# AHLT - Lab - DDI ML

Authors: Ricard Monge (group 12) and Cristina Capdevila (group 10)

This notebook contains the deliverables for the AHLT Lab DDI Machine Learning assignment.
The notebook contains the following sections:

- [Feature extractor function *extract features*](#features)
    - [Depdency tree utility functions](#dependency)
- [Learner function *learner*](#learner)
    - [Get features and labels utility function](#get_features)
- [Classifier function *classifier*](#classifier)
    - [Output generator utility function](#output)
- [Model comparison on Devel dataset](#dev_table_results)
- [Model comparison on Test dataset](#test_table_results)
- [Conclusion](#conclusion)

We do not present the *analyze* function as it is the same as for the previous task *DDI_baseline*.


<a id='features'></a>
## Feature extractor function *extract_features*

To improve upon the baseline DDI classification we devise a set of features with used to train the classifiers to detect Drug Drug Interactions (DDI).

We employ a set of utility functions to extract information from the sentence analysis depdenency tree, [see here](#dependency).

Finally, we come up with 5 types of features:

- **Lemma features**, which relate to the appearence of certain lemmas inside the sentence's tree. For instance the appearance of specific lemmas associated with each DDI type, the appearence of modal verbs (characteristic of advise DDI) or the relation with each entitiy ancestor verb with the DDI type assocated lemmas.

- **Dependency features**, which relate to the dependencies on the dependency tree associated with each entity and its ancestors. For instance, which dependency the entity has with their head, i.e. entities which are direct obejcts *dobj* nominal subjects *nsubj*, etc or the string of connected depdencies of the entity until the root of the sentence.

- **Common ancestor features**, which relate to the properties of the first common node in each entity's ancestors, such as its dependencies and tags, or its distance to the entities and distance to the root node.

- **Tree addres features**, which check the existence of certain tree relationships withtin the entity, such as entities wich act as a direct object related to entites that act as a nominal modifier, which may be effects/mechanics DDI characteristics.

- **NER features**, features related to the properties of the entity's lemma, such as the 2 or 3 prefix/sufix or the jaccard or edit distance between entities. This could help identify the types of drugs involved and thus the DDI.

Given that the **MaxEnt** classifier allows for selection of feature columns on training, we decide to use for it only a subset of the features presented. We specified the final selected features in the following section.

In [11]:
def extract_features(analysis, entities, e1, e2):
    """
    Extract Features.
    Function which receives an analyzed sentence tree, the entities
    present in the sentence, and the ids of the two target entities and returns
    a list of features to pass to a ML model to predict DDI.
    Args:
        - analysis: DependencyGraph object instance with sentence parsed
            information.
        - entities: dictionary of entities indexed by id with offset as value.
        - e1: string with id of the first entity to consider.
        - e2: string with id of the second entity to consider.
    Return:
        - feats: list of features extracted from the tree and e1, e2.
    """
    feats = []

    # Get entity nodes from tree
    n1 = get_entity_node(analysis, entities, e1)
    n2 = get_entity_node(analysis, entities, e2)

    # Get verb ancestor from entities
    v1 = get_verb_ancestor(analysis, n1)
    v2 = get_verb_ancestor(analysis, n2)

    # Get ancestors nodes list for entity nodes and verb nodes
    ance1 = get_ancestors(analysis, n1)
    ance2 = get_ancestors(analysis, n2)
    ancev1 = get_ancestors(analysis, v1)
    ancev2 = get_ancestors(analysis, v2)

    # DDI-type characteristic lemmas
    advise_lemmas = ["administer", "use", "recommend", "consider", "approach",
                     "avoid", "monitor", "advise", "require", "contraindicate"]
    effect_lemmas = ["increase", "report", "potentiate", "enhance", "decrease",
                     "include", "result", "reduce", "occur", "produce",
                     "prevent", "effect"]
    int_lemmas = ["interact", "interaction"]
    mechanism_lemmas = ["reduce", "increase", "decrease"]
    
    # Mix lemmas
    mix_lemmas = list(set(
        advise_lemmas + effect_lemmas + int_lemmas + mechanism_lemmas))
    
    # Modal verbs lemmas
    modal_vb = ["can", "could", "may", "might", "must", "will", "would",
                "shall", "should"]

    # Modal verbs and DDI-type lemmas present in sentence
    modal_present = check_lemmas(analysis, modal_vb)
    lemma_present = check_lemmas(analysis, mix_lemmas)
    advise_present = check_lemmas(analysis, advise_lemmas)
    effect_present = check_lemmas(analysis, effect_lemmas)
    int_present = check_lemmas(analysis, int_lemmas)
    mechanism_present = check_lemmas(analysis, effect_lemmas)

    # e1<-*-VB Verb is part DDI-type lemmas
    advise_v1 = True if v1["lemma"] in advise_lemmas else "null"
    effect_v1 = True if v1["lemma"] in effect_lemmas else "null"
    int_v1 = True if v1["lemma"] in int_lemmas else "null"
    mechanism_v1 = True if v1["lemma"] in mechanism_lemmas else "null"
    # e2<-*-VB Verb is part DDI-type lemmas
    advise_v2 = True if v2["lemma"] in advise_lemmas else "null"
    effect_v2 = True if v2["lemma"] in effect_lemmas else "null"
    int_v2 = True if v2["lemma"] in int_lemmas else "null"
    mechanism_v2 = True if v2["lemma"] in mechanism_lemmas else "null"


    # Check if entities hang from the same verb
    v1_lemma = v1["lemma"]  # NOT USED
    v2_lemma = v2["lemma"]  # NOT USED
    v1_equal_v2 = v1 == v2

    # Get head dependencies
    e1_rel = n1["rel"]
    e2_rel = n2["rel"]
    v1_rel = v1["rel"]
    v2_rel = v2["rel"]

    # Get node dependencies from all its ancestors
    e1_deps = "_".join(n1["deps"].keys()) if len(n1["deps"]) else "null"
    e2_deps = "_".join(n2["deps"].keys()) if len(n2["deps"]) else "null"
    v1_deps = "_".join(v1["deps"].keys()) if len(v1["deps"]) else "null"
    v2_deps = "_".join(v2["deps"].keys()) if len(v2["deps"]) else "null"
    ance1_deps = "_".join([a["rel"] for a in ance1]) if len(ance1) else "null"
    ance2_deps = "_".join([a["rel"] for a in ance2]) if len(ance2) else "null"

    # Get node order
    e1_over_e2 = n1 in ance2
    e2_over_e1 = n2 in ance1  # NOT USED
    v1_over_v2 = v1 in ancev2
    v2_over_v1 = v2 in ancev1

    # Common ancestor features
    common = ([n for n in ance1 if n in ance2] if len(ance1) > len(ance2) else
              [n for n in ance2 if n in ance1])
    common_rel = common[0]["rel"] if len(common) else "null"
    common_deps = ("_".join(common[0]["deps"].keys())
                   if len(common) and len(common[0]["deps"]) else "null")
    common_tag = common[0]["tag"] if len(common) else "null"
    common_tag = dict_tags[common_tag]
    common_dist_root = (len(ance1) - 1 - ance1.index(common[0])
                        if len(common) else 99)
    common_dist_e1 = ance1.index(common[0]) if len(common) else 99
    common_dist_e2 = ance2.index(common[0]) if len(common) else 99

    # Common ancestor son's rel for each entity's branch
    common_dep11_rel = (
        ance1[ance1.index(common[0]) - 1]["rel"]
        if len(common) and ance1.index(common[0]) > 0 else "null")
    common_dep12_rel = (
        ance1[ance1.index(common[0]) - 2]["rel"]
        if len(common) and ance1.index(common[0]) > 1 else "null")
    common_dep13_rel = (
        ance1[ance1.index(common[0]) - 3]["rel"]
        if len(common) and ance1.index(common[0]) > 2 else "null")
    common_dep21_rel = (
        ance2[ance2.index(common[0]) - 1]["rel"]
        if len(common) and ance2.index(common[0]) > 0 else "null")
    common_dep22_rel = (
        ance2[ance2.index(common[0]) - 2]["rel"]
        if len(common) and ance2.index(common[0]) > 1 else "null")
    common_dep23_rel = (
        ance2[ance2.index(common[0]) - 3]["rel"]
        if len(common) and ance2.index(common[0]) > 2 else "null")

    # Common ancestor son's tag for each entity's branch
    common_dep11_tag = (
        dict_tags[ance1[ance1.index(common[0]) - 1]["tag"]]
        if len(common) and ance1.index(common[0]) > 0 else "null")

    common_dep22_tag = (
        dict_tags[ance2[ance2.index(common[0]) - 2]["tag"]]
        if len(common) and ance2.index(common[0]) > 1 else "null")

    # Tree address features
    # e1<-conj-x<-dobj-VB-nmod->e2
    e2_nmod = get_dependency_address(v2, "nmod") == n2["address"]
    x_dobj = get_dependency_address(v1, "dobj")
    nx = analysis.nodes[x_dobj] if x_dobj != -1 else v1
    e1_conj_dobj = get_dependency_address(nx, "conj") == n1["address"]
    e1_conj_dobj_nmod_e2 = e1_conj_dobj and e2_nmod  # NOT USED

    # NER features
        
    # Entity lemma features
    lemma1 = str(n1["lemma"])  # NOT USED
    lemma2 = str(n2["lemma"])  # NOT USED
    
    # 3-Prefix/Suffix from lemma
    pre3_1 = lemma1[:3].lower()
    pre3_2 = lemma2[:3].lower()
    suf3_1 = lemma1[-3:].lower()
    suf3_2 = lemma2[-3:].lower()
    
    # Number of capitals in token
    capitals1 = sum(i.isupper() for i in lemma1)  # NOT USED
    capitals2 = sum(i.isupper() for i in lemma2)  # NOT USED

    # Gather variables
    feats = [
        modal_present,  
        lemma_present,
        advise_present,
        effect_present,
        int_present,
        mechanism_present,  
        advise_v1,
        effect_v1,
        int_v1,
        mechanism_v1,
        advise_v2,  
        effect_v2,
        int_v2,
        mechanism_v2,
        v1_equal_v2,
        e1_rel,
        e2_rel,
        v1_rel,
        v2_rel,
        e1_deps,  
        e2_deps,
        e1_over_e2,
        v1_over_v2,
        v2_over_v1,
        common_rel,  
        common_tag,
        common_dist_root,
        common_dist_e1,
        common_dist_e2,
        common_deps, 
        common_dep11_rel,
        common_dep12_rel,
        common_dep13_rel,
        common_dep21_rel,
        common_dep22_rel, 
        common_dep23_rel,
        common_dep11_tag,
        common_dep22_tag,
        v1_deps,
        v2_deps,  
        ance1_deps,
        ance2_deps,
        pre3_1,
        pre3_2,
        suf3_1,
        suf3_2,
    ]
    # Turn variables f to categorical var_i=f
    feats = [f"var_{i}={f}" for i, f in enumerate(feats)]
    return feats

<a id="dependency"></a>
### Depdency tree utility functions

In order to analyse and extract the mentioend features from the dependency tree we build a series of utility functions to extract relations:

- **Get entity node** function which retrieves the node corresponding with a given entity.

- **Get verb ancestor** function which retrieves the first ancestor of type verb for a given entity and a given sentence analysis.

- **Get dependency address** function which returns the address of a certain dependency for a given node, or -1 if the node has no such dependency.

- **Check lemmas** function which checks if the words in the sentence contain the given lemmas, and returns the found lemma with the highest position in the depdency tree.

- **Get ancestors** function which retrieves the list of ancestor nodes from the given node.

In [2]:
def get_entity_node(analysis, entities, entity):
    """
    Get Entity Node.
    Function which finds the node in the Dependency Tree which corresponds to
    the root of the entity.
    Args:
        - analysis: DependencyTree object instance with sentence analysis.
        - entities: dictionary with entity information.
        - entity: string with id of entity to get.
    Returns:
        - node: dictionary with node from DependencyTree.
    """
    # Get nodes list
    nodes = [analysis.nodes[k] for k in analysis.nodes]
    ents = entities[entity]["text"].split()
    # Capture possible tree nodes containing or that are contained in entity
    possible = sorted(
        [node for node in nodes if node["word"] is not None and
         any(ent in node["word"] for ent in ents)],
        key=lambda x: x["head"])
    node = possible[0] if len(possible) else nodes[0]
    return node


def get_verb_ancestor(analysis, node):
    """
    Get Verb Ancestor.
    Function which looks in the node's antecessor nodes inthe analysis tree
    until it finds a verb VB, and returns such verb.
    Args:
        - analysis: DependencyTree object instance with sentence analysis.
        - node: dictionary with node to start from.
    Return:
        - node: dictionary with verb antecessor node from DependencyTree.
    """
    nodes = analysis.nodes
    while node["tag"] != "TOP" and "VB" not in node["tag"]:
        node = nodes[node["head"]]
        if not node["tag"]:
            break
    return node


def get_dependency_address(node, dependency):
    """
    Get Dependency Address.
    Function which returns the address of a given dependency for a given node,
    or a non tractable value -1, which always evaluates to False in the
    features. To use when extracting features.
    Args:
        - node: dictionary with node to look dependencies from.
        - dependency: string with dependency name to look for in node.
    Return:
        - _: string with address of found dependency, or -1 if not found.
    """
    dep = node["deps"][dependency]
    # If dependency exists, return address
    # If dependency does not exist, return non-value
    return dep[0] if len(dep) else -1


def check_lemmas(analysis, lemmas):
    """
    Check Lemmas.
    Function which checks if the words in the sentence contain the given
    lemmas. Then returns the tree-higher encountered lemma, or "null" if none
    found.
    Args:
        - analysis: DependencyTree object instance with sentence analysis.
        - lemmas: list of strings with lemmas to check.
    Returns:
        - _: string with present lemma or None.
    """
    nds = analysis.nodes
    present = [nds[n] for n in nds
               if (nds[n]["word"] is not None and nds[n]["lemma"] in lemmas)]
    present = sorted(present, key=lambda x: x["head"])
    # return present[0]["lemma"] if len(present) else "null"
    return "True" if len(present) else "False"


def get_ancestors(analysis, node):
    """
    Get Ancestors.
    Function which returns the given node's ancestor nodes.
    Args:
        - analysis: DependencyTree object instance with sentence analysis.
        - node: dictionary with node to start from.
    Return:
        - node: dictionary with verb antecessor node from DependencyTree.
    """
    ancs = []
    nds = analysis.nodes
    while node["tag"] and node["tag"] != "TOP":
        ancs.append(node)
        node = nds[node["head"]]
    return ancs


<a id='learner'></a>
## Learner function *learner*

The learner function takes the generated training features and the gold class labels for each entity pair together with the selected **model** and trains the selected classifer, then saves it for later use.

We decide to try four different classifiers, which we think are able to capture the relations between selected features to detect DDI and their types.

- **Maximum Entropy classifier** (**MaxEnt**), using the [MEGAM](http://users.umiacs.umd.edu/~hal/megam/version0_3/) optimizer package through command line executable abd a subset of the features used for the other classifiers.

- **Multi-layer Perceptron Classifier** (**MLP**), through its implementation in *Sklearn* Python package.
   
- **Support Vector Classification** (**SVC**), through its implementation in *Sklearn* Python package.

- **Logistic Regression** (**LR**), through its implementation in *Sklearn* Python package.

After trying different configurations for the different classifiers, we decide for each classifier a set of hyper-parameters that give best results for the Devel dataset. Similarly, for the MaxEnt classifier we select a subset of features. Furthermore, for the *Sklearn* classifiers, we use One-Hot-Encoding to turn the categorical features into binary sets of features.

The learner function, as well as the later mentioned classifer function, uses the [get_features_labels](#get_features) utility function to extract the sentence and entity ids, the generated features and gold standard class labels.

In [1]:
## Hyper-parameters
random_seed = 10

# MaxEnt params
feat_col = "4-10,12,16-46,48"

# MLP params
hidden_layer_sizes = 
alpha = 1
activation = "relu"
solver = "adam"
n_epochs = 100
early_stopping = True
verbose = False

# LR params
C = 1e6
multi_class = 'ovr'
penalty = 'l2'
max_iter = 1000
lr_solver = 'lbfgs'
n_jobs = -1
lr_verbose = 0

def learner(model, feature_input, output_fn):
    """
    Learner.
    Function which calls the learner with a given feature filename and an
    output filename to save model to.
    Args:
        - model: string with model type to use.
        - feature_input: string with filename of the file to extract features
            from to fit the model.
        - output_fn: string with filename of output file for trained model.
    """
    if model == "MaxEnt":
        # MaxEnt learner flow
        megam_features = f"{tmp_path}/megam_train_features.dat"
        megam_model = f"{output_fn}.megam"
        system(f"cat {feature_input}  | cut -f {feat_col} > \
            {megam_features}")
        system(f"./{megam} -quiet -nc -nobias multiclass \
            {megam_features} > {megam_model}")

    elif model == "MLP":
        _, x_cat, y = get_features_labels(feature_input)
        # OneHotEncode variables
        encoder = OneHotEncoder(handle_unknown="ignore")
        encoder.fit(x_cat)
        x = encoder.transform(x_cat)
        # Create MLP instance
        model = MLPClassifier(
            hidden_layer_sizes=hidden_layer_sizes,
            activation=activation,
            solver=solver,
            max_iter=n_epochs,
            early_stopping=early_stopping,
            random_state=random_seed,
            verbose=verbose)
        # Train MLP instance
        model.fit(x, y)
        # Save model to pickle
        with open(f"{output_fn}.MLP", "wb") as fp:
            pickle.dump([model, encoder], fp)

    elif model == "SVC":
        _, x_cat, y = get_features_labels(feature_input)
        # OneHotEncode variables
        encoder = OneHotEncoder(handle_unknown="ignore")
        encoder.fit(x_cat)
        x = encoder.transform(x_cat)
        # Create SVC instance
        model = SVC(random_state=random_seed)
        # Train SVC instance
        model.fit(x, y)
        # Save model to pickle
        with open(f"{output_fn}.SVC", "wb") as fp:
            pickle.dump([model, encoder], fp)

    elif model == "LR":
        _, x_cat, y = get_features_labels(feature_input)
        # OneHotEncode variables
        encoder = OneHotEncoder(handle_unknown="ignore")
        encoder.fit(x_cat)
        x = encoder.transform(x_cat)
        # Create LR instance
        model = LR(
            C=C,
            multi_class=multi_class,
            penalty=penalty,
            max_iter=max_iter,
            solver=lr_solver,
            random_state=random_seed,
            n_jobs=n_jobs,
            verbose=lr_verbose)
        # Train LR instance
        model.fit(x, y)
        # Save model to pickle
        with open(f"{output_fn}.LR", "wb") as fp:
            pickle.dump([model, encoder], fp)

    else:
        print(f"[ERROR] Model {model} not implemented")
        raise NotImplementedError


<a id='get_features'></a>
### Get features and labels

Utiltiy function to extract sentence id, entities id for each entity of a given pair, generated features and gold standard DDI type from the given **input** file.

In [2]:
def get_features_labels(input):
    """
    Get Features & Labels.
    Function which opens the given filename and extracts the feature and label
    vectors, togehter with the sentence and pair entities ids.
    Args:
        - input: string with filename of file to extract features from.
    Returns:
        - ids: list of lists with sentence id and entity pairs ids.
        - feats: list of lists with binary feature vector.
        - labels: list of labels for each entity pair, for the trainer to use.
    """
    with open(input, "r") as fp:
        lines = fp.read()
    pairs = [sent.split("\t") for sent in lines.split("\n")[:-1]]
    ids = []
    labels = []
    feats = []
    for p in pairs:
        ids.append((p[0], p[1], p[2]))
        labels.append(p[3])
        feat = [elem.split("=")[1] for elem in p[4:]]
        feats.append(feat)
    return ids, feats, labels


<a id='classifier'></a>
## Classifier function *classifier*

The classifier function takes the generated features for the data and the trained model, according to the **model** parameter and outputs the predictions given by the model. The different prediction formats of each model type are normalized into the same format and finally passed onto the [ouput_features](#output) function.

This function uses the [get_features_labels](#get_features) utility function to extract the sentence and entity ids, the generated features and gold standard class labels.

In [3]:
def classifier(model, feature_input, model_input, outputfile):
    """
    Classifier.
    Function which retrived a trainer model and predicts the output for a given
    validation set features file, to print output to another file.
    Args:
        - model: string with model type to use.
        - feature_input: string with filename of the file to extract features
            from to validate the model.
        - outputfile: string with filename of output file for validation
            predictions.
    """
    # Retrieve sentences, entities and feature vectos
    ids, x, _ = get_features_labels(feature_input)
    if model == "MaxEnt":
        # MaxEnt classifier flow
        megam_features = f"{tmp_path}/megam_valid_features.dat"
        megam_predictions = f"{tmp_path}/megam_predictions.dat"
        system(f"cat {feature_input} | cut -f {feat_col} > \
            {megam_features}")
        # system(f"cat {feature_input} | cut -f4- > \
        #     {megam_features}")
        system(f"./{megam} -quiet -nc -nobias -predict {model_input}.megam \
            multiclass {megam_features} > {megam_predictions}")
        with open(megam_predictions, "r") as fp:
            lines = fp.readlines()
        predictions = [line.split("\t")[0] for line in lines]

    elif model == "MLP":
        # Retrieve model
        with open(f"{model_input}.MLP", "rb") as fp:
            model, encoder = pickle.load(fp)
        # OneHotEncode variables
        x_ = encoder.transform(x)
        # Predict classes
        predictions = model.predict(x_)

    elif model == "SVC":
        # Retrieve model
        with open(f"{model_input}.SVC", "rb") as fp:
            model, encoder = pickle.load(fp)
        # OneHotEncode variables
        x_ = encoder.transform(x)
        # Predict classes
        predictions = model.predict(x_)

    elif model == "GBC":
        # Retrieve model
        with open(f"{model_input}.GBC", "rb") as fp:
            model, encoder = pickle.load(fp)
        # OneHotEncode variables
        x_ = encoder.transform(x)
        # Predict classes
        predictions = model.predict(x_)

    elif model == "LR":
        # Retrieve model
        with open(f"{model_input}.LR", "rb") as fp:
            model, encoder = pickle.load(fp)
        # OneHotEncode variables
        x_ = encoder.transform(x)
        # Predict classes
        predictions = model.predict(x_)

    else:
        print(f"[ERROR] Model {model} not implemented")
        raise NotImplementedError

    # Ouput entites for each sentence
    with open(outputfile, "w") as outf:
        for (id, id_e1, id_e2), type in zip(ids, predictions):
            output_ddi(id, id_e1, id_e2, type, outf)


<a id='output'></a>
### Output generator utility function

This function recieves the token list  **tokens** for each sentence, identified by the **id** parameter, the ids of each entity of the considered pairs (**e1**,**e2**) in the given sentence, their predicted classes **type** of interaction and the list of extracted **features**. Then it outputs the correspondng line to write in the output features file object **outf**.

In [1]:
def output_features(id, e1, e2, type, features, out):
    """
    Output Features.
    Function which outputs to the given opened file object the entity pair
    specified with the features extracted from their sentence.
    Args:
        - id: string with sentence id.
        - e1: string with id of the first entity to consider.
        - e2: string with id of the second entity to consider.
        - type: string with gold class of DDI, for use in training.
        - features: list of extracted features from sentence tree.
        - outf: file object with opened file for writing output features.
    """
    feature_str = "\t".join(features)
    txt = f"{id}\t{e1}\t{e2}\t{type}\t{feature_str}\n"
    out.write(txt)


<a id='dev_table_results'></a>
## Model comparison  on Devel dataset

Model comparison:

|model|prec|recall|F1|
|--|--|--|--|
|MaxEnt|0.6452|0.3821|0.4800|
|--|--|--|--|
|**MLP**|**0.5891**|**0.4601**|**0.5166**|
|--|--|--|--|
|SVC|0.4936|0.3521|0.411|
|--|--|--|--|
|**LR**|**0.4905**|**0.5530**|**0.5199**|


We see that the **MLP** and **LR** models have the highest *F1* scores, with higher precision for the **MLP** and higher recall and slightly higher *F1* for the **LR** model.

Conversely, the maximum precision score is achieved with the **MaxEnt** classifier.

Here are the evaluator outputs for the four models:

```
SCORES FOR THE GROUP: ML_MaxEnt RUN=1
Gold Dataset: /Devel

Partial Evaluation: only detection of DDI (regadless to the type)
tp	fp	fn	total	prec	recall	F1
216	88	268	484	0.7105	0.4463	0.5482


Detection and Classification of DDI
tp	fp	fn	total	prec	recall	F1
158	146	326	484	0.5197	0.3264	0.401


________________________________________________________________________

SCORES FOR DDI TYPE
Scores for ddi with type mechanism
tp	fp	fn	total	prec	recall	F1
38	37	163	201	0.5067	0.1891	0.2754


Scores for ddi with type effect
tp	fp	fn	total	prec	recall	F1
72	76	90	162	0.4865	0.4444	0.4645


Scores for ddi with type advise
tp	fp	fn	total	prec	recall	F1
47	33	72	119	0.5875	0.395	0.4724


Scores for ddi with type int
tp	fp	fn	total	prec	recall	F1
1	0	1	2	1	0.5	0.6667


MACRO-AVERAGE MEASURES:
	P	R	F1
	0.6452	0.3821	0.48
________________________________________________________________________

```

```
SCORES FOR THE GROUP: ML_MLP RUN=1
Gold Dataset: /Devel

Partial Evaluation: only detection of DDI (regadless to the type)
tp	fp	fn	total	prec	recall	F1
278	211	206	484	0.5685	0.5744	0.5714


Detection and Classification of DDI
tp	fp	fn	total	prec	recall	F1
213	276	271	484	0.4356	0.4401	0.4378


________________________________________________________________________

SCORES FOR DDI TYPE
Scores for ddi with type mechanism
tp	fp	fn	total	prec	recall	F1
63	91	138	201	0.4091	0.3134	0.3549


Scores for ddi with type effect
tp	fp	fn	total	prec	recall	F1
101	143	61	162	0.4139	0.6235	0.4975


Scores for ddi with type advise
tp	fp	fn	total	prec	recall	F1
48	42	71	119	0.5333	0.4034	0.4593


Scores for ddi with type int
tp	fp	fn	total	prec	recall	F1
1	0	1	2	1	0.5	0.6667


MACRO-AVERAGE MEASURES:
	P	R	F1
	0.5891	0.4601	0.5166
________________________________________________________________________

```

```
SCORES FOR THE GROUP: ML_SVC RUN=1
Gold Dataset: /Devel

Partial Evaluation: only detection of DDI (regadless to the type)
tp	fp	fn	total	prec	recall	F1
252	80	232	484	0.759	0.5207	0.6176


Detection and Classification of DDI
tp	fp	fn	total	prec	recall	F1
217	115	267	484	0.6536	0.4483	0.5319


________________________________________________________________________

SCORES FOR DDI TYPE
Scores for ddi with type mechanism
tp	fp	fn	total	prec	recall	F1
58	35	143	201	0.6237	0.2886	0.3946


Scores for ddi with type effect
tp	fp	fn	total	prec	recall	F1
97	55	65	162	0.6382	0.5988	0.6178


Scores for ddi with type advise
tp	fp	fn	total	prec	recall	F1
62	25	57	119	0.7126	0.521	0.6019


Scores for ddi with type int
tp	fp	fn	total	prec	recall	F1
0	0	2	2	0	0	0


MACRO-AVERAGE MEASURES:
	P	R	F1
	0.4936	0.3521	0.411
________________________________________________________________________

```

```
SCORES FOR THE GROUP: ML_LR RUN=1
Gold Dataset: /Devel

Partial Evaluation: only detection of DDI (regadless to the type)
tp	fp	fn	total	prec	recall	F1
251	158	233	484	0.6137	0.5186	0.5622


Detection and Classification of DDI
tp	fp	fn	total	prec	recall	F1
192	217	292	484	0.4694	0.3967	0.43


________________________________________________________________________

SCORES FOR DDI TYPE
Scores for ddi with type mechanism
tp	fp	fn	total	prec	recall	F1
53	81	148	201	0.3955	0.2637	0.3164


Scores for ddi with type effect
tp	fp	fn	total	prec	recall	F1
91	103	71	162	0.4691	0.5617	0.5112


Scores for ddi with type advise
tp	fp	fn	total	prec	recall	F1
46	31	73	119	0.5974	0.3866	0.4694


Scores for ddi with type int
tp	fp	fn	total	prec	recall	F1
2	2	0	2	0.5	1	0.6667


MACRO-AVERAGE MEASURES:
	P	R	F1
	0.4905	0.553	0.5199
________________________________________________________________________
```

<a id='test_table_results'></a>
## Model comparison  on Test-DDI dataset

Model comparison:

|model|prec|recall|F1|
|--|--|--|--|
|**MaxEnt**|**0.5386**|**0.3231**|**0.4039**|
|--|--|--|--|
|MLP|0.3678|0.3823|0.3749|
|--|--|--|--|
|**SVC**|**0.6114**|**0.4013**|**0.4845**|
|--|--|--|--|
|LR|0.4489|0.3596|0.3993|

We see that opposite to the Devel dataset results, the **MaxEnt** and **SVC** models have the highest *F1* scores, with higher precision and *F1* for the **SVC**.

Contrary, the models with higher *F1* score in the Devel dataset **MaxEnt** and **LR** have greater generalization error and thus much lower performance in the Test dataset. This shows these models were overfitted.

Here are the evaluator outputs for the four models:

```
SCORES FOR THE GROUP: ML_MaxEnt RUN=2
Gold Dataset: /Test-DDI

Partial Evaluation: only detection of DDI (regadless to the type)
tp	fp	fn	total	prec	recall	F1
426	177	553	979	0.7065	0.4351	0.5386


Detection and Classification of DDI
tp	fp	fn	total	prec	recall	F1
296	307	683	979	0.4909	0.3023	0.3742


________________________________________________________________________

SCORES FOR DDI TYPE
Scores for ddi with type mechanism
tp	fp	fn	total	prec	recall	F1
53	53	249	302	0.5	0.1755	0.2598


Scores for ddi with type effect
tp	fp	fn	total	prec	recall	F1
125	173	235	360	0.4195	0.3472	0.3799


Scores for ddi with type advise
tp	fp	fn	total	prec	recall	F1
78	62	143	221	0.5571	0.3529	0.4321


Scores for ddi with type int
tp	fp	fn	total	prec	recall	F1
40	19	56	96	0.678	0.4167	0.5161


MACRO-AVERAGE MEASURES:
	P	R	F1
	0.5386	0.3231	0.4039
________________________________________________________________________
```

```
SCORES FOR THE GROUP: ML_MLP RUN=2
Gold Dataset: /Test-DDI

Partial Evaluation: only detection of DDI (regadless to the type)
tp	fp	fn	total	prec	recall	F1
563	533	416	979	0.5137	0.5751	0.5427


Detection and Classification of DDI
tp	fp	fn	total	prec	recall	F1
380	716	599	979	0.3467	0.3882	0.3663


________________________________________________________________________

SCORES FOR DDI TYPE
Scores for ddi with type mechanism
tp	fp	fn	total	prec	recall	F1
92	212	210	302	0.3026	0.3046	0.3036


Scores for ddi with type effect
tp	fp	fn	total	prec	recall	F1
163	346	197	360	0.3202	0.4528	0.3751


Scores for ddi with type advise
tp	fp	fn	total	prec	recall	F1
90	99	131	221	0.4762	0.4072	0.439


Scores for ddi with type int
tp	fp	fn	total	prec	recall	F1
35	59	61	96	0.3723	0.3646	0.3684


MACRO-AVERAGE MEASURES:
	P	R	F1
	0.3678	0.3823	0.3749
________________________________________________________________________

```

```
SCORES FOR THE GROUP: ML_SVC RUN=2
Gold Dataset: /Test-DDI

Partial Evaluation: only detection of DDI (regadless to the type)
tp	fp	fn	total	prec	recall	F1
533	155	446	979	0.7747	0.5444	0.6395


Detection and Classification of DDI
tp	fp	fn	total	prec	recall	F1
396	292	583	979	0.5756	0.4045	0.4751


________________________________________________________________________

SCORES FOR DDI TYPE
Scores for ddi with type mechanism
tp	fp	fn	total	prec	recall	F1
92	55	210	302	0.6259	0.3046	0.4098


Scores for ddi with type effect
tp	fp	fn	total	prec	recall	F1
161	161	199	360	0.5	0.4472	0.4721


Scores for ddi with type advise
tp	fp	fn	total	prec	recall	F1
108	59	113	221	0.6467	0.4887	0.5567


Scores for ddi with type int
tp	fp	fn	total	prec	recall	F1
35	17	61	96	0.6731	0.3646	0.473


MACRO-AVERAGE MEASURES:
	P	R	F1
	0.6114	0.4013	0.4845
________________________________________________________________________
```

```
SCORES FOR THE GROUP: ML_LR RUN=2
Gold Dataset: /Test-DDI

Partial Evaluation: only detection of DDI (regadless to the type)
tp	fp	fn	total	prec	recall	F1
516	349	463	979	0.5965	0.5271	0.5597


Detection and Classification of DDI
tp	fp	fn	total	prec	recall	F1
343	522	636	979	0.3965	0.3504	0.372


________________________________________________________________________

SCORES FOR DDI TYPE
Scores for ddi with type mechanism
tp	fp	fn	total	prec	recall	F1
81	180	221	302	0.3103	0.2682	0.2877


Scores for ddi with type effect
tp	fp	fn	total	prec	recall	F1
137	250	223	360	0.354	0.3806	0.3668


Scores for ddi with type advise
tp	fp	fn	total	prec	recall	F1
87	59	134	221	0.5959	0.3937	0.4741


Scores for ddi with type int
tp	fp	fn	total	prec	recall	F1
38	33	58	96	0.5352	0.3958	0.4551


MACRO-AVERAGE MEASURES:
	P	R	F1
	0.4489	0.3596	0.3993
________________________________________________________________________

```

<a id="conclusion"></a>
## Conclusion

To conclude, we would recommend to use **MaxEnt** or the **SVC** classiier for further DDI detection with the selected features. They have been the models wich have presented less difference between the Devel and Test results.