# AHLT - Second delivery - Task 9.2 DDI
**Albert Rial**   
**Karen Lliguin**   

This delivery consists of solving the task 9.2 of the SemEval-2013 challenge. The task concerns classifying drug-drug interactions between pairs of drugs. 

The dataset provided contains XML files with sentences with drugs entities appearing on it and the corresponding interaction. There are four general types: mechanism, int, effect and advise. The data is already splitted in three subsets: Train, Devel and Test.

To complete this task different methods and resources have been used among them, the Stanford CoreNLP dependency parser. We have divide the task in different subgoals.

## Goal 1: Rule-based DDI
### Introduction

First, simple heurístic rules have been used to carry out the task. In this version only the information given by the Train dataset is used. The final goal is to achieve an overall F1 score of at least 0.15 on the Devel dataset.

### Data exploration


With the purpose of building significant rules (and features for the next goal), first we have done a data exploration over the Test dataset. The following aspects have been analyse for each type interaction:
- The most commom words that appear in between the two drugs that interact (clue_words).
- The most commom words that appear in each sentence containing drugs that interact (sentence_words).
- The most commom words from the dependency tree of the sentence where entity1 is under entity (e1_under).
- The most commom words from the dependency tree of the sentence where entity1 is under entity (e2_under).

Besides this analysis, in order to have a more clear understading of each metric regarding on how one word is seen by all the different types, we have store the previous information so that we could search a word regarding any of the previous metrics and the information of how many times it appears for each type is shown. For intance, given the word "effect" and the metric "e1_under" the following information is obtained:
```
--------------------
effect
48
1525
--------------------
mechanism
16
1118
--------------------
int
0
186
--------------------
advise
2
707
--------------------
none
307
21553
```
This inside information is used to build the rules.

### Rules

As a first approach, 

### Details

The functions *analyse*, *check_interaction* and *get_entity* are presented.



In [None]:
def analyze(sent):
    if len(sent)<= 0:
        return None
    
    mytree, = my_parser.raw_parse(sent)
    tree = mytree.nodes
    ini_token = 0
                   
    # clean tree
    info = ['address', 'head', 'lemma', 'rel', 'word', 'tag']
    for k in range(len(tree)):
        node = tree[k] 
        for key in list(node):
            if key not in info:
                del node[key]
        
        if k != 0:
            # add offsets
            ini_token = sent.find(node['word'] ,ini_token)
            node['start'] = ini_token
            ini_token += len(node['word'])
            node['end'] = ini_token - 1
            
    return tree

def get_entity_nodes(tree, entities, e1, e2):
    entity1 = []
    entity2 = []
    starts1 = entities[e1][0].split(';')
    starts2 = entities[e2][0].split(';')
    ends1 = entities[e1][1].split(';')
    ends2 = entities[e2][1].split(';')
    for k in tree.keys():
        if 'start' in tree[k].keys():
            if str(tree[k]['start']) in starts1 or str(tree[k]['end']) in ends1:
                entity1.append(tree[k])
            elif str(tree[k]['start']) in starts2 or str(tree[k]['end']) in ends2:
                entity2.append(tree[k])
    return entity1, entity2

def check_interaction(analysis, entities, e1, e2):
    # Get entities
    entity1, entity2 = get_entity_nodes(analysis, entities, e1, e2)
    
    int_flag = 0
    effect_flag = 0
    mechanism_flag = 0
    advise_flag = 0
    for key in analysis.keys():
        if 'start' in analysis[key].keys() and analysis[key]['word'] in ["administration", 'None','inhibitor']:
            int_flag = 1
        if 'start' in analysis[key].keys() and analysis[key]['word'] in ['drug','administer', 'effect', 'use','dose']:
            effect_flag = 1
        if 'start' in analysis[key].keys() and analysis[key]['word'] in ['drug', 'administer', 'dose','use','effect','concentration']:
            mechanism_flag = 1
        if 'start' in analysis[key].keys() and analysis[key]['word'] in ['drug','use','effect']:
            advise_flag = 1
            
    if len(entity1) > 0 and len(entity2) > 0:
        # DDI rules
        # e1_e2_under_same_verb -> "advise"
        # e1_e2_under_same_word_but_not_noun_or_verb -> none
        # e1_under_e2 -> none
        for ent1 in entity1:
            if analysis[ent1['head']] in entity2:
                return (0, "null")
        
        for ent1 in entity1:
            for ent2 in entity2:
                if ent1['head'] == ent2['head'] and analysis[ent1['head']]['tag'].lower()[0] not in ['v', 'n']:
                    return (0, "null")
                
                if ent1['head'] == ent2['head'] and analysis[ent1['head']]['tag'].lower()[0] == 'v':
                    return (1, "advise")
    
        for e in entity1:
            if analysis[e['head']]['lemma'] in ['response', 'diminish', 'enhance'] and not effect_flag:
                return (1, "effect")
            elif analysis[e['head']]['lemma'] in ['absorption', 'metabolism', 'presence']and not mechanism_flag:
                return (1, "mechanism")
            elif analysis[e['head']]['lemma'] in ['interact', 'interaction'] and not int_flag:
                return (1, "int")
            elif analysis[e['head']]['lemma'] in ['take', 'adjustment', 'avoid', 'recommend', 'contraindicate'] and not advise_flag:
                return (1, "advise")
            
        for e in entity2:
            if analysis[e['head']]['lemma'] in ['effect']:
                return (1, "effect")
            elif analysis[e['head']]['lemma'] in ['absorption', 'metabolism', 'level', 'clearance']:
                return (1, "mechanism")
            elif analysis[e['head']]['lemma'] in ['take', 'caution']:
                return (1, "advise")
        
        return (0, "null")    
    else:
        return (0, "null")
    

### Results

#### Devel
```
Gold Dataset: /Devel

Partial Evaluation: only detection of DDI (regadless to the type)
tp      fp      fn      total   prec    recall  F1
135     227     349     484     0,3729  0,2789  0,3191


Detection and classification of DDI
tp      fp      fn      total   prec    recall  F1
122     240     362     484     0,337   0,2521  0,2884


________________________________________________________________________

SCORES FOR DDI TYPE
Scores for ddi with type mechanism
tp      fp      fn      total   prec    recall  F1
60      134     141     201     0,3093  0,2985  0,3038


Scores for ddi with type effect
tp      fp      fn      total   prec    recall  F1
43      52      119     162     0,4526  0,2654  0,3346


Scores for ddi with type advise
tp      fp      fn      total   prec    recall  F1
17      32      102     119     0,3469  0,1429  0,2024


Scores for ddi with type int
tp      fp      fn      total   prec    recall  F1
2       22      0       2       0,0833  1       0,1538


MACRO-AVERAGE MEASURES FOR DDI CLASSIFICATION:
        P       R       F1
        0,298   0,4267  0,351
________________________________________________________________________
```
#### Test
```
Gold Dataset: /Test-DDI

Partial Evaluation: only detection of DDI (regadless to the type)
tp      fp      fn      total   prec    recall  F1
260     452     719     979     0,3652  0,2656  0,3075


Detection and classification of DDI
tp      fp      fn      total   prec    recall  F1
214     498     765     979     0,3006  0,2186  0,2531


________________________________________________________________________

SCORES FOR DDI TYPE
Scores for ddi with type mechanism
tp      fp      fn      total   prec    recall  F1
88      182     214     302     0,3259  0,2914  0,3077


Scores for ddi with type effect
tp      fp      fn      total   prec    recall  F1
48      53      312     360     0,4752  0,1333  0,2082


Scores for ddi with type advise
tp      fp      fn      total   prec    recall  F1
32      121     189     221     0,2092  0,1448  0,1711


Scores for ddi with type int
tp      fp      fn      total   prec    recall  F1
46      142     50      96      0,2447  0,4792  0,3239


MACRO-AVERAGE MEASURES FOR DDI CLASSIFICATION:
        P       R       F1
        0,3138  0,2622  0,2857
```

### Conclusions

The final results obtained are:



## Goal 2: ML-based DDI
### Introduction

### Results

#### Devel

#### Test


### Conclusions

The final results obtained are: