# Evaluation metrics

### Metrics description

<b>Def:</b> Event mention (EM) is entity or event phrase
<br>
<br>
tw-events score is average tweet score.
<br>
Tweet score is F1 score of EM detections.
<br>
If EM is inaccurate, true positives should be increased on F1 score of it's labels.

### Metrics formula

In [37]:
def event_mention_score(res, ann, label):
    tp = 0
    for i in range(len(res)):
        tp += int((res[i] == label) and (ann[i] == label))

    if (ann.count(label) == 0) or (res.count(label) == 0) or (tp == 0):
        return 0

    rec = tp / ann.count(label)
    prec = tp / res.count(label)

    return (2*rec*prec)/(rec+prec)

def tweet_score(res, ann):
    ann_ems = set(ann) - set('o')
    res_ems = set(res) - set('o')
    
    em_labels = {'en', 'ep', 'en1', 'ep1', 'en2', 'ep2'} # TODO: change on regexp
    assert ann_ems.issubset(em_labels)
    assert res_ems.issubset(em_labels) 
    assert len(res) == len(ann)
    
    tp = 0
    for label in res_ems:
        tp += event_mention_score(res, ann, label)
        
    if (len(ann_ems) == 0) or (len(res_ems) == 0) or (tp == 0):
        return 0

    rec = tp / len(ann_ems)
    prec = tp / len(res_ems)

    return (2*rec*prec)/(rec+prec)

### Examples

In [46]:
tweet               = ['Cant', 'wait', 'for', 'the', 'ravens', 'game', 'tomorrow', '....', 'go', 'ray', 'rice', '!!!!!!!']
ann                 = ['ep',   'ep',   'o',   'en',  'en',     'en',   'o',        'o',    'ep1','en1', 'en1',  'o']
missed_events       = ['o',    'o',    'o',   'o',   'o',      'o',    'o',        'o',    'o',   'o',  'o',    'o']
missed_event        = ['ep',   'ep',   'o',   'en',  'en',     'en',   'o',        'o',    'o',   'o',  'o',    'o']
missed_entity       = ['ep',   'ep',   'o',   'o',   'o',      'o',    'o',        'o',    'ep1','en1', 'en1',  'o']
fp_entity           = ['ep',   'ep',   'o',   'en',  'en',     'en',   'o',        'o',    'ep1','en1', 'en1',  'en2']
wrong_entity        = ['en',   'en',   'o',   'o',   'o',      'o',    'o',        'o',    'ep1','en1', 'en1',  'o']
inaccurate_entity   = ['ep',   'ep',   'o',   'o',   'en',     'en',   'en',       'o',    'ep1','en1', 'en1',  'o']

#### EM Score Exaples:

In [52]:
print('Missed entity: ', event_mention_score(missed_entity, ann, 'en'))
print('Wrong entity (covers event phrase): ', event_mention_score(wrong_entity, ann, 'en'))
print('Inaccurate entity: ', round(event_mention_score(inaccurate_entity, ann, 'en'), 2))
print('Correct entity: ', event_mention_score(ann, ann, 'en'))

Missed entity:  0
Wrong entity (covers event phrase):  0
Inaccurate entity:  0.67
Correct entity:  1.0


#### Tweet Score Exaples:

In [53]:
print('Missed events: ', round(tweet_score(missed_events, ann), 2))
print('Missed event: ', round(tweet_score(missed_event, ann), 2))
print('Missed entity: ', round(tweet_score(missed_entity, ann), 2))
print('False positive entity: ', round(tweet_score(fp_entity, ann), 2))
print('Wrong entity (covers event phrase): ', event_mention_score(wrong_entity, ann, 'en'))
print('Inaccurate entity: ', round(tweet_score(inaccurate_entity, ann), 2))

Missed events:  0
Missed event:  0.67
Missed entity:  0.86
False positive entity:  0.89
Wrong entity (covers event phrase):  0
Inaccurate entity:  0.92
