# Calculates Cohen Kappa for sentence level annotations

This notebook calculates the Cohen Kappa scores for sentence level annotations from export of the InCepTion tool. These exports are stored in the IAA folder for five groups of two annotators: G1, G2, G3, G4, G5. Each group has two tsv files in their subfolder.

InCepTion stores the annotations on rows for tokens. Rows have a document identifier, a sentence-token identifier and the annotation label. If there is no annotation, the "_" is given.

We process the data group by group and calculte the Kappa scores for each label separately. Annotations are aggregated at the sentence level. 
If there are multiple labels per sentence, we select the label under investigation from either annotator and check if the other annotator has the same label or some other label.

The results are aggregated in a Pandas frame and saved to a CSV file at the end after processing all the annotators.

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.cohen_kappa_score.html


In [1]:
from sklearn.metrics import cohen_kappa_score
from sklearn.metrics import confusion_matrix

import numpy as np
import pandas as pd
import os, random, glob, json, re

import util

results_df = pd.DataFrame(columns=['Label', 'G1', 'G2', 'G3', 'G4', 'G5'])


In [2]:
### Given two annotators
group='G1'
base="./IAA/"+group+"/IAA_"
name1="avelli"
name2="vervaart"

print('annotator:',name1)
file1=base+name1+".tsv"
df1 = pd.read_csv(file1,sep="\t")
print(df1.info())
print(df1.head())
print()
df1['annotation'].fillna("_")

print('annotator:',name2)
file2=base+name2+".tsv"
df2 = pd.read_csv(file2,sep="\t")
print(df2.info())
print(df2.head())
df2['annotation'].fillna("_")

annotator: avelli
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4255 entries, 0 to 4254
Data columns (total 3 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   note_id        4255 non-null   object
 1   sen_id-tok_id  4255 non-null   object
 2   annotation     3681 non-null   object
dtypes: object(3)
memory usage: 99.9+ KB
None
                                             note_id sen_id-tok_id annotation
0  VUmc--5113--1834251--414676213--2020-04-13--q1...           1-1          _
1  VUmc--5113--1834251--414676213--2020-04-13--q1...           1-2          _
2  VUmc--5113--1834251--414676213--2020-04-13--q1...           2-1          _
3  VUmc--5113--1834251--414676213--2020-04-13--q1...           2-2          _
4  VUmc--5113--1834251--414676213--2020-04-13--q1...           2-3          _

annotator: vervaart
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4255 entries, 0 to 4254
Data columns (total 3 columns):
 #   Column         

0       _
1       _
2       _
3       _
4       _
       ..
4250    _
4251    _
4252    _
4253    _
4254    _
Name: annotation, Length: 4255, dtype: object

### Clean the annotations

In [3]:
print('DF1')
df1= util.clean_df(df1)
print('DF2')
df2= util.clean_df(df2)

DF1
Length: 4255
DF2
Length: 4255


The number of token ids should be the same across the two annotators:

In [4]:
print('Number of token ids', len(util.get_token_ids(df1)))
print('Number of token ids', len(util.get_token_ids(df2)))

Number of token ids 4255
Number of token ids 4255


## Getting the  annotations at the sentence level

We get some statistics on the distribution of the annotations at the sentence level

In [5]:
sentence_label_dict1 = util.sentence_anno(df1)
print('Number of sentences:',len(sentence_label_dict1))

sentence_label_dict2 = util.sentence_anno(df2)
print('Number of sentences:',len(sentence_label_dict1))

Number of sentences: 731
Number of sentences: 731


In [6]:
key1=list(sentence_label_dict1.keys())[450]
key2=list(sentence_label_dict2.keys())[450]
print(key1)
print(sentence_label_dict1[key1])
print(key2)
print(sentence_label_dict2[key2])

VUmc--5119--1834251--418648242--2020-04-29--q1q2--Search1_151
['_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_']
VUmc--5119--1834251--418648242--2020-04-29--q1q2--Search1_151
['_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_', '_']


Next, we need to make these annotations unique sets.

In [7]:
setdict1 = util.get_sentence_set_anno_dict(sentence_label_dict1)
setdict2 = util.get_sentence_set_anno_dict(sentence_label_dict2)

Check if this worked out and sentence ids are correctly aligned

In [8]:
keys1=setdict1.items()
keys2=list(setdict2.items())

cnt = 0
for key1, value1 in setdict1.items():
    value2=setdict2[key1]
    print(key1, value1, value2)
    cnt+=1
    if (cnt==10):
        break

VUmc--5113--1834251--414676213--2020-04-13--q1q2--Search1_1 {'_'} {'_'}
VUmc--5113--1834251--414676213--2020-04-13--q1q2--Search1_2 {'_'} {'_'}
VUmc--5113--1834251--414676213--2020-04-13--q1q2--Search1_3 {'_'} {'_'}
VUmc--5113--1834251--414676213--2020-04-13--q1q2--Search1_4 {'_'} {'_'}
VUmc--5113--1834251--414676213--2020-04-13--q1q2--Search1_5 {'_'} {'_'}
VUmc--5113--1834251--414676213--2020-04-13--q1q2--Search1_6 {'_'} {'_'}
VUmc--5113--1834251--414676213--2020-04-13--q1q2--Search1_7 {'_'} {'_'}
VUmc--5113--1834251--414676213--2020-04-13--q1q2--Search1_8 {'_'} {'_'}
VUmc--5113--1834251--414676213--2020-04-13--q1q2--Search1_9 {'_'} {'_'}
VUmc--5113--1834251--414676213--2020-04-13--q1q2--Search1_10 {'_'} {'_'}


## Getting the Cohen Kappa scores for multiple values per sentence

In [9]:
annotation_list=[]
for key1, value1 in setdict1.items():
    value2=setdict2[key1]
    for item in value1:
        annotation_list.append(item)
    for item in value2:
        annotation_list.append(item)
    
### Unique set of labels
annotation_labels = set(annotation_list)
print(annotation_labels)

{'stm\\_reaction', '.B455: Inspanningstolerantie', '_', 'STM 4', 'type\\_Background', '.B152: Stemming', 'info\\_Third party', 'FAC 4', 'FAC 1', 'STM 3', '.D450: Lopen en zich verplaatsen', 'target', 'lop\\_hulpmiddel', 'STM 1'}


In [10]:
kappas={}
for label in annotation_labels:
    kappa=util.get_kappa_for_label(setdict1, setdict2, label)
    kappas[label]=kappa

In [11]:
import pprint
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(kappas)

{   '.B152: Stemming': 0.83265194875988,
    '.B455: Inspanningstolerantie': 0.4325312214438013,
    '.D450: Lopen en zich verplaatsen': 0.4334450197989643,
    'FAC 1': 0.49959116925592806,
    'FAC 4': 0.6233156390363415,
    'STM 1': 0.0,
    'STM 3': 0.4991830065359477,
    'STM 4': 0.49836867862969003,
    '_': -0.024315200205119458,
    'info\\_Third party': 0.7992151733158928,
    'lop\\_hulpmiddel': 0.49850258644160084,
    'stm\\_reaction': 0.6659400544959129,
    'target': 0.5227648752987315,
    'type\\_Background': 0.46888210859096424}


In [12]:
for key, value in kappas.items():
     results_df = util.add_new_row_with_value(results_df, key, value, group)
results_df

Unnamed: 0,Label,G1,G2,G3,G4,G5
0,stm\_reaction,0.66594,-1,-1,-1,-1
1,.B455: Inspanningstolerantie,0.432531,-1,-1,-1,-1
2,_,-0.024315,-1,-1,-1,-1
3,STM 4,0.498369,-1,-1,-1,-1
4,type\_Background,0.468882,-1,-1,-1,-1
5,.B152: Stemming,0.832652,-1,-1,-1,-1
6,info\_Third party,0.799215,-1,-1,-1,-1
7,FAC 4,0.623316,-1,-1,-1,-1
8,FAC 1,0.499591,-1,-1,-1,-1
9,STM 3,0.499183,-1,-1,-1,-1


## G2

In [13]:
### Given two annotators
group='G2'
base="./IAA/"+group+"/IAA_"
name1="bos"
name2="meskers"

file1=base+name1+".tsv"
df1 = pd.read_csv(file1,sep="\t")
df1['annotation'].fillna("_")

file2=base+name2+".tsv"
df2 = pd.read_csv(file2,sep="\t")
df2['annotation'].fillna("_")

df1= util.clean_df(df1)
df2= util.clean_df(df2)

sentence_label_dict1 = util.sentence_anno(df1)
sentence_label_dict2 = util.sentence_anno(df2)

setdict1 = util.get_sentence_set_anno_dict(sentence_label_dict1)
setdict2 = util.get_sentence_set_anno_dict(sentence_label_dict2)

annotation_list=[]
for key1, value1 in setdict1.items():
    value2=setdict2[key1]
    for item in value1:
        annotation_list.append(item)
    for item in value2:
        annotation_list.append(item)
### Unique set of labels
annotation_labels = set(annotation_list)

kappas={}
for label in annotation_labels:
    kappa=util.get_kappa_for_label(setdict1, setdict2, label)
    kappas[label]=kappa


Length: 6464
Length: 6464


In [14]:
for key, value in kappas.items():
     results_df = util.add_new_row_with_value(results_df, key, value, group)
        
results_df

Unnamed: 0,Label,G1,G2,G3,G4,G5
0,stm\_reaction,0.66594,0.499667,-1,-1,-1
1,.B455: Inspanningstolerantie,0.432531,0.749501,-1,-1,-1
2,_,-0.024315,-0.0523888,-1,-1,-1
3,STM 4,0.498369,0.499667,-1,-1,-1
4,type\_Background,0.468882,0.556916,-1,-1,-1
5,.B152: Stemming,0.832652,0.530118,-1,-1,-1
6,info\_Third party,0.799215,0.211848,-1,-1,-1
7,FAC 4,0.623316,-1.0,-1,-1,-1
8,FAC 1,0.499591,-1.0,-1,-1,-1
9,STM 3,0.499183,-1.0,-1,-1,-1


## G3

In [15]:
### Given two annotators
group='G3'
base="./IAA/"+group+"/IAA_"
name1="katsburg"
name2="opsomer"

file1=base+name1+".tsv"
df1 = pd.read_csv(file1,sep="\t")
df1['annotation'].fillna("_")

file2=base+name2+".tsv"
df2 = pd.read_csv(file2,sep="\t")
df2['annotation'].fillna("_")

df1= util.clean_df(df1)
df2= util.clean_df(df2)

sentence_label_dict1 = util.sentence_anno(df1)
sentence_label_dict2 = util.sentence_anno(df2)

setdict1 = util.get_sentence_set_anno_dict(sentence_label_dict1)
setdict2 = util.get_sentence_set_anno_dict(sentence_label_dict2)

annotation_list=[]
for key1, value1 in setdict1.items():
    value2=setdict2[key1]
    for item in value1:
        annotation_list.append(item)
    for item in value2:
        annotation_list.append(item)
### Unique set of labels
annotation_labels = set(annotation_list)

kappas={}
for label in annotation_labels:
    kappa=util.get_kappa_for_label(setdict1, setdict2, label)
    kappas[label]=kappa


Length: 5968
Length: 5968


In [16]:
for key, value in kappas.items():
     results_df = util.add_new_row_with_value(results_df, key, value, group)

results_df

Unnamed: 0,Label,G1,G2,G3,G4,G5
0,stm\_reaction,0.66594,0.499667,0.749036,-1,-1
1,.B455: Inspanningstolerantie,0.432531,0.749501,0.666209,-1,-1
2,_,-0.024315,-0.0523888,-0.00328042,-1,-1
3,STM 4,0.498369,0.499667,0.749614,-1,-1
4,type\_Background,0.468882,0.556916,0.373581,-1,-1
5,.B152: Stemming,0.832652,0.530118,0.634937,-1,-1
6,info\_Third party,0.799215,0.211848,1.0,-1,-1
7,FAC 4,0.623316,-1.0,-1.0,-1,-1
8,FAC 1,0.499591,-1.0,0.499743,-1,-1
9,STM 3,0.499183,-1.0,0.499743,-1,-1


## G4

In [17]:
### Given two annotators
group='G4'
base="./IAA/"+group+"/IAA_"
name1="swartjes"
name2="vanderpas"

file1=base+name1+".tsv"
df1 = pd.read_csv(file1,sep="\t")
df1['annotation'].fillna("_")

file2=base+name2+".tsv"
df2 = pd.read_csv(file2,sep="\t")
df2['annotation'].fillna("_")

df1= util.clean_df(df1)
df2= util.clean_df(df2)

sentence_label_dict1 = util.sentence_anno(df1)
sentence_label_dict2 = util.sentence_anno(df2)

setdict1 = util.get_sentence_set_anno_dict(sentence_label_dict1)
setdict2 = util.get_sentence_set_anno_dict(sentence_label_dict2)

annotation_list=[]
for key1, value1 in setdict1.items():
    value2=setdict2[key1]
    for item in value1:
        annotation_list.append(item)
    for item in value2:
        annotation_list.append(item)
### Unique set of labels
annotation_labels = set(annotation_list)

kappas={}
for label in annotation_labels:
    kappa=util.get_kappa_for_label(setdict1, setdict2, label)
    kappas[label]=kappa


Length: 3688
Length: 3688


In [18]:
for key, value in kappas.items():
     results_df = util.add_new_row_with_value(results_df, key, value, group)
results_df

Unnamed: 0,Label,G1,G2,G3,G4,G5
0,stm\_reaction,0.66594,0.499667,0.749036,0.499494,-1
1,.B455: Inspanningstolerantie,0.432531,0.749501,0.666209,-1.0,-1
2,_,-0.024315,-0.0523888,-0.00328042,-0.0447761,-1
3,STM 4,0.498369,0.499667,0.749614,-1.0,-1
4,type\_Background,0.468882,0.556916,0.373581,0.13656,-1
5,.B152: Stemming,0.832652,0.530118,0.634937,0.0,-1
6,info\_Third party,0.799215,0.211848,1.0,0.396151,-1
7,FAC 4,0.623316,-1.0,-1.0,-1.0,-1
8,FAC 1,0.499591,-1.0,0.499743,-1.0,-1
9,STM 3,0.499183,-1.0,0.499743,-1.0,-1


## G5

In [20]:
### Given two annotators
group='G5'
base="./IAA/"+group+"/IAA_"
name1="edwin"
name2="sabina"

file1=base+name1+".tsv"
df1 = pd.read_csv(file1,sep="\t")
df1['annotation'].fillna("_")

file2=base+name2+".tsv"
df2 = pd.read_csv(file2,sep="\t")
df2['annotation'].fillna("_")

df1= util.clean_df(df1)
df2= util.clean_df(df2)

sentence_label_dict1 = util.sentence_anno(df1)
sentence_label_dict2 = util.sentence_anno(df2)

setdict1 = util.get_sentence_set_anno_dict(sentence_label_dict1)
setdict2 = util.get_sentence_set_anno_dict(sentence_label_dict2)

annotation_list=[]
for key1, value1 in setdict1.items():
    try:
        value2=setdict2[key1]
        for item in value1:
            annotation_list.append(item)
        for item in value2:
            annotation_list.append(item)
    except:
        print('Line mismatch', key1)
        
### Unique set of labels
annotation_labels = set(annotation_list)

kappas={}
for label in annotation_labels:
    kappa=util.get_kappa_for_label(setdict1, setdict2, label)
    kappas[label]=kappa


Length: 6353
Length: 5676
Line mismatch VUmc--4222--1801027--402487522--2020-02-28--q1q2--Search2_1
Line mismatch VUmc--4222--1801027--402487522--2020-02-28--q1q2--Search2_2
Line mismatch VUmc--4222--1801027--402487522--2020-02-28--q1q2--Search2_3
Line mismatch VUmc--4222--1801027--402487522--2020-02-28--q1q2--Search2_4
Line mismatch VUmc--4222--1801027--402487522--2020-02-28--q1q2--Search2_5
Line mismatch VUmc--4222--1801027--402487522--2020-02-28--q1q2--Search2_6
Line mismatch VUmc--4222--1801027--402487522--2020-02-28--q1q2--Search2_7
Line mismatch VUmc--4222--1801027--402487522--2020-02-28--q1q2--Search2_8
Line mismatch VUmc--4222--1801027--402487522--2020-02-28--q1q2--Search2_9
Line mismatch VUmc--4222--1801027--402487522--2020-02-28--q1q2--Search2_10
Line mismatch VUmc--4222--1801027--402487522--2020-02-28--q1q2--Search2_11
Line mismatch VUmc--4222--1801027--402487522--2020-02-28--q1q2--Search2_12
Line mismatch VUmc--4222--1801027--402487522--2020-02-28--q1q2--Search2_13
Line mis

In [21]:
for key, value in kappas.items():
     results_df = util.add_new_row_with_value(results_df, key, value, group)
results_df

Unnamed: 0,Label,G1,G2,G3,G4,G5
0,stm\_reaction,0.66594,0.499667,0.749036,0.499494,0.0
1,.B455: Inspanningstolerantie,0.432531,0.749501,0.666209,-1.0,0.0
2,_,-0.024315,-0.0523888,-0.00328042,-0.0447761,-0.00270911
3,STM 4,0.498369,0.499667,0.749614,-1.0,-1.0
4,type\_Background,0.468882,0.556916,0.373581,0.13656,0.104683
5,.B152: Stemming,0.832652,0.530118,0.634937,0.0,0.0
6,info\_Third party,0.799215,0.211848,1.0,0.396151,0.0
7,FAC 4,0.623316,-1.0,-1.0,-1.0,-1.0
8,FAC 1,0.499591,-1.0,0.499743,-1.0,-1.0
9,STM 3,0.499183,-1.0,0.499743,-1.0,0.0


## Save the final results to a CSV file

In [22]:
result="iaa.csv"
results_df.to_csv(result)

## End of this notebook