# Notebook: Calculate Agreement

This notebook is used to calculate the inter-rater agreement using Krippendorf's Alpha.
<br>**Contributors:** [Nils Hellwig](https://github.com/NilsHellwig/) | [Markus Bink](https://github.com/MarkusBink/)

## Packages

In [125]:
from statsmodels.stats import inter_rater as irr
import krippendorff
import pandas as pd
import numpy as np
import glob
import os

## Parameters

In [85]:
ANNOTATED_DATASET_PATH = "../Datasets/annotated_dataset/*.xlsx"
ANNOTATED_DATASET_PATH = "../Datasets/annotated_dataset/"

## Code

### 1. Load Annotations

In [3]:
file_list = sorted(glob.glob(ANNOTATED_DATASET_PATH))
file_list

['../Datasets/annotated_dataset/tweets_session_1_1.xlsx',
 '../Datasets/annotated_dataset/tweets_session_1_2.xlsx',
 '../Datasets/annotated_dataset/tweets_session_2_1.xlsx',
 '../Datasets/annotated_dataset/tweets_session_2_2.xlsx',
 '../Datasets/annotated_dataset/tweets_session_2_3.xlsx']

In [143]:
anno_first = pd.read_excel(os.path.join(ANNOTATED_DATASET_PATH, "tweets_session_1_1.xlsx"))
anno_first = pd.concat([anno_first, pd.read_excel(os.path.join(ANNOTATED_DATASET_PATH, "tweets_session_2_1.xlsx"))])
anno_first['sentiment'] = anno_first['sentiment'].astype('category').cat.codes
anno_first.rename(columns={'sentiment': 'sentiment_1'}, inplace=True)

anno_second = pd.read_excel(os.path.join(ANNOTATED_DATASET_PATH, "tweets_session_1_2.xlsx"))
anno_second = pd.concat([anno_second, pd.read_excel(os.path.join(ANNOTATED_DATASET_PATH, "tweets_session_2_2.xlsx"))])
anno_second['sentiment'] = anno_second['sentiment'].astype('category').cat.codes
anno_second.rename(columns={'sentiment': 'sentiment_2'}, inplace=True)

anno_all = anno_first[['id', 'sentiment_1']]
anno_all = pd.concat([anno_all, anno_second['sentiment_2']], axis=1)

anno_all

Unnamed: 0,id,sentiment_1,sentiment_2
0,1.460589e+18,1,0
1,1.470141e+18,2,1
2,1.405951e+18,2,2
3,1.350127e+18,2,2
4,1.443989e+18,1,1
...,...,...,...
995,1.400721e+18,1,1
996,1.415980e+18,2,2
997,1.419539e+18,1,2
998,1.436027e+18,2,2


### 2. Calculate Krippendorff's Alpha

In [140]:
# Rows are the coders (annotators) # of coders
# Columns are the individual items (sentiment of tweet) # of tweets
value_counts = anno_all.loc[:, anno_all.columns != 'id']
value_counts = value_counts.to_numpy().transpose()

krippendorff.alpha(value_counts=value_counts, level_of_measurement="nominal")

-0.000150556794942025

### 3. Calculate Fleiss' Kappa

In [141]:
agg = irr.aggregate_raters(anno_all)
agg

(array([[0, 1, 1, ..., 0, 0, 0],
        [0, 0, 1, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 1, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 1, ..., 0, 0, 0]]),
 array([-1.00000000e+00,  0.00000000e+00,  1.00000000e+00, ...,
         1.47648526e+18,  1.47652585e+18,  1.47688916e+18]))

In [142]:
irr.fleiss_kappa(agg[0], method='fleiss')

0.12117977062934199