# Calculation of ICR scores

Ressources: https://doi.org/10.1177%2F1609406919899220

"Researchers often cite Landis and Koch’s (1977) recommendation of interpreting values less than 0 as indicating no, between 0 and 0.20 as slight, 0.21 and 0.40 as fair, 0.41 and 0.60 as moderate, 0.61 and 0.80 as substantial, and 0.81 and 1 as nearly perfect agreement.

All such guidelines are ultimately arbitrary, and the researcher must judge what represents acceptable agreement for a particular study. Studies that influence important medical, policy, or financial decisions arguably merit a higher ICR threshold than exploratory academic research (Hruschka et al., 2004; Lombard et al., 2002). For instance, McHugh (2012) proposes a more conservative system of acceptability thresholds when using Cohen’s kappa coefficients in the context of clinical decision-making. Whatever interpretative framework is chosen should be stipulated in advance and not decided post hoc after results are viewed.
"

History:

In [1]:
################## CONFIG ########################
CATEGORY_LEVEL = "cro_sub_type_combined" # cro, cro_sub_type, cro_sub_type_combined
JOIN_STRATEGY = "inner" # inner, outer, left, right
##################################################

import sys
import pandas as pd

import nltk
from nltk.metrics.agreement import AnnotationTask
from nltk.metrics import masi_distance

sys.path.append('..')
import data
from data.labels_postprocessing import process

# Load files
df_a = pd.read_pickle("/Users/david/Nextcloud/Dokumente/Education/Uni Bern/Master Thesis/Analyzing Financial Climate Disclosures with NLP/Labelling/annual reports/icr/initial 15/Firm_AnnualReport_Labels_DF.pkl")
df_b = pd.read_pickle("/Users/david/Nextcloud/Dokumente/Education/Uni Bern/Master Thesis/Analyzing Financial Climate Disclosures with NLP/Labelling/annual reports/icr/initial 15/Firm_AnnualReport_Labels_TS.pkl")
df_b.to_csv("/Users/david/Nextcloud/Dokumente/Education/Uni Bern/Master Thesis/Analyzing Financial Climate Disclosures with NLP/Labelling/annual reports/icr/initial 15/Firm_AnnualReport_Labels_TS.csv")


# Run postprocessing
df_a = process(df_a)
df_b = process(df_b)

# Set id
id_columns = ['report_id', 'page', 'paragraph_no']
df_a["id"] = df_a.apply(lambda row: "_".join([str(row[c]) for c in id_columns]), axis=1)
df_b["id"] = df_b.apply(lambda row: "_".join([str(row[c]) for c in id_columns]), axis=1)

# Special case: Remove "indirect" since those were not labelled by B
df_a = df_a.query("indirect == False")

paragraphs_a = pd.DataFrame(df_a.groupby(id_columns).size(), columns=['count'])
paragraphs_b = pd.DataFrame(df_b.groupby(id_columns).size(), columns=['count'])
paragraphs = paragraphs_a.join(paragraphs_b, on=id_columns, how=JOIN_STRATEGY, lsuffix='_a', rsuffix='_b')
total_paragraphs = paragraphs_a.join(paragraphs_b, on=id_columns, how="outer", lsuffix='_a', rsuffix='_b')

def invert_labels(df, paragraph_id):
    labels = df[df.id == paragraph_id][CATEGORY_LEVEL].unique()
    result = []
    # TODO: Sort?
    for l in labels:
        result.append(l)
    return frozenset(result if len(result) else ['NaN'])

labels = []
for index, row in paragraphs.iterrows():
    paragraph_id = "_".join(str(v) for v in index)
    a = ('coder_a', paragraph_id, invert_labels(df_a, paragraph_id))
    b = ('coder_b', paragraph_id, invert_labels(df_b, paragraph_id))
    labels.append(a)
    labels.append(b)
    
    
task = AnnotationTask(data=labels, distance = masi_distance)
print(f"Krippendorff's Alpha: \t{round(task.alpha(),3)}")
print(f"Cohen's Kappa: \t\t{round(task.kappa(),3)}")

Krippendorff's Alpha: 	0.56
Cohen's Kappa: 		0.571


In [174]:
print(len(paragraphs_a), len(paragraphs_b), len(total_paragraphs), len(paragraphs))

72 63 100 35
