df_a.drop(df_a.query("cro == 'OP'"))# Calculation of ICR scores

Ressources: https://doi.org/10.1177%2F1609406919899220

"Researchers often cite Landis and Koch’s (1977) recommendation of interpreting values less than 0 as indicating no, between 0 and 0.20 as slight, 0.21 and 0.40 as fair, 0.41 and 0.60 as moderate, 0.61 and 0.80 as substantial, and 0.81 and 1 as nearly perfect agreement.

All such guidelines are ultimately arbitrary, and the researcher must judge what represents acceptable agreement for a particular study. Studies that influence important medical, policy, or financial decisions arguably merit a higher ICR threshold than exploratory academic research (Hruschka et al., 2004; Lombard et al., 2002). For instance, McHugh (2012) proposes a more conservative system of acceptability thresholds when using Cohen’s kappa coefficients in the context of clinical decision-making. Whatever interpretative framework is chosen should be stipulated in advance and not decided post hoc after results are viewed.
"

History:

In [10]:
################## CONFIG ########################
CATEGORY_LEVEL = "cro_sub_type_combined" # cro, cro_sub_type, cro_sub_type_combined
JOIN_STRATEGY = "inner" # inner, outer, left, right
##################################################

import sys
import pandas as pd
import numpy as np

import nltk
from nltk.metrics.agreement import AnnotationTask
from nltk.metrics import masi_distance, binary_distance

sys.path.append('../..')
import data
from data.labels_postprocessing import process

# Load files
df_a = pd.read_pickle("/Users/david/Nextcloud/Dokumente/Education/Uni Bern/Master Thesis/Analyzing Financial Climate Disclosures with NLP/Labelling/annual reports/icr/initial 15/Firm_AnnualReport_Labels_DF.pkl")
df_b = pd.read_pickle("/Users/david/Nextcloud/Dokumente/Education/Uni Bern/Master Thesis/Analyzing Financial Climate Disclosures with NLP/Labelling/annual reports/icr/initial 15/Firm_AnnualReport_Labels_TS.pkl")
#df_b.to_csv("/Users/david/Nextcloud/Dokumente/Education/Uni Bern/Master Thesis/Analyzing Financial Climate Disclosures with NLP/Labelling/annual reports/icr/initial 15/Firm_AnnualReport_Labels_TS.csv")

# Run postprocessing
df_a = process(df_a)
df_b = process(df_b)

# Set id
id_columns = ['report_id', 'page', 'paragraph_no']
df_a["id"] = df_a.apply(lambda row: "_".join([str(row[c]) for c in id_columns]), axis=1)
df_b["id"] = df_b.apply(lambda row: "_".join([str(row[c]) for c in id_columns]), axis=1)

# Special case: Remove "indirect" since those were not labelled by B
df_a = df_a.query("indirect == False")
df_a = df_a[df_a.comment.str.contains("inversed").replace(np.nan,False) == False] # Also, the "inversed" disclosures

# Remove erroneously labelled "interview" with a customer
df_b = df_b.iloc[3:]

# Only keep labelled paragraphs
df_a = df_a.query("cro == ['PR', 'TR', 'OP']")
df_b = df_b.query("cro == ['PR', 'TR', 'OP']")

# Remove OP and negative examples (that only A labelled)
df_a = df_a.query("cro == ['PR', 'TR']")
df_b = df_b.query("cro == ['PR', 'TR']")

paragraphs_a = pd.crosstab(df_a.id, df_a[CATEGORY_LEVEL], dropna=False)
paragraphs_b = pd.crosstab(df_b.id, df_b[CATEGORY_LEVEL], dropna=False)

paragraphs = paragraphs_a.join(paragraphs_b, how=JOIN_STRATEGY, lsuffix='_a', rsuffix='_b')
paragraphs = paragraphs.replace(np.nan, 0)
paragraphs = (paragraphs > 0)

In [11]:
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
columns = [c for c in paragraphs.columns if c.endswith("_a")]
mlb.fit([columns])

labels = []
for index, row in paragraphs.iterrows():
    labels_a = row[[c for c in paragraphs.columns if c.endswith("_a")]].to_numpy()
    labels_b = row[[c for c in paragraphs.columns if c.endswith("_b")]].to_numpy()
    labels_a = np.array([labels_a])
    labels_b = np.array([labels_b])
    labels_a = mlb.inverse_transform(labels_a)
    labels_b = mlb.inverse_transform(labels_b)
    
    a = ('coder_a', index, frozenset(labels_a))
    b = ('coder_b', index, frozenset(labels_b))

    labels.append(a)
    labels.append(b)

task = AnnotationTask(data=labels, distance = binary_distance)
print(f"Krippendorff's Alpha: \t{round(task.alpha(),3)}")
print(f"Cohen's Kappa: \t\t{round(task.kappa(),3)}")

Krippendorff's Alpha: 	0.69
Cohen's Kappa: 		0.667
