## 1. Preparation

**Before you start this notebook,** you need to:
- Create a shortcut of the "CMCLS" folder in your Google Drive (so you can access all annotations)

Once everything is ready, you can start preparing the procedure.  
**First step:** Install all libraries  
*(Note that you will have to authorize connecting the Notebook to your Google Drive)*

In [None]:
import pandas as pd
import numpy as np
import os

from google.colab import drive
drive.mount('/content/drive')

**Second step:** Read the dataset and show it

In [None]:
# read the dataset
df = pd.read_excel("/content/drive/MyDrive/CMCLS/curation.xlsx")

# list labels
labels = list(set(df['curation']))
for label in labels:
  print(f'curation = {label}, count = {df["curation"].tolist().count(label)}')

# show data
df = df[['sentence', 'curation', 'group']]
df = df.rename(columns={'curation': 'label'})
df['sentence'] = df['sentence'].astype('string')
df

## 2. Evaluate ChatGPT annotations

**Third step:** Load the GPT annotations and evaluate them.

In [None]:
my_group = "basic_prompt"

from sklearn.metrics import classification_report

test = df[:98]
predict = pd.read_excel("/content/drive/MyDrive/CMCLS/prompt_engineering/results/"+my_group+".xlsx")

# make predictions
true_labels = []
predicted_labels = []
sentence = []

for i in range(predict.shape[0]):
  predicted_labels.append(predict.iloc[i,2])

for i in range(test.shape[0]):
  sentence.append(test.iloc[i,0])
  true_labels.append(test.iloc[i,1])

# print and save report
report = classification_report(true_labels,predicted_labels,digits=3)
print(report)

# compare predictions and truth
predict['true'] = true_labels
predict['sentence'] = sentence
predict

**Fourth step:** Calculate Kappa to compare with humans

In [None]:
from sklearn.metrics import cohen_kappa_score

# read the dataset
df = pd.read_excel("/content/drive/MyDrive/CMCLS/curation.xlsx")
df = df[:98]

# Select the two columns to compare
col1 = df['annotator_1'].values
col2 = df['annotator_2'].values

# Calculate Cohen's Kappa scores
print("Kappa between annotators: ")
print(cohen_kappa_score(col1, col2))
print("\nKappa between annotator_1 and ChatGPT: ")
print(cohen_kappa_score(col1, predicted_labels))
print("\nKappa between annotator_2 and ChatGPT:")
print(cohen_kappa_score(predicted_labels, col2))