# Intercoder reliability on the intruder task

I'm setting out to measure two things.

1. The percentage of questions each of our coders simply *gets right,* and
2. The extent of *agreement* above what would be expected by chance.

In [1]:
import pandas as pd

We load both dataframes.

In [3]:
ls *results.csv   

han_intruder_final_results.csv    mimno_intruder_final_results.csv


In [7]:
han = pd.read_csv('han_intruder_final_results.csv')
mimno = pd.read_csv('mimno_intruder_final_results.csv')
han.head()

Unnamed: 0,orig_index,user,most_plausible,response_0,response_1,response_2,response_3
0,0,Describe Los Angeles.,1,Los Angeles is a city and the county-seat of L...,"Los Angeles, often referred to as the City of ...","Los Angeles, the county seat of Los Angeles co...","Los Angeles, often referred to as the ""City of..."
1,1,What public offices may a woman hold in England?,1,"As of 1914, women in England have made some st...",Women in England may fill some of the highest ...,Under the provisions of the Local Government A...,"In 1914, the opportunities for women to hold p..."
2,2,What are the ethnological origins of America's...,2,The ethnological origins of America's aborigin...,Whether with Payne it is assumed that in some ...,The aboriginal inhabitants of America are gene...,The question of the ethnological origins of Am...
3,3,Describe the character of the Australian abori...,2,"The indigenous peoples of Australia, commonly ...",The Australian Aborigines are often described ...,"In disposition the Australians are a bright, l...",The Australian aborigines are a race of men wh...
4,4,What is the significance of Yamagata Aritomo i...,2,Yamagata Aritomo (1838-1922) was a prominent J...,Yamagata Aritomo was a prominent figure in Jap...,Prince Yamagata Aritomo (1838–) was a Japanese...,Yamagata Aritomo is a significant figure in Ja...


We also load the mapping between models and columns, and use it to construct a dictionary of correct answers.

In [8]:
mapping = pd.read_csv('intruder_mapping.tsv', sep='\t')
mapping.head()

Unnamed: 0,row,intruder_idx,response_0,response_1,response_2,response_3
0,0,250,assistant,4omini-raw,4omini-ft,4obig
1,1,170,4omini-raw,assistant,4omini-ft,4obig
2,2,164,4omini-raw,assistant,4omini-ft,4obig
3,3,137,4obig,4omini-raw,assistant,4omini-ft
4,4,225,4omini-ft,4omini-raw,assistant,4obig


In [9]:
correct = dict()
for idx, row in mapping.iterrows():
    # We check the columns response_0, response_1, response_2, response_3
    # to see which one has 'assistant' in it.
    # the digit part of the column name is then the correct answer;
    # we place it in correct[idx].DS_Store

    for i in range(4):
        if 'assistant' in row[f'response_{i}']:
            correct[idx] = i
            break

## Accuracy

In [12]:
allright = 0
hanright = 0
mimnoright = 0
alltotal = 0

for idx, row in han.iterrows():
    alltotal += 1
    orig_idx = int(row['orig_index'])
    if row['most_plausible'] == correct[orig_idx]:
        hanright += 1
        allright += 1

for idx, row in mimno.iterrows():
    alltotal += 1
    orig_idx = int(row['orig_index'])
    if row['most_plausible'] == correct[orig_idx]:
        mimnoright += 1
        allright += 1

print(f'All: {allright}/{alltotal} = {allright/alltotal:.2f}')
print(f'Han: {hanright}/{len(han)} = {hanright/len(han):.2f}')
print(f'Mimno: {mimnoright}/{len(mimno)} = {mimnoright/len(mimno):.2f}')


All: 56/100 = 0.56
Han: 19/50 = 0.38
Mimno: 37/50 = 0.74


## Krippendorff's alpha

In [21]:
# find the subset of rows in han and mimno that
# have the same orig_index; in other words
# the intersection of the two dataframes on that column

han['orig_index'] = han['orig_index'].astype(int)
mimno['orig_index'] = mimno['orig_index'].astype(int)

both_coded = pd.merge(han, mimno, on='orig_index')
both_coded.shape


(25, 13)

In [14]:
both_coded.head()

Unnamed: 0,orig_index,user_x,most_plausible_x,response_0_x,response_1_x,response_2_x,response_3_x,user_y,most_plausible_y,response_0_y,response_1_y,response_2_y,response_3_y
0,50,What are the most notable buildings of Barcelona?,3,"Barcelona, a city rich in history and architec...",Visitors to Barcelo should see the Casa Consis...,"Barcelona, a city known for its vibrant cultur...",Notable buildings in Barcelona include the cat...,What are the most notable buildings of Barcelona?,1,"Barcelona, a city rich in history and architec...",Visitors to Barcelo should see the Casa Consis...,"Barcelona, a city known for its vibrant cultur...",Notable buildings in Barcelona include the cat...
1,51,What is a keyboard?,1,"In the early twentieth century, the term ""keyb...","A keyboard is a series of keys, levers or butt...","In 1914, a ""keyboard"" typically refers to the ...",A keyboard is a succession of keys for unlocki...,What is a keyboard?,3,"In the early twentieth century, the term ""keyb...","A keyboard is a series of keys, levers or butt...","In 1914, a ""keyboard"" typically refers to the ...",A keyboard is a succession of keys for unlocki...
2,52,What is telepathy?,1,Telepathy is a purely fictitious term invented...,Telepathy is the concept of transmitting thoug...,"TELEPATHY, or thought transference, is the con...",Telepathy is a concept that suggests the abili...,What is telepathy?,1,Telepathy is a purely fictitious term invented...,Telepathy is the concept of transmitting thoug...,"Telepathy, or thought transference, is the con...",Telepathy is a concept that suggests the abili...
3,53,Why is Yalta significant as a location?,3,Yalta is a fashionable summer resort with a ro...,Yalta is notable for being the site of the con...,"I must confess, I am unfamiliar with the term ...",Yalta is a significant location primarily know...,Why is Yalta significant as a location?,3,Yalta is a fashionable summer resort with a ro...,Yalta is notable for being the site of the con...,"I must confess, I am unfamiliar with the term ...",Yalta is a significant location primarily know...
4,54,What evidence is used to divide mankind into s...,2,"In the early twentieth century, the division o...",The classification of mankind into a number of...,The most important evidence is based on the pe...,"In the early twentieth century, the classifica...",What evidence is used to divide mankind into s...,1,"In the early twentieth century, the division o...",The classification of mankind into a number of...,The most important evidence is based on the pe...,"In the early twentieth century, the classifica..."


In [16]:
# percentage agreement on most_plausible in both_coded

agreement = (both_coded['most_plausible_x'] == both_coded['most_plausible_y']).mean()
agreement

np.float64(0.44)

In [17]:
import krippendorff

In [29]:
#find percentage agreement of the two coders on the 'most_plausible' column
agreement = (both_coded['most_plausible_x'] == both_coded['most_plausible_y']).mean()
agreement

np.float64(0.44)

In [28]:
alpha = krippendorff.alpha(both_coded[[('most_plausible_x'), 'most_plausible_y']],
                           level_of_measurement='nominal')
alpha

np.float64(-0.010726072607260884)

## Discussion

That's a very low alpha. Our coders get more questions right than wrong (they are right 56% of the time). But in the area of overlap between these two coders, they *agree* only 44% of the time, and Krippendorff's alpha is actually negative.

In [31]:
# print the subset of both_coded with just the columns
# 'orig_index', 'most_plausible_x', 'most_plausible_y'

both_coded[['orig_index', 'most_plausible_x', 'most_plausible_y']]

Unnamed: 0,orig_index,most_plausible_x,most_plausible_y
0,50,3,1
1,51,1,3
2,52,1,1
3,53,3,3
4,54,2,1
5,55,2,2
6,56,1,0
7,57,0,1
8,58,2,0
9,59,0,2


In [6]:
from difflib import SequenceMatcher

def similar(a, b):
    return SequenceMatcher(None, a, b).ratio()

stringA = 'Marie Curie'
stringB = 'Marie Sklodowska Curie'
stringC = 'Marie S. Curie'
stringD = 'Curie, Marie'

print(similar(stringA, stringB), "Marie Curie <-> Marie Sklodowska Curie")
print(similar(stringA, stringC), "Marie Curie <-> Marie S. Curie")
print(similar(stringA, stringD), "Marie Curie <-> Curie, Marie")



0.6666666666666666 Marie Curie <-> Marie Sklodowska Curie
0.88 Marie Curie <-> Marie S. Curie
0.43478260869565216 Marie Curie <-> Curie, Marie
