# Fleiss Kappa

Interpretation
> It can be interpreted as expressing the extent to which the observed amount of agreement among raters exceeds what would be expected if all raters made their ratings completely randomly.

The raters can rate different items whereas for Cohen's they need to rate the exact same items
> Fleiss' kappa specifically allows that although there are a fixed number of raters (e.g., three), different items may be rated by different individuals

\begin{equation*}
\kappa =  \frac{\bar p - \bar p_e}{1-\bar p_e}
\end{equation*}

# Worked Example

In the following example, 3 raters (n) assign 5 "subject" (N) to a total of 2 categories (k). The categories are presented in the columns, while the subjects are presented in the rows. Each cell lists the number of raters who assigned the indicated (row) subject to the indicated (column) category.

| nij   | yes     | no   | Pi     |
|-------|---------|------|---------
| 1     | 3       | 0    | 0.2    | 
| 2     | 1       | 2    | 0      | 
| 3     | 2       | 1    | 0      | 
| 4     | 0       | 3    | 0.2    | 
| 5     | 2       | 1    | 0      | 
| Total | 8       | 7    | 0.4    |
| pj    | 0.53    | 0.46 | 0.15   |


N = 5, n = 3, k = 2 (yes/no)

For example the first row (P_1):
```
P_1 = (3 ** 2 + 0 ** 2 - 5) / (5 * 4) = 4/20 = 1/5
```

```
P_2 = (1 ** 2 + 2 ** 2 - 5) / (5 * 4) = 0
```

And the first columns (p_1):
```
p_1 = 8 / (5 * 2) = 8/15 = 4/5 = 0.53
```

Go through the worked example [here](https://www.wikiwand.com/en/Fleiss'_kappa#/Worked_example) if this is not clear.

Now you can calculate Kappa:
```
P_bar = (1 / 5) * (0.2+0+0+0.2+0) = 0.08
P_bar_e = 0.53 ** 2 + 0.46 ** 2 = 0.28 + 0.21 = 0.49
```

At this point we have everything we need and `kappa` is calculated:
```
kappa = (0.08 - 0.49) / (1 - 0.49) = -0.41/0.51 = -0.8


# Interpretation

 - < 0	Poor agreement
 - 0.01 – 0.20	Slight agreement
 - 0.21 – 0.40	Fair agreement
 - 0.41 – 0.60	Moderate agreement
 - 0.61 – 0.80	Substantial agreement
 - 0.81 – 1.00	Almost perfect agreement

In [1]:
N = 26709

In [2]:
import csv
import sys
# number of raters n
n = 3

# category assignment (yes/no)
k = 2

# total number of tweets
N = 26709

# Proportion of all assignments to Yes
p_yes = 0
sum_of_yes_per_tweet = 0

# Proportion of all assignments to No
p_no = 0
sum_of_no_per_tweet = 0

total_extent = 0
sarcasm_corpus = 0
non_sarcasm_corpus = 0

In [3]:
## Construct a table

f = open("fleiss_kappa.csv", 'wt')
fleiss_kappa_writer = csv.writer(f)
i = 0
list_of_tuples = []
with open("rated.csv", 'r',encoding='utf-8', errors='ignore') as csvfile:
    reader = csv.reader(csvfile)
    for tweet,r1,r2,r3 in reader:
      if tweet and r1 and r2 and r3:
         i += 1
         yes_per_tweet = 0
         no_per_tweet = 0
         if(r1.lower() == "yes"):
            yes_per_tweet +=1
         if(r2.lower() == "yes"):
            yes_per_tweet += 1
         if(r3.lower() == "yes"):
            yes_per_tweet += 1
         if((r1.lower() == "no") or (r1.lower() == "not sure")):
            no_per_tweet += 1
         if((r2.lower() == "no") or (r2.lower() == "not sure")):
            no_per_tweet += 1
         if((r3.lower() == "no") or (r3.lower() == "not sure")):
            no_per_tweet += 1

         tuple = (i,yes_per_tweet,no_per_tweet)
         list_of_tuples.append(tuple)
         fleiss_kappa_writer.writerow([i,yes_per_tweet,no_per_tweet])

In [4]:
# Calculate Fleiss Kappa to find out how good is the agreement among raters

sum_of_all_yes = 0
sum_of_all_no = 0

list_of_P_i = []
for tweet,n_yes,n_no in list_of_tuples:
   sum_of_all_yes += n_yes
   sum_of_all_no += n_no
   list_of_P_i.append( (1/(float(n)*(n-1))) * (((n_yes**2) + (n_no**2)) - n) )

p_yes = (1/(float(N) * n)) * sum_of_all_yes
p_no = (1/(float(N) * n)) * sum_of_all_no


print("Proportion of all assignments to the YES category (p_yes): ",p_yes)
print("Proportion of all assignments to the NO category (p_no): " ,p_no)

sum_of_all_p_i = 0
for p in list_of_P_i:
   sum_of_all_p_i += p

p_dash = (sum_of_all_p_i/float(N))
print("Overall extent of agreement(p_mean): ", p_dash)
p_expected = (p_yes**2) + (p_no**2)

print("Mean proportion of agreement(p_expected): ", p_expected)

kappa = (p_dash - p_expected)/(1-p_expected)
print("KAPPA: ",kappa)



Proportion of all assignments to the YES category (p_yes):  0.501079536236225
Proportion of all assignments to the NO category (p_no):  0.498920463763775
Overall extent of agreement(p_mean):  0.5013665805534948
Mean proportion of agreement(p_expected):  0.5000023307969708
KAPPA:  0.0027285122322641907


# Exercise - work out Fleiss Kappa for the following chart


| nij   | yes     | no   | Pi     |
|-------|---------|------|---------
| 1     | 3       | 0    |        | 
| 2     | 0       | 3    |        | 
| 3     | 3       | 0    |        | 
| 4     | 0       | 3    |        | 
| 5     | 2       | 1    |        | 
| Total |         |      |        |
| pj    |         |      |        |

N = 5, n = 3, k = 2 (yes/no)

# Solution

N=5, n=3, k=2

P_1 (square and add rows) = (3**2 + 0**2)-3/3*2 = 9-3/6 = 1
P_2 = 1
P_3 = 1
P_4 = 1
P_5 = (2**2 + 1**2)-3/3*2 = 2/6 = 1/3

P = (1+1+1+1+1/3)/5 = 13/15 = 0.86

Pe yes = (3+0+3+0+2)/(3*5) = 8/15
Pe no = (0+3+0+3+1)/(3*5) = 7/15

Pe = (8/15)**2 + (7/15)**2 = 0.49

K = P-Pe/1-Pe = 0.86-0.49/1-0.49 ~= 0.73 = 73% substantial agreement


In [14]:
p_1 = ((3**2 + 0**2)-3)/(3*2)
p_2 = ((0**2 + 3**2)-3)/(3*2)
p_3 = ((3**2 + 0**2)-3)/(3*2)
p_4 = ((0**2 + 3**2)-3)/(3*2)
p_5 = ((2**2 + 1**2)-3)/(3*2)
print(p_1,p_2,p_3,p_4,p_5)

p_bar = (p_1+p_2+p_3+p_4+p_5)/5
print(p_bar)

1.0 1.0 1.0 1.0 0.3333333333333333
0.8666666666666666


In [10]:
Pe_yes = ((3+0+3+0+2)/(5*3))**2
print(Pe_yes)

0.28444444444444444


In [11]:
Pe_no = ((0+3+0+3+1)/(5*3))**2
print(Pe_no)

0.2177777777777778


In [13]:
Pe = Pe_yes + Pe_no
print(Pe)

0.5022222222222222


In [16]:
Kappa = (p_bar-Pe)/(1-Pe) 
print(Kappa)

0.732142857142857
