제작한 데이터는 LM fine-tuning을 통해 성능을 알아보기 이전에, annotator들의 일치도(iaa)로 우선적으로 평가받게 됩니다. 여기서 활용하는 평가 지표는 Fleiss' Kappa로, 여러 class 및 여러 rater 정보가 있을 때, 이를 반영하여 일치도를 평가하는 방식입니다.  
**본 노트북에서는 iaa를 계산하는 방법을 알아봅니다.**

### 1. 태깅 결과를 `iaa_sample.xlsx` 파일로 저장한 다음 로드합니다. 

In [48]:
import pandas as pd
result = pd.read_excel('수정후_relation.xlsx', engine='openpyxl')

In [49]:
labels = {"stu:하위_학문":1, "stu:상위_학문":2, "stu:별칭":3, "stu:기여자":4, "stu:시대":5, "stu:연구_집단":6, "stu:영향":7, "stu:요소":8,
          "관계_없음":9, "lan:하위_언어":10, "lan:상위_언어":11, "lan:사용_집단":12, "lan:사용_지역":13, "lan:파생물":14, "lan:별칭":15}

In [50]:
result

Unnamed: 0,동민,성현,인희,재욱,한성
0,lan:별칭,lan:별칭,lan:별칭,lan:별칭,lan:별칭
1,lan:사용_지역,lan:사용_지역,lan:사용_지역,lan:사용_지역,lan:사용_지역
2,lan:하위_언어,lan:하위_언어,lan:하위_언어,lan:하위_언어,lan:하위_언어
3,관계_없음,관계_없음,관계_없음,관계_없음,관계_없음
4,lan:사용_집단,lan:사용_집단,lan:사용_집단,lan:사용_집단,lan:사용_집단
...,...,...,...,...,...
1309,관계_없음,관계_없음,관계_없음,관계_없음,관계_없음
1310,stu:영향,stu:요소,stu:영향,stu:영향,stu:영향
1311,stu:기여자,stu:기여자,stu:기여자,stu:기여자,stu:기여자
1312,stu:상위_학문,stu:상위_학문,stu:상위_학문,stu:상위_학문,stu:상위_학문


In [51]:
import numpy as np
result = result.to_numpy()
result

array([['lan:별칭', 'lan:별칭', 'lan:별칭', 'lan:별칭', 'lan:별칭'],
       ['lan:사용_지역', 'lan:사용_지역', 'lan:사용_지역', 'lan:사용_지역', 'lan:사용_지역'],
       ['lan:하위_언어', 'lan:하위_언어', 'lan:하위_언어', 'lan:하위_언어', 'lan:하위_언어'],
       ...,
       ['stu:기여자', 'stu:기여자', 'stu:기여자', 'stu:기여자', 'stu:기여자'],
       ['stu:상위_학문', 'stu:상위_학문', 'stu:상위_학문', 'stu:상위_학문', 'stu:상위_학문'],
       ['stu:하위_학문', 'stu:하위_학문', 'stu:하위_학문', 'stu:하위_학문', 'stu:하위_학문']],
      dtype=object)

In [52]:
result.shape

(1314, 5)

In [53]:
for i in range(len(result)):
    for j in range(len(result[i])):
        result[i][j] = labels.get(result[i][j])
result

array([[15, 15, 15, 15, 15],
       [13, 13, 13, 13, 13],
       [10, 10, 10, 10, 10],
       ...,
       [4, 4, 4, 4, 4],
       [2, 2, 2, 2, 2],
       [1, 1, 1, 1, 1]], dtype=object)

In [54]:
num_classes = int(np.max(result))
print(num_classes)

15


### 2. Fleiss' Kappa 함수를 선언합니다.

In [55]:
'''
Created on Aug 1, 2016
@author: skarumbaiah

Computes Fleiss' Kappa
Joseph L. Fleiss, Measuring Nominal Scale Agreement Among Many Raters, 1971.
'''

def checkInput(rate, n):
    """
    Check correctness of the input matrix
    @param rate - ratings matrix
    @return n - number of raters
    @throws AssertionError
    """
    N = len(rate)
    k = len(rate[0])
    assert all(len(rate[i]) == k for i in range(k)), "Row length != #categories)"
    assert all(isinstance(rate[i][j], int) for i in range(N) for j in range(k)), "Element not integer"
    assert all(sum(row) == n for row in rate), "Sum of ratings != #raters)"

def fleissKappa(rate,n):
    """
    Computes the Kappa value
    @param rate - ratings matrix containing number of ratings for each subject per category
    [size - N X k where N = #subjects and k = #categories]
    @param n - number of raters
    @return fleiss' kappa
    """

    N = len(rate)
    k = len(rate[0])
    print("#raters = ", n, ", #subjects = ", N, ", #categories = ", k)
    checkInput(rate, n)

    #mean of the extent to which raters agree for the ith subject
    PA = sum([(sum([i**2 for i in row])- n) / (n * (n - 1)) for row in rate])/N
    print("PA = ", PA)

    # mean of squares of proportion of all assignments which were to jth category
    PE = sum([j**2 for j in [sum([rows[i] for rows in rate])/(N*n) for i in range(k)]])
    print("PE =", PE)

    kappa = -float("inf")
    try:
        kappa = (PA - PE) / (1 - PE)
        kappa = float("{:.3f}".format(kappa))
    except ZeroDivisionError:
        print("Expected agreement = 1")

    print("Fleiss' Kappa =", kappa)

    return kappa

### 3. 데이터를 Fleiss Kappa를 계산할 수 있는 형태로 변환(transform)합니다. 

In [56]:
transformed_result = []
for i in range(len(result)):
    temp = np.zeros(num_classes)
    for j in range(len(result[i])):
        temp[int(result[i][j]-1)] += 1
    transformed_result.append(temp.astype(int).tolist())

### 4. IAA를 구합니다.

In [57]:
kappa = fleissKappa(transformed_result,len(result[0]))

#raters =  5 , #subjects =  1314 , #categories =  15
PA =  0.8951293759512942
PE = 0.12292918551878956
Fleiss' Kappa = 0.88
