English | 繁體中文
In statistics, inter-rater reliability (also called by various similar names, such as inter-rater agreement, inter-rater concordance, inter-observer reliability, inter-coder reliability, and so on) is the degree of agreement among independent observers who rate, code, or assess the same phenomenon.
Cohen’s Kappa is a metric used to measure the agreement of two raters. Implemented using sklearn.metrics.cohen_kappa_score
Fleiss’ Kappa is a metric used to measure the agreement when in the study there are more than two raters. Furthermore, the Fleiss’ Kappa is the extension of Cohen’s Kappa. Implemented using statsmodels.stats.inter_rater.fleiss_kappa
# Install necessary packages
import pandas as pd
import numpy as np
from sklearn.metrics import cohen_kappa_score
# Prepare the dataset, suppose you have two datasets in DataFrame format
raters_1 = pd.DataFrame({'confirm_A': [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0]})
raters_2 = pd.DataFrame({'confirm_B': [0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]})
# Calculate cohen's kappa
kappa = cohen_kappa_score(raters_1, raters_2)
print(kappa)
# Install necessary packages
import pandas as pd
import numpy as np
from statsmodels.stats.inter_rater import fleiss_kappa
# Prepare the dataset, suppose you have one datasets in DataFrame format
n_confirm = pd.DataFrame({
'confirm_A': [0, 0, 1, 1, 1, 1, 1, 1, 1],
'confirm_B': [0, 0, 1, 1, 1, 1, 1, 1, 1],
'confirm_C': [0, 0, 1, 1, 1, 1, 1, 1, 0]
})
"""
n_confirm represents a DataFrame containing n raters,
where confirm_A represents the review content of rater A,
confirm_B represents the review content of rater B, and
confirm_C represents the review content of rater C, and
its example content is a binary classification
"""
# Calculate Fleiss Kappa
def Fleiss_kappa(n_confirm: pd.DataFrame):
# Count the number of each score (0 or 1) for each row, fill missing values with 0
value_counts = n_confirm.apply(pd.value_counts, axis=1).fillna(0)
# Compute Fleiss' Kappa using the counts
return fleiss_kappa(value_counts.to_numpy())
Fleiss_kappa(n_confirm)