# Tính QWK Theo bài báo

We investigated how the prediction was affected by different RRI samples in the
same person by evaluating the inter-sample reliability of age prediction using Cohen’s
kappa. Two distinct 5 min RRI samples from the same subject were randomly selected
without replacement and predicted age groups from these two samples were acquired.
We repeated this process for all test subjects with two or more 5 min samples. Based on
these results, we calculated Cohen’s kappa, which measured the agreement between two
predictions obtained from the same subject. We employed the quadratic weights described
in Table 4 for the calculations to account for the ordinal nature of the predictions. For the
inter-sample reliability, we evaluated the kappa values based on the models using hybrid
loss with ADASYN.

## 1. Tính inter-sample QWK

In [5]:
import pandas as pd
import numpy as np
from sklearn.metrics import cohen_kappa_score

def inter_sample_qwk_all_pairs(df, id_col="ID", pred_col="y_pre"):
    """
    Dùng tất cả cặp (i, j) trong cùng một ID để tính QWK.
    Lưu ý: ID có nhiều sample sẽ tạo ra nhiều cặp -> bị “nặng cân” hơn.
    """
    df_multi = df.groupby(id_col).filter(lambda g: len(g) >= 2)

    preds_1 = []
    preds_2 = []

    for _, g in df_multi.groupby(id_col):
        values = g[pred_col].to_numpy()
        n = len(values)
        for i in range(n):
            for j in range(i + 1, n):
                preds_1.append(values[i])
                preds_2.append(values[j])

    kappa = cohen_kappa_score(preds_1, preds_2, weights="quadratic")
    return kappa


In [6]:
import pandas as pd
Resnet34 = pd.read_csv('../SOTA_results_storage/Resnet34/Resnet34 hybrid- remove_low_quality/_20251203-020440/Resnet34_20251203-020440_oof_predictions.csv', sep = ',')
Resnet34

Unnamed: 0,fold,row_index,ID,y_true,y_pred
0,2,0,1,0,0
1,2,1,1,0,0
2,3,2,5,0,0
3,3,3,5,0,0
4,3,4,5,0,0
...,...,...,...,...,...
3066,4,3066,1119,0,1
3067,4,3067,1119,0,1
3068,4,3068,1119,0,1
3069,4,3069,1119,0,0


In [7]:
kappa_by_fold = {}
kappa_list = []

for fold, df_fold in Resnet34.groupby("fold"):
    kappa_fold = inter_sample_qwk_all_pairs(df_fold,
                                             id_col="ID",
                                             pred_col="y_pred")
    kappa_by_fold[fold] = kappa_fold
    kappa_list.append(kappa_fold)
    print(f"Fold {fold}: inter-sample QWK = {kappa_fold:.4f}")

kappa_array = np.array(kappa_list)

# Mean và Standard Deviation (SD)
mean_kappa = kappa_array.mean()
std_kappa = kappa_array.std(ddof=1)  # sample std (ddof=1 cho CV thường hợp lý)

# Standard Error & 95% Confidence Interval (tùy bạn có cần không)
se_kappa = std_kappa / np.sqrt(len(kappa_array))
ci_low = mean_kappa - 1.96 * se_kappa
ci_high = mean_kappa + 1.96 * se_kappa

print(f"\nInter-sample QWK (mean ± SD): {mean_kappa:.4f} ± {std_kappa:.4f}")
print(f"Inter-sample QWK 95% CI: [{ci_low:.44f}, {ci_high:.4f}]")


Fold 1: inter-sample QWK = 0.8508
Fold 2: inter-sample QWK = 0.8307
Fold 3: inter-sample QWK = 0.6246
Fold 4: inter-sample QWK = 0.8033
Fold 5: inter-sample QWK = 0.8101

Inter-sample QWK (mean ± SD): 0.7839 ± 0.0910
Inter-sample QWK 95% CI: [0.70413687818851533606334669457282871007919312, 0.8637]
