# Inter-rater reliability between humans
1. Sample Size Determination
2. Inter-rater Reliability Testing

In [4]:
library(pwr)
library(irr)

## 1. Sample Size Determination

**Parameters based on thesis methodology:**
- Significance level (α): 0.05  
- Power: 0.8  
- Expected κ (kappa): 0.7 *(substantial agreement)*  
- Null hypothesis κ₀: 0.3 *(fair agreement)*
- Two-sided: True, due to no strong evidence for lower/higher agreement.

---

### Cohen's Kappa Interpretation Guide:
| Kappa (κ)         | Interpretation          |
|-------------------|--------------------------|
| < 0               | Poor agreement           |
| 0.01–0.20         | Slight agreement         |
| 0.21–0.40         | Fair agreement           |
| 0.41–0.60         | Moderate agreement       |
| 0.61–0.80         | Substantial agreement    |
| 0.81–1.00         | Almost perfect agreement |

In [39]:
# Calculate sample size for Cohen's Kappa
cohen_sample_size <- N.cohen.kappa(1/2, 1/2, 0.7, 0.3, alpha = 0.05, power = 0.8, twosided = TRUE)
cat("Sample size for Cohen's Kappa:", cohen_sample_size, "\n")



Sample size for Cohen's Kappa: 39 


## 2. Inter-rater Reliability Testing

Parameter unweighted in Cohen's Kappa due to nominal values

In [37]:
set.seed(123)

# Just sample data, real comes later
generate_ratings <- function(n_items, n_categories = 10) {
  matrix(sample(1:n_categories, n_items * 2, replace = TRUE), 
         ncol = 2)
}

# Calculate Cohen's Kappa
calculate_cohen_kappa <- function(ratings_matrix) {
  kappa2(ratings_matrix, weight = 'unweighted')
}


For now testing example!

In [38]:
ratings <- generate_ratings(cohen_sample_size)
cohen_result <- calculate_cohen_kappa(ratings)

print("Cohen's Kappa Result:")
print(cohen_result)


[1] "Cohen's Kappa Result:"
 Cohen's Kappa for 2 Raters (Weights: unweighted)

 Subjects = 39 
   Raters = 2 
    Kappa = -0.0905 

        z = -1.7 
  p-value = 0.0896 
