# Comparisons for Three or More Groups

When comparing **three or more groups**, the choice of statistical test depends on:

1. **Data type** – Continuous or categorical
2. **Independence** – Independent or repeated-measures
3. **Assumptions** – Normality and equal variances

---

### **Parametric Case (ANOVA)**

#### **1. One-way ANOVA (Independent samples)**
Used when comparing $k \geq 3$ independent groups.  
Null hypothesis:

$$
H_0: \mu_1 = \mu_2 = \dots = \mu_k
$$

Test statistic:

$$
F = \frac{\text{Between-group variance}}{\text{Within-group variance}}
= \frac{MS_{\text{between}}}{MS_{\text{within}}}
$$

Where:

$$
MS_{\text{between}} = \frac{\sum_{i=1}^k n_i (\bar{X}_i - \bar{X})^2}{k - 1}
$$

$$
MS_{\text{within}} = \frac{\sum_{i=1}^k \sum_{j=1}^{n_i} (X_{ij} - \bar{X}_i)^2}{N - k}
$$

---

#### **2. Repeated-measures ANOVA**
Used when the **same subjects** are measured under $k$ different conditions.  
It accounts for **within-subject correlation** and partitions variance into between-subjects and within-subjects.

---

### **Non-parametric Case**

- **Kruskal–Wallis test**:  
  Rank-based test for $k$ independent groups.  
  Test statistic:

$$
H = \frac{12}{N(N+1)} \sum_{i=1}^k \frac{R_i^2}{n_i} - 3(N+1)
$$

Where $R_i$ is the sum of ranks in group $i$, $n_i$ is group size, and $N$ total sample size.

- **Friedman test**:  
  For $k$ repeated measures, ranks each row (subject) separately.  
  Test statistic:

$$
Q = \frac{12}{n k (k+1)} \sum_{j=1}^k R_j^2 - 3n(k+1)
$$

Where $R_j$ is the sum of ranks for treatment $j$ across subjects.

---

If a test is significant ($p < 0.05$), **post-hoc tests** (Tukey's HSD, pairwise Wilcoxon, etc.) are used to find which groups differ.


In [1]:
# One-way ANOVA example
set.seed(123)
group_A <- rnorm(10, mean = 5, sd = 1)
group_B <- rnorm(10, mean = 6, sd = 1)
group_C <- rnorm(10, mean = 7, sd = 1)

values <- c(group_A, group_B, group_C)
groups <- factor(rep(c("A", "B", "C"), each = 10))

anova_result <- aov(values ~ groups)
summary(anova_result)

            Df Sum Sq Mean Sq F value  Pr(>F)   
groups       2  12.24   6.122   6.435 0.00518 **
Residuals   27  25.68   0.951                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In [2]:
# Kruskal–Wallis test example (non-parametric)
set.seed(321)
grp1 <- rexp(8, rate = 0.5)
grp2 <- rexp(8, rate = 0.6)
grp3 <- rexp(8, rate = 0.7)

kw_result <- kruskal.test(list(grp1, grp2, grp3))
kw_result


	Kruskal-Wallis rank sum test

data:  list(grp1, grp2, grp3)
Kruskal-Wallis chi-squared = 2.355, df = 2, p-value = 0.308


In [5]:
# Repeated-measures ANOVA example
if(!require(ez)) install.packages("ez")
library(ez)

# Simulated dataset
set.seed(42)
subject <- factor(1:6)
treatment <- factor(rep(c("T1", "T2", "T3"), each = 6))
score <- c(rnorm(6, 5), rnorm(6, 6), rnorm(6, 7))

data_rm <- data.frame(subject = rep(subject, 3),
                      treatment = treatment,
                      score = score)

ez_result <- ezANOVA(data = data_rm,
                     dv = score,
                     wid = subject,
                     within = treatment)
ez_result

Unnamed: 0_level_0,Effect,DFn,DFd,F,p,p<.05,ges
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<dbl>
2,treatment,2,10,4.262341,0.04584016,*,0.4104552

Unnamed: 0_level_0,Effect,W,p,p<.05
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<chr>
2,treatment,0.4379655,0.1918138,

Unnamed: 0_level_0,Effect,GGe,p[GG],p[GG]<.05,HFe,p[HF],p[HF]<.05
Unnamed: 0_level_1,<chr>,<dbl>,<dbl>,<chr>,<dbl>,<dbl>,<chr>
2,treatment,0.6401907,0.07659173,,0.7638268,0.06409709,


# Real-World Analogy

- **One-way ANOVA**: Compare the average yield of **three different fertilizer types** on crops.  
- **Repeated-measures ANOVA**: Test the same patients' blood pressure at **three time points**.  
- **Kruskal–Wallis**: Compare customer satisfaction scores (non-normal) between **three stores**.  
- **Friedman test**: Compare performance of algorithms on the **same datasets**.
