# Two-Group Comparisons

When comparing **two groups**, we choose the statistical test based on:
1. **Data type** – Continuous or categorical
2. **Independence** – Independent or paired samples
3. **Assumptions** – Normal distribution, equal variances

---

### **Parametric case (t-test)**

#### **1. Independent samples t-test**
If $X_1, X_2, \dots, X_n \sim N(\mu_1, \sigma^2)$ and 
$Y_1, Y_2, \dots, Y_m \sim N(\mu_2, \sigma^2)$,  
the null hypothesis is:

$$
H_0 : \mu_1 = \mu_2
$$

The test statistic is:

$$
t = \frac{\bar{X} - \bar{Y}}{s_p \sqrt{\frac{1}{n} + \frac{1}{m}}}
$$

where the pooled standard deviation is:

$$
s_p = \sqrt{\frac{(n-1)s_X^2 + (m-1)s_Y^2}{n+m-2}}
$$

---

#### **2. Paired t-test**
If we have paired data $(X_i, Y_i)$,  
we define differences $D_i = X_i - Y_i$.  
Then:

$$
H_0 : \mu_D = 0
$$

Test statistic:

$$
t = \frac{\bar{D}}{s_D / \sqrt{n}}
$$

---

### **Non-parametric case**

- **Mann–Whitney U test**: Compares rank sums between two independent groups.  
  The $U$ statistic is based on the sum of ranks $R_X$ for group X:

$$
U_X = n_X n_Y + \frac{n_X(n_X + 1)}{2} - R_X
$$

- **Wilcoxon signed-rank test**: For paired data, ranks the absolute differences $|D_i|$ and sums positive/negative rank totals.

---

If data are categorical, alternative tests such as Chi-square or Fisher's exact test are used.


In [5]:
# Independent samples t-test example

set.seed(42)
grp1 <- rnorm(12, mean = 50, sd = 5)
grp2 <- rnorm(12, mean = 54, sd = 5)

# Run t-test (equal variance assumed)
t_result <- t.test(grp1, grp2, var.equal = TRUE)
t_result



	Two Sample t-test

data:  grp1 and grp2
t = 1.0356, df = 22, p-value = 0.3117
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.396201  7.175997
sample estimates:
mean of x mean of y 
 53.77687  51.38697 


In [6]:
# Paired t-test example

set.seed(101)
before <- rnorm(8, mean = 70, sd = 4)
after  <- before + rnorm(8, mean = 3, sd = 2)

paired_res <- t.test(before, after, paired = TRUE)
paired_res


	Paired t-test

data:  before and after
t = -4.5347, df = 7, p-value = 0.002685
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -4.547741 -1.430402
sample estimates:
mean difference 
      -2.989072 


In [7]:
# Mann–Whitney U Test example (non-parametric)

set.seed(202)
non_norm1 <- rexp(15, rate = 0.2)
non_norm2 <- rexp(15, rate = 0.25)

mw_res <- wilcox.test(non_norm1, non_norm2)
mw_res


	Wilcoxon rank sum exact test

data:  non_norm1 and non_norm2
W = 187, p-value = 0.001408
alternative hypothesis: true location shift is not equal to 0


In [8]:
# Wilcoxon signed-rank test example

set.seed(303)
sk_before <- rexp(10, rate = 0.3)
sk_after  <- sk_before + rexp(10, rate = 0.4)

wilcoxon_res <- wilcox.test(sk_before, sk_after, paired = TRUE)
wilcoxon_res


	Wilcoxon signed rank exact test

data:  sk_before and sk_after
V = 0, p-value = 0.001953
alternative hypothesis: true location shift is not equal to 0


# Real-World Analogy

**Example:**  
- Independent t-test: Compare average exam scores between **two different classes**.  
- Paired t-test: Compare the **same students' scores before and after** a special training program.  
- Mann–Whitney U: Compare two independent groups' **rank-based performance** when scores are skewed.  
- Wilcoxon signed-rank: Compare before/after scores in the same group when distribution is not normal.
