# Chapter 2: Simple Comparative Experiments (Sections 2.1–2.6)

## 2.1 Introduction
- Goal: Compare **two conditions (treatments)** and decide if they produce different results.
- Example: Does adding polymer latex to cement mortar change curing time or strength?
- Key idea: Observed differences may be due to:
  1. **True treatment effect**
  2. **Random variation (noise)**

We use **statistical inference** to separate signal from noise.

---

## 2.2 Random Sampling and Sampling Distributions
- A **random sample** means each observation is independent and equally likely.
- Let’s say we have two populations with means:
  - Population 1 mean: $$\mu_1$$
  - Population 2 mean: $$\mu_2$$
- Sample means:
  - $$\bar{y}_1 = \frac{1}{n_1}\sum_{i=1}^{n_1} y_{i1}$$
  - $$\bar{y}_2 = \frac{1}{n_2}\sum_{i=1}^{n_2} y_{i2}$$
- Sampling distributions tell us how these statistics behave over repeated experiments.

**Central Limit Theorem (CLT):**
- For large $n$, sample mean $$\bar{y}$$ follows approximately a normal distribution:
  $$
  \bar{y} \sim N\left(\mu, \frac{\sigma^2}{n}\right)
  $$

---

## 2.3 Comparing Two Means: The Two-Sample t-Test
We want to test:
- Null hypothesis: $$H_0 : \mu_1 = \mu_2$$
- Alternative hypothesis: $$H_a : \mu_1 \neq \mu_2$$ (two-sided)

### Equal variance assumption
If both populations have variance $$\sigma^2$$:
- **Pooled variance estimate**:
  $$
  s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}
  $$
- **Test statistic**:
  $$
  t_0 = \frac{\bar{y}_1 - \bar{y}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}
  $$
- Degrees of freedom: $$df = n_1 + n_2 - 2$$

---

## 2.4 Confidence Intervals for Difference of Means
- A **confidence interval** estimates the range where the true difference lies.
- Formula:
  $$
  (\bar{y}_1 - \bar{y}_2) \pm t_{\alpha/2, df} \cdot s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}
  $$

Interpretation: With 95% confidence, the true mean difference lies in this interval.

---

## 2.5 Paired t-Test (Blocking Principle)
- Sometimes samples are **naturally paired** (before/after, same subject, same material piece).
- Define **differences**:
  $$
  d_i = y_{i1} - y_{i2}
  $$
- Mean of differences:
  $$
  \bar{d} = \frac{1}{n} \sum_{i=1}^n d_i
  $$
- Standard deviation:
  $$
  s_d = \sqrt{\frac{1}{n-1}\sum_{i=1}^n (d_i - \bar{d})^2}
  $$
- Test statistic:
  $$
  t_0 = \frac{\bar{d}}{s_d / \sqrt{n}}, \quad df = n-1
  $$
- Blocks (pairs) reduce **unexplained variability** → more powerful test.

---

## 2.6 The Summary
- To compare two groups:
  - If samples are independent → **two-sample t-test**
  - If samples are paired → **paired t-test**
- Always check assumptions:
  1. Normality (especially with small $n$)
  2. Equal variances (for pooled $t$-test)
  3. Independence

---

# Python Examples

```python
import numpy as np
import scipy.stats as stats

# Example: Two-sample t-test
group1 = np.array([16.5, 16.7, 17.1, 16.8, 16.9, 17.0, 16.6, 16.7, 16.8, 16.9])
group2 = np.array([17.0, 17.2, 16.9, 17.1, 17.3, 17.0, 17.1, 17.2, 17.1, 17.2])

t_stat, p_value = stats.ttest_ind(group1, group2, equal_var=True)
print("Two-sample t-test:")
print("t-statistic =", t_stat, "p-value =", p_value)

# Example: Paired t-test (before vs after)
before = np.array([45, 47, 50, 46, 48])
after  = np.array([44, 46, 49, 45, 47])

t_stat, p_value = stats.ttest_rel(before, after)
print("\nPaired t-test:")
print("t-statistic =", t_stat, "p-value =", p_value)
