# Mod4/L5: Small Population Confidence Intervals for Differences in Means

## Introduction
In this lesson, we discuss confidence intervals for the difference between means of two different populations, focusing on small sample sizes.

## Assumptions
1. Populations have normal distributions.
2. Variances are unknown.
3. At least one of the sample sizes is small (≤ 30).

## Setup
- **Sample 1**: Size $n_1$, from $N(\mu_1, \sigma_1^2)$
- **Sample 2**: Size $n_2$, from $N(\mu_2, \sigma_2^2)$
- Samples are independent.

## Steps to Find Confidence Interval

### Step 1: Estimator
A natural estimator for $\mu_1 - \mu_2$ is $\bar{X}_1 - \bar{X}_2$:
$$ \bar{X}_1 = \frac{1}{n_1} \sum_{i=1}^{n_1} X_{1i} $$
$$ \bar{X}_2 = \frac{1}{n_2} \sum_{i=1}^{n_2} X_{2i} $$

### Step 2: Distribution of Estimator
- $\bar{X}_1 \sim N(\mu_1, \frac{\sigma_1^2}{n_1})$
- $\bar{X}_2 \sim N(\mu_2, \frac{\sigma_2^2}{n_2})$
- $\bar{X}_1 - \bar{X}_2 \sim N(\mu_1 - \mu_2, \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2})$

### Step 3: Standardization
Standardize the estimator to a Z-score:
$$ Z = \frac{(\bar{X}_1 - \bar{X}_2) - (\mu_1 - \mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}} \sim N(0,1) $$

### Step 4: Confidence Interval
For a $(1-\alpha)100\%$ confidence interval:
$$ (\bar{X}_1 - \bar{X}_2) \pm Z_{\alpha/2} \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} $$

## Example
### Problem
- **Group 1**: 9 samples, mean = 23.2, variance = 4.3
- **Group 2**: 8 samples, mean = 24.7, variance = 5.2
- Find a 90% confidence interval for the difference in means.

### Solution
1. **Calculate Pooled Variance**:
   $$ S_p^2 = \frac{(n_1 - 1)S_1^2 + (n_2 - 1)S_2^2}{n_1 + n_2 - 2} $$
   $$ S_p^2 = \frac{(9 - 1)4.3 + (8 - 1)5.2}{9 + 8 - 2} = 4.72 $$

2. **Calculate Critical Value**:
   - For 90% CI, $\alpha = 0.10$
   - Degrees of freedom = $n_1 + n_2 - 2 = 15$
   - $t_{\alpha/2, df} = t_{0.05, 15} = 1.753$

3. **Apply Formula**:
   $$ (\bar{X}_1 - \bar{X}_2) \pm t_{\alpha/2, df} \sqrt{S_p^2 \left(\frac{1}{n_1} + \frac{1}{n_2}\right)} $$
   - Point estimate: $23.2 - 24.7 = -1.5$
   - Margin of error: $1.753 \times \sqrt{4.72 \left(\frac{1}{9} + \frac{1}{8}\right)} \approx 1.753 \times 1.09 \approx 1.91$
   - Confidence interval: $-1.5 \pm 1.91 = (-3.41, 0.41)$

### Interpretation
The 90% confidence interval for the difference in means is (-3.41, 0.41). Since the interval includes 0, it is plausible that the means are equal. (This is a form of hypothesis test).

## Conclusion
- Confidence intervals provide a range of plausible values for the difference in means.
- For small samples, we use the t-distribution.
- Assumptions about equal variances can simplify calculations but may not always be valid.
- Welch's approximation can be used when variances are not assumed equal.

## Welch's Approximation
When variances are not assumed equal, use Welch's approximation:
$$ \nu = \frac{\left(\frac{S_1^2}{n_1} + \frac{S_2^2}{n_2}\right)^2}{\frac{\left(\frac{S_1^2}{n_1}\right)^2}{n_1 - 1} + \frac{\left(\frac{S_2^2}{n_2}\right)^2}{n_2 - 1}} $$

### Example in R
```r
# Simulate data
set.seed(123)
x <- rnorm(10)
y <- rnorm(14)

# Perform t-test
t.test(x, y, conf.level = 0.90)

In [6]:
# Simulate data
set.seed(123)

x <- rnorm(10)
y <- rnorm(14)

# Perform t-test
t.test(x, y, conf.level = 0.90)


	Welch Two Sample t-test

data:  x and y
t = 0.35532, df = 20.032, p-value = 0.7261
alternative hypothesis: true difference in means is not equal to 0
90 percent confidence interval:
 -0.5503036  0.8359084
sample estimates:
  mean of x   mean of y 
 0.07462564 -0.06817676 


## t-critical value: Looking up 
use R for looking up t-critical value
```r
qt(p=0.05,df=15)

In [4]:
qt(0.05,15, lower.tail = FALSE)