# ANOVA and Bootstrapping

## Comparing More Than Two Means

* Compare means of 2 groups using a T statistic.
* Compare means of 3+ groups using a new test called **analysis of variance (ANOVA)** and a new statistic called **F**.


* ANOVA
    * H0: The mean outcome is the same across all categories.
    * HA: At least one pair of means are different from each other.


$$ F = \frac{\text{variability between groups}}{\text{variability within groups}}$$

* Obtaining a large F statistic requires that the variability between sample means is greater than the variability within the samples.

## ANOVA

* Variability partitioning.


* **Group** : **Between group variablity**.
* **Error** : **Within group variablity**.


### Degrees of Freedom

* **Total degress of freedom** is calculated as sample size minus one.

$$
df_T = n - 1
$$

* **Group degrees of freedom** is calculated as number of groups minus one.

$$
df_G = k - 1
$$

* **Error degrees of freedom** is the difference between the above two DF.

$$
df_E = df_T - df_G
$$

### Sum of Squares

* **Sum of squares total (SST)** measures the **total variability** in the response variable. 
* Calculated very similarly to variance (except not scaled by the sample size).

$$
SST = \sum^n_{i=1} (y_i - \bar{y})^2
$$

$$
SST = SSG + SSE
$$

* **Sum of squares groups (SSG)** measures the variability **between groups**. 
* It is the **explained variability**.

$$
SSG = \sum^k_{j=1} n_j (\bar{y}_j - \bar{y})^2
$$

* **Sum of squares error (SSE)** measures the variability **within groups**.
* It is the **unexplained variability**, unexplained by the group variable, due to other reasons.

$$
SSE = SST - SSG
$$

### Mean Squares

* Mean sqares is the average variability between and withing groups, calculated as the total variability (sum of squares) scaled by the associated degress of freedom.

* **Mean squares group (MSG)**

$$
MSG = \frac{SSG}{df_G}
$$

* **Mean squares error (MSE)**

$$
MSE = \frac{SSE}{df_E}
$$

### F-Statistic

* **F-statistic** is the ratio of the average between group and within 
group variabilities.
* It is never negative. Hence it's right-skewed.

$$
F = \frac{MSG}{MSE}
$$

### P-Value

* **P-value** is the probability of at least as large a ratio between the "between" and "within" group variabilities if in fact the means of all groups are equal.

**Example**

* F-statistics = 21.735
* DF_G = 3
* DF_E = 791

In [1]:
pf(q = 21.735, df1 = 3, df2 = 791, lower.tail = FALSE)

# If p-value is small (less than alpha), the data provide convincing
# evidence that at least one pair of population means are different
# from each other (but we can't tell which one).

# If p-value is large, the data do not provide convincing evidence that 
# at least one pair of population means are different from each other,
# the observed differences in sample means are attributable to 
# sampling variability (or chance).

## Conditions for ANOVA

* (1) **Independence**
    * Within groups: sampled observations must be independent.
        * Random sample / assignment
        * Each $n_j$ less than 10% of respective population
    * Between groups: the groups must be independent of each other (non-paired).
        * Carefully consider whether the groups may be dependent -> repeated measures anova
* (2) **Approximate normality**: distribution should be nearly normal within each group.
    * Especially important when sample sizes are small.
* (3) **Equal variance**: groups should have roughly equal variability.
    * Especially important when sample sizes differ between groups.

## Multiple Comparisons

* Which means are different?


* Two sample T tests for differences in each possible pair of groups.
* Multiple tests will inflate the Type I error rate ($\alpha$ significance level).
* Solution: use **modified significance level**.


* Testing many pairs of groups is called **multiple comparisons**.
* The **Bonferroni correction** $\alpha^\star$ suggests that a more stringent significance level is more appropriate for these tests.
    * Adjust $\alpha$ by the number of comparisons $K$ being considered.

$$
K = \frac{k(k-1)}{2}
$$

$$
\alpha^\star = \frac{\alpha}{K}
$$

* Constant variance: use consistent standard error and degrees of freedom for all tests.
* Compare the p-values from each test to the modified significance level.


* **Standard error for multiple pairwise comparisons**

$$
SE = \sqrt{
\frac{MSE}{n_1} + \frac{MSE}{n_2}
}
$$

* **Degrees of freedom for multiple pairwise comparisons**

$$
df = df_E
$$

**Example**

* If the explanatory variable in an ANOVA has 3 levels, and the F-test in ANOVA yields a significant result, how many pairwise comparisons are needed to compare each group to one another?

In [4]:
3 * (3-1) / 2

**Example**

* 4 class levels
* $\alpha$ = 0.05 for the original ANOVA

In [5]:
# Number of comparisons
(K <- 4 * (4-1) / 2)
# Corrected significance level
0.05 / K

* Is there a difference between the average vocabulary scores between middle and lower class Americans> (A single pairwise comparison.)
* DF_E = 691
* MSE = 3.628
* Lower class
    * N = 41
    * Mean = 5.07
* Middle class
    * N = 331
    * Mean = 6.76

In [13]:
# H0: mu_middle - mu_lower = 0
# HA: mu_middle - mu_lower != 0

(se <- sqrt(3.628/41 + 3.628/331))

(t = ((6.76 - 5.07) - 0) / se)

pt(t, df = 791, lower.tail = FALSE) * 2

# P-value is smaller than the alpha 0.00833. Reject the null hypothesis.

## Bootstrapping

* Take a bootstrap sample - a random sample taken **with replacement** from **the original sample**, of **the same size** as the original sample.
* Calculate the bootstrap statistic - a statistic such as mean, median, proportion, etc. computed on the bootstrap samples.
* Repeat the above two steps many times to create a bootstrap distribution - a distribution of bootstrap statistics.


* **Percentile method**
* **Standard error method**


* Not as rigid conditions as CLT based methods.
* If the bootstrap distribution is extremely skewed or sparse, the bootstrap interval might be unreliable.
* A representative sample is still required - if the sample is biased, the estimates resulting from this sample will also be biased.