# **Week 10: Analysis of Variance (ANOVA)**

```
.------------------------------------.
|   __  ____  ______  _  ___ _____   |
|  |  \/  \ \/ / __ )/ |/ _ \___  |  |
|  | |\/| |\  /|  _ \| | | | | / /   |
|  | |  | |/  \| |_) | | |_| |/ /    |
|  |_|  |_/_/\_\____/|_|\___//_/     |
'------------------------------------'

```

In this workshop, we will focus on analysis of variance (ANOVA) techniques.

ANOVA is a family of statistical methods used to compare means across multiple groups and to understand how different factors contribute to variability in the data. We will explore:

- The fundamental ideas behind ANOVA.
- One-way and two-way ANOVA designs.
- Assumptions underlying these models.
- How to interpret results and post-hoc comparisons.
- Practical applications with real datasets.

## **Pre-Configurating the Notebook**

### **Switching to the R Kernel on Colab**

By default, Google Colab uses Python as its programming language. To use R instead, you’ll need to manually switch the kernel by going to **Runtime > Change runtime type**, and selecting R as the kernel. This allows you to run R code in the Colab environment.

However, our notebook is already configured to use R by default. Unless something goes wrong, you shouldn’t need to manually change runtime type.

### **Importing Required Packages**
**Run the following lines of code**:

In [None]:
#Do not modify

setwd("/content")

# Remove `MXB107-Notebooks` if exists,
if (dir.exists("MXB107-Notebooks")) {
  system("rm -rf MXB107-Notebooks")
}

# Fork the repository
system("git clone https://github.com/edelweiss611428/MXB107-Notebooks.git")

# Change working directory to "MXB107-Notebooks"
setwd("MXB107-Notebooks")

#
invisible(source("R/preConfigurated.R"))

**Do not modify the following**

In [None]:
if (!require("testthat")) install.packages("testthat"); library("testthat")

test_that("Test if all packages have been loaded", {

  expect_true(all(c("ggplot2", "tidyr", "dplyr", "stringr", "magrittr", "knitr") %in% loadedNamespaces()))

})

## **One-Way ANOVA**

We will take a closer look at one-way ANOVA. The essence of two-way ANOVA is similar, although the formulas and interpretation are a bit more complex.


### **Formulation of One-Way ANOVA**

Consider a dataset $Y$ consisting of observations from $k$ **independent** groups.  
Let $Y_{ij}$ denote the $j$-th observation in group $i$, where $i = 1, \dots, k$ and $j = 1, \dots, n_i$.  

The one-way ANOVA model assumes that each observation can be modelled as:

$$
Y_{ij} = \mu_i + \varepsilon_{ij},
$$

where
- $\mu_i$: effect of group $i$
- $\varepsilon_{ij}$: random error, assumed $\sim \mathcal{N}(0, \sigma^2)$  

In this unit, for simplicity, $n_1 = n_2 = ⋯ = n_k$.

In one-way ANOVA, the goal is to test whether the means of all groups are equal.  This can be formalised as follows:

$$
H_0: \mu_1 = \mu_2 = \cdots = \mu_k\\
H_1: \text{At least one group mean differs from the others}
$$

- **Interpretation:**  
  - $H_0$ states that all group means are equal.  
  - $H_1$ states that at least one group mean differs from the others.

Note that two-sample t-test assuming equal variance is a special case of one-way ANOVA where $k = 2$. Moreover, this formulation assumes **equal population variance** across groups.



### **Decomposition of Total Sum of Squares (SSTOT)**

**NOTE:** In this unit, $n_1 = n_2 = ⋯ = n_k$.

Total variability in the data is:

$$
\text{SSTOT} = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (Y_{ij} - \bar{Y}_{..})^2
$$

Where $\bar{Y}_{..}$ is the overall mean.  

It is possible to show that

$$
\text{SSTOT} = \text{SSTr} + \text{SSE}
$$

- **SSTr** = Treatment Sum of Squares, or sometimes called Between-group Sum of Squares (due to differences between group means)  
- **SSE** = Error Sum of Squares, or sometimes called Within-group Sum of Squares (variability within groups)

**Between-Group Sum of Squares (SSTr):**

$$
\text{SSTr} = \sum_{i=1}^{k} n_i (\bar{Y}_{i\cdot} - \bar{Y}_{..})^2
$$

- $n_i$ = number of observations in group $i$  
- $\bar{Y}_{i\cdot}$ = mean of group \(i\)  
- $\bar{Y}_{..}$ = overall mean  

**Within-Group Sum of Squares (SSE):**

$$
\text{SSE} = \sum_{i=1}^{k} \sum_{j=1}^{n_i} (Y_{ij} - \bar{Y}_{i\cdot})^2
$$


### **Essence of One-Way ANOVA**


Under the null hypothesis:

$$
H_0: \mu_1 = \mu_2 = \cdots = \mu_k\\
$$

- All group means are equal.
- In this case, **SSTr** will be relatively small compared to **SSE**, because the differences between group means are mostly due to random variation within groups.  
- **SSE** dominates the total variability.

When $H_0$ is **violated** (i.e., at least one pair of group means are different):  

- **SSTr** becomes large relative to **SSE**, because there is substantial variability due to differences between group means.  
- This situation motivates the use of the **F-test**, which compares SSTr and SSE.

**F-test rationale:**  

- Under $H_0$, the F-statistic:

$$
F = \frac{\text{MSTr}}{\text{MSE}} = \frac{\text{SSTr}/(k-1)}{\text{SSE}/(N-k)}
$$

is distributed according to an $F$ distribution with $(k-1, N-k)$ degrees of freedom ($F_{k-1, N-k}$).  
- If $H_0$ is violated, the observed $F$ statistic will be **large**, unlikely to occur under the null.  
- A large $F$ statistic provides **evidence to reject $H_0$**, indicating that not all group means are equal.

Unlike the t-test, which can be two-sided or one-sided, the F-test in ANOVA is **always a right-tailed test**.  This is because the F-statistic is a ratio of sums of squares (MSTr / MSE), which is **always non-negative**.  For example, take a look at the PDF of $F_{2,17}$


In [None]:
#F_{2,17}
vals = seq(0, 5, length.out = 500)
densities = df(vals, df1  = 2, df2 = 17)

# Plot
plot(vals, densities, type = "l", lwd = 2,
     main = "F-Distribution (df1 = 2, df2 = 17)",
     xlab = "F-value", ylab = "Density", col = "blue")

### **R Examples**

#### **R Quick Reference for F-Distribution**

Before we show how to run ANOVA in R, we will first discuss how to perform it manually.  This requires understanding and working with the $F$ distribution.


`R` supports the following functions for computing distributional quantities and simulating from $F$-distributions:

- `df(x, df1, df2, log = FALSE)` computes the **density** (PDF) of `F(df1, df2)` at `x`  
- `pf(q, df1, df2, lower.tail = TRUE, log.p = FALSE)` computes the **CDF** of `F(df1, df2)` at `q`  
- `qf(p, df1, df2, lower.tail = TRUE, log.p = FALSE)` computes the **p-quantile** of `F(df1, df2)`  
- `rf(n, df1, df2)` simulates `n` random numbers from an `F(df1, df2)` distribution


##### **Exercises**

The following dataset consists of three independent groups, each generated from a Gaussian distribution.

In [None]:

set.seed(123)
group1 = rnorm(5, 1)
group2 = rnorm(5,0)
group3 = rnorm(5,-1)

# Combine into long-format
df = data.frame(
  value = c(group1, group2, group3),
  group = factor(c(
    rep(1, length(group1)),
    rep(2, length(group2)),
    rep(3, length(group3))
  ))
)

df %>% head(10)


###### **Exercise 1**

Verify the following identity:

$$
\text{SSTOT} = \text{SSTr} + \text{SSE}.
$$


<details>
<summary>▶️ Click to show the solution</summary>

```r
df %>%
  summarise(SSTOT = (n()-1)*var(value)) %>%
    pull(SSTOT) -> SSTOT
df %>%
  group_by(group) %>%
    summarise(groupAvg = mean(value), ni = n(), .groups = "drop") %>%
    summarise(SSTr = sum(ni * (groupAvg - mean(df$value))^2)) %>%
    pull(SSTr)  -> SSTr
df %>%
  group_by(group) %>%
  summarise(SSEi = (n()-1)*var(value), .groups = "drop") %>%
  summarise(SSE = sum(SSEi)) %>%
  pull(SSE) -> SSE

SSTOT
SSTr
SSE

all.equal(SSTOT, SSTr + SSE)
```

</details>

###### **Exercise 2**

Use ANOVA to *manually* test whether the group means are equal. Assume a 5% Type I error.


<details>
<summary>▶️ Click to show the solution</summary>

```r
# Hypotheses:
# H0: mu1 = mu2 = mu3
# H1: At least one mean is different

k = 3
n = nrow(df)
MSTr = SSTr/(k-1)
MSE = SSE/(n-k)
F_obs = MSTr/MSE #F_obs ~ F(k-1, n-k) under H0
F_obs
F_obs > qf(0.95, df1 = k-1, df2 = n-k) #unlikely under H0
#Evidence to reject the null hypothesis
```

</details>

###### **Exercise 3**

Rerun the previous code blocks several times under different scenarios:

- When the null hypothesis holds (all group means are equal).
- When the group means are moderately different (e.g., $\mu_1 = -1$, $\mu_2 = 0$, $\mu_3 = 1$.

- When the group means are very different (e.g., $\mu_1 = -5$, $\mu_2 = 0$, $\mu_3 = 5$.

What happens to the $F$ statistic and the rejection decision in each case?

In [None]:

set.seed(123)
group1 = rnorm(5, 5)
group2 = rnorm(5,0)
group3 = rnorm(5,-5)

# Combine into long-format
df = data.frame(
  value = c(group1, group2, group3),
  group = factor(c(
    rep(1, length(group1)),
    rep(2, length(group2)),
    rep(3, length(group3))
  ))
)

df %>% head(10)



<details>
<summary>▶️ Click to show the solution</summary>

```r
group1 = rnorm(5, 5)
group2 = rnorm(5,0)
group3 = rnorm(5,-5)

# Combine into long-format
df = data.frame(
  value = c(group1, group2, group3),
  group = factor(c(
    rep(1, length(group1)),
    rep(2, length(group2)),
    rep(3, length(group3))
  ))
)

df %>%
  summarise(SSTOT = (n()-1)*var(value)) %>%
    pull(SSTOT) -> SSTOT
df %>%
  group_by(group) %>%
    summarise(groupAvg = mean(value), ni = n(), .groups = "drop") %>%
    summarise(SSTr = sum(ni * (groupAvg - mean(df$value))^2)) %>%
    pull(SSTr)  -> SSTr
df %>%
  group_by(group) %>%
  summarise(SSEi = (n()-1)*var(value), .groups = "drop") %>%
  summarise(SSE = sum(SSEi)) %>%
  pull(SSE) -> SSE

# Hypotheses:
# H0: mu1 = mu2 = mu3
# H1: At least one mean is different

k = 3
n = nrow(df)
MSTr = SSTr/(k-1)
MSE = SSE/(n-k)
F_obs = MSTr/MSE #F_obs ~ F(k-1, n-k) under H0
F_obs
```
As the deviation between group means becomes larger, we tend to observe more extreme $F$ statistics, providing stronger evidence against the null hypothesis. When the null hypothesis is true, $F$ values are generally small.

</details>

#### **ANOVA in R via `aov`**

R provides a dedicated function for performing ANOVA: the `aov()` function. It has a formula interface, similar to `t.test()`, `lm()`, and many other functions used in statistical modeling in R. However, `aov()` is not the only way to perform ANOVA in R. You can also use `lm()`, since ANOVA is a special case of (multiple) linear regression models.

**Usage:**

```r
aov(formula,
    data = NULL,
    projections = FALSE,
    qr = TRUE,
    contrasts = NULL,
    ...)
```

**Arguments:**

- `formula`: a model formula of the form `response ~ predictors` (e.g., `y ~ group`)  
- `data`: a data frame containing variables in the model  
- `projections`: logical; if TRUE, returns projection matrices  
- `qr`: logical; if TRUE, returns the QR decomposition of the model fit  
- `contrasts`: a list of contrast specifications for factors  
- `...`: additional arguments passed to `lm()`  



Back to the previous simulated dataset:

In [None]:

set.seed(123)
group1 = rnorm(5, 1)
group2 = rnorm(5,0)
group3 = rnorm(5,-1)

# Combine into long-format
df = data.frame(
  value = c(group1, group2, group3),
  group = factor(c(
    rep(1, length(group1)),
    rep(2, length(group2)),
    rep(3, length(group3))
  )),
  groupInt = c(
    rep(1, length(group1)),
    rep(2, length(group2)),
    rep(3, length(group3))
  )
)

df %>% head(10)


Note that the datatype of `group` is not integer, but factor. This means it should be treated as a categorical variable rather than a numeric one. This distinction is important for ANOVA because if group were treated as an integer, the model would interpret the group IDs (e.g., 1, 2, 3) as numeric values with a meaningful order or magnitude, rather than as mere labels representing different categories. Using a factor ensures that ANOVA correctly compares group means.

In [None]:
aov(value ~ group, data = df) %>% summary()

`aov()` by default generates a **statistical summary**, which includes:

- $F$-statistic: the value of the test statistic for the ANOVA  
- Degrees of freedom (Df): for the $F$-distribution under the null hypothesis  
- Sum of Squares (Sum Sq): total variation attributed to each source (between groups vs within groups/residual)  
- Mean Squares (Mean Sq): average variation per degree of freedom (Sum Sq / Df)  
- p-value (Pr(>F)): the probability of observing a value as extreme (or more extreme) than the observed $F$ statistic, assuming the null hypothesis is true  
  - If the **p-value is smaller than the significance level** $\alpha$, it is equivalent to rejecting the null hypothesis in the Neyman-Pearson framework. Useful for deciding whether group means differ significantly!


ANOVA is a special case of a linear regression model where the regressors are categorical. One can also use the `lm()` function to run ANOVA and then apply `anova()` on the `lm` object to extract the results.  Details about the mathematics behind this approach are beyond the scope of this unit.


In [None]:
lm(value ~ group, data = df) %>% anova()

**Be cautious!!!**

If one forgets to convert `group` to a factor when it is numeric and performs ANOVA, this can lead to different results. In this case, ANOVA will treat `group` as a numeric predictor and fit a linear model where `value` depends linearly on `group`, rather than comparing group means. This can produce misleading conclusions.

For example, in `df`, we have `groupInt`, which stores group IDs as integers.

In [None]:
aov(value ~ groupInt, data = df) %>% summary()

In [None]:
anova(lm(value ~ groupInt, data = df))

#### **Pair-wise Comparision**

After performing a one-way ANOVA, if the null hypothesis is rejected, one might be tempted to conduct **pairwise comparisons** between all groups to determine which specific means differ. For $k$ groups, there are $\frac{k(k-1)}{2}$ possible pairwise comparisons. Each comparison can be formulated as a standard two-sample t-test:

$$
t = \frac{\bar{X}_i - \bar{X}_j}{\sqrt{s_p^2 \left(\frac{1}{n_i} + \frac{1}{n_j}\right)}},
$$

where $s_p^2$ is the pooled variance and $n_i$, $n_j$ are the sample sizes of groups $i$ and $j$. However, performing multiple t-tests **inflates the type I error rate**, because each test has a probability of incorrectly rejecting the null, and the more tests performed, the higher the overall chance of making at least one false positive. This is why adjustments such as Tukey's HSD, Bonferroni correction, or other multiple testing procedures are recommended after ANOVA to control the **family-wise error rate** — that is, the probability of incorrectly rejecting at least one pairwise comparison hypothesis when the original null hypothesis is true:

$$
H_0: \mu_1 = \mu_2 = \dots = \mu_k = 0.
$$

Of course, if one is solely interested in testing whether all group means are equal, it is more appropriate to test them **simultaneously** using the overall F-test from ANOVA, rather than performing multiple pairwise comparisons.


We focus on the practical implementation of Tukey's HSD. It is straightforward: one only needs to pipe an `aov` object to the `TukeyHSD()` function to perform all pairwise comparisons with appropriate adjustment for multiple testing.



In [None]:
aov(value ~ group, data = df) %>% TukeyHSD()

There are several ways to interpret the output of `TukeyHSD()`. One handy approach is to look at the **adjusted p-values** (though this is not covered in detail in this unit). If the adjusted p-value is smaller than the chosen type I error rate $\alpha$, we can reject the null hypothesis that the two group means are equal.  

Another approach is to examine the `lwr` and `upr` columns. These represent confidence intervals for the differences between group means. If a confidence interval does **not** include 0, it is equivalent to rejecting the null hypothesis that the two group means are equal, thanks to the connection between hypothesis testing and confidence intervals.

For example, in our example, we reject the null hypothesis that $\mu_1 = \mu_3$ and do not reject the null hypotheses $\mu_1 = \mu_2$ and $\mu_2 = \mu_3$.


##### **Exercise**

The `PlantGrowth` dataset contains the weight of plants under three different treatment groups (`ctrl`, `trt1`, `trt2`).

- Use the `aov()` function in R to test whether there is a significant difference in mean plant weight among the three groups.
- Interpret the ANOVA table and determine whether the null hypothesis of equal group means can be rejected.
- Perform an appropriate method for pairwise comparisons between the groups.

Assume a 5% Type I error.


In [None]:
PlantGrowth %>% str()


<details>
<summary>▶️ Click to show the solution</summary>

```r
aov(weight~group, PlantGrowth) %>% summary()
F_val = 0.0159
F_val > qf(0.95,2,27)
```
As $F$ statistic is larger than `qf(0.95, 2,27)`, we reject the null hypothesis that there is no difference in mean plant weight between the 3 interventions in favour of the alternative hypothesis.


```r
aov(weight~group, PlantGrowth) %>% TukeyHSD()
```
Pairwise comparisons using the TukeyHSD method suggest that there is **no evidence** against the null hypothesis for `trt2 - ctrl` and `trt1 - trt2` (their mean differences are not statistically significant). However, there **is evidence** that `trt1` and `ctrl` lead to different mean plant weights.

**Note:**  

Even if we find  
- no statistical evidence for a difference between A and B, and  
- no statistical evidence for a difference between A and C,  

this does **not** mean there is no statistical difference between B and C.  

For example, it could be that B has a slightly higher mean than A, and C has a slightly lower mean than A. Each difference might be too small for a t-test to detect individually. However, when comparing B and C directly, the differences "add up," making it easier to detect a statistically significant difference.


</details>

## **Two-Way ANOVA**



### **Formulation of Two-Way ANOVA**
We will not go into two-way ANOVA in as much detail as one-way ANOVA, but the idea is a natural extension. Here, instead of having only one treatment variable, we now allow for two treatment variables (factors). Their essenses are pretty much the same, though.

Suppose we have two factors, **A** with $I$ levels and **B** with $J$ levels. The two-way ANOVA model is:  

$$
Y_{ijk} = \alpha_i + \beta_j + (\alpha\beta)_{ij} + \varepsilon_{ijk},
$$  

where:  
- $\alpha_i$: effect of the $i$-th level of factor A  
- $\beta_j$: effect of the $j$-th level of factor B  
- $(\alpha\beta)_{ij}$: **interaction effect** between level $i$ of A and level $j$ of B  
- $\varepsilon_{ijk} \sim \mathcal{N}(0, \sigma^2)$: random error term  

Sometimes the interaction is not included in the model. However, if interactions are present and significant, they fundamentally change how we interpret the effects of the factors. In particular:

- If the interaction term is **not significant**, the effect of one factor is consistent across the levels of the other factor.  
- If the interaction term is **significant**, the effect of one factor depends on the level of the other factor (the two factors do not act independently).  

#### **Hypothesis Testing for Two-Way ANOVA**
  

In two-way ANOVA, we are interested in testing the following hypotheses:

- **Main effect of factor A:**  
  $$
  H_0^A: \alpha_1 = \alpha_2 = \dots = \alpha_I\\
  H_1^A: \text{At least one } \alpha_i \text{ is different from the others}
  $$  

- **Main effect of factor B:**  
  $$
  H_0^B: \beta_1 = \beta_2 = \dots = \beta_J\\
  H_1^B: \text{At least one } \beta_j\text{ is different from the others}
  $$  

- **Interaction effect:**  
  $$
  H_0^{AB}: (\alpha\beta)_{ij} = 0 \quad \text{for all } i,j\\
  H_1^{AB}: \text{There is an interaction between the two factors}
  $$  

Below is a typical ANOVA table obtained from statistical software (e.g., `R`) for two-way ANOVA.

| Source       | Degrees of Freedom | Sum of Squares | Mean Squares             | F             |
|-------------|------------------|----------------|-------------------------|---------------|
| A           | I - 1            | SSA            | SSA / (I - 1)           | MSA / MSE     |
| B           | J - 1            | SSB            | SSB / (J - 1)           | MSB / MSE     |
| A × B       | (I - 1)(J - 1)   | SSAB           | SSAB / ((I - 1)(J - 1)) | MSAB / MSE    |
| Error       | IJ(K - 1)        | SSE            | SSE / (IJ(K - 1))       |               |
| Total       | IJK - 1          | SST            |                         |               |

The $F$-test for a factor (e.g., factor A) is performed by comparing the mean square of that factor (`MSA`) with the mean square of error (`MSE`) using the corresponding degrees of freedom:

$$
F_A = \frac{MSA}{MSE},
$$


with $\textit{df}_1 = I-1$, $\textit{df}_2 = IJ(K-1)$.



### **Randomised Complete Block Design (RCBD)**

A **block design** is a special case of two-way ANOVA, where one factor is the **treatment of interest** and the other factor is a **blocking factor** (a nuisance variable) used to control for variability across experimental units.

Suppose we have:  
- Treatments: factor A with $I$ levels  
- Blocks: factor B with $J$ levels  
- One replicate per treatment within each block (standard RCBD).

The model is:

$$
Y_{ij} = \tau_i + \beta_j + \varepsilon_{ij},
$$

where:   
- $\tau_i$: effect of the $i$-th treatment (factor of interest)  
- $\beta_j$: effect of the $j$-th block (blocking factor)  
- $\varepsilon_{ij} \sim \mathcal{N}(0, \sigma^2)$: random error term  

**Notes:**

- There is **no interaction term**.
- Including blocks in the model reduces residual variability, making it easier to detect treatment effects.  
- The **null hypothesis** for treatments:

$$
H_0: \tau_1 = \tau_2 = \dots = \tau_I
$$

- The **null hypothesis** for blocks:

$$
H_0: \beta_1 = \beta_2 = \dots = \beta_J
$$

Interpretation of ANOVA output is the same as in one-way or two-way ANOVA. Below is a typical ANOVA table obtained from statistical software (e.g., `R`) for RCBD.

| Source       | Sum of Squares | Degrees of Freedom | Mean Square       | F                  |
|-------------|----------------|-----------------|-----------------|------------------|
| Block       | SSB            | J - 1           | SSB / (J - 1)   | MSB / MSE        |
| Treatment   | SSTr           | I - 1           | SSTr / (I - 1)  | MSTr / MSE       |
| Error       | SSE            | (I - 1)(J - 1)  | SSE / ((I - 1)(J - 1)) |                  |
| Total       | SST            | n - 1           |                 |                  |



### **R Reference**

We can use the `aov()` function (or equivalently `lm() %>% anova()`) to perform two-way ANOVA.

```r
aov(formula,
    data = NULL,
    projections = FALSE,
    qr = TRUE,
    contrasts = NULL,
    ...)
```
There is no fundamental difference from one-way ANOVA, but now we can specify an additional factor and, optionally, their interaction.

| Model formula                  | Includes main effects | Includes interaction | Interpretation                                                                 |
|--------------------------------|-----------------------|----------------------|--------------------------------------------------------------------------------|
| `response ~ factorA + factorB` | ✅ Yes                | ❌ No                | Each factor’s effect is assumed to act **independently** of the other.         |
| `response ~ factorA * factorB` | ✅ Yes                | ✅ Yes               | Each factor’s effect may **depend on the level** of the other factor.          |

Everything else (e.g., interpretation of the ANOVA table, p-values, hypothesis testing, and connections to confidence intervals) follows the same principles as in one-way ANOVA.

## **Workshop Questions**

In this section, a Type I error of 5% is assumed.

### **Question 1**

The built-in dataset `warpbreaks` contains the number of warp breaks per loom, grouped by two factors:  
- `wool` (type of wool: `"A"` or `"B"`)  
- `tension` (tension level: `"L"`, `"M"`, `"H"`)  



In [None]:
warpbreaks %>% str()

#### **Question 1.1**

Use the `aov()` function to test for the effects of `wool`, `tension`, and their interaction on the number of breaks.  Interpret the ANOVA table.



In [None]:
aov(breaks ~ wool * tension, data = warpbreaks) %>% summary()

# Main effect of wool has p-value ≈ 0.058. At 95% significance level, there is insufficient evidence against the null hypothesis that wool type has no effect on the number of breaks, after allowing for tension.
#                                          However, we could interpret this as some/slight evidence against H0 if we are not willing to stick to a strict significance level
# Main effect of tension has p-value ≈ 0.0007 < 0.05. Different tension levels significantly affect the number of breaks, after allowing for wool type.
# Interaction effect (wool:tension): p ≈ 0.02 < 0.05. Evidence against the null hypothesis in favour of the hypothesis that the effect of tension depends on the wool type (and vice versa).

# Of course, one may also compare F statistics to critical values

F_wool = 3.765
F_tension = 8.498
F_interaction = 4.189

F_wool > qf(0.95, 1,48)
F_tension > qf(0.95, 2,48)
F_interaction > qf(0.95, 2,48)


#### **Question 1.2**

Use an appropriate post-hoc method to perform pairwise comparisons between levels of `tension`. Interpret the results to identify significant differences. Comment on the directions of the differences.

In [None]:
aov(breaks ~ wool * tension, data = warpbreaks) %>% TukeyHSD()

# M vs L: CI does not include 0. There is evidence that medium tension produces fewer breaks than low tension.
# H vs L: CI does not include 0. There is evidence that high tension produces fewer breaks than low tension.
# H vs M: CI does include 0. No evidence of difference between high and medium tension.

# all of these should be interpreted as accounting or allowing for having wool type in the model

# Technically, if a symmetric two-sided test is rejected, then the corresponding one-sided test in the same direction as the observed difference (`diff`) would also be rejected.
# So, while TukeyHSD formally only performs two-sided tests for pairwise differences, it is still valid to **describe the observed direction of the difference** when reporting results.


### **Question 2**

The `ToothGrowth` dataset contains the tooth length of 60 guinea pigs. Each animal received one of three dose levels of Vitamin C (0.5, 1, or 2 mg/day) with one of two delivery methods: orange juice (OJ) or ascorbic acid (VC).


In [None]:
ToothGrowth %>% str()

#### **Question 2.1**

Fit a two-way ANOVA model using the `aov()` function to test for the effects of supplement and dose level, and their interaction on tooth length. Interpret the ANOVA table.


<details>
<summary>▶️ Click to show the solution</summary>
Solution will be released at the end of the week!

</details>

#### **Question 2.2**


Use an appropriate post-hoc method to perform pairwise comparisons between dose levels. Interpret the results to identify significant differences. Comment on the directions of the differences.


<details>
<summary>▶️ Click to show the solution</summary>
Solution will be released at the end of the week!

</details>