# Higher-order ANOVA I: The Additive Model
In the previous parts of this lesson, we examined the use of models that only contained a *single* categorical predictor variable. But what about those times when we have *multiple* categorical predictor variables? This is the domain of the *high-order* ANOVA model, which we will be exploring in both this part and the final part of this lesson.

## The Higher-order ANOVA Framework
To begin with, it is important that we establish some terminology and core concepts about factorial experimental designs. This is especially important for ANOVA models, because many of these concepts directly inform the mechanics of how an ANOVA model works. This is by design, as the ANOVA was original developed by Fisher as a principles way of analysing data from factorial designs. As such, we need to understand the concepts behind the experiment in order to understand the analysis.

### Terminology
When we talk about ANOVA models, we usually do so with specific reference to both the number of factors and the number of levels. We saw this earlier with a 1-way ANOVA, so-called because it only has *one* factor. For higher-order ANOVA models, we typical refer to them in the following way. If the model contains two factors, we would call it a 2-way ANOVA. If each factor had 2 levels, we could refer to the specific model as a $2 \times 2$ ANOVA. Here, each number represents a factor, with the value of the number indicative of the number of levels. For instance, if the second factor had 3 levels, we would call it a $2 \times 3$ ANOVA. Similarly, if there were 3 factors, we would have a 3-way ANOVA. If each factor had two levels we could call it a $2 \times 2 \times 2$ ANOVA. If the second factor had 3 levels and the third had 5 levels it would be a $2 \times 3 \times 5$ ANOVA, and so on. Hopefully the pattern is clear.


### Means Tables
The reason for conceptualising the ANOVA in this fashion is that it makes the organisation of the experimental manupulation clear. If we think of the experiment in terms of a *table* then the number of *cells* is given by the multiplication of the factor levels. The simplest example is a $2 \times 2$ ANOVA that can be represented using a table with $2 \times 2 = 4$ cells. For instance

|                       | Factor B: Level 1 | Factor B: Level 2 | 
|-----------------------|-------------------|-------------------|
| **Factor A: Level 1** |                   |                   |
| **Factor A: Level 2** |                   |                   |

The cells therefore represent the *intersection* of the factor levels. All data from the experiment is collected from one of these cells. Thus, the entire structure of the experiment can be represented this way.

### Cell Means
Conceptualising the experiment as a table also makes it clear how we can *summarise* the effects of the experimental manipulations. Given that we have data representative of each cell of the design, the simplest approach is simply to average all the data within each cell to produce a *cell mean*. These cell means are typically represented by the Greek letter $\mu$, with subscripts indicating the specific cell. For instance, $\mu_{12}$ would indicate the mean of the data collected from level 1 of Factor A and level 2 of Factor B. The complete picture of cell means in a $2 \times 2$ design is therefore

|                       | Factor B: Level 1 | Factor B: Level 2 | 
|-----------------------|-------------------|-------------------|
| **Factor A: Level 1** | $\mu_{11}$        | $\mu_{12}$        |
| **Factor A: Level 2** | $\mu_{21}$        | $\mu_{22}$        |

The reason why this is important is because all higher-order ANOVA models are fundamentally concerned with modelling the data using *cell means*. As such, the concept of the cell mean is central to understanding ANOVA models.


### Marginal Means
Beyond cell means, we can also derive another form of summary known as a *marginal mean*, so-called because they are written in the *margins* of the means table. These types of means concern a specific level of one factor, *averaged-over* the other factors. For instance, there are two cell means that contain level 1 of Factor A: $\mu_{11}$ and $\mu_{12}$. The marginal mean for level 1 of Factor A is therefore $(\mu_{11} + \mu_{12})/2 = \mu_{1.}$. Here, the dot subscript is shorthand for an index that has been averaged-over. This marginal mean is therefore representative of level 1 of Factor A, *ignoring* Factor B. We can see the complete picture of both *cell means* and *marginal means* below, where $\mu$ indicates the *grand mean*.

|                         | Factor B: Level 1 | Factor B: Level 2 | Marginal Means of A |
|-------------------------|-------------------|-------------------|---------------------|
| **Factor A: Level 1**   | $\mu_{11}$        | $\mu_{12}$        | $\mu_{1.}$          |
| **Factor A: Level 2**   | $\mu_{21}$        | $\mu_{22}$        | $\mu_{2.}$          |
| **Marginal Means of B** | $\mu_{.1}$        | $\mu_{.2}$        | $\mu$               |

As we go through this section and the next, the relevance of cell means and marginal means will become clearer. For now, it is just important that you understand what these terms mean within the context of a factorial experimental design.

## The Additive Model
Now that we have addressed terminology, we can turn to how we include *multiple* factors within a model. Let us start with the simplest approach, which is to simply add another factor to the model equation. In terms of notation, this is a basic extension of what we already known

$$
y_{ijk} = \mu + \alpha_{i} + \beta_{j} + \epsilon_{ijk},
$$

where $\alpha_{i}$ is the effect associated with Factor A and $\beta_{j}$ is the effect associated with Factor B. The most basic form of this model would be one that represents a $2 \times 2$ design, with $i = 1,2$ and $j = 1,2$. We would therefore have 4 cell means 

|                       | Factor B: Level 1 | Factor B: Level 2 | 
|-----------------------|-------------------|-------------------|
| **Factor A: Level 1** | $\mu_{11}$        | $\mu_{12}$        |
| **Factor A: Level 2** | $\mu_{21}$        | $\mu_{22}$        |

and thus 4 unique predicted values formed from:

$$
\begin{alignat*}{1}
    \mu_{11} &= \mu + \alpha_{1} + \beta_{1} \\
    \mu_{21} &= \mu + \alpha_{2} + \beta_{1} \\
    \mu_{12} &= \mu + \alpha_{1} + \beta_{2} \\
    \mu_{22} &= \mu + \alpha_{2} + \beta_{2}. 
\end{alignat*}
$$

Although probably not immediately obvious, this model assumes that the difference between the levels of each factor is the *same*, irrespective of the levels of the other factor. In other words, the model assumes a *constant* difference between the rows or the columns of the means table. For instance, the two differences between the 1st and 2nd levels of Factor A are

$$
\begin{alignat*}{2}
    \mu_{11} - \mu_{21} &= \left(\mu + \alpha_{1} + \beta_{1}\right) - \left(\mu + \alpha_{2} + \beta_{1}\right) &&= \alpha_{1} - \alpha_{2} \quad\text{(effect of A at level 1 of B)} \\
    \mu_{12} - \mu_{22} &= \left(\mu + \alpha_{1} + \beta_{2}\right) - \left(\mu + \alpha_{2} + \beta_{2}\right) &&= \alpha_{1} - \alpha_{2} \quad\text{(effect of A at level 2 of B)}
\end{alignat*}
$$

As such, no matter the level of Factor B, the effect of Factor A is always $\alpha_{1} - \alpha_{2}$. The same is true across the levels of Factor A, where the effect of Factor B is always $\beta_{1} - \beta_{2}$. In other words, this model make the strong assumption that the two factors are entirely *independant* and do not affect each other in any way. The effects of the factors only *add* and so this is known as the assumption of *additivity*.

It is important to recognise that additivity is actually a *constraint* on the fitting procedure. By specifying the model in this fashion, either least-squares or maximum likelihood will produce estimated means that adhere to additivity. The estimated cell means will therefore have a constant difference between the rows and the columns of the table. However, if the data does not adhere to additivity, the estimated cell means and the sample cell means will be *different*. The degree to which the model does not fit the actual sample means is therefore indicative of the degree to which the additivity assumption does not hold. We will see this in the example below and will be the starting point for justifying the concept of an *interaction* a little later.

```{admonition} ANOVA Examples
:class: tip
It can be difficult to conceptualise what an ANOVA model is saying when working in abstract terms such as "Factor A" or $\mu_{12}$. Often, it is useful to have a concrete example to drive the point home. For instance, imagine that Factor A is *depression diagnosis* with two levels: *depressed* and *non-depressed*. Now imagine that Factor B is *anxiety status* with two levels: *high-anxiety* and *low-anxiety*. Our $2 \times 2$ table of means would be

|                   | Depression: Non-depressed   | Depression: Depressed   | 
|-------------------|-----------------------------|-------------------------|
| **Anxiety: Low**  | Low-anxiety, Non-depressed  | Low-anxiety, Depressed  |
| **Anxiety: High** | High-anxiety, Non-depressed | High-anxiety, Depressed |

<br>

Remembering that the additive model assumes a *constant row difference* and a *constant column difference*, this is the same as assuming that the difference between those with and without depression is the same, irrespective of their anxiety (constant *column* difference). Similarly, this is the same as assuming that the difference between those with high and low anxiety is the same, irrespective of whether they are depressed (constant *row* difference). For this particular example, it would seem unlikely that depression and anxiety are two completely independant conditions that do not influence each other in any way.
```

### Additive Model Example in `R`
As an example, let us expand our `mtcars` analysis with an addition categorical predictor. Within `mtcars` there already exists a factor called `vs` which indicates whether the engine is V-shaped or straight[^engine-foot]. This is already coded as a dummy variable, but we will label it so that it is clearer what it means before turning it into a factor.

In [8]:
data(mtcars)
mtcars$origin <- c('Other','Other','USA','USA','USA','USA','USA','Other','Other','Other',
                   'Other','Other','Other','Other','USA','USA','USA','Other','Other',
                   'Other','Other','USA','USA','USA','USA','Other','Other','Other',
                   'USA','Other','Other','Other')
mtcars$origin <- as.factor(mtcars$origin)

In [9]:
vs.lab <- rep("",length(mtcars$vs)) 
vs.lab[mtcars$vs == 0] <- "V-shaped"
vs.lab[mtcars$vs == 1] <- "Straight"

mtcars$vs <- as.factor(vs.lab)
print(levels(mtcars$vs))

[1] "Straight" "V-shaped"


We will also work with the simpler version of `origin`, where we only had 2 levels: `USA` and `Other`. 

In [10]:
print(levels(mtcars$origin))

[1] "Other" "USA"  


Our table of means is therefore

|                   | VS: Straight | VS: V-shaped | 
|-------------------|--------------|--------------|
| **Origin: Other** | $\mu_{11}$   | $\mu_{12}$   |
| **Origin: USA**   | $\mu_{21}$   | $\mu_{22}$   |

We can now examine how `R` has coded both `vs` and `origin` as dummy variables.

In [11]:
print(contrasts(mtcars$origin))
print(contrasts(mtcars$vs))

      USA
Other   0
USA     1
         V-shaped
Straight        0
V-shaped        1


So there are now 4 unique combinations of dummy values that lead to the 4 cell means

$$
\begin{alignat*}{2}
    \mu_{11} &= \beta_{0} + (\beta_{1} \times \mathbf{0}) + (\beta_{2} \times \mathbf{0}) = \beta_{0} &&\quad\texttt{(Other,Straight)} \\
    \mu_{21} &= \beta_{0} + (\beta_{1} \times \mathbf{1}) + (\beta_{2} \times \mathbf{0}) = \beta_{0} + \beta_{1} &&\quad\texttt{(USA,Straight)} \\
    \mu_{12} &= \beta_{0} + (\beta_{1} \times \mathbf{0}) + (\beta_{2} \times \mathbf{1}) = \beta_{0} + \beta_{2} &&\quad\texttt{(Other,V-shaped)} \\
    \mu_{22} &= \beta_{0} + \underbrace{(\beta_{1} \times \mathbf{1})}_{\texttt{origin}} + \underbrace{(\beta_{2} \times \mathbf{1})}_{\texttt{vs}} = \beta_{0} + \beta_{1} + \beta_{2} &&\quad\texttt{(USA,V-shaped)}\\
\end{alignat*}
$$

Using the same logic we used in the previous part of this lesson, we can work out that

| Parameter   | Meaning                                               | Interpretation           |
|-------------|-------------------------------------------------------|--------------------------|
| $\beta_{0}$ | Mean of `(Other,Straight)` cell                       | Reference cell           |
| $\beta_{1}$ | Mean difference `(USA,Straight) - (Other,Straight)`   | Constant *column* effect |
| $\beta_{2}$ | Mean difference `(Other,V-shaped) - (Other,Straight)` | Constant *row* effect    |

This also helps make sense of how the model prediction works. Thinking of the means table, if we start at the reference cell ($\beta_{0} = \mu_{11}$) and then add the *row effect* ($\beta_{1}$) we move *down* a row and end up at $\mu_{21}$. Instead, if we start at the reference cell ($\beta_{0} = \mu_{11}$) and then add the *column effect* ($\beta_{2}$) we move *across* a column and end up at $\mu_{12}$. Finally, if we start at the reference cell ($\beta_{0} = \mu_{11}$) and then add the *row effect* ($\beta_{1}$) and the *column effect* ($\beta_{2}$) we move down a row and across a column and end up at $\mu_{22}$. 


```{warning}
The ability to parameterise the cell means in this way only works because of the *additive assumptions* about contant row and column effects. If the cell means did not adhere to this, you could not move up/down or left/right across the table of means freely, because the row/column effects would change depending upon which cell you started in. Under the additive model, it does not matter where you start because the differences are always the same.
```

Given all this, we can now see how `R` fits this model and check that it aligns with our understanding

In [12]:
add.mod <- lm(mpg ~ origin + vs, data=mtcars)
print(summary(add.mod))


Call:
lm(formula = mpg ~ origin + vs, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.7035 -3.2079  0.1795  1.9298  8.3965 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   25.504      1.157  22.036  < 2e-16 ***
originUSA     -4.416      1.587  -2.783  0.00939 ** 
vsV-shaped    -6.433      1.571  -4.094  0.00031 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.139 on 29 degrees of freedom
Multiple R-squared:  0.5588,	Adjusted R-squared:  0.5283 
F-statistic: 18.36 on 2 and 29 DF,  p-value: 7.044e-06



These estimates certainly look like the values we determined in the table above, with the intercept giving us a *cell mean* and the two slope parameters giving us *mean differences*. Based on this, we can construct the estimated cell means from the estimates parameters

In [13]:
beta <- coef(add.mod)

# fitted cell means
mu.other.str <- beta[1]
mu.USA.str   <- beta[1] + beta[2]
mu.other.v   <- beta[1] + beta[3]
mu.USA.v     <- beta[1] + beta[2] + beta[3]

# means table
est.means.tbl <- data.frame("origin.other"=c(mu.other.str,mu.other.v),
                            "origin.USA"  =c(mu.USA.str,  mu.USA.v),
                            row.names=c("vs.straight","vs.vshaped"))

print(est.means.tbl)

            origin.other origin.USA
vs.straight     25.50350   21.08716
vs.vshaped      19.07019   14.65385


which should align directly with the 4 unique model predictions[^round-foot]

In [14]:
predicted.vals <- round(fitted(add.mod),8)
print(unique(predicted.vals))

[1] 19.07019 21.08716 14.65385 25.50350


As expected, these fitted values have a constant difference between each row[^unname-foot]

In [15]:
print(unname(mu.other.str - mu.other.v))
print(unname(mu.USA.str   - mu.USA.v))

[1] 6.433314
[1] 6.433314


and a constant difference between each column

In [16]:
print(unname(mu.other.str - mu.USA.str))
print(unname(mu.other.v   - mu.USA.v))

[1] 4.416336
[1] 4.416336


However, these estimated means do not match the actual sample means

In [17]:
# sample cell means
mu.other.str <- mean(mtcars$mpg[mtcars$origin == "Other" & mtcars$vs == "Straight"])
mu.USA.str   <- mean(mtcars$mpg[mtcars$origin == "USA"   & mtcars$vs == "Straight"])
mu.other.v   <- mean(mtcars$mpg[mtcars$origin == "Other" & mtcars$vs == "V-shaped"])
mu.USA.v     <- mean(mtcars$mpg[mtcars$origin == "USA"   & mtcars$vs == "V-shaped"])

# means table
samp.means.tbl <- data.frame("origin.other"=c(mu.other.str, mu.other.v),
                             "origin.USA"  =c(mu.USA.str,   mu.USA.v),  
                             row.names=c("vs.straight","vs.vshaped"))

cat("Sample means:\n") # allows more control over printing than print()
print(samp.means.tbl)
cat("\nEstimated means:\n")
print(est.means.tbl)

Sample means:
            origin.other origin.USA
vs.straight     25.59091   20.76667
vs.vshaped      18.95000   14.75000

Estimated means:
            origin.other origin.USA
vs.straight     25.50350   21.08716
vs.vshaped      19.07019   14.65385


Now, the estimates and the true means are not too far off, so there are a few possibilities here. Firstly, it could be that the assumptions of the additive model do not hold in this example. Alternatively, it is possible that these effects are truly additive in the population and any deviation is simply sampling noise. Or, it could be that the degree to which these two factors influence each other is very minor and that we could treat these factors as additive for the sake of simplicity. We will come back to this when we explore the Full Factorial model in the next part of the lesson.

### Main Effects
In the previous part of this lesson, we discussed the concept of an *omnibus test* as an overall reflection of whether any of the mean differences across the levels of a categorical predictor are non-zero. This idea carries-over into higher-order ANOVA models. The main difference is that our null models are rarely intercept-only. Instead, we include all other factors in the null model *except* the factor of interest. The logic is that the additional factors may relate to the outcome variable and thus should be included in the null model. This is so that the error is as accurate as possible and does not accidentally contain any sources of systematic variation which would make it larger than it should be. Another way of thinking about this is that the omnibus null for one variable is not based on assuming that there are *no relationships with any other variables*. As such, the null model simply involves removing a *single variable*, not all of them[^reganova-foot].

```{admonition} Main Effects Definition
:class: tip
Within the context of an ANOVA, the omnibus effect of a single categorical predictor is known as a *main effect*. Under additivity, this reflects the omnibus effect of a single factor, ignoring all other factors in the model. For instance, what is the effect of `origin`, irrespective of whether an engine is `straight` or `v-sahped`? Or, what is the effect of *depression*, irrespective of whether someone is *highly-anxious* or not? Main effects *require* additivity to be interpretable and are based on comparing *marginal means* for the factor of interest.
```

Based on everything above, the *main effect* of Factor A is equivalent to the following model comparison

$$
\begin{alignat*}{2}
    \mathcal{M}_{0} &: y_{jk}  &&= \mu + \beta_{j} + \epsilon_{jk} \\
    \mathcal{M}_{1} &: y_{ijk} &&= \mu + \alpha_{i} + \beta_{j} + \epsilon_{ijk},
\end{alignat*}
$$

and the main effect of Factor B is equivalent to the following model comparison

$$
\begin{alignat*}{2}
    \mathcal{M}_{0} &: y_{ik}  &&= \mu + \alpha_{i} + \epsilon_{ik} \\
    \mathcal{M}_{1} &: y_{ijk} &&= \mu + \alpha_{i} + \beta_{j} + \epsilon_{ijk}.
\end{alignat*}
$$

In each case, we simply remove the terms associated with the factor of interest and see whether the change in the residual sums-of-squares is large relative to the error. This is then equivalent to comparing all the *marginal means* associated with a single factor. In the case of a $2 \times 2$ design, the omnibus null for these factors would be

$$
\begin{alignat*}{1}
    \mathcal{H}_{0} &: \mu_{1.} = \mu_{2.} \quad\text{(Main effect of Factor A)} \\
    \mathcal{H}_{0} &: \mu_{.1} = \mu_{.2} \quad\text{(Main effect of Factor B)}. \\
\end{alignat*}
$$

Although both these hypotheses only involve two means, they still remain dissimilar to a $t$-test by virtue of taking the *other factor* into account. This is not something you can do with a simple $t$-test. Of course, when dealing with factors that contain $> 2$ levels, the need for an omnibus test becomes even clearer.

#### Main Effects in `R`
As we saw previously, we can form omnibus tests from explicit model comparisons using the `anova()` function. Using this method, the main effect of `origin` would be

In [18]:
null.mod <- lm(mpg ~ vs,          data=mtcars)
full.mod <- lm(mpg ~ origin + vs, data=mtcars)

print(anova(null.mod,full.mod))

Analysis of Variance Table

Model 1: mpg ~ vs
Model 2: mpg ~ origin + vs
  Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
1     30 629.52                                
2     29 496.86  1    132.66 7.7428 0.009386 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


and the main effect of `vs` would be

In [19]:
null.mod <- lm(mpg ~ origin,      data=mtcars)
full.mod <- lm(mpg ~ origin + vs, data=mtcars)

print(anova(null.mod,full.mod))

Analysis of Variance Table

Model 1: mpg ~ origin
Model 2: mpg ~ origin + vs
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1     30 784.06                                  
2     29 496.86  1     287.2 16.763 0.0003096 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Or, more practically, we can simply give the full model to the `Anova()` function from the `car` package.

In [20]:
library(car)
print(Anova(add.mod))

Loading required package: carData



Anova Table (Type II tests)

Response: mpg
          Sum Sq Df F value    Pr(>F)    
origin    132.66  1  7.7428 0.0093864 ** 
vs        287.20  1 16.7628 0.0003096 ***
Residuals 496.86 29                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Compare the `Anova()` output with the outputs from the model comparisons to reassure yourself that these are *identical*. 

Based on this result, there are significant differences in `mpg` between the levels of both `origin` and `vs`, assuming the two factors have independent influences on `mpg`. In context, this is assuming that there is nothing about where the car is manufactured that influences the shape of the engine, nor anything about the shape of the engine that influences where it was manufactured. Based on this, we would conclude that *both* the engine shape and the country of origin independently influence MPG.

`````{topic} What do you now know?
In this section, we have explored ... . After reading this section, you should have a good sense of:

- The difference between *cell means*, *marginal means* and how these correspond to a factorial experimental design.
- ...

`````

[^engine-foot]: Not that it really matters for understanding an ANOVA, but a straight engine has cylinders in a straight line, while a V-shaped engine has two banks of cylinders arranged in a "V". V-shaped engines tend to be more compact and powerful. Straight engines are often more balanced, whereas V engines are shorter and can be better for performance-focused applications.

[^round-foot]: The use of the `round()` function here is only because, due to the way that `R` solves for the model parameters, there can be some differences between estimates deep into the decimal places. This can fool the `unique()` function, which will return values that are apparently identical. This is simply an issue with numeric precision. As such, it is always worth using `round()` in combination with `unique()` to prevent this.

[^unname-foot]: All the `unname()` function is doing here is preventing `R` from printing any labels alongside these results and making them less clear. Sometimes, values carry labels with them across different calculations and you get some odd names, like `(Intercept)` being printed above the calculations. This stops that from happening. You can remove it, if you like, and see what happens.

[^reganova-foot]: If we want to remove *everything*, we can always use the regression ANOVA report at the bottom of the summary table, as mentioned at the end of the previous part of the lesson.