# Higher-order ANOVA Models


## The Higher-order ANOVA Framework

### Terminology and Mean Tables

### Cell Means and Marginal Means


```{admonition} Higher-order terminology
:class: tip
... For instance, the most basic higher-order ANOVA is the two-way ANOVA, that contains two factors. IF each factor had two levels, we would write this as a $2 \times 2$ ANOVA. If the second factor had three levels, we would call it a $2 \times 3$ ANOVA. If there were *three* factors, we would have a three-way ANOVA. If each factor had two levels we could call it a $2 \times 2 \times 2$ ANOVA, and so on.
```

## The Additive Model
Let us start with the simplest approach, which is to just add another factor to the model. In terms of notation, this is a basic extension of what we already known

$$
y_{ijk} = \mu + \alpha_{i} + \beta_{j} + \epsilon_{ijk},
$$

where $\alpha_{i}$ is the effect associated with Factor A and $\beta_{j}$ is the effect associated with Factor B. The most basic form of this model would be one that represents a $2 \times 2$ design, with $i = 1,2$ and $j = 1,2$.

It is important to recognise at this point the assumptions that this model makes. If we stick with a $2 \times 2$ design then we have 4 cell means 

|                       | Factor B: Level 1 | Factor B: Level 2 | 
|-----------------------|-------------------|-------------------|
| **Factor A: Level 1** | $\mu_{11}$        | $\mu_{12}$        |
| **Factor A: Level 2** | $\mu_{21}$        | $\mu_{22}$        |

and thus 4 unique predicted values formed from:

$$
\begin{alignat*}{1}
    \mu_{11} &= \mu + \alpha_{1} + \beta_{1} \\
    \mu_{21} &= \mu + \alpha_{2} + \beta_{1} \\
    \mu_{12} &= \mu + \alpha_{1} + \beta_{2} \\
    \mu_{22} &= \mu + \alpha_{2} + \beta_{2}. 
\end{alignat*}
$$

Although probably not immediately obvious, this model makes the assumption that the difference between the levels of each factor is the *same* irrespective of the levels of the other factor. In other words, the model assumes a *constant* difference between the rows or the columns of the means table, no matter which cell you start in. For instance, the two differences between the 1st and 2nd levels of Factor A are

$$
\begin{alignat*}{2}
    \mu_{11} - \mu_{21} &= \left(\mu + \alpha_{1} + \beta_{1}\right) - \left(\mu + \alpha_{2} + \beta_{1}\right) &&= \alpha_{1} - \alpha_{2} \\
    \mu_{12} - \mu_{22} &= \left(\mu + \alpha_{1} + \beta_{2}\right) - \left(\mu + \alpha_{2} + \beta_{2}\right) &&= \alpha_{1} - \alpha_{2}
\end{alignat*}
$$

As such, no matter the level of Factor B, $\alpha_{1} - \alpha_{2}$ is always the same. The same is true across the levels of Factor A, where $\beta_{1} - \beta_{2}$ is always the same. In other words, this model make the strong assumption that the two factors are entirely *independant* and do not affect each other in any way. 


```{admonition} Grounding ANOVA Examples
:class: tip
It can be difficult at times to conceptualise what an ANOVA model is saying when working in abstract terms such as "Factor A" or $\mu_{12}$. Often, it is useful to have a concrete example to drive the point home. For instance, imagine that Factor A is *depression diagnosis* with two levels: *depressed* and *non-depressed*. Now imagine that Factor B is *anxiety status* with two levels: *high-anxiety* and *low-anxiety*. Our $2 \times 2$ table of means would be

|                   | Depression: Non-Depressed | Depression: Depressed | 
|-------------------|---------------------------|-----------------------|
| **Anxiety: Low**  | $\mu_{11}$                | $\mu_{12}$            |
| **Anxiety: High** | $\mu_{21}$                | $\mu_{22}$            |


Remembering that the additive model assumes a *constant* row difference and a *constant* column difference, this is the same as assuming that the difference between those with and without depression is the same, irrespective of their anxiety. Similarly, this is the same as assuming that the difference between those with high and low anxiety is the same, irrespective of whether they are depressed. Of course, this depends entirely on what our outcome measure actually is. However, in the real world, it would seem unlikely that depression and anxiety are two completely independant conditions that do not influence each other in any way. As such, this assumption of the additive model is somewhat questionable.

```

It is important to recognise that this assumption of additivity is actually a *constraint* on the fitting procedure. By specifying the model in this fashion, either least-squares or maximum likelihood will produce estimated means that adhere to additivity. The estimated cell means will therefore have a constant difference between the rows and the columns of the table. However, if this assumption is not true, the estimated call means and the sample cell means will be *different*. The degree to which the model does not fit the actual sample means is therefore indicative of the degree to which the additivity assumption does not hold. We will see this in the example below and will be the starting point for justifying the concept of an *interaction* a little later.

### Additive Model Example in `R`
As an example, let us expand our `mtcars` example with an addition categorical predictor. Within `mtcars` there already exists a factor called `vs` which indicates whether the engine is V-shaped or straight. This is already coded as a dummy variable, but we will label it so that it is clearer what it means before turning it into a factor.

In [27]:
data(mtcars)
mtcars$origin <- c('Other','Other','USA','USA','USA','USA','USA','Other','Other','Other',
                   'Other','Other','Other','Other','USA','USA','USA','Other','Other',
                   'Other','Other','USA','USA','USA','USA','Other','Other','Other',
                   'USA','Other','Other','Other')
mtcars$origin <- as.factor(mtcars$origin)

In [28]:
vs.lab <- rep("",length(mtcars$mpg)) 
vs.lab[mtcars$vs == 0] <- "v-shaped"
vs.lab[mtcars$vs == 1] <- "straight"

mtcars$vs <- vs.lab
mtcars$vs <- as.factor(mtcars$vs)
print(levels(mtcars$vs))

[1] "straight" "v-shaped"


We will also work with the simpler version of `origin`, where we only had 2 levels: `USA` and `Other`. Our table of means is therefore

|                  | Origin: Other | Origin: USA | 
|------------------|---------------|-------------|
| **VS: Straight** | $\mu_{11}$    | $\mu_{12}$  |
| **VS: V-shaped** | $\mu_{21}$    | $\mu_{22}$  |

We can now examine how `R` has coded both `vs` and `origin` as dummy variables.

In [29]:
print(contrasts(mtcars$origin))
print(contrasts(mtcars$vs))

      USA
Other   0
USA     1
         v-shaped
straight        0
v-shaped        1


So there are now 4 unique combinations of dummy values that lead to the 4 cell means. This leads to

$$
\begin{alignat*}{2}
    &\mu^{(\texttt{other},\texttt{straight})} &&= \beta_{0} + (\beta_{1} \times \mathbf{0}) + (\beta_{2} \times \mathbf{0}) = \beta_{0} \\
    &\mu^{(\texttt{USA},\texttt{straight})}   &&= \beta_{0} + (\beta_{1} \times \mathbf{1}) + (\beta_{2} \times \mathbf{0}) = \beta_{0} + \beta_{1} \\
    &\mu^{(\texttt{other},\texttt{v-shaped})} &&= \beta_{0} + (\beta_{1} \times \mathbf{0}) + (\beta_{2} \times \mathbf{1}) = \beta_{0} + \beta_{2} \\
    &\mu^{(\texttt{USA,v-shaped})}   &&= \beta_{0} + \underbrace{(\beta_{1} \times \mathbf{1})}_{\texttt{origin}} + \underbrace{(\beta_{2} \times \mathbf{1})}_{\texttt{vs}} = \beta_{0} + \beta_{1} + \beta_{2} \\
\end{alignat*}
$$

Based on this we can summise that

| Parameter   | Meaning                                               | Interpretation           |
|-------------|-------------------------------------------------------|--------------------------|
| $\beta_{0}$ | Mean of `(other,straight)` cell                       | Reference cell           |
| $\beta_{1}$ | Mean difference `(USA,straight) - (other,straight)`   | Constant *column* effect |
| $\beta_{2}$ | Mean difference `(other,v-shaped) - (other,straight)` | Constant *row* effect    |

This also helps make sense of how the model prediction works. If we start at the reference cell ($\beta_{0} = \mu_{11}$) and then add the *row effect* ($\beta_{1}$) we move *down* a row and end up at $\mu_{21}$. Instead, if we start at the reference cell ($\beta_{0} = \mu_{11}$) and then add the *column effect* ($\beta_{2}$) we move *across* a column and end up at $\mu_{12}$. Finally, if we start at the reference cell ($\beta_{0} = \mu_{11}$) and then add the *row effect* ($\beta_{1}$) and the *column effect* ($\beta_{2}$) we move down a row and across a column and end up at $\mu_{22}$. Note that this only works because of the additive assumptions about contant row and column effects.

Given all this, we can now see how `R` fits this model and check that it aligns with our understanding

In [30]:
add.mod <- lm(mpg ~ origin + vs, data=mtcars)
print(summary(add.mod))


Call:
lm(formula = mpg ~ origin + vs, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.7035 -3.2079  0.1795  1.9298  8.3965 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   25.504      1.157  22.036  < 2e-16 ***
originUSA     -4.416      1.587  -2.783  0.00939 ** 
vsv-shaped    -6.433      1.571  -4.094  0.00031 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.139 on 29 degrees of freedom
Multiple R-squared:  0.5588,	Adjusted R-squared:  0.5283 
F-statistic: 18.36 on 2 and 29 DF,  p-value: 7.044e-06



Based on this, we can construct the estimated cell means

In [49]:
beta <- coef(add.mod)

# fitted cell means
mu.other.straight <- beta[1]
mu.USA.straight   <- beta[1] + beta[2]
mu.other.vshaped  <- beta[1] + beta[3]
mu.USA.vshaped    <- beta[1] + beta[2] + beta[3]

# means table
means.tbl <- data.frame("origin.other"=c(mu.other.straight,mu.other.vshaped),
                        "origin.USA"  =c(mu.USA.straight,  mu.USA.vshaped),
                        row.names=c("vs.straight","vs.vshaped"))

print(means.tbl)

            origin.other origin.USA
vs.straight     25.50350   21.08716
vs.vshaped      19.07019   14.65385


As expected, these fitted values have a constant difference between each row

In [47]:
print(unname(mu.other.straight - mu.other.vshaped))
print(unname(mu.USA.straight   - mu.USA.vshaped))

[1] 6.433314
[1] 6.433314


and a constant difference between each column

In [48]:
print(unname(mu.other.straight - mu.USA.straight))
print(unname(mu.other.vshaped  - mu.USA.vshaped))

[1] 4.416336
[1] 4.416336


Unfortunately, these estimated means do not match the actual sample means

In [52]:
# sample cell means
mu.other.straight <- mean(mtcars$mpg[mtcars$origin == "Other" & mtcars$vs == "straight"])
mu.USA.straight   <- mean(mtcars$mpg[mtcars$origin == "USA"   & mtcars$vs == "straight"])
mu.other.vshaped  <- mean(mtcars$mpg[mtcars$origin == "Other" & mtcars$vs == "v-shaped"])
mu.USA.vshaped    <- mean(mtcars$mpg[mtcars$origin == "USA"   & mtcars$vs == "v-shaped"])

# means table
means.tbl <- data.frame("origin.other"=c(mu.other.straight,mu.other.vshaped),
                        "origin.USA"  =c(mu.USA.straight,  mu.USA.vshaped),
                        row.names=c("vs.straight","vs.vshaped"))

print(means.tbl)

            origin.other origin.USA
vs.straight     25.59091   20.76667
vs.vshaped      18.95000   14.75000


These are not too far off, however, this does suggest that the assumptions of the additive model may not hold in this example. It is also possible that the fitted values are close enough that we can treat these factors as additive for simplicity. We will come back to this when we explore the Full Factorial model below. First, we need to discuss the ANOVA omnibus effects in an additive $2 \times 2$ design.

### Main Effects

So, the main effect of Factor A is equivalent to the following model comparison

$$
\begin{alignat*}{2}
    \mathcal{M}_{0} &: y_{jk}  &&= \mu + \beta_{j} + \epsilon_{jk} \\
    \mathcal{M}_{1} &: y_{ijk} &&= \mu + \alpha_{i} + \beta_{j} + \epsilon_{ijk},
\end{alignat*}
$$

and the main effect of Factor B is equivalent to the following model comparison

$$
\begin{alignat*}{2}
    \mathcal{M}_{0} &: y_{ik}  &&= \mu + \alpha_{i} + \epsilon_{ik} \\
    \mathcal{M}_{1} &: y_{ijk} &&= \mu + \alpha_{i} + \beta_{j} + \epsilon_{ijk}.
\end{alignat*}
$$

In each case, we simply remove the terms associated with the factor of interest and see whether the change in the residual sums-of-squares is large relative to the error.

As we saw previously, we can do this as explicit model comparisons using the `anova()` function, where the main effect of `origin` would be:

In [38]:
null.mod <- lm(mpg ~ vs,          data=mtcars)
full.mod <- lm(mpg ~ origin + vs, data=mtcars)

print(anova(null.mod,full.mod))

Analysis of Variance Table

Model 1: mpg ~ vs
Model 2: mpg ~ origin + vs
  Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
1     30 629.52                                
2     29 496.86  1    132.66 7.7428 0.009386 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


and the main effect of `vs` would be:

In [39]:
null.mod <- lm(mpg ~ origin,      data=mtcars)
full.mod <- lm(mpg ~ origin + vs, data=mtcars)

print(anova(null.mod,full.mod))

Analysis of Variance Table

Model 1: mpg ~ origin
Model 2: mpg ~ origin + vs
  Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
1     30 784.06                                  
2     29 496.86  1     287.2 16.763 0.0003096 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Or, we could much more easily give the full model to the `Anova()` function from the `car` package to deal with the model comparisons automatically.

In [35]:
library(car)
print(Anova(full.mod))

Anova Table (Type II tests)

Response: mpg
          Sum Sq Df F value    Pr(>F)    
origin    132.66  1  7.7428 0.0093864 ** 
vs        287.20  1 16.7628 0.0003096 ***
Residuals 496.86 29                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Which agrees with the comparisons we ran above. Notice, however, that generating the full ANOVA table using `anova()` gives us a *different* answer.

In [36]:
print(anova(full.mod))

Analysis of Variance Table

Response: mpg
          Df Sum Sq Mean Sq F value    Pr(>F)    
origin     1 341.99  341.99  19.961 0.0001110 ***
vs         1 287.20  287.20  16.763 0.0003096 ***
Residuals 29 496.86   17.13                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


We will discuss the reasons why later on the unit. For now, this should be evidence enough for always using `Anova()` instead of `anova()`.

## The Full Factorial Model

So, the interaction terms represents that "bit extra" that we need in order to allow the fitted means to be the same as the sample means. The additive model only gets us so far. The interaction terms allows for a cell-specific adjustment to shift the additive model means so that they are the actual cell means. The larger this adjustment, the larger the interaction effect. In other words, the more that effect of one factor depends upon the levels of the other. 

Note that you do not *have* to include an interaction. A typical approach in Psychology is always to specify a *full factorial* model containing every main effect and every possible interaction. In general, this is probably the approach you should take, otherwise you make strong assumptions that all interactions are 0. However, there is nothing that says you *have* to do this. You have the flexibility to only include the effects that you want in the model. You just have to be aware of the consequences of doing so. 

$$
SS_{\text{model}} = SS_{A} + SS_{B} + SS_{AB}
$$

### Cell Means and Marginal Means


```{admonition} Interaction notation
:class: tip
There are different conventions for writing an interaction into a model. Some authors like to add an additional symbol to denote the interaction, leaving the subscripts to imply that the term is an interaction rather than a main effect. For instance

$$
y_{ijk} = \mu +\alpha_{i} + \beta_{j} + \gamma_{ij} + \epsilon_{ijk},
$$

where $\gamma_{ij}$ is the interaction. Personally, we like to use the following notation as it makes the interaction terms much more explicit, particularly in terms of which effects the interaction corresponds to

$$
y_{ijk} = \mu +\alpha_{i} + \beta_{j} + (\alpha\beta)_{ij} + \epsilon_{ijk}.
$$

For a model containing multiple interactions, this helps to make the meaning of the terms clearer, rather than somewhat hiding it in the subscripts. For instance, a 3-way full factorial ANOVA would be

$$
y_{ijkl} = \mu +\alpha_{i} + \beta_{j} + \gamma_{k} + (\alpha\beta)_{ij} + (\beta\gamma)_{jk} + (\alpha\gamma)_{ik} + (\alpha\beta\gamma)_{ijk} + \epsilon_{ijkl},
$$

where each main effect, 2-way interaction and 3-way interaction is clearly denoted.
```


### Interactions as Multiplicative Effects

### Interpreting Main Effects in the Presence of an Interaction
... We can see that averaging over either the rows or columns of the means table only makes sense when additivity is assumed. If we do this when there is not a constant row or column effect, the very meaning of the main effect breaks down. If we want to think of a main effect as the consistent effect of a factor irrespective of other factors, this is no longer valid when those effects are *not* consistent. As such, main effects must assume additivity to make any sense. If there is a large interaction effect, the very concept of a main effects no longer make sense. 

Indeed, you do not need to know anything about the model to see this. The interaction tells us that the main effect depends upon the level of another factor. Why would we then try to look at a main effect *ignoring* that factor? We know that other factor matters. That is what the interaction tells us. Unfortuantely, it is common practise in Psychology to ignore this and try to interpret main effects in the presence of significant interactions. Hopefully it is clear how meaningless this actually is.