# Unbalanced ANOVA Models

```{figure} images/unbalanced-text.webp
---
scale: 80%
align: right
---
```

... Indeed, whole textbooks were written about unbalanced data (as can be seen on the *right*). So this is a topic that deserves some attention, even if it is largely *ignored* by modern teaching in Psychology. There is something of an assumption that the issues of balance have been *solved* and thus do not need considering anymore. However, this is not really true. The "solution" implemented by SAS and SPSS is the Type III sums-of-squares, which researchers continue to use because it is the default[^default-foot]. However, as discussed briefly last week, this approach is highly flawed.

In this part of the lesson, we will dig deeper into the Type I/II/III debate so that you understand what each type of sums-of-squares means, when they are most appropriate to use and what the various arguments are for/against them. In general, we will be recommending Type II for 95% of all use-cases. However, it is important not to just take our word for it. Instead, it is important that you *understand* the difference and can make your own informed judgement.

## The Problem of Imbalance
The arithmetic behind the traditional ANOVA relates to a simple decomposition of the sums-of-squares. When there are an equal number of data points in each cell (using a 2-way ANOVA as an example), we simply have

$$
SS_{\text{A}} + SS_{\text{B}} + SS_{\text{AB}} = SS_{\text{Model}}.
$$

So, the total amount of variance explained by the model can be neatly decomposed into several chunks. These decompositions are said to be *orthogonal*, which you can take to mean *independant*. The value of each sum-of-squares is not affected by any of the others and they represent a neat and simple partition of the amount explained by the model. Together, we then have

$$
SS_{\text{Total}} = SS_{\text{Model}} + SS_{\text{Error}}.
$$

Unfortunately, when there is an *unequal* number of data points across cells, application of the standard ANOVA equations results in

$$
SS_{\text{A}} + SS_{\text{B}} + SS_{\text{AB}} > SS_{\text{Model}}.
$$

Adding these decompositions together is now *not* the same as the amount of variance explained by the model. What happens is that the effects "bleed" into each other. They no longer represent an independent partition of the variance. A lack of balance kills the symmetry that allows the ANOVA to neatly decompose the variance. What this means practically is that each effect now contains some element of the other effects and adding them together means we double-count some chunks of variance. This leads to a larger sum $\left(SS_{\text{A}} + SS_{\text{B}} + SS_{\text{AB}}\right)$ than the model actually explains. 

What does this mean in terms of applying an ANOVA model to unbalanced data? It means that each sums-of-squares we calculate is influenced *by the other terms in the model*. This means we have several options when we decompose the sums-of-squares related to what else is in the model at the time. Each chunk that gets calculated will represent the variance associated with a given effect *minus* the overlap with anything else in the model. Unfortunately, wherever there is choice, there is also disagreement. For an unbalance ANOVA, this disagreement surrounds three possible ways of decomposing the sums-of-squares in an unbalanced ANOVA model. These are known as Type I, Type II and Type III sums-of-squares and will be the focus of this part of the lesson.

### Venn Diagram Intuition
Perhaps the simplest way to gain intuition about what happens in an unbalanced ANOVA is to return to the Venn diagram visualisation we saw previously in multiple regression. Here, each circle represents the sum-of-squares associated with each main effect $\text{A}$ and $\text{B}$, along with their interaction $\text{AB}$.

When the ANOVA is *balanced*, the situation is as shown below

```{figure} images/venn-diagrams/orthog-ANOVA.png
---
scale: 55%
align: center
---
```

Here, there is no overlap between the circles. Each effect is completely independant and it does not mater what else is in the model at the point where we calculate its sum-of-squares. We could entirely remove $\text{B}$ and $\text{AB}$ when calculating $SS_{\text{A}}$ and it would not make any difference. The other model terms therefore do not matter[^modterms-foot].

When the ANOVA is *unbalanced*, the situation is as shown below

```{figure} images/venn-diagrams/unbalanced-ANOVA.png
---
scale: 55%
align: center
---
```

Here, the effects now *overlap*. This means there is some element of both $\text{B}$ and $\text{AB}$ inside the sum-of-squares for $\text{A}$. This tells us why the sum of these terms is too big. If we sum the area of the $\text{A}$ circle, $\text{B}$ circle and $\text{C}$ circle we will double-count the areas of overlap. This will be larger than the total area of all the circles (the $SS_{\text{Model}}$). Furthermore, because each sum-of-squares will now depend upon the other terms in the model, we now have several options when it comes to calculating them. 

Using $SS_{\text{A}}$ as an examples, we could 

- Calculate $SS_{\text{A}}$ with nothing else in the model
- Calculate $SS_{\text{A}}$ with $\text{B}$ in the model, but no $\text{AB}$. 
- Calculate $SS_{\text{A}}$ with *both* $\text{B}$ and $\text{AB}$ in the model. 

In each case, the $SS_{\text{A}}$ will represent only the *unique* portion of the cicle, with the overlaps removed. These options are illustrated below and correspond to the Type I, Type II and Type III sums-of-squares.

```{figure} images/venn-diagrams/SS-Types.png
---
scale: 55%
align: center
---
```

## The Principle of Marginality
In trying to determine which of the sums-of-squares to choose, we can be guided by the idea of building *meaningful* models. This is encapsulated by the *principle of marginality*, which was laid out by [Nelder (1977)](https://www.jstor.org/stable/2344517) as a response to his dissatisfaction with the way that linear models were being applied in statistics. In brief, this principle states that *if an interaction is in the model, all the constituent lower-order terms must also be in the model*.

For example, if we include $\text{AB}$, we must also include $\text{A}$ and $\text{B}$. If we include $\text{ABC}$ we must also include $\text{A}$, $\text{B}$, $\text{C}$, $\text{AB}$, $\text{AC}$, $\text{BC}$. From this perspective, interpretation flows *downwards*. We interpret the highest-order significant terms and ignore any lower-order terms that are nested inside it. We must start at the *bottom* of the ANOVA table and work our way up, respecting marginality along the way. This reflects the fact that removing a lower-order term destroys the meaning of the interaction term. Remember, an interaction is defined as a *departure from additivity*. Without the additive components present in the model, this deviation has no meaning. 

For instance, if we fit the model

$$
y_{ijk} = \mu + \beta_{k} + (\alpha\beta)_{jk} + \epsilon_{ijk},
$$

then there is no main effect of $\text{A}$, there is only the main effect of $\text{B}$ and the $\text{AB}$ interaction. However, the interaction term $(\alpha\beta)_{jk}$ will no longer behave as a "deviation from additivity". Instead, it will need to soak-up both the main effect of $\text{A}$ and the $\text{AB}$ interaction. So the value of this term will be a combination of the main effect and the interaction, which is uninterpretable. This is all because this model does not respect the principle of marginality and thus the clean interpretation of each term is destroyed. Although the model above does not seem sensible, we will see below that the Type III sums-of-squares actually make *implicit* comparisons with these forms of models.   


## Type I Sums-of-squares
With some intuition about the problem of imbalance, and armed with the principle of marginality, we can now explore our options for deriving sums-of-squares in unbalanced models. We will start with Type I. As indicated above, the Type I sums-of-squares for factor $\text{A}$ are calculated based on nothing else being in the model. However, this does not tell the full story. To understand the Type I effects more clearly, it is useful to introduce some new notation. In comparing the following two models

$$
\begin{alignat*}{1}
    \mathcal{M}_{0} &: y_{ijk} = \mu + \beta_{k} + \epsilon_{ijk} \\
    \mathcal{M}_{1} &: y_{ijk} = \mu + \alpha_{j} + \beta_{k} + \epsilon_{ijk} \\
\end{alignat*}
$$

we can denote the *reduction* in residual sums-of-squares as follows

$$
R(\alpha|\beta).
$$

This is read as "the reduction in residual sums-of-squares for $\alpha$, after taking $\beta$ into account". So the terms on the *left* of $|$ are added and removed between the two models, whereas the terms on the *right* of $|$ remain in both models.

With this in mind, the Type I sums-of-squares are a *sequential* decomposition, where terms are added *in the same order* as the model equation. Thus, for a 2-way ANOVA, the sums-of-squares for each term are as follows

$$
\begin{alignat*}{1}
    SS_{\text{A}}  &: R(\alpha|\mu)        \\
    SS_{\text{B}}  &: R(\beta|\mu, \alpha) \\
    SS_{\text{AB}} &: R((\alpha\beta)|\mu, \alpha, \beta)
\end{alignat*}
$$

As such, the order of the model equation matters here. Each term is added *in turn* and then compared to the previous model. So we start with only $\mu$, we then add $\alpha$ and see what the difference is. We then add $\beta$ and see what the difference is from the previous model. Finally, we add the interaction and see what the difference is from the previous model. Written long-hand, this would be:

$$
\begin{alignat*}{1}
SS_{\text{A}}&
\begin{cases}
  y_{ijk} = \mu + \epsilon_{ijk} \\
  y_{ijk} = \mu + \alpha_{j} + \epsilon_{ijk}   
\end{cases} \\
SS_{\text{B}}&
\begin{cases}
  y_{ijk} = \mu + \alpha_{j} + \epsilon_{ijk} \\
  y_{ijk} = \mu + \alpha_{j} + \beta_{k} + \epsilon_{ijk}   
\end{cases} \\
SS_{\text{AB}}&
\begin{cases}
  y_{ijk} = \mu + \alpha_{j} + \beta_{k} + \epsilon_{ijk} \\
  y_{ijk} = \mu + \alpha_{j} + \beta_{k} + (\alpha\beta)_{jk} + \epsilon_{ijk}
\end{cases}
\end{alignat*}
$$

In terms of the Venn diagram intuition, we can see below what the standard order of terms produces in terms of the Type I tests.

```{figure} images/venn-diagrams/SS-I.png
---
scale: 50%
align: center
---
```

Of these, the test of $\text{B}$ and $\text{AB}$ are useful because they take other terms into account. However, the test of $\text{A}$ is less useful, because it does not take the effect of $\text{B}$ into account. Importantly, however, this will change entirely if the model is specified in a *different* order. This reliance on order makes the Type I sums-of-squares dubious in their usefulness. Unfortunately, this is exactly what the `anova()` function from base `R` produces and why this method is not really suitable for *unbalanced* data. Here, the adherence to marginality depends entirely upon the order in which the terms enter the model and will only produce *some* useful tests, but not necessarily all.

`````{admonition} Why does R default to Type I sums-of-squares?
:class: info
It may seem strange that the `anova()` function would choose to use the Type I sums-of-squares. However, we need to understand that `anova()` is only designed for use on *balanced* data. With this in mind, the choice of decomposition technique is entirely due to computational ease. The Type I effects are easy to calculate because we simply loop through each term, comparing the change in residual sums-of-squares to the previous model. Under balance, this is identical to the traditional ANOVA decompositon and is also identical to Type II and Type III tests. From this perspective, it is easy to see why `anova()` does things this way. However, it is important to understand that this should not be used with *unbalanced* data. At least, not without understanding what the tests actually mean.
``````

## Type II Sums-of-squares
Moving on to Type II, these sums-of-squares have a *strict* adherence to marginality. Indeed, these are the only tests that do this and is the reason why they are the recommended approach. In brief, each Type II effect is tested based on a model that contains none of its *higher-order* relatives. We will unpack what this means in more detail in this section.

As an example, in the 2-way ANOVA we have the effects $\text{A}$, $\text{B}$ and $\text{AB}$. If we wanted to test $\text{A}$ then the Type II tests would include all the *lower-order* terms and *same-order* terms, but none of the *higher-order* terms that involve $\text{A}$. This would mean including $\text{B}$ but *not* $\text{AB}$. Similarly, the Type II test of $\text{B}$ would include $\text{A}$, but *not* $\text{AB}$. Finally, the test of $\text{AB}$ would include both $\text{A}$ *and* $\text{B}$. In this scheme, the sums-of-squares would be

$$
\begin{alignat*}{1}
    SS_{\text{A}}  &: R(\alpha|\mu, \beta)        \\
    SS_{\text{B}}  &: R(\beta|\mu, \alpha) \\
    SS_{\text{AB}} &: R((\alpha\beta)|\mu, \alpha, \beta)
\end{alignat*}
$$

Long-hand, this gives:

$$
\begin{alignat*}{1}
SS_{\text{A}}&
\begin{cases}
  y_{ijk} = \mu + \beta_{k} + \epsilon_{ijk} \\
  y_{ijk} = \mu + \alpha_{j} + \beta_{k} + \epsilon_{ijk}   
\end{cases} \\
SS_{\text{B}}&
\begin{cases}
  y_{ijk} = \mu + \alpha_{j} + \epsilon_{ijk} \\
  y_{ijk} = \mu + \alpha_{j} + \beta_{k} + \epsilon_{ijk}   
\end{cases} \\
SS_{\text{AB}}&
\begin{cases}
  y_{ijk} = \mu + \alpha_{j} + \beta_{k} + \epsilon_{ijk} \\
  y_{ijk} = \mu + \alpha_{j} + \beta_{k} + (\alpha\beta)_{jk} + \epsilon_{ijk}
\end{cases}
\end{alignat*}
$$

So, notice that the main effects are tested *assuming their interaction is 0*. 

In general, for a given effect $T$ the Type II rule is that the model comparison is based on including all lower-order terms, all same-order terms and all higher-order terms that *do not* contain $T$. The reason is that this scheme respects marginality. This is the only way to guarantee that each test is always based on an effect that is *interpretable*. If we make sure we are only ever testing the highest-order effect for the factor in question, it means the hypothesis always has a clear intepretation. 

However, just because an effect is *interpretable* does not necessarily mean it is *meaningful*. Whether the Type II effects have any useful meaning depends upon the other effects in the ANOVA table. We therefore have to interpret Type II effects hierarchically, from highest-order to lowest-order. Each test only makes sense if its highest-order relatives are null. So, in the 2-way example, we would only interpret the main effects of $\text{A}$ and $\text{B}$ if the interaction $\text{AB}$ were null. This makes sense because

- If the interaction was *significant* then the main effects are not meaningful and we would just interpret the interaction on its own.
- If the interaction was *non-significant*, the only way to interpret the main effects sensibly is in a model *without* an interaction term (i.e. a purely additive model). 

In terms of our Venn diagram intuition, the Type II tests in the 2-way ANOVA are shown below.

```{figure} images/venn-diagrams/SS-II.png
---
scale: 50%
align: center
---
```

Here we can see that $\text{A}$ and $\text{B}$ are tested controlling for each other, but assuming that $\text{AB} = 0$. This makes both $\text{A}$ and $\text{B}$ interpretable, but only *meaningful* if the additive model fits. If it does, we ignore $\text{AB}$. If it does not, we ignore $\text{A}$ and $\text{B}$ and only interpret $\text{AB}$.

## Type III Sums-of-squares
Finally, moving on to Type III, these sums-of-squares *ignore* marginality. In the presence of an interaction, the Type III tests of main effects do not assume that that interaction is 0 (as Type II does). Instead, these tests retain the interaction in the model and make a comparison of the form

$$
\begin{alignat*}{1}
    \mathcal{M}_{0} &: y_{ijk} = \mu + \beta_{k} + (\alpha\beta)_{jk} + \epsilon_{ijk} \\
    \mathcal{M}_{1} &: y_{ijk} = \mu + \alpha_{j} + \beta_{k} + (\alpha\beta)_{jk} + \epsilon_{ijk} \\
\end{alignat*}
$$

In other words, the Type III logic implies testing a model that contains an interaction, without one of the associated main effects. In this way, the effect of $\text{A}$ would be conducted after correcting for both $\text{B}$ *and* $\text{AB}$. In terms of the $R$ notation, this would be

$$
\begin{alignat*}{1}
    SS_{\text{A}}  &: R(\alpha|\mu, \beta, (\alpha\beta)) \\
    SS_{\text{B}}  &: R(\beta|\mu, \alpha, (\alpha\beta)) \\
    SS_{\text{AB}} &: R((\alpha\beta)|\mu, \alpha, \beta)
\end{alignat*}
$$

We can see that the test of the interaction agrees with both Type I and Type II, but the main effects do not. Now, as we argued above, a comparison such as $R(\alpha|\mu, \beta, (\alpha\beta))$ is not particularly meaningful because the model will simply absorb the main effects into the interaction term. This makes the interaction parameters difficult to interpret, but it keeps the model fit *identical*. In fact, you cannot create the Type III tests from model comparisons alone. If you tried, the model fit would be the same and you would have nothing. This is already a hint that Type III tests are not asking a sensible question. In fact, Type III tests are generated using an entirely different approach to model comparisons which, as we will see further below, requires further technical changes to our model in order to generate them correctly. 

Even beyond the technical elements, conceptually the model comparison that Type III tests imply is wholly *meaningless*, as it suggests a situation where there is 0 difference between the levels of a factor ($\alpha_{j} = 0$), but also that this difference changes depending upon another factor ($(\alpha\beta)_{jk} \neq 0$). To bring back the example from last week, if the effectiveness of a treatment depends upon diagnosis, it makes little sense to calculate the effect of treatment *averaging over* diagnosis. It is like someone asking you whether the treament works. Quite rightly, you say it depends upon diagnosis. So, which diagnosis are we talking about? They then just look at you blankly and ask again whether the treatment works. In this situation, the Type II tests would say "ok, if we set the interaction to 0, this is what we get. However, the interaction is not 0 and an additive model does not fit well here". The Type III tests would say, "ok, well let us correct the main effects for the interaction effect and interpret what is left over". This can be severly misleading. In this situation we could easily conclude "the treatement does not work". Except that the treatment *does* work, but only if you give it to the patients and not the healthy controls. 

In terms of the Venn diagram intuition, we can see below that the Type III main effects have the overlap with the interaction term removed. The problem is that this remaining chunk is *uninterpretable*. Yes, a number can be calculated, but conceptually it is problematic. This is most easily seen when we have a significant interaction and a non-significant main effect. The Type III claim is that these are both interpretable, but clearly this situation creates a contradiction. A treatment cannot simultaneously work and not work. We cannot conclude "When taking diagnosis into account, the treatment is effective. However, ignoring diagnosis the treatment does nothing."

```{figure} images/venn-diagrams/SS-III.png
---
scale: 50%
align: center
---
```

Just to hammer this point home, consider the following main effect questions that Type III effects would pose:

- What is the average effect of brakes on stopping a car, adjusting for whether the car is moving?
- What is the average effect of an umbrella on keeping you dry, adjusting for whether it is raining?
- What is the average effect of an antibiotic for treatment, adjusting for whether there is an infection present?
- What is the average effect of sunscreen on preventing sunburn, adjusting for whether you are indoors or outside?

Hopefully it is clear in all these cases that the context provided by the interaction is *essential* for interpretation and that simply adjusting that context away leaves you with something entirely meaningless.

## Sums-of-Squares in `R`
We will now turn to how to calculate these various types of sums-of-squares in `R`. In general, both Type I and Type III are of little practical use. As such, we would recommend just using `Anova()` for everything, which will default to Type II. However, there may be times where you wish to generate the other types (if only to placate colleagues who are more used to SPSS), so we will see how to do this below.

By way of an example, we will retun the the $2 \times 3$ ANOVA example from `mtcars`, which we regenerate using the code in the drop-down below.

In [1]:
data(mtcars)

# Origin factor
mtcars$origin <- c('Japan','Japan','USA','USA','USA','USA','USA','Europe','Europe',
                   'Europe','Europe','Europe','Europe','Europe','USA','USA','USA',
                   'Europe','Japan','Japan','Japan','USA','USA','USA','USA',
                   'Europe','Europe','Europe','USA','Europe','Europe','Europe')
mtcars$origin <- as.factor(mtcars$origin)

# VS factor
vs.lab <- rep("",length(mtcars$vs)) 
vs.lab[mtcars$vs == 0] <- "V-shaped"
vs.lab[mtcars$vs == 1] <- "Straight"
mtcars$vs <- as.factor(vs.lab)

We can examine the degree of imbalance in these data using the `tables()` function to generate a contingency table of the factors

In [2]:
with(mtcars, table(origin,vs))

        vs
origin   Straight V-shaped
  Europe        8        6
  Japan         3        2
  USA           3       10

So we can see that this is *severely* unbalanced. We fit the model below and will then examine the different types of sums-of-squares.

In [3]:
mpg.mod <- lm(mpg ~ origin + vs + origin:vs, data=mtcars)

### Type I and Type II in `R`
Type I and II sums-of-sqaures are simple to produce. For Type I, we can use the built-in `anova()` function.

In [4]:
print(anova(mpg.mod))

Analysis of Variance Table

Response: mpg
          Df Sum Sq Mean Sq F value    Pr(>F)    
origin     2 393.88 196.938 11.4370 0.0002733 ***
vs         1 282.26 282.261 16.3921 0.0004117 ***
origin:vs  2   2.21   1.104  0.0641 0.9380730    
Residuals 26 447.70  17.219                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


These sums-of-squares can be reproduced using model comparisons, based on sequentially adding terms to each model. In the code below, we replicate the values in the `Sum Sq` column given above. This is just to illustrate the interpretation of the Type I effects.

In [5]:
mod.0     <- lm(mpg ~ 1,      data=mtcars)
mod.1     <- lm(mpg ~ origin, data=mtcars)
SS.origin <- sum(resid(mod.0)^2) - sum(resid(mod.1)^2)

mod.0     <- lm(mpg ~ origin,      data=mtcars)
mod.1     <- lm(mpg ~ origin + vs, data=mtcars)
SS.vs     <- sum(resid(mod.0)^2) - sum(resid(mod.1)^2)

mod.0     <- lm(mpg ~ origin + vs,             data=mtcars)
mod.1     <- lm(mpg ~ origin + vs + origin:vs, data=mtcars)
SS.inter  <- sum(resid(mod.0)^2) - sum(resid(mod.1)^2)

print(SS.origin)
print(SS.vs)
print(SS.inter)

[1] 393.8751
[1] 282.2613
[1] 2.207004


Note that we cannot recreate the Type I test using comparisons within the `anova()` function, as only the sums-of-squares will agree. All these tests use the error term from the model containing *all* the factors. So we can recreate the *numerators* of the $F$-statistics, but the denominators will be different in all but the final comparison.

For Type II, we can use `Anova()` from `car` without any options, as the Type II tests are the default. We can also specify `type='II'`, if we want to be explicit about it.

In [6]:
library(car)
print(Anova(mpg.mod)) # or Anova(mpg.mod, type="II")

Loading required package: carData



Anova Table (Type II tests)

Response: mpg
          Sum Sq Df F value    Pr(>F)    
origin    179.61  2  5.2153 0.0124621 *  
vs        282.26  1 16.3921 0.0004117 ***
origin:vs   2.21  2  0.0641 0.9380730    
Residuals 447.70 26                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Again, we can recreate the `Sum sq` column manually, just to make the logic clear.

In [7]:
mod.0     <- lm(mpg ~ vs,          data=mtcars)
mod.1     <- lm(mpg ~ origin + vs, data=mtcars)
SS.origin <- sum(resid(mod.0)^2) - sum(resid(mod.1)^2)

mod.0     <- lm(mpg ~ origin,      data=mtcars)
mod.1     <- lm(mpg ~ origin + vs, data=mtcars)
SS.vs     <- sum(resid(mod.0)^2) - sum(resid(mod.1)^2)

mod.0     <- lm(mpg ~ origin + vs,             data=mtcars)
mod.1     <- lm(mpg ~ origin + vs + origin:vs, data=mtcars)
SS.inter  <- sum(resid(mod.0)^2) - sum(resid(mod.1)^2)

print(SS.origin)
print(SS.vs)
print(SS.inter)

[1] 179.6085
[1] 282.2613
[1] 2.207004


As expected, both `origin:vs` and `vs` are the same as the Type I tests, but `origin` is different. Refer back to the Venn diagrams to see why this is the case.


### Type III in `R`
For Type III, things get trickier. Recall from above that a Type III main effect is making the implicit comparison

$$
\begin{alignat*}{1}
    \mathcal{M}_{0} &: y_{ijk} = \mu + \beta_{k} + (\alpha\beta)_{jk} + \epsilon_{ijk} \\
    \mathcal{M}_{1} &: y_{ijk} = \mu + \alpha_{j} + \beta_{k} + (\alpha\beta)_{jk} + \epsilon_{ijk} \\
\end{alignat*}
$$

So you might think we can compare `y ~ A + B + A:B` to `y ~ B + A:B`. Unfortunately, this will not work. ... In fact, not actual model comparison exists that will generate the Type III test for us because the models will not behave in the way the Type III tests require. This in and of itself should indicate why Type III is not doing anything sensible. The approach that `Anova()` uses is based on manipulating the model parameters to create the comparison that the Type III tests imply. However, in order to do this correctly, we need to use a *very specific dummy coding scheme*. ...

We have two options here. The first is to change the dummy coding *globally* using 

In [8]:
options(contrasts=c('contr.sum','contr.poly'))

The disadvantage of doing this is that this will change the interpretation of the parameters in all the model you use after setting this, unless you remember to set it back to normal afterwards. This means you always need to specify models in the following way

In [9]:
library(car)

options(contrasts=c('contr.sum','contr.poly'))              # set sum-to-zero coding
mod.sum <- lm(mpg ~ vs + origin + vs:origin, data=mtcars)   # fit model
print(Anova(mod.sum, type='III'))                           # Type III ANOVA table
options(contrasts=c('contr.treatment','contr.poly'))        # puting the coding back to default

Anova Table (Type III tests)

Response: mpg
             Sum Sq Df  F value    Pr(>F)    
(Intercept) 10488.5  1 609.1096 < 2.2e-16 ***
vs            251.9  1  14.6285 0.0007372 ***
origin        166.5  2   4.8354 0.0163901 *  
vs:origin       2.2  2   0.0641 0.9380730    
Residuals     447.7 26                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


The better approach, though the more messy in terms of syntax, is to tell `lm()` how to code each variable explicitly. This can be done using the `contrasts=` argument, which takes a list of each factor alongside how we want them coded. For this example, we will therefore use

In [10]:
mod.sum <- lm(mpg ~ vs + origin + vs:origin, data=mtcars, 
              contrasts=list(vs=contr.sum, origin=contr.sum)) # fit model w/specific coding
print(Anova(mod.sum, type='III'))                             # Type III ANOVA table

Anova Table (Type III tests)

Response: mpg
             Sum Sq Df  F value    Pr(>F)    
(Intercept) 10488.5  1 609.1096 < 2.2e-16 ***
vs            251.9  1  14.6285 0.0007372 ***
origin        166.5  2   4.8354 0.0163901 *  
vs:origin       2.2  2   0.0641 0.9380730    
Residuals     447.7 26                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


This additional hassle and the dependence of the Type III tests on something as arbitrary as the model coding should be enough to dissuade you from this approach. Indeed, this very dependence is the reason why `Anova()` will *not* refit the model for you and automatically change the coding. [John Fox](https://uk.sagepub.com/en-gb/eur/author/john-david-fox) (the author of `car`) wants this dependence to be clear. Type I and Type II effects do not change with the dummy variable scheme. Type III effects *do* change. This fact needs to be *understood*, not *hidden*.

All that being said, if you want an easier method, there is the `ezANOVA()` function from the `ez` package. This will take care of all the coding mess for you behind the scenes. However, there are some distinct disadvantages here:

- `ezANOVA` aims to create output that mimics SPSS. This is not done within the linear models framework, meaning there is not access to residuals, diagnostic plots, parameter estimates or any of the useful output we want. You get an ANOVA table and nothing else[^ez-foot].
- By abstracting away the difficulties of Type III tests, `ezANOVA` gives the impression of simplicity and does not engage you with any of the controversy. It prints a generic warning under imbalance, but nothing else.
- Fundammentally, this package aims to re-express linear models in the language of Psychology and then *hide* information from you. This is is rarely a good thing. If this is what someone needs in order to use `R`, it is arguable that they should not be using it at all.

Nevertheless, if you want the simplest possible method of generating a Type III table, you can do the following

In [11]:
library(ez)

mtcars$idx <- as.factor(seq(from=1, to=dim(mtcars)[1])) # ezANOVA needs subject IDs
ezAOV      <- ezANOVA(data=mtcars,          # data
                      dv=mpg,               # outcome
                      between=.(vs,origin), # between-subject factors  
                      wid=idx,              # subject IDs
                      type=3)               # sums-of-squares

print(ezAOV$ANOVA)

“Data is unbalanced (unequal N per group). Make sure you specified a well-considered value for the type argument to ezANOVA().”
Coefficient covariances computed by hccm()



     Effect DFn DFd           F            p p<.05         ges
2        vs   1  26 14.62853431 0.0007372137     * 0.360055674
3    origin   2  26  4.83541867 0.0163900610     * 0.271113270
4 vs:origin   2  26  0.06408491 0.9380730371       0.004905426


## The Sums-of-squares Circus
... The truth is that the main reason all this hassle exists is because the neat partition of the ANOVA effects disappears when the data are imbalanced. In order to resolve this, we have to choose a method of partitioning the sums-of-squares. The definitions given above come directly from SAS, who's aim was not some principled statistical derivation that makes sense, rather it was to give their users what they wanted: identical ANOVA output irrespective of balance. Because the traditional ANOVA was not seen as an exercise in model building, it was not typical to remove terms that appeared redundant. In order to maintain this completeness, SAS wanted ANOVA tables that contained *all* terms, rather than certain terms disappearing under imbalance. As such, different methods for decomposing these effects were developed and a choice was provided. 

From a modern perspective, all this hassle is unnecessary if we engage with the process of *model building*. This is something we will discuss in much greater detail in the machine learning module next semester. However, the idea is very simple. If a term adds little predictive utility, remove it and create the simplest model you can. From this perspective, if the highest-order interaction is *small* it would be removed and then the lower-order terms become interpretable again. No need for Type II tests to make them intepretable *despite* the presence of the interaction term. However, if an interaction is *large*, it stays in the model and we only interpret the highest-order term for each factor. Under this scheme, the whole Type I/II/III debate disappears. 

As an example, say we have the model 

$$
Y = A + B + C + AB + AC + BC + ABC.
$$

If the 3-way interaction is uninteresting, we can drop it to form

$$
Y = A + B + C + AB + AC + BC.
$$

Now, say that $AC$ and $BC$ are also uninteresting, we can settle on

$$
Y = A + B + C + AB.
$$

We would now interpret the 2-way interaction $AB$ and the main effect $C$. Because we have respected marginality here when building these models, all these terms have interpretable effects

`````{topic} What do you now know?
In this section, we have explored ... After reading this section, you should have a good sense of:

- ...
- ...
- ...

`````

[^default-foot]: Always be wary of defaults. If there is one way of getting an entire scientific field to adhere to a particular way of doing something without the need for any critical evaluation, simply make it the default in software. Defaults do not automatically hold some higher-level of credibility simply because they were the value that the developer picked. Many times these are well-considered, but this is not a *guarantee*. We can easily be led astray by default choices because we do not have to justify using them. This does not have an official name, but we could perhaps call it *the default authority effect*. It is effectively a reversal of the burden of proof: deviating from defaults requires defence, whereas using defaults is treated as neutral. Yet this presupposes that the defaults are normatively sound, which is rarely demonstrated or even documented.

[^modterms-foot]: At least in terms of the sums-of-squares and mean-squares. The $F$-statistic and $p$-value depend on the error sums-of-squares, which will change depending upon the other terms in the model.

[^ez-foot]: A column of effect sizes will also be given (labelled `ges`). We will discuss the nature of omnibus effect sizes later in this lesson.