# Unbalanced ANOVA Models

```{figure} images/unbalanced-text.webp
---
scale: 80%
align: right
---
```

... Indeed, whole textbooks were written about unbalanced data (as can be seen on the *right*). So this is a topic that deserves some attention, even if it is largely *ignored* by modern teaching in Psychology. There is something of an assumption that the issues of balance have been *solved* and thus do not need considering anymore. However, this is not really true. The "solution" implemented by SAS and SPSS is the Type III sums-of-squares, which researchers continue to use because it is the default[^default-foot]. However, as discussed briefly last week, this approach is highly flawed.

In this part of the lesson, we will dig deeper into the Type I/II/III debate so that you understand what each type of sums-of-squares means, when they are most appropriate to use and what the various arguments are for/against them. In general, we will be recommending Type II for 95% of all use-cases. However, it is important not to just take our word for it. Instead, it is important that you *understand* the difference and can make your own informed judgement.

## The Problem of Imbalance
The arithmetic behind the traditional ANOVA relates to a simple decomposition of the sums-of-squares. When there are an equal number of data points in each cell (using a 2-way ANOVA as an example), we simply have

$$
SS_{\text{A}} + SS_{\text{B}} + SS_{\text{AB}} = SS_{\text{Model}}.
$$

So, the total amount of variance explained by the model can be neatly decomposed into several chunks. These decompositions are said to be *orthogonal*, which you can take to mean *independant*. The value of each sum-of-squares is not affected by any of the others and they represent a neat and simple partition of the amount explained by the model. Together, we then have

$$
SS_{\text{Total}} = SS_{\text{Model}} + SS_{\text{Error}}.
$$

Unfortunately, when there is an *unequal* number of data points across cells, application of the standard ANOVA equations results in

$$
SS_{\text{A}} + SS_{\text{B}} + SS_{\text{AB}} > SS_{\text{Model}}.
$$

Adding these decompositions together is now *not* the same as the amount of variance explained by the model. What happens is that the effects "bleed" into each other. They no longer represent an independent partition of the variance. A lack of balance kills the symmetry that allows the ANOVA to neatly decompose the variance. What this means practically is that each effect now contains some element of the other effects and adding them together means we double-count some chunks of variance. This leads to a larger $SS_{\text{Model}}$ than there actually is. 

What does this mean in terms of using an ANOVA model? It means that each sums-of-squares we calculate is influenced *by the other terms in the model*. This means we have several options when we decompose the sums-of-squares in terms of what else is in the model at the time. Each chunk that gets calculated will represent the variance associated with a given effect *minus* the overlap with anything else in the model. Unfortunately, wherever there is choice, there is also disagreement. For an unbalance ANOVA, this disagreement surrounds 3 possible ways of decomposing the sums-of-squares in an unbalanced ANOVA model. These are known as Type I, Type II and Type III sums-of-squares and will be the focus of this part of the lesson.

### Venn Diagram Intuition
Perhaps the simplest way to gain intuition about what happens in an unbalanced ANOVA is to return to the Venn diagram visualisation we saw previously in multiple regression. Here, each circle represents the sum-of-squares associated with each main effect A and B, along with their interaction AB.

When the ANOVA is *balanced*, the situation is as shown below

```{figure} images/venn-diagrams/orthog-ANOVA.png
---
scale: 55%
align: center
---
```

Here, there is no overlap between the circles. Each effect is completely independant and it does not matter what else is in the model at the point where we calculate its sum-of-squares. We could entirely remove B and AB when calculating $SS_{\text{A}}$ and it would not make any difference. The model therefore does not matter, as the sum-of-squares will be the same.

When the ANOVA is *unbalanced*, the situation is as shown below

```{figure} images/venn-diagrams/unbalanced-ANOVA.png
---
scale: 55%
align: center
---
```

Here, the effects now *overlap*. This means there is some element of both $B$ and $AB$ inside the sum-of-squares for $A$. We can now see why the sum of these terms is too big. If we count the $A$ circle, $B$ circle and $AB$ circle we will double-count the areas of overlap and the sum will be too big. This is why $SS_{\text{A}} + SS_{\text{B}} + SS_{\text{AB}} > SS_{\text{Model}}$. Furthermore, we now have several options when it comes to decomposing the value of the sums-of-sqaures. 

Using $SS_{\text{A}}$ as an examples, we could calculate the sums-of-squares with nothing else in the model, or we could calculate it with $SS_{\text{B}}$ in the model (but no interaction), or we could calculate it with both $SS_{\text{B}}$ and $SS_{\text{AB}}$ in the model. These options are all illustrated below and correspond to the Type I, Type II and Type III sums-of-squares.

```{figure} images/venn-diagrams/SS-Types.png
---
scale: 55%
align: center
---
```

As we can see, the Type I method would calculate all the explanatory variance associated with factor A, taking none of the overlap into account. The Type II method would calculate the explanatory variance after taking B into account. As such, the effect is based on removing the overlap with B, but nothing else. Finally, the Type III method would calculate the explanatory variance after taking B and AB into account. So this removes *all* overlap with the other effects, only leaving the unique element of A.

## The Principle of Marginality
In trying to determine which of the sums-of-squares to choose, we can be guided by the idea of building *meaningful* models. This is encapsulated by the *principle of marginality*, which was laid out by [Nelder (1977)](https://www.jstor.org/stable/2344517) as a response to his dissatisfaction in the way that linear models were being applied in statistics.

## Type I Sums-of-squares
... In terms of their adherence to marginality, this depends entirely upon the order in which the terms enter the model.

## Type II Sums-of-squares
... In terms of their adherence to marginality, the Type II sums-of-squares are the *only* method of the 3 that does respect marginality.

## Type III Sums-of-squares
... In terms of their adherence to marginality, the Type III sums-of-squares *ignore* marginality. Instead choosing to compare models of the form

$$
\begin{alignat*}{1}
    \mathcal{M}_{0} &: y_{ijk} = \mu + \beta_{k} + (\alpha\beta)_{jk} + \epsilon_{ijk} \\
    \mathcal{M}_{1} &: y_{ijk} = \mu + \alpha_{j} + \beta_{k} + (\alpha\beta)_{jk} + \epsilon_{ijk} \\
\end{alignat*}
$$

In other words, by considering a model that contains an interaction, without one of the associated main effects. This is arguably a wholly *meaningless* model, where we simultaneously suggest that there is 0 difference between the levels of a factor, but also that this difference changes depending upon another factor. What this model comparison ends up calculating is the effect of A after removing the effect of B *and* AB. So a main effect with the interaction *removed*. So what exactly does this mean? If we pretend that the effect of A does not depend upon B, what would the effect of A be? If there is a meaningful interaction, this is then calculating a fantasy. To bring back the example from last week, if the effectiveness of a treatment depends upon diagnosis, if makes little sense to calculate the effect of treatment *ignoring* diagnosis. It is like someone asking you whether the treament works, and you ask them what the diagnosis is and they refuse to answer and just ask you again whether the treatment works. From this perspective, Type III sums-of-squares are of little use to us, despite being the default in many statistical packages.

## Sums-of-Squares in `R`

## Resolving the Sums-of-squares Circus
... The truth is that the main reason all this hassle exists is because the neat partition of the ANOVA effects disappears when the data are imbalanced. In order to resolve this, we have to choose a method of partitioning the sums-of-squares. The definitions given above come directly from SAS, who's aim was not some principled statistical derivation that makes sense, rather it was to give their users what they wanted: identical ANOVA output irrespective of balance. Because the traditional ANOVA was not seen as an exercise in model building, it was not typical to remove terms that appeared redundant. In order to maintain this completeness, SAS wanted ANOVA tables that contained *all* terms, rather than certain terms disappearing under imbalance. As such, different methods for decomposing these effects were developed and a choice was provided. 

From a modern perspective, all this hassle is unnecessary if we engage with the process of *model building*. This is something we will discuss in much greater detail in the machine learning module next semester. However, the idea is very simple. If a term adds little predictive utility, remove it and create the simplest model you can. From this perspective, if the highest-order interaction is *small* it would be removed and then the lower-order terms become interpretable again. No need for Type II tests to make them intepretable *despite* the presence of the interaction term. However, if an interaction is *large*, it stays in the model and we only interpret the highest-order term for each factor. Under this scheme, the whole Type I/II/III debate disappears. 

As an example, say we have the model 

$$
Y = A + B + C + AB + AC + BC + ABC.
$$

If the 3-way interaction is uninteresting, we can drop it to form

$$
Y = A + B + C + AB + AC + BC.
$$

Now, say that $AC$ and $BC$ are also uninteresting, we can settle on

$$
Y = A + B + C + AB.
$$

We would now interpret the 2-way interaction $AB$ and the main effect $C$. Because we have respected marginality here when building these models, all these terms have interpretable effects

In [8]:
library('datarium')

print(with(headache, table(risk,treatment)))

      treatment
risk    X  Y  Z
  high 12 12 12
  low  12 12 12


`````{topic} What do you now know?
In this section, we have explored ... After reading this section, you should have a good sense of:

- ...
- ...
- ...

`````

[^default-foot]: Always be wary of defaults. If there is one way of getting an entire scientific field to adhere to a particular way of doing something without the need for any critical evaluation, simply make it the default in software. Defaults do not automatically hold some higher-level of credibility simply because they were the value that the developer picked. Many times these are well-considered, but this is not a *guarantee*. We can easily be led astray by default choices because we do not have to justify using them. This does not have an official name, but we could perhaps call it *the default authority effect*. It is effectively a reversal of the burden of proof: deviating from defaults requires defence, whereas using defaults is treated as neutral. Yet this presupposes that the defaults are normatively sound, which is rarely demonstrated or even documented.