# Unbalanced ANOVA Models

```{figure} images/unbalanced-text.webp
---
scale: 80%
align: right
---
```

... Indeed, whole textbooks were written about unbalanced data (as can be seen on the *right*). So this is a topic that deserves some attention, even if it is largely *ignored* by modern teaching in Psychology. There is something of an assumption that the issues of balance have been *solved* and thus do not need considering anymore. However, this is not really true. The "solution" implemented by SAS and SPSS is the Type III sums-of-squares, which researchers continue to use because it is the default[^default-foot]. However, as discussed briefly last week, this approach is highly flawed.

In this part of the lesson, we will dig deeper into the Type I/II/III debate so that you understand what each type of sums-of-squares means, when they are most appropriate to use and what the various arguments are for/against them. In general, we will be recommending Type II for 95% of all use-cases. However, it is important not to just take our word for it. Instead, it is important that you *understand* the difference and can make your own informed judgement.

## The Problem of Imbalance
... Perhaps the most important thing to recognise here is that imbalance is only a problem when we insist on trying to interpret effects that *do not make sense* in the context of the model. For instance, trying to interpret a main effect in the presence of an interaction. If an interaction effect is *large* then the main effects make no sense however, when the interaction effect is *small*, it adds little to the predictive accuracy of the model and should not be there. 

The key point here is that all this hassle goes away if we just engage with the idea of *model building* and only interpret tests once we have a suitable model in place

## The Principle of Marginality

## Type I Sums-of-squares

## Type II Sums-of-squares

## Type III Sums-of-squares

## Resolving the Sums-of-squares Circus
... The truth is that the main reason all this hassle exists is because the neat partition of the ANOVA effects disappears when the data are imbalanced. In order to resolve this, we have to choose a method of partitioning the sums-of-squares. The definitions given above come directly from SAS, who's aim was not some principled statistical derivation that makes sense, rather it was to give their users what they wanted: identical ANOVA output irrespective of balance. Because the traditional ANOVA was not seen as an exercise in model building, it was not typical to remove terms that appeared redundant. In order to maintain this completeness, SAS wanted ANOVA tables that contained *all* terms, rather than certain terms disappearing under imbalance. As such, different methods for decomposing these effects were developed and a choice was provided. 

From a modern perspective, all this hassle is unnecessary if we engage with the process of *model building*. This is something we will discuss in much greater detail in the machine learning module next semester. However, the idea is very simple. If a term adds little predictive utility, remove it and create the simplest model you can. From this perspective, if the highest-order interaction is *small* it would be removed and then the lower-order terms become interpretable again. No need for Type II tests to make them intepretable *despite* the presence of the interaction term. However, if an interaction is *large*, it stays in the model and we only interpret the highest-order term for each factor. Under this scheme, the whole Type I/II/III debate disappears. 

As an example, say we have the model 

$$
Y = A + B + C + AB + AC + BC + ABC.
$$

If the 3-way interaction is uninteresting, we can drop it to form

$$
Y = A + B + C + AB + AC + BC.
$$

Now, say that $AC$ and $BC$ are also uninteresting, we can settle on

$$
Y = A + B + C + AB.
$$

We would now interpret the 2-way interaction $AB$ and the main effect $C$. Because we have respected marginality here when building these models, all these terms have interpretable effects

In [9]:
library('datarium')
library('car')
data(headache)
mod <- lm(pain_score ~ gender*risk*treatment, data=headache)
print(Anova(mod))

mod <- lm(pain_score ~ gender + risk + treatment + risk:treatment, data=headache)

mod.sum <- lm(pain_score ~ gender + risk + treatment + risk:treatment, data=headache, contrasts=list(gender=contr.sum,risk=contr.sum,treatment=contr.sum))

print(anova(mod))
print(Anova(mod))
print(Anova(mod.sum, type="III"))

Anova Table (Type II tests)

Response: pain_score
                       Sum Sq Df F value    Pr(>F)    
gender                 313.36  1 16.1957 0.0001625 ***
risk                  1793.56  1 92.6988   8.8e-14 ***
treatment              283.17  2  7.3177 0.0014328 ** 
gender:risk              2.73  1  0.1411 0.7084867    
gender:treatment       129.18  2  3.3384 0.0422001 *  
risk:treatment          27.60  2  0.7131 0.4942214    
gender:risk:treatment  286.60  2  7.4063 0.0013345 ** 
Residuals             1160.89 60                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Analysis of Variance Table

Response: pain_score
               Df  Sum Sq Mean Sq F value    Pr(>F)    
gender          1  313.36  313.36 12.8962 0.0006333 ***
risk            1 1793.56 1793.56 73.8135 2.607e-12 ***
treatment       2  283.17  141.58  5.8269 0.0047027 ** 
risk:treatment  2   27.60   13.80  0.5678 0.5695375    
Residuals      65 1579.40   24.30                      
---

`````{topic} What do you now know?
In this section, we have explored ... After reading this section, you should have a good sense of:

- ...
- ...
- ...

`````

[^default-foot]: Always be wary of defaults. If there is one way of getting an entire scientific field to adhere to a particular way of doing something without the need for any critical evaluation, simply make it the default in software. Defaults do not automatically hold some higher-level of credibility simply because they were the value that the developer picked. Many times these are well-considered, but this is not a *guarantee*. We can easily be led astray by default choices because we do not have to justify using them. This does not have an official name, but we could perhaps call it *the default authority effect*. It is effectively a reversal of the burden of proof: deviating from defaults requires defence, whereas using defaults is treated as neutral. Yet this presupposes that the defaults are normatively sound, which is rarely demonstrated or even documented.