# Statistical Data Management Session 11: ANOVA


## Exercise 1: Animal-assisted therapy for heart patients *(exercise 10.38 and 10.58 from the book)*

In *the American Heart Association Conference* (Nov. 2005) study to gauge whether animal-assisted therapy can improve the physiological responses of heart failure patients, 76 heart patients were randomly assigned to one of three groups. Each patient in group T was visited by a human volunteer accompanied by a trained dog, each participant in group V was visited by a volunteer only, and the patients in group C were not visited at all. The anxiety level of each patient was measured (in points) both before and after the visits. The accompanying table gives summary statistics for the drop in anxiety level of the three groups of patients were compared with the use of analysis of variance. Although an ANOVA table was not provided in the article, sufficient information is given to reconstruct it.

In order to do so, recall that for a sample $x_1,\dots,x_n$, the sample variance is calculated as $\frac{1}{n-1}\sum_i(x_i-\bar{x})^2$.

| $\qquad$ | Sample Size | Mean Drop | Stand. Dev. |
|:---| :---:|:---:|:---:|
|Group T: Volunteer + Trained dog|26|10.5|7.6|
|Group V: Volunteer only |25|3.9|7.5|
|Group C: Control group (no visit)|25|1.4|7.5|

  
1. Compute the sum of squares for treatments (SST).
2. Compute the sum of squares for error (SSE).
3. Use the results from part 1. and 2. to construct the ANOVA table
4. Is there sufficient evidence (at $\alpha = 0.01$) of differences among the mean drops in anxiety levels by the patients in the three groups? Use Python to calculate the critical $F$ value.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats as sts
import pandas as pd
%matplotlib inline





5. Comment on the validity of the ANOVA assumption. How might this affect the results of the study?
6. If you found evidence of a difference among the treatment means, then conduct a post-hoc-analysis. Conduct a Bonferroni analysis to establish confidence intervals of the treatment mean differences and rank the three treatment means. Use an experimentwise error rate of $\alpha = 0.03$. Interpret the results. Use Python to calculate the relevant $t$ values.

## Exercise 2: Do different EE1 coaches grade differently?

The following dataframes represent marks given by different coaches (A to F) to EE1 groups (credit to Christel Willemaerts for making the anonymised 2024 dataset available). We want to analyse whether there is a significant difference between how coaches mark groups. Assuming the underlying distribution of scores is normal and coaches grade with the same variance, conduct ANOVA to determine whether or not there is a significant (use $\alpha = 0.05$) difference between the group scores.

1. There is a short way (one line, Google this!) to do it in Python. Formulate $H_0$ and $H_a$ and draw the appropriate conclusion.
2. As a challenge: reconstruct this function yourself. To work as general as possible: pass a list of dataframes. This is beyond the scope of the exam material.

In [None]:
marks = pd.read_csv("../../shared/marks_ee1.csv", sep=";")
marks_A = marks["A"].dropna()
marks_B = marks["B"].dropna()
marks_C = marks["C"].dropna()
marks_D = marks["D"].dropna()
marks_E = marks["E"].dropna()
marks_F = marks["F"].dropna()