# Statistical Data Management Session 11: ANOVA


## Exercise 1: Animal-assisted therapy for heart patients *(exercise 10.38 and 10.58 from the book)*

In *the American Heart Association Conference* (Nov. 2005) study to gauge whether animal-assisted therapy can improve the physiological responses of heart failure patients, 76 heart patients were randomly assigned to one of three groups. Each patient in group T was visited by a human volunteer accompanied by a trained dog, each participant in group V was visited by a volunteer only, and the patients in group C were not visited at all. The anxiety level of each patient was measured (in points) both before and after the visits. The accompanying table gives summary statistics for the drop in anxiety level of the three groups of patients were compared with the use of analysis of variance. Although an ANOVA table was not provided in the article, sufficient information is given to reconstruct it.

| $\qquad$ | Sample Size | Mean Drop | Stand. Dev. |
|:---| :---:|:---:|:---:|
|Group T: Volunteer + Trained dog|26|10.5|7.6|
|Group V: Volunteer only |25|3.9|7.5|
|Group C: Control group (no visit)|25|1.4|7.5|

  
1. Compute the sum of squares for treatments (SST).

    $SST = \sum_{i=1}^k n_i(\bar{x_i}-\bar{x})^2$.

    $\bar{x} = \frac{26\bar{x_1} + 25 \bar{x_2} + 25 \bar{x_3}}{76} = 5.34$.
    
    $SST = 1132.194$.


2. Compute the sum of squares for error (SSE).

    $SSE = \sum_{j=1}^{n_1}(x_{1,j}-\bar{x_1})^2 + \cdots + \sum_{j=1}^{n_k}(x_{k,j}-\bar{x_2})^2$.

    $SSE = (n_1-1)s_1^2 + \cdots + (n_k-1)s^2_k = 25\cdot 7.6^2 + 24\cdot 7.5^2 + 24\cdot 7.5^2 = 4144$.

3. Use the results from part 1. and 2. to construct the ANOVA table

| Source | $\qquad$ df $\qquad$ | $\qquad SS \qquad$ | $\qquad \qquad MS \qquad \qquad$ | $\qquad \qquad F \qquad \qquad$ |
|---:| :---:|:---:|:---:|:---:|
|Treatments| $k$-$1$  |$SST$|$MST=\frac{SST}{k-1}$|$\frac{MST}{MSE}$|
|Error| $n$-$k$ |$SSE$|$MSE=\frac{SSE}{n-k}$| |


| Source | $\qquad$ df $\qquad$ | $\qquad$ SS $\qquad$ | $\qquad $ MS $ \qquad$ | $\qquad$ F $ \qquad$ |
|:---:| :---:|:---:|:---:|:---:|
|Treatments| 2  |1132.194|566.1|9.97|
|Error| 73 |4144|56.8| $\qquad$|

 
4. Is there sufficient evidence (at $\alpha = 0.01$) of differences among the mean drops in anxiety levels by the patients in the three groups? Use Python to calculate the critical $F$ value.


    $\alpha = 0.01, dfn = 2, dfd = 73$

    Critical $F$-value (calculated below) $\approx 4.9$
    
    $H_0: \mu_C = \mu_V = \mu_T$
    
    $H_a:$ at least two differ.
    
    $F=9.97 > 4.9 = F_{0.01}$
    
    We reject the null hypothesis, at $\alpha=0.01$, there is sufficient evidence to conclude that the difference in treatments is significant.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats as sts
import pandas as pd
%matplotlib inline

k = 3
n = 76
xbar = (26*10.5 + 25*3.9 + 25*1.4)/n
print('xbar: ', xbar)

SST = 26*(10.5-xbar)**2 + 25*(3.9-xbar)**2 + 25*(1.4-xbar)**2
print('SST: ', SST)

SSE = 25 * 7.6**2 + 24* 7.5**2 + 24* 7.5**2
print('SSE: ', SSE)

MST = SST / (k-1)
print('MST: ', MST)

MSE = SSE / (n-k)
print('MSE: ', MSE)

F = MST / MSE
print('F: ',F)

F_distr = sts.f(k-1, n-k)
interval = np.linspace(0, 7, 1000)
plt.plot(interval, F_distr.pdf(interval))
plt.show()
plt.close()

F_critical = F_distr.ppf(1-0.01)
print('Threshold F: ', F_critical)

5. Comment on the validity of the ANOVA assumption. How might this affect the results of the study?

    There are three assumptions:

* Samples were randomly and independently selected from the three treatment populations.
    
    This was accomplished here by randomly assigning the heart patients to one of the three groups.
        
* All three treatment populations have approximately normal distributions.
    
    This cannot be checked here, but: ANOVA is robust when this is not exactly satisfied.
        
* The three treatment populations have the same (population) variance.
    
    Looking at the sample variances, we agree. When in doubt, one could preform a pair-wise F-test to compare variances. Careful: ANOVA is *not* robust to violations of this condition.

6. If you found evidence of a difference among the treatment means, then conduct a post-hoc-analysis. Conduct a Bonferroni analysis to establish confidence intervals of the treatment mean differences and rank the three treatment means. Use an experimentwise error rate of $\alpha = 0.03$. Interpret the results. Use Python to calculate the relevant $t$ values.

    $\mu_T-\mu_V \in [\bar{x_T}-\bar{x_V} \pm t_{\frac{\alpha}{2c}} s_p \sqrt{\frac{1}{n_T} + \frac{1}{n_V}}]$, with
    
    $s_p^2$ (pooled variance)$ = MSE \Rightarrow s_p = 7.53$
    
    $ t_{\frac{\alpha}{2c}}$?
    
    * $\alpha = 0.03$
    * $c = {3\choose2} = 3$ 
    
    $t_{0.005} = 2.64$ (t-value with $n-k = 73$ degrees of freedom).
  
    $\mu_T-\mu_V \in [1.02, 12.18]$.
  
    Likewise,
  
    $\mu_T-\mu_C \in [3.52, 14.68]$ and
  
    $\mu_V-\mu_C \in [-3.14, 8.14]$.
  
    The latter contains $0$, and doesn't indicate a significant difference. So C and V are not significantly different, C and T, and V and T are significantly different and support $H_a$.

In [None]:
alpha = 0.03
c = k*(k-1)/2
sp = np.sqrt(MSE)
print('pooled standard deviation: ', sp)
t = sts.t(n-k)
t_val = t.ppf(1-alpha/(2*c))
print('Bonferroni multiple comparison critical value: ', t_val)

# T-V
diff = 10.5 - 3.9
print('difference T-V', diff)
SE = t_val * sp * np.sqrt(1/26+1/25)
print('SE T-V: ', SE)
print('confidence interval: ', [diff - SE, diff + SE])

# T-C
diff = 10.5 - 1.4
print('difference T-C', diff)
SE = t_val * sp * np.sqrt(1/26+1/25)
print('SE T-C: ', SE)
print('confidence interval: ', [diff - SE, diff + SE])

# V-C
diff = 3.9 - 1.4
print('difference V-C', diff)
SE = t_val * sp * np.sqrt(1/25+1/25)
print('SE V-C: ', SE)
print('confidence interval: ', [diff - SE, diff + SE])

## Exercise 2: Does the moment of an exam influence the results?

A university wants to test whether the moment of the day when an exam takes place has an influence on the test results. They assign students at random to one of three groups: an exam in the morning (M), early afternoon (E) or late afternoon (L). Their results are below. Assuming the underlying distribution of exam scores is normal and the variances in the subgroups the same, conduct ANOVA to determine whether or not there is a significant (use $\alpha = 0.05$) difference between the groups.

1. There is a short way (one line!) to do this in Python. Formulate $H_0$ and $H_a$ and draw the appropriate conclusion.
2. As a challenge: reconstruct this function yourself. To work as general as possible: pass a list of dataframes. This is beyond the scope of the exam material.

In [None]:
M = pd.DataFrame([13, 10, 9, 8, 6, 13, 9, 9, 11, 9, 9, 13, 11, 10, 10, 11, 8, 6, 9, 10, 9, 8, 9, 11, 8, 11, 11, 9, 10, 7, 11, 9, 10, 10, 10, 11, 12, 14, 8, 9, 14, 8, 11, 6, 8, 10, 12, 12, 9, 9, 11, 13, 10, 11, 12, 8, 10, 12, 13, 12, 11, 8, 12, 8, 9, 8, 11, 10, 10, 12, 7, 13, 11, 7, 10, 11, 14, 11, 7, 10])
E = pd.DataFrame([13, 9, 12, 14, 12, 9, 11, 12, 13, 10, 12, 9, 12, 8, 13, 9, 11, 10, 7, 8, 7, 11, 8, 11, 5, 12, 12, 7, 11, 8, 10, 10, 8, 9, 8, 13, 8, 10, 11, 10, 11, 12, 14, 9, 7, 7, 8, 7, 4, 11, 9, 7, 8, 8, 11, 7, 9, 10, 11, 9, 10, 11, 9, 11, 13, 8, 15, 6, 5, 14, 9, 12, 9, 12, 8, 12, 13, 8, 5, 8])
L = pd.DataFrame([9, 11, 8, 8, 9, 12, 12, 7, 9, 4, 5, 9, 12, 10, 9, 5, 8, 7, 10, 9, 7, 10, 14, 10, 9, 8, 9, 8, 10, 16, 10, 10, 10, 11, 13, 13, 7, 10, 12, 8, 8, 7, 13, 9, 12, 13, 9, 10, 8, 12, 11, 7, 6, 9, 6, 12, 4, 7, 11, 6, 8, 12, 14, 10, 12, 11, 6, 8, 11, 8, 11, 7, 13, 7, 12, 8, 9, 10, 12, 10])

F, p = sts.f_oneway(M, E, L)
print("F-value:", F)
print("p-value:", p)

def my_f_oneway(dataframes):
    means = []
    sizes = []
    variances = []
    for dataframe in dataframes:
        means += [dataframe.mean()]
        sizes += [len(dataframe)]
        variances += [dataframe.var()]
    n = sum(sizes)
    k = len(dataframes)
    
    global_mean = 0
    for i in range(k):
        global_mean += sizes[i] * means[i]
    global_mean = global_mean / n

    SST = 0
    SSE = 0
    for i in range(k):
        SST += sizes[i] * (means[i] - global_mean)**2
        SSE += (sizes[i]-1) * variances[i]
    MST = SST/(k-1)
    MSE = SSE/(n-k)
    F = MST/MSE
    p = 1 - sts.f(k-1,n-1).cdf(F)

    return F,p
    
    

F, p = my_f_oneway([M,E,L])
print(F,p)



    


The $p$-value is the probability that, if $H_0$ is true, we were to find a result like this (or more extreme). As the $p$-value is larger than the threshold $\alpha=0.05$, we cannot reject $H_0$. We conclude that there is insufficient evidence to conclude that the time of the exam leads to significantly different exam results.