# Module 2 Practice 2 Answers - One-Way Repeated Measures ANOVA

In this practice exercise, you will perform a one-way Repeated Measures ANOVA on a dataset in long form.

The data is documented [here](../resources/theoph.txt).

Refer to the [documentation](https://pingouin-stats.org/generated/pingouin.rm_anova.html) for the pingouin rm_anova method for using data in a long format.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sys
!{sys.executable} -m pip install pingouin
import pingouin as pg

pd.set_option('display.max_rows', None)

In [None]:
data = pd.read_csv('../resources/theoph.csv')

data.head(10)

## State the Null and Alternative Hpotheses
The Null Hypothesis is that the means for theophylline clearance within subjects treated by theophylline given in conjunction with placebo, Pepcid, and Tagamet are equal.

The Alterntive Hypothesis is that at least one of the means differs.

## Check for missing data


In [None]:
data[data.isna().any(axis=1)]

## Check the distribution of data

In [None]:
data['clearance'].hist()

## Apply a transformation to normalize the data
Some appropriate transformations for right tailed distributions are:
1. log base 2 or log base 10 (cannot be used where data contains the value zero)
1. root transformation - square root, cube root, or higher order roots
1. inverse transformation

In [None]:
## Show some possible transformations and their effect on the distribution.  Ultimately we go with the fourth root.

data['clearance_trans'] = np.log10(data['clearance'])
data['clearance_trans'].hist()
plt.title('log base 10 transformation')
plt.show()

data['clearance_trans'] = 1/data['clearance']
data['clearance_trans'].hist()
plt.title('inverse transformation')
plt.show()

data['clearance_trans'] = np.sqrt(data['clearance'])
data['clearance_trans'].hist()
plt.title('square root transformation')
plt.show()

data['clearance_trans'] = data['clearance'].apply(lambda x: x**(1/4))
data['clearance_trans'].hist()
plt.title('fourth root transformation')
plt.show()



## Test the sphericity
Use the significance level of $\alpha$ = 0.05.
Check the [documentation](https://pingouin-stats.org/generated/pingouin.sphericity.html) for the correct invocation of the method for data in long format.  The transformed clearance data should be used as the dependent variable.

In [None]:
pg.sphericity(data, dv='clearance_trans', subject='subject', within='agent', alpha=0.05)

## Does the data pass the sphericity test?  Do you expect a correction to be applied when running the Repeated Measures ANOVA?

The data do pass the sphericity test, so we should not expect a correction to be applied when running ANOVARM.

## Perform a One-Way ANOVARM
Use the significance level of $\alpha$ = 0.05.

Check the [documentation](https://pingouin-stats.org/generated/pingouin.rm_anova.html) for the correct invocation of the method for data in long format. It should be very similar to the parameters used to test sphericity.

In [None]:
pg.rm_anova(data=data, dv='clearance_trans', subject='subject', within='agent')

## Interpret the results
Do we reject the Null Hypothesis?

The p value of 0.00001 is signifcant at $\alpha$ = 0.05.  Therefore, we reject the Null that the means for theophylline clearance within subjects treated by theophylline given in conjunction with placebo, Pepcid, and Tagamet are equal, and support that at least one of the means differs.

## Display an appropriate chart to highlight where there might be differences

In [None]:
data.boxplot(column='clearance', by='agent')

A box plot is appropriate for this type of ANOVARM.  Unlike in the lab, where we were looking at changes over time, here we are looking at changes within subjects based on three different treatments.  The box plot shows us the means, the spread, and the min and max outliers.  

Also, note that the original, non-transformed clearance values should be plotted, and not the transformed values.  The transformed values would be difficult to interpret with regards to the clearance of Theophylline.