<div style="text-align: center;"> <h3>Statistical Theory</h3>
<h5>Summative Assessment 2</h5>
<h5><u>By Romand Lansangan</u></h5>
    </div>
    
---

## Introduction
Mice are used in an experiment to test drugs that may prevent Alzheimer’s disease. Half the mice are transgenic – have been genetically modified to have Alzheimer’s disease. The other half of the mice are “wild type” – they have not been modified in any way, and are considered free of Alzheimer’s disease. The mice are assigned to treatment conditions and given one of four drugs, then tested on memory using a maze. The number of errors made in the maze is recorded for the Training Day and the Memory Day.

## Methodology
We ought to use Two-Way Anova to test for main and interactions effects in our dataset. With that being said, the hypothesis are as follows:

#### Main Effect of AD Status
**Null Hypothesis ($H_0$)**: The means of maze errors are the same for transgenic mice and wild-type mice (no effect of AD status). 
$$
\mu_{Transgenic} = \mu_{Wild}
$$

**Alternative Hypothesis ($H_1$)**: The mean number of maze errors differs between transgenic and wild-type mice.
$$
\mu_{Transgenic} \neq \mu_{Wild}
$$

#### Main Effect of Drug Treatment
**Null Hypothesis ($H_0$)**: The means of maze errors are the same across all drug treatments (no effect of drug treatment). 
$$
\mu_{Drug \ A} = \mu_{Drug \ B} = \mu_{Drug \ C} = \mu_{Drug \ D}
$$

**Alternative Hypothesis ($H_1$)**: At least one drug treatment leads to a different mean number of maze errors.
$$
H_1 : \text{At least one mean of drug category is not equal to the rest}
$$


#### Interaction Effect Between Drug Treatment and AD Status
**Null Hypothesis ($H_0$)**: There is no interaction effect between drug treatment and AD status. The effect of drug treatment on maze errors is the same regardless of AD status. 
$$
H_0 : \text{The effect of drug treatment is independent of AD status.}
$$

**Alternative Hypothesis ($H_1$)**: There is an interaction effect between drug treatment and AD status. The effect of drug treatment on maze errors depends on whether the mice are transgenic or wild type.
$$
H_1 : \text{The effect of drug treatment depends on AD status}
$$

We ought to test the null hypothesis at a 0.05 significance level. In other words, we ought to reject the null hypothesis if and only if p-value < 0.05. But it is also worth noting the choosing a 0.05 level of significance poses a risk of commiting a type I error (false positive; rejecting null hypothesis when it should be accepted) 5% of the time.

---

# Training Day Maze Errors

In [45]:
import pandas as pd
from scipy.stats import shapiro
from scipy.stats import levene
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm
from pingouin import anova
from statsmodels.stats.multicomp import pairwise_tukeyhsd
import numpy as np
from scipy.stats import t

In [19]:
df = pd.read_csv('Alzheimers Mice Data.csv')
df.head()

Unnamed: 0,AD_Status,Treatment,Training,Memory
0,1,1,12,10
1,1,1,15,12
2,1,1,13,13
3,1,1,12,10
4,1,1,14,13


## Checking for assumptions

### Assumption 1: You have one dependent variable that is measured at the continuous level (i.e., the interval or ratio level).
In our case, the dependent variable shall be the `Training` column, which is measured at the continuous level. `Training` coloumn is about the number of errors made during the training day. Technically, this is a discrete variable but we could still see it at a continous level, especially since we are dealing with mean/average.

### Assumption 2: You have two independent variables where each independent variable consists of two or more categorical, independent groups.
We have two independent and categorical variables: `AD_Status` (1: Transgenic; 2: Wild) and `Treatment` (1: Drug A, 2: Drug B, 3: Drug C, 4: Drug D).

Note that since we have two factors, with `AD_Status` have 2 categories and `Treatment` having 4 categories, we'll have a total of $2 \times 4 = 8$ cells.

In [20]:
ad_mapping = {1: 'Transgenic', 2: 'Wild'}
drug_mapping = {1: 'A', 2: 'B', 3: 'C', 4: 'D'}

df['AD_Status'] = df['AD_Status'].map(ad_mapping)
df['Treatment'] = df['Treatment'].map(drug_mapping)
df.head()

Unnamed: 0,AD_Status,Treatment,Training,Memory
0,Transgenic,A,12,10
1,Transgenic,A,15,12
2,Transgenic,A,13,13
3,Transgenic,A,12,10
4,Transgenic,A,14,13


### Assumption 3: You should have independence of observations
Since `AD_Status` and `Treatment` are two distinct groups/categories, there's no issue here.

### Assumption 4: There should be no significant outliers in any cell of the design.

We have used the IQR method to flag outliers. The IQR is computed as follows:

$$
IQR = Q_3 - Q_1
$$

Then the acceptable range for observed data shall be:
$$
(Q_1 - 1.5 \times IQR \  \ , \ \ Q_3 + 1.5 \times IQR) 
$$

Any values outside of this interval shall be flagged as outliers.

In [21]:
grouped = df.groupby(['AD_Status', 'Treatment'])

outlier_info = []

for (ad, drug), group in grouped:
    Q1 = group['Training'].quantile(0.25)
    Q3 = group['Training'].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR

    outliers = group[(group['Training'] < lower_bound) | (group['Training'] > upper_bound)]
    
    outlier_info.append({
        'AD_Status': ad,
        'Treatment': drug,
        'Q1': Q1,
        'Q3': Q3,
        'IQR': IQR,
        'Lower Bound': lower_bound,
        'Upper Bound': upper_bound,
        'Outliers': outliers['Training'].tolist()
    })

outlier_df = pd.DataFrame(outlier_info)
outlier_df

Unnamed: 0,AD_Status,Treatment,Q1,Q3,IQR,Lower Bound,Upper Bound,Outliers
0,Transgenic,A,12.0,14.0,2.0,9.0,17.0,[]
1,Transgenic,B,15.0,17.0,2.0,12.0,20.0,[]
2,Transgenic,C,14.0,16.0,2.0,11.0,19.0,[]
3,Transgenic,D,13.0,14.0,1.0,11.5,15.5,[]
4,Wild,A,14.0,17.0,3.0,9.5,21.5,[]
5,Wild,B,14.0,17.0,3.0,9.5,21.5,[]
6,Wild,C,14.0,16.0,2.0,11.0,19.0,[]
7,Wild,D,13.0,14.0,1.0,11.5,15.5,[]


If we take a look on the `Outliers` column above, it will appear that there's no flagged outlier within every cell, as assessed by the IQR method. 

A quick look the these raincloud plots will reinforce this idea..

![image.png](attachment:20511488-0d0d-47a5-84c3-cbc1d2a7344e.png)

### Assumption 5: The distribution of the dependent variable (residuals) should be approximately normally distributed in every cell of the design.
Let us now test for normality using Shapiro-Wilk test.

In [22]:
normality_results = []

for (ad, drug), group in grouped:
    stat, p_value = shapiro(group['Training'])
    normality_results.append({
        'AD_Status': ad,
        'Treatment': drug,
        'Shapiro-Wilk Statistic': stat,
        'p-value': p_value,
        'Normal Distribution': 'Yes' if p_value > 0.05 else 'No'
    })

normality_df = pd.DataFrame(normality_results)
normality_df

Unnamed: 0,AD_Status,Treatment,Shapiro-Wilk Statistic,p-value,Normal Distribution
0,Transgenic,A,0.90202,0.42115,Yes
1,Transgenic,B,0.90202,0.42115,Yes
2,Transgenic,C,0.978716,0.927636,Yes
3,Transgenic,D,0.960859,0.813952,Yes
4,Wild,A,0.866836,0.253846,Yes
5,Wild,B,0.893924,0.377222,Yes
6,Wild,C,0.978716,0.927636,Yes
7,Wild,D,0.960859,0.813952,Yes


### Assumption 6: The variance of the dependent variable (residuals) should be equal in every cell of the design.
We ought to use levene's test for homogeneity because we are comparing "between groups."

In [23]:
group_values = [group['Training'].values for _, group in grouped]

statistic, p_value = levene(*group_values, center='median')

alpha = 0.05
if p_value > alpha:
    result = "Fail to reject the null hypothesis: Variances are equal across groups."
else:
    result = "Reject the null hypothesis: Variances are not equal across groups."

print(f"Levene's Test Statistic: {statistic:.4f}")
print(f"p-value: {p_value:.4f}")
print(result)

Levene's Test Statistic: 0.4346
p-value: 0.8731
Fail to reject the null hypothesis: Variances are equal across groups.


## Two-way Anova

In [24]:
aov = anova(dv='Training', between=['AD_Status', 'Treatment'], data=df, detailed=True)
aov['p'] = aov['p-unc'].apply(lambda x: "< 0.001" if x < 0.001 else f"{x:.4f}")
SS_residual = aov.loc[aov['Source'] == 'Residual', 'SS'].values[0]
aov['partial_eta_sq'] = aov['SS'] / (aov['SS'] + SS_residual)
aov.drop(columns=['np2', 'omega_sq'], inplace=True, errors='ignore')  # Drop if they exist
aov = aov[['Source', 'SS', 'DF', 'MS', 'F', 'p', 'partial_eta_sq']]

aov

Unnamed: 0,Source,SS,DF,MS,F,p,partial_eta_sq
0,AD_Status,3.025,1,3.025,1.21608,0.2784,0.036611
1,Treatment,28.275,3,9.425,3.788945,0.0197,0.262109
2,AD_Status * Treatment,9.075,3,3.025,1.21608,0.3198,0.10234
3,Residual,79.6,32,2.4875,,,0.5


As we can notice, there is not a statistically significant evidence to say that there's a Main Effect in `AD_Status` with *p=.2784*. Although, in treatment, there is a statisitically significant evidence to reject the null hypothesis that `Treatment` doesn't have an effect on the number of `Maze Erros during Training Day`. However, it is important to see if there's an interaction effect between the two factors, `AD_Status` and `Maze Errors during Training Day`. Meaning, if the `Maze Errors during Training Day` of a certain group in `AD_Status` depends on the `Treatment`. The result of two-way Anova indicates that there's not enough statistically significant evidence to support the notion of interaction effect between the two factors, *p=.3198*.

Due to the previously mentioned result, there will be no need for a Post-Hoc analysis of both the main effect of `AD_Status` and interaction effect between `AD_Status` and `Treatment`. However, it is imperative to examine the mean difference within the `Treatment` group.

$$
\alpha_adjusted = \frac{\alpha}{m} = \frac{0.05}{4} \approx 0.0083
$$
Where $\alpha$ is the original level of significance and $m$ is combination of 2 groups of 4 `Treatment` ($\binom{4}{2}= 6$). Therefore, we are now accepting at a significance level of 0.0083.
## Tukey’s HSD Post Hoc

In [25]:
tukey = pairwise_tukeyhsd(endog=df['Training'], groups=df['Treatment'], alpha=0.0083)

print(tukey.summary())

Multiple Comparison of Means - Tukey HSD, FWER=0.01
group1 group2 meandiff p-adj   lower  upper  reject
---------------------------------------------------
     A      B      1.5  0.172 -0.9372 3.9372  False
     A      C      0.9 0.5931 -1.5372 3.3372  False
     A      D     -0.7 0.7612 -3.1372 1.7372  False
     B      C     -0.6 0.8347 -3.0372 1.8372  False
     B      D     -2.2 0.0196 -4.6372 0.2372  False
     C      D     -1.6 0.1314 -4.0372 0.8372  False
---------------------------------------------------


![image.png](attachment:557c8a0b-f2b5-4004-9a8b-bd8a86c42164.png)

## Reporting
A two-way ANOVA was conducted to analyze the effects of `AD_Status` and `Treatment` on a rat's `Maze Errors during Training Day`. A Residual analysis was done to test for assumptions before conducting two-way ANOVA. The outliers were assessed through IQR method and inspection of boxplot, the result from both method showed no sign of any signicant outliers. A Shapiro-Wilk test was also done to test for normality of the residual distributions and they beg no deviation from normality (*p>.05*). A Levene's test was done to test for homogeneity of variance and we have fail to reject the null hypothesis of having homogeneity of variances (*p=.87*). 

The result of the two-way ANOVA testing showed that therte is no statistically significant interaction between `AD_Status` and `Treatement` (*F(3,32)=1.216, p=.320, patial n^2=.102*), therefore we have failed to reject the null hypothesis. There is also no significant difference between `AD_Status` groups in terms of number of `Maze Errors during Training Day` (*F(1,32)=1.216, p=.278, patial n^2=.037*). However there is a significant evidence to support the notion that there is at least one drug treatment leads to a different mean number of `Maze Errors during Training Day` (*F(3,32)=9.425, p=.020, patial n^2=.262*).

To investigate further, pairwise comparisons were conducted using Tukey’s HSD test with an adjusted alpha level of 0.0083 to control for multiple comparisons. The results indicated that none of the pairwise comparisons were statistically significant at the adjusted alpha level. For example, the comparison between Treatment A and Treatment B yielded a mean difference of 1.5 (*p= .172*), and the comparison between Treatment B and Treatment D yielded a mean difference of -2.2 (*p=.0196*). However, neither comparison met the adjusted threshold for significance. 

In summary, while the ANOVA indicated a significant overall effect of `Treatment`, the post-hoc tests revealed no statistically significant pairwise differences. This suggests that the significant main effect might be due to small but cumulative differences across all groups rather than large differences between specific pairs of treatments.


--- 

# Training Day Maze Errors

## Checking for assumptions

### Assumption 1: You have one dependent variable that is measured at the continuous level (i.e., the interval or ratio level).
In our case, the dependent variable shall be the `Memory` column, which is measured at the continuous level. `Memory` coloumn is about the number of errors made during the memory day. Technically, this is a discrete variable but we could still see it at a continous level, especially since we are dealing with mean/average.

### Assumption 2: You have two independent variables where each independent variable consists of two or more categorical, independent groups.
We have two independent and categorical variables: `AD_Status` (1: Transgenic; 2: Wild) and `Treatment` (1: Drug A, 2: Drug B, 3: Drug C, 4: Drug D).

Note that since we have two factors, with `AD_Status` have 2 categories and `Treatment` having 4 categories, we'll have a total of $2 \times 4 = 8$ cells.

### Assumption 3: You should have independence of observations
Since `AD_Status` and `Treatment` are two distinct groups/categories, there's no issue here.

### Assumption 4: There should be no significant outliers in any cell of the design.

We have used the IQR method to flag outliers. The IQR is computed as follows:

$$
IQR = Q_3 - Q_1
$$

Then the acceptable range for observed data shall be:
$$
(Q_1 - 1.5 \times IQR \  \ , \ \ Q_3 + 1.5 \times IQR) 
$$

Any values outside of this interval shall be flagged as outliers.

In [26]:
grouped = df.groupby(['AD_Status', 'Treatment'])

outlier_info = []

for (ad, drug), group in grouped:
    Q1 = group['Memory'].quantile(0.25)
    Q3 = group['Memory'].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR

    outliers = group[(group['Memory'] < lower_bound) | (group['Memory'] > upper_bound)]
    
    outlier_info.append({
        'AD_Status': ad,
        'Treatment': drug,
        'Q1': Q1,
        'Q3': Q3,
        'IQR': IQR,
        'Lower Bound': lower_bound,
        'Upper Bound': upper_bound,
        'Outliers': outliers['Memory'].tolist()
    })

outlier_df = pd.DataFrame(outlier_info)
outlier_df

Unnamed: 0,AD_Status,Treatment,Q1,Q3,IQR,Lower Bound,Upper Bound,Outliers
0,Transgenic,A,10.0,13.0,3.0,5.5,17.5,[]
1,Transgenic,B,13.0,14.0,1.0,11.5,15.5,[11]
2,Transgenic,C,11.0,14.0,3.0,6.5,18.5,[]
3,Transgenic,D,10.0,12.0,2.0,7.0,15.0,[]
4,Wild,A,8.0,9.0,1.0,6.5,10.5,[]
5,Wild,B,7.0,9.0,2.0,4.0,12.0,[]
6,Wild,C,8.0,9.0,1.0,6.5,10.5,[]
7,Wild,D,5.0,8.0,3.0,0.5,12.5,[]


If we take a look on the `Outliers` column above, it will appear that there is one flagged outlier within Transgenic and B cell, as assessed by the IQR method. 

A quick look the these raincloud plots will reinforce this idea..

![image.png](attachment:0976ce27-8cd5-4a1d-9a78-966c03ad1778.png)

Upon assessing the raincloud plots, it is indeed far from the first quartile of it's cell. To minimize the effect while retaining the data point (we need this due to small sample size, we can't afford to delete any data points), let's bring it closer to the lower bound of the IQR, which is 11.5. 

### Assumption 5: The distribution of the dependent variable (residuals) should be approximately normally distributed in every cell of the design.
Let us now test for normality using Shapiro-Wilk test.

In [27]:
df.loc[(df['AD_Status'] == 'Transgenic') & (df['Treatment'] == 'B') & (df['Memory'] == 11), 'Memory'] = 11.5

  df.loc[(df['AD_Status'] == 'Transgenic') & (df['Treatment'] == 'B') & (df['Memory'] == 11), 'Memory'] = 11.5


In [28]:
grouped = df.groupby(['AD_Status', 'Treatment'])

outlier_info = []

for (ad, drug), group in grouped:
    Q1 = group['Memory'].quantile(0.25)
    Q3 = group['Memory'].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR

    outliers = group[(group['Memory'] < lower_bound) | (group['Memory'] > upper_bound)]
    
    outlier_info.append({
        'AD_Status': ad,
        'Treatment': drug,
        'Q1': Q1,
        'Q3': Q3,
        'IQR': IQR,
        'Lower Bound': lower_bound,
        'Upper Bound': upper_bound,
        'Outliers': outliers['Memory'].tolist()
    })

outlier_df = pd.DataFrame(outlier_info)
outlier_df

Unnamed: 0,AD_Status,Treatment,Q1,Q3,IQR,Lower Bound,Upper Bound,Outliers
0,Transgenic,A,10.0,13.0,3.0,5.5,17.5,[]
1,Transgenic,B,13.0,14.0,1.0,11.5,15.5,[]
2,Transgenic,C,11.0,14.0,3.0,6.5,18.5,[]
3,Transgenic,D,10.0,12.0,2.0,7.0,15.0,[]
4,Wild,A,8.0,9.0,1.0,6.5,10.5,[]
5,Wild,B,7.0,9.0,2.0,4.0,12.0,[]
6,Wild,C,8.0,9.0,1.0,6.5,10.5,[]
7,Wild,D,5.0,8.0,3.0,0.5,12.5,[]


### Assumption 6: The variance of the dependent variable (residuals) should be equal in every cell of the design.
We ought to use levene's test for homogeneity because we are comparing "between groups."

In [32]:
group_values = [group['Memory'].values for _, group in grouped]

statistic, p_value = levene(*group_values, center='median')

alpha = 0.05
if p_value > alpha:
    result = "Fail to reject the null hypothesis: Variances are equal across groups."
else:
    result = "Reject the null hypothesis: Variances are not equal across groups."

print(f"Levene's Test Statistic: {statistic:.4f}")
print(f"p-value: {p_value:.4f}")
print(result)

Levene's Test Statistic: 0.8752
p-value: 0.5365
Fail to reject the null hypothesis: Variances are equal across groups.


## Two-way Anova

In [33]:
aov = anova(dv='Memory', between=['AD_Status', 'Treatment'], data=df, detailed=True)
aov['p'] = aov['p-unc'].apply(lambda x: "< 0.001" if x < 0.001 else f"{x:.4f}")
SS_residual = aov.loc[aov['Source'] == 'Residual', 'SS'].values[0]
aov['partial_eta_sq'] = aov['SS'] / (aov['SS'] + SS_residual)
aov.drop(columns=['np2', 'omega_sq'], inplace=True, errors='ignore')  # Drop if they exist
aov = aov[['Source', 'SS', 'DF', 'MS', 'F', 'p', 'partial_eta_sq']]

aov

Unnamed: 0,Source,SS,DF,MS,F,p,partial_eta_sq
0,AD_Status,191.40625,1,191.40625,78.125,< 0.001,0.709421
1,Treatment,14.96875,3,4.989583,2.036565,0.1285,0.160319
2,AD_Status * Treatment,9.31875,3,3.10625,1.267857,0.3020,0.106234
3,Residual,78.4,32,2.45,,,0.5


As we can notice, there is in fact a statistically significant evidence to say that there's a Main Effect in `AD_Status` with *p<.001*. Although, in treatment, there is not a statisitically significant evidence to reject the null hypothesis that `Treatment` doesn't have an effect on the number of `Maze Erros during Memory Day`, *p=.129*. Also, the result of two-way Anova indicates that there's not enough statistically significant evidence to support the notion of interaction effect between the `AD_Status` and `Treatment` factors, *p=.302*.

Due to the previously mentioned result, there will be no need for a Post-Hoc analysis of both the main effect of `Treatment` and interaction effect between `AD_Status` and `Treatment`. However, it is imperative to examine the mean difference within the `AD_Status` group. 

Also, since `AD_Status` only has two categories under it, there is no need to adjust the alpha. 

## Tukey’s HSD Post Hoc

In [46]:
tukey = pairwise_tukeyhsd(endog=df['Memory'], groups=df['AD_Status'], alpha=0.05)

print(tukey.summary())

  Multiple Comparison of Means - Tukey HSD, FWER=0.05  
  group1   group2 meandiff p-adj  lower   upper  reject
-------------------------------------------------------
Transgenic   Wild   -4.375   0.0 -5.4274 -3.3226   True
-------------------------------------------------------


## Reporting
A two-way ANOVA was conducted to analyze the effects of `AD_Status` and `Treatment` on a rat's `Maze Errors during Memory Day`. A Residual analysis was done to test for assumptions before conducting two-way ANOVA. The outliers were assessed through IQR method and inspection of boxplot, the result from both method flagged a potential signicant outliers in the cell of 'Transgenic' `AD_Status` and 'B' `Treatment`. To minimize the effect without deleting the specific data point, the single outlier was brought closer to it first quartile counterpart. A Shapiro-Wilk test was also done to test for normality of the residual distributions and they beg no deviation from normality (*p>.05*). A Levene's test was done to test for homogeneity of variance and we have fail to reject the null hypothesis of having homogeneity of variances (*p=.54*). 

The result of the two-way ANOVA testing showed that there is no statistically significant interaction between `AD_Status` and `Treatment` (*F(3,32)=1.268, p=.302, patial n^2=.106*), therefore we have failed to reject the null hypothesis. There is also no significant difference between `Treatment` groups in terms of number of `Maze Errors during Memory Day` (*F(3,32)=2.037, p=.129, patial n^2=.160*). However there is a significant evidence to support the notion that there is a different mean number of `Maze Errors during Memory Day` under `AD_Status` categories (*F(1,32)=.78.125, p<.001, patial n^2=.709*). 

A Tukey's HSD test was conducted to examine the mean difference in `Maze Errors during Memory Day` between "Transgenic" and "Wild" groups. The result of the Tukey's HSD test indicated a statistically significant difference in `Maze Errors during Memory Day` between the "Transgenic" and "Wild" groups, with a mean difference of -4.375 (*p<.001*, 95% CI [-5.427, -3.322]). These results suggest that `Maze Errors during Memory Day` are significantly different between the two AD status groups, with the "Wild" group having more errors on average.