# Hypothesis Test Examples

## Example 1: Class survey

We have conducted a survey to the >100 students in this class, with the question of 
"What do you think about the pace of the course?".

There are 3 options: about right, too fast, too slow.

My hypothesis is that the majority of class, that is, > 80% of the students, think that the pace is about right.
We want to test this hypothesis using the survey results.

Since there are 3 possible outcomes, to turn this to something we are more familiar with, 
we can turn this to a binomial test, by grouping the "too fast" and "too slow" together as "not about right".

Therefore, we have:
$$H_0: p = 0.8$$
$$H_a: p > 0.8$$

The Null distribution is a Normal distribution with mean 0.8 and standard deviation $\sqrt{p(1-p)/n}$.

In [21]:
import numpy as np
from scipy.stats import norm

n_samples = 75
n_yes = 67
p_hat = n_yes / n_samples

p_0 = 0.8
alpha = 0.05

# Calculate the standard error of the proportion
se = np.sqrt(p_0 * (1 - p_0) / n_samples)

# Approach 1: find the critical value and the corresponding rejection region
print("="* 10 + "\nApproach 1: find the critical value and the corresponding rejection region\n")
p_critical = norm.ppf(1 - alpha, loc=p_0, scale=se)

print(f"The critical value is {p_critical:.3f}.")
print(f"The rejection region is {p_critical:.3f} or more.")
print(f"The observed proportion is {p_hat:.3f}.")

# Approach 2: find the p-value
print("\n" + "="* 10 + "\nApproach 2: find the p-value\n")
p_value = 1 - norm.cdf(p_hat, loc=p_0, scale=se)  # or norm.sf(p_hat, loc=p_0, scale=se)
print(f"The p-value is {p_value:.3f}.")
print(f"The preset alpha is {alpha:.3f}.")


Approach 1: find the critical value and the corresponding rejection region

The critical value is 0.876.
The rejection region is 0.876 or more.
The observed proportion is 0.893.

Approach 2: find the p-value

The p-value is 0.022.
The preset alpha is 0.050.


## Example 2: Alzheimer's Disease drug trial (Biogen)

Biogen is a company that develops drugs for Alzheimer's Disease.
They want to develop a drug hopoing to slow down the progression of Alzheimer's Disease.

One of such studies is EMERGE (protocol 221AD302): a Phase 3, randomized, double-blind, 
placebo-controlled study of aducanumab (BIIB037) in patients with early Alzheimer’s disease.

The trial had two dose arms (high dose and low dose) compared to placebo (control), with monthly infusions.
The primary outcome measure is the change from baseline in the Clinical Dementia Rating Scale - Sum of Boxes (CDR-SB) score at Week 78.

Below is the summary of the trial:
| Group     | Sample Size | Sample Mean | Sample Standard Deviation |
|-----------|-------------|-------------|---------------------------|
| Control   | 288         | 1.74        | 1.95                      |
| Low dose  | 290         | 1.47        | 1.97                      |
| High dose | 299         | 1.35        | 1.99                      |


In [22]:
# Control group stats
n_c, mean_c, sd_c = 288, 1.74, 1.95
# Low dose
n_l, mean_l, sd_l = 290, 1.47, 1.97
# High dose
n_h, mean_h, sd_h = 299, 1.35, 1.99

def calculate_z_score(mean_1, mean_2, sd_1, sd_2, n_1, n_2):
    # Calculate the z-score under the null hypothesis that the two groups have the same mean
    return (mean_1 - mean_2) / np.sqrt(sd_1**2 / n_1 + sd_2**2 / n_2)

# Between the low dose and control
z_c_l = calculate_z_score(mean_c, mean_l, sd_c, sd_l, n_c, n_l)
p_c_l = norm.sf(z_c_l)
print(f"The p-value for the low dose vs control is {p_c_l:.3f}.")

# Between the high dose and control
z_c_h = calculate_z_score(mean_c, mean_h, sd_c, sd_h, n_c, n_h)
p_c_h = norm.sf(z_c_h)
print(f"The p-value for the high dose vs control is {p_c_h:.3f}.")

The p-value for the low dose vs control is 0.049.
The p-value for the high dose vs control is 0.008.
