# 3.5 - Hypothesis Testing

Given your model, how do you test hypothesis about the data? For instance, how do you test the likelihood that there is a relation between some feature and the predictors?
More formally, one can formulate a hypothesis $H_1$ and its negation $H_0$ (the null hypothesis, which assumes no true relation exists). $H_0$ is in a way the "status quo", meaning no discovery, no relationship. It is only rejected if the observed data is very unlikely to have happened under $H_0$. If it is very unlikely that the data was generated under $H_0$, we say we reject the null hypothesis. To do this, we apply a test statistic $T = T(X)$ on the data $X$.

If we know (or can estimate) the distribution of t-values under $H_0$, then we can also compute the probability that the test statistic by chance (randomly) takes a value as extreme as the one observed (or even more extreme). I.e., if $t$ is the observed outcome of the test statistic, we can compute the probability $P \equiv \mathcal{P}_{H_0}[T \geq t]$. Small p-values thus indicate that the probability of the observed test statistic being randomly generated by the data is very low (i.e., the distribution of $T$ under $H_0$ does not really reach the value of $t$, it vanishes before). 

There are two ways to decide on the null hypothesis:
1. Reject $H_0$ if $T$ falls in some critical region. For instance, if $T$ is too large (or too small).
2. Reject $H_0$ if the p-value is smaller than some significance level $\alpha$


The basic approach is
1. Formulate the model for the data
2. Formulate the null and alternative hypotheses in terms of parameters of the model
3. Determine an appropriate test statistic (that is a function of the data only)
4. Calculate the outcome of the test statistic for the data
5. Determine the (approximate) distribution of the test statistic under $H_0$
6. Calculate the p-value.
7. Accept or reject $H_0$, based on a pre-defined $\alpha$.

The whole trick here is to know the distribution of the t-values. For this, we either make assumptions of the data which then suggest known distributions, or we numerically calculate the distribution through resampling techniques (e.g., bootstrapping!).

### Example 1 from book:

Suppose that the blood pressure on a certain group of people is normally distributed with mean 127 and standard deviation 7, i.e., that for that group, $P \sim \mathcal{N}(127, 7^2)$. A study looks into $N = 101$ diabetics and reports a sample mean of 130.

The question is: is this good evidence to show that diabetics have a higher blood pressure on average than the other group of people? To answer this, let us go step by step.

1. Since we only know the sample mean of the data, a reasonable model data is that $P_1, ..., P_{101} \sim _{iid} \mathcal{N}(\mu, 7^2)$. Or alternatively (??), that the average follows a distribution $\overline{P} \sim \mathcal{N}(\mu, 7^2/101)$.
2. The null hypothesis is $H_0 : \mu = 127$. Alternative hypothesis is $H_1 : \mu > 127$.
3. A test statistic can simply be the average blood pressure $\overline{P}$.
4. The outcome of the test statistic for the data is known: $\overline{P} = 130$.
5. As we assume in the model, the average follows $\overline{P} \sim \mathcal{N}(\mu, 7^2/101)$.
6. Compute $p = \mathcal{P}(\overline{P} \geq 130)$. This is just a matter of consulting the cumulative distribution function (CDF) for a gaussian. In more details, one can do: $\mathcal{P}(\overline{P} \geq 130) = \mathcal{P}\left( \frac{\overline{P}-127}{\sqrt{49/101}} \geq \frac{130-127}{\sqrt{49/101}}\right) = \mathcal{P}(Z \geq 4.31)$, where we do $\frac{\mathcal{N}(\mu, \sigma) - \mu}{\sigma^2/N} = \mathcal{N}(0,1) \equiv Z$. Then, we consult a table for $Z$ to find that $p = 8.16 . 10^{-6}$.
7.  Such a small value indicates it is extremely unlikely that the event $\overline{P} \geq 130$ occurs if the two groups have the same blood pressure (i.e., if $H_0$ is true). Therefore, we reject the null hypothesis based on our assumptions and the data collected.

Note: there were two important mathematical concepts in the previous example
1. Distribution of average over i.i.d. random variables: $P_1, ..., P_{101} \sim _{iid} \mathcal{N}(\mu, 7^2)$, and $\overline{P} \sim \mathcal{N}(\mu, 7^2/101)$. 
The distribution of the sample mean (average) of $N$ values drawn from a Gaussian distribution with mean $\mu$ and standard deviation $\sigma$ follows a Gaussian distribution itself with mean $\mu$ and standard deviation $\sigma/\sqrt{N}$. For other distributions this is also true in the limit $N\to\infty$ (CLT).


2. Standardization
$\frac{\mathcal{N}(\mu, \sigma) - \mu}{\sigma^2} = \mathcal{N}(0,1) \equiv Z$. So let $X \sim \mathcal{N}(\mu, \sigma^2)$, then the transformation $Z = (X-\mu)/\sigma$ follows $\mathcal{N}(0, 1)$.

3. CDF: see code below

In [2]:
# Small note: consulting a CDF table via python:
from scipy.stats import norm

# Mean and standard deviation of the Gaussian distribution
mean = 127
variance = 0.485
std_dev = variance ** 0.5  # Standard deviation is the square root of the variance

# Threshold value
threshold = 130

# Probability of sampling a value higher or equal to the threshold
probability = 1 - norm.cdf(threshold, loc=mean, scale=std_dev)

print("Probability:", probability)

Probability: 8.246221081642524e-06


### Example 2 from book:

There is a suspicion that a certain die is loaded (probability of getting sixes is higher than 1/6). By throwing 100 times we observe 25 sixes. Is there enough evidence to justify the suspicion?

1. The results from a die follow a binomial distribution, so the model is $X \sim Bin(100, p)$, where $p$ is unknown.
2. The null hypothesis is $H_0: p = 1/6$, and alternative is $H_1: p > 1/6$.
3. Test statistic is simply the number of sixes we get from 100 throws.
4. Outcome is known: 25.
5. The test statistic under $H_0$ is $Bin(100, 1/6)$
6. We have $p = \mathcal{P}(X \geq 25)= ... = 0.0217$.
7. Pretty small p, reasonable evidence the die is loaded.

In [3]:
# In the future, implement an example that uses resampling to estiamte the distribution from the data itself. 