# Hypothesis Testing


When do you need a hypothesis? Whenever you want to prove or draw a conclusion about the population with a sample of that population.


So we make a sample from a population, if you only want to describe that sample you use **descriptive statistic**, and you calculate for instance mean or standard deviation in order to describe the sample, but if we want to make a statement about the whole population, so we use our sample and with Hypothesis Testing we can infer the population, so the goal of Hypothesis Testing is use sample from a population to test a hypothesis about the population.

When you want to formulate hypothesis there are always two hypothesis that claim the opposite

## Null Hypothesis

**Null Hypothesis ($ H_0 $)**: This is a statement that there is no effect or no difference. It's what you're trying to test against. For instance there is no difference between **Drug A** and **Drug B**.

## Alternative Hypothesis 
**Alternative Hypothesis ($ H_a $ or $ H_1 $)**: This is what you want to prove, e.g., that there is a difference or an effect.
For instance, **Drug A** is superior to Drug B.


Hypothesis testing can only determine with a probability of error whether a Hypothesis is accepted or rejected.





## P values

The **$ p $-value** used in statistics to measure how surprising or unlikely your data is, under  null hypothesis.

**The intuition behind P values**
Imagine you have a coin and you suspect it might be biased towards heads. To test this, you decide to flip the coin 100 times. Your null hypothesis (the assumption you're testing against) is that the coin is fair, meaning it has an equal chance of landing heads or tails.

After flipping the coin 100 times, suppose you get an unusually high number of heads, say 70 heads and 30 tails. You might start to think this result is pretty strange if the coin were truly fair.

Here's where the **$ p $-value** comes in. The **$ p $-value** is a number between 0 and 1 that tells you how likely it is to see a result as extreme as yours (or more extreme) if the null hypothesis were true. In our example, if the p-value is very low (let's say 0.01), it means that getting 70 heads out of 100 flips would be very unlikely if the coin were fair. A low p-value suggests that maybe your assumption (the null hypothesis) that the coin is fair might not be right.


- A low **$ p $-value** (typically, a threshold like 0.05 or 5% is used), it suggests that the observed data is inconsistent with the null hypothesis, so you might reject the null hypothesis in favor of the alternative hypothesis (which is the hypothesis that there is an effect or a difference).
- A high **$ p $-value** means you don't have enough statistical evidence to reject the null hypothesis.

However, a low p-value doesn't prove the alternative hypothesis is true. It only suggests that the data you observed are unlikely under the assumption that the null hypothesis is true. Other factors, like the design of the experiment and assumptions of the statistical test, also play critical roles in the interpretation of **$ p $-value**.

**Technical way of describing the p-value**

The **$ p $-value** is the probability of observing a test statistic as extreme as, or more extreme than, the statistic computed from the data, assuming that the null hypothesis is true. 


**Test Statistic**: This is a number calculated from your data that is used to evaluate how compatible your data is with the null hypothesis. The form of the test statistic depends on the type of test you're conducting (e.g., t-test, chi-square test). For instance, in our coin flip example, the test statistic could be the number of heads observed.

**Observing a Test Statistic as Extreme as, or More Extreme Than, the Statistic Computed from the Data**: comparing what you actually observed in your experiment or study to what you would expect under the null hypothesis. "As extreme as, or more extreme than" refers to outcomes that are at least as unlikely as the actual outcome you got, given the null hypothesis is true.

**Assuming That the Null Hypothesis is True**: The calculation of the p-value is done under the assumption that the null hypothesis is correct. This is crucial because the p-value is meant to test the strength of evidence against the null hypothesis.

**Probability**: The **$ p $-value** itself is a probability. It measures the likelihood of observing your actual test statistic (or one more extreme) purely by chance if the null hypothesis were true. 

## Test Statistic
### t-test
The t-test is a statistical test used to determine if there is a significant difference between the means of two groups, which may be related in certain features. It's widely used in hypothesis testing to assess the significance of the differences between two sample means. There are several types of t-tests, including:

1. **One-sample t-test** compares the mean of a single group against a known mean. For instance a chocklate factory claim that its chockale br is 50 grams, We gather 30 bars and calculated the mean value which is 48 grams and we can compare it to the 50 grams.

2. **Two-sample (independent) t-test** compares the means of two independent or unrelated groups to determine if there is a significant difference between them. Example: effectiveness of two painkillers on two groups of patient, compring the mean value of the time that painkiller took to effect.

3. **Paired t-test** compares the means from the same group at different times (say, one year apart), or the means from two groups that are somehow related or paired. For example effectiveness of a diet, we calculated the mean weight of participants before a diet and after the diet. 



### Formula for Two-sample (independent) t-test

The formula to calculate the t-value in an independent t-test is:

$ t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{ \frac{s_1^2}{N_1} + \frac{s_2^2}{N_2} }} $

where:
- $ \bar{X}_1 $ and $ \bar{X}_2 $ are the sample means,
- $ s_1^2 $ and $ s_2^2 $ are the sample variances,
- $ N_1 $ and $ N_2 $ are the sample sizes.

The degrees of freedom (df) for this test is $ df = N_1 + N_2 - 2 $.

After calculating the t-value, you compare it against a critical value from the t-distribution table based on your chosen significance level (alpha, typically 0.05) and degrees of freedom to determine if the difference is statistically significant.

### Numerical Example

Imagine you want to test if there is a significant difference in the exam scores between two groups of students, Group A and Group B. Here are their scores:

- Group A: 85, 86, 88, 75, 78, 94, 98, 90, 94, 92
- Group B: 78, 74, 88, 82, 82, 85, 89, 92, 90, 83

Let's calculate the t-statistic for these two groups.

The calculated t-statistic for the difference between the means of Group A and Group B is approximately 1.27. The critical t-value for a two-tailed test with $ \alpha = 0.05 $ (95% confidence level) and 18 degrees of freedom is approximately 2.10. Since the absolute value of the t-statistic is less than the critical t-value, we fail to reject the null hypothesis. This means that based on our sample data and the chosen significance level, there is not enough evidence to conclude that there is a significant difference in the exam scores between the two groups. The p-value associated with our t-statistic is approximately 0.219, which is greater than 0.05, further supporting our failure to reject the null hypothesis.

## Example Drug A and Drug B

Imagine you have **Drug A** and **Drug B** and you test them on two patients, can we say because it worked on **patient1** and didn't work on **patient2** it is working? There might be several factors that contributed to that result. So now let's try it on $2000$ patients and **Drug A** cured $97\%$ of people while **Drug B** cured only $3\%$, so now the chance the result was random and there is no difference between them is unrealistic. Now imagine the success of **Drug A** is $37%$ and **Drug B** is  $31%$ on $50$ patients.

So given that no study is perfect and there are always a few random things that change the result, how can we become confident that Drug A is superior?

That's where the **$ p $-value** comes in. **$ p $-value** are numbers between  0 and 1 and quantify how confident we should be **Drug A** is different from **Drug B**.
The closer a **$ p $-value** is to 0 the more confident we are that **Drug A** and **Drug B** are different. 


In practice, the commonly used threshold is $0.05$, meaning if there is no difference between **Drug A** and **Drug B**, and we did the exact same experiment then only $5\%$ of those experiments would result is the wrong decision. Now let's repeat the experiment repeatedly, and we get the following (**$ p $-value** calculated using the Fisher test):



| Drug A  |           | Drug B |           | p-value |
|---------|-----------|--------|-----------|---------|
| Cured   | Not Cured | Cured  | Not Cured |         |
| 73      | 125       | 71     | 127       | 0.9     |
| 71      | 127       | 72     | 126       | 1.0     |
| 75      | 123       | 70     | 128       | 0.7     |


## Example Effects of a new Fertilizer on Plant Growth

Imagine you're a botanist studying the effects of a new fertilizer on plant growth. You have two groups of plants:

1. **Control Group**: Plants not given the fertilizer.
2. **Treatment Group**: Plants given the fertilizer.

You want to know if the fertilizer has a significant effect on plant growth. To do this, you measure the height of the plants after a fixed period.

**Hypotheses**:
- $ H_0 $: The fertilizer has no effect on plant growth. (Mean height of Control Group = Mean height of Treatment Group)
- $ H_a $: The fertilizer has an effect on plant growth. (Mean height of Control Group ≠ Mean height of Treatment Group)

**Data**:
Let's assume you measured the height (in cm) of 10 plants from each group:

- Control Group: [15, 17, 16, 14, 15, 16, 17, 15, 16, 17]
- Treatment Group: [18, 19, 20, 19, 18, 21, 19, 20, 19, 18]

We'll use a two-sample t-test to determine if there's a significant difference in the means of the two groups.

To calculate the p-value using a t-test for comparing the means of the control group and the treatment group, we'll go through the following steps:

1. **Calculate the mean** of each group.
2. **Calculate the standard deviation** of each group.
3. **Calculate the standard error of the mean (SEM)** for each group.
4. **Calculate the t-statistic** using the means, SEMs, and sample sizes of both groups.
5. **Calculate the degrees of freedom** needed to look up the p-value.
6. **Calculate the p-value** based on the t-statistic and degrees of freedom.



For a two-tailed test (which checks for any difference between the means, not specifying direction), the p-value can be conceptualized as:

$ \text{p-value} = 2 \times (1 - \text{CDF}(t, df)) $

Where:
- $ \text{CDF} $ refers to the cumulative distribution function for the t-distribution.
- $ t $ is the observed t-statistic calculated from your data.
- $ df $ are the degrees of freedom, which, for an independent samples t-test, are usually calculated as $ n_1 + n_2 - 2 $ for equal variances, or using a more complex formula for unequal variances and sample sizes.

This formula is calculating the probability of observing a t-statistic as extreme as, or more extreme than, the observed t-statistic under the null hypothesis. The "2 ×" part accounts for both tails of the distribution since we're interested in differences in either direction (higher or lower).

In practice, this calculation is not done manually but through  software. These functions internally use the properties of the t-distribution to find the p-value corresponding to the calculated t-statistic and the degrees of freedom for your specific test scenario.



### Control Group
- **Mean**: 15.8 cm
- **Standard Deviation**: 1.033 cm
- **Standard Error of the Mean (SEM)**: 0.327 cm

### Treatment Group
- **Mean**: 19.1 cm
- **Standard Deviation**: 0.994 cm
- **Standard Error of the Mean (SEM)**: 0.314 cm

### T-test Results
- **T-statistic**: -7.279
- **P-value**: approximately 0.0000009162

The T-statistic is -7.279, which indicates a significant difference between the control and treatment groups, given the very low P-value (less than 0.001). This means there is a statistically significant difference in plant height between the control group and the treatment group, favoring the hypothesis that the new fertilizer has a positive effect on plant growth.



Refs [1](https://www.youtube.com/watch?v=vemZtEM63GY), [2](https://www.youtube.com/watch?v=udyAvvaMjfM), [3](https://www.youtube.com/watch?v=p0W1oKPP6eQ), [4](https://www.youtube.com/watch?v=0oc49DyA3hU), [5](https://www.youtube.com/watch?v=JQc3yx0-Q9E)



In [2]:
from scipy.stats import ttest_ind
import numpy as np

# Data
control_group = np.array([15, 17, 16, 14, 15, 16, 17, 15, 16, 17])
treatment_group = np.array([18, 19, 20, 19, 18, 21, 19, 20, 19, 18])

# Step 1 & 2: Calculate mean and standard deviation
mean_control = np.mean(control_group)
std_dev_control = np.std(control_group, ddof=1)  # Sample standard deviation
mean_treatment = np.mean(treatment_group)
std_dev_treatment = np.std(treatment_group, ddof=1)  # Sample standard deviation

# Step 3: Calculate the Standard Error of the Mean (SEM) for each group
n_control = len(control_group)
n_treatment = len(treatment_group)
sem_control = std_dev_control / np.sqrt(n_control)
sem_treatment = std_dev_treatment / np.sqrt(n_treatment)

# Step 4 & 5: Calculate t-statistic and degrees of freedom
# Using scipy to calculate t-statistic and p-value directly
t_stat, p_value = ttest_ind(control_group, treatment_group)

mean_control, std_dev_control, mean_treatment, std_dev_treatment, sem_control, sem_treatment, t_stat, p_value


(15.8,
 1.0327955589886444,
 19.1,
 0.9944289260117533,
 0.3265986323710904,
 0.31446603773522014,
 -7.278624758728698,
 9.162003368633656e-07)