# Hypothesis testing 

## Z-tests
A z-test is a statistical method used to determine if there is a significant difference between sample data and a **known population parameter**, or between two sample means when the **population variance is known or the sample size is large**. 

Assumptions of a Z-Test:
- Normality: The data should be approximately normally distributed. This is usually satisfied if the sample size is large $n \geq 30$ due to the Central Limit Theorem.
- Known Population Variance: For a z-test, the population standard deviation $\sigma$ is assumed to be known. If it's unknown, a t-test is typically used instead.
- Independent Observations: The samples should be independent of each other.

## One vs Two Sided 
A one-sided (or one-tailed) test and a two-sided (or two-tailed) test are two types of hypothesis tests.  The key difference between the two lies in the nature of the alternative hypothesis and the area of interest in the distribution.

### One-Sided (One-Tailed) Test
- A one-sided test is used when we are only interested in deviations in one direction (greater than or less than the hypothesized value).
- The alternative hypothesis specifies the direction of the effect.

**Null Hypothesis $H_0$**: The parameter is equal to a specific value.
 - $ H_0: \mu = \mu_0 $
   
**Alternative Hypothesis $H_1$**: The parameter is either greater than or less than the specific value, depending on the test direction.
- $ H_0: \mu = \mu_0 $
- $ H_1: \mu > \mu_0 $ (Right-tailed test)
- $ H_1: \mu < \mu_0 $ (Left-tailed test)

For example, let's say I develop a new teaching method. I'd like to know if it improves scores. I only care if scores are higher, not lower.
- $H_0$: The new method does not improve scores $\mu \leq \mu_0$.
- $H_1$: The new method improves scores $\mu > \mu_0$.

### Two-Sided (Two-Tailed) Test
A two-sided test is used when we are interested in detecting whether a parameter (e.g., mean, proportion) is significantly different from a hypothesized value in either direction (both greater than or less than the hypothesized value). The alternative hypothesis is concerned with deviations in **both directions** from the null hypothesis.

**Null Hypothesis $H_0$**: The parameter is equal to a specific value.
  - $ H_0: \mu = \mu_0 $
    
**Alternative Hypothesis $H_1$**: The parameter is not equal to that specific value.
  - $ H_1: \mu \neq \mu_0 $

For example, I am interested in how diet affects cholesterol levels. I am interested in this regardless of whether they increase or decrease.
  - $H_0$: The diet has no effect on cholesterol levels $\mu = \mu_0$.
  - $H_1$: The diet affects cholesterol levels $\mu \neq \mu_0$.



### Choosing Between the Tests

| Aspect | One-Sided Test | Two-Sided Test |
|--------|----------------|----------------|
| **Alternative Hypothesis** | Tests for deviation in one direction $>$ or $<$ | Tests for deviation in both directions $\neq$ |
| **Rejection Region** | Located in one tail (either left or right) | Located in both tails |
| **Significance Level $\alpha$** | Entire $\alpha$ is in one tail | $\alpha$ is split between both tails ($\alpha/2$ in each tail) |
| **When to Use** | When you have a directional hypothesis | When you are testing for any difference, regardless of direction |

## Calculate the Test Statistic:
The test statistic $ z $ is calculated using:

$$
z = \frac{\bar{X} - \mu_0}{\sigma / \sqrt{n}}
$$

Where:
- $ \bar{X} $ is the sample mean.
- $ \mu_0 $ is the population mean under the null hypothesis.
- $ \sigma $ is the population standard deviation.
- $ n $ is the sample size.

## Determine the Critical Value and Significance Level
- Choose a significance level $ \alpha $, commonly set at 0.05 or 0.01.
- Determine the critical value from the standard normal distribution for a one-tailed or two-tailed test.

### One-Sided Test
**$ \alpha = 0.05 $**:
- Right-tailed critical value: $ z_{0.05} \approx 1.645 $
- Left-tailed critical value: $ z_{0.05} \approx -1.645 $

**$ \alpha = 0.01 $**:
- Right-tailed critical value: $ z_{0.01} \approx 2.33 $
- Left-tailed critical value: $ z_{0.01} \approx -2.33 $

For a one-sided test with $\alpha = 0.05$, you find the z-value such that the area in the right tail is 0.05 is $\approx$ 1.645.
![oneside](../../../images/onesided.png)

### Two-Sided Test

**$ \alpha = 0.05 $**:
- Critical values for a two-sided test with $ \alpha = 0.05 $ means $ \alpha/2 = 0.025 $ in each tail:$ z_{0.025} \approx \pm 1.96 $

**$ \alpha = 0.01 $**:
- Critical values for a two-sided test with $ \alpha = 0.01 $ means $ \alpha/2 = 0.005 $ in each tail:$ z_{0.005} \approx \pm 2.58 $

For a two-sided test with $\alpha = 0.05$, you find the z-value such that the area in each tail is 0.025 is $\approx \pm$ 1.96
![oneside](../../../images/twosided.png) 

### Summary of Critical Values for Z-Distribution

| Significance Level ($\alpha$) | One-Sided Critical Value (Right Tail) | One-Sided Critical Value (Left Tail) | Two-Sided Critical Values |
|---------------------------------|---------------------------------------|--------------------------------------|---------------------------|
| $ \alpha = 0.05 $             | $ 1.645 $                           | $ -1.645 $                         | $ \pm 1.96 $            |
| $ \alpha = 0.01 $             | $ 2.33 $                            | $ -2.33 $                          | $ \pm 2.58 $            |

### Draw a Conclusion
- Compare the calculated $ z $-value to the critical value(s):
- If the $ z $-value falls in the rejection region (beyond the critical value), reject the null hypothesis.
- If the $ z $-value does not fall in the rejection region, fail to reject the null hypothesis.

### Example: Two sided Z-Test
Suppose you have a sample mean test score of $ \bar{X} = 85 $, with a known population mean of $ \mu_0 = 80 $ and a population standard deviation $ \sigma = 10 $. The sample size $ n = 30 $. 

1. Hypotheses: 
   - $ H_0: \mu = 80 $
   - $ H_1: \mu \neq 80 $

2. Calculate the z-Statistic:
$$
   z = \frac{85 - 80}{10 / \sqrt{30}} \approx 2.74
$$

3. Critical Value at $ \alpha = 0.05 $ (Two-Tailed):
   - Critical values are $ \pm 1.96 $.

4. Decision:
   - $ 2.74 $ is greater than $ 1.96 $, so we reject the null hypothesis.

5. Conclusion:
   - There is a statistically significant difference between the sample mean and the population mean at the 0.05 significance level.

## P-value
### Understanding the P-Value
Instead of comparing the computed z-value to the critical value, it is also possible to directly calculate the probability of that value arising by chance if the $H_0$ is true.

The p-value is the probability of obtaining a test statistic at least as extreme as the one observed in your sample data, assuming that the null hypothesis is true. It quantifies the evidence against the null hypothesis. A smaller p-value indicates stronger evidence against the null hypothesis.

### Draw a Conclusion
Compare the P-Value to the Significance Level $\alpha$. The significance level $\alpha$ is a threshold you set before conducting the test.
- If $\text{p-value} \leq \alpha$: There is enough evidence to reject the null hypothesis.
- If $\text{p-value} > \alpha$: There is not enough evidence to reject the null hypothesis.

Statistically Significant: When the p-value is less than or equal to $\alpha$ the result is considered statistically significant, indicating that the observed effect is unlikely to have occurred by chance.