# Reference: Hypothesis Testing

### Step 1: Experiment Design, Conducting Trials and Collecting Data

We need to collect data while avoiding bias. The best way scientists know of is: *Randomization*.

> ***Randomization*** is the process of assigning participants to treatment and control groups, assuming that each participant has an equal chance of being assigned to any group. ([NIH](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2267325/#:~:text=What%20Is%20Randomization%3F,being%20assigned%20to%20any%20group))

In *Random Experiments*, we have two groups:

- ***Test or Treatment Group***: a group where the treament is applied
- ***Control Group***: a group where no treatment is applied (often, a placebo is given to eliminate psychological effects)

### Step 2: Inference about each group

Let's first state our Hypotheses:
- The Null Hypothesis $H_0$ states that the $\mu_a$ (mean) of the Test Group is equal to the $\mu_0$ of the Control Group.
- The Alternative Hypothesis $H_a$ states that the $\mu_a$ of the Test Group is not equal to the $\mu_0$ of the Control Group.

These are two populations (Treatment and Control). We choose our Confidence Level ($95\%$ by convention) and estimate their means: $\mu_0$ and $\mu_a$ through repeated (say: $k = 1000$) large (say: $n = 50$) sampling. The goes back to the [Central Limit Theorem](../techniques/central_limit_theorem.ipynb).

### Step 3: Statistical Significance

We know have two estimates of the means: $\mu_0$ and $\mu_a$. But how can we be sure that the difference between them is not due to random chance? This is where *Statistical Significance* comes in.

The **t-test** is a statistical test that qualifies the difference between two means as being statistically significant or not. For a *Confidence Level* $CI = 95\%$, the t-test is significant if the p-value (the outcome of the t-test) is less than the *Significance Level* $\alpha = 0.05$.

In other words, the ***p-value*** is the probability of observing a test statistic as extreme as the one computed from the sample data (Treatment Group), assuming that the null hypothesis ($\mu_a = \mu_0$) is true.

Remember, the p-value will always be tied to a null hypothesis.

### Step 4: Check for Type I and Type II Errors

- **Type I Error (False Positive)**: mistakenly concluding an effect is real (when it is due to chance)
    - example: the test result says you have coronavirus, but you actually don’t
    - example: the test result says the drug is effective, but it actually isn’t
- **Type II Error (False Negative)**: mistakenly concluding an effect is due to chance (when it is real)
    - example: the test result says you don’t have coronavirus, but you actually do
    - example: the test result says the drug is not effective, but it actually is

# Statistical Tests

### z-test

A **z-test** is a statistical test used to determine whether two population *means* are significantly different. The *z-statistic* is given by:

$$
z = \frac{\bar x - \mu}{\sigma / \sqrt{n}}
$$

The issue with the z-test is that it requires the population standard deviation ($\sigma$) to be known. In practice, this is rarely the case. Therefore, the t-test is more commonly used.

### t-test

In a **t-test**, a *t-statistic* is the difference between the hyopthesized mean $\mu$ and a sample mean $\bar x$ divided by the standard error $\frac{s}{\sqrt{n}}$, instead of the standard deviation $\sigma$ since it is unknown.

$$
t = \frac{\bar x - \mu}{s / \sqrt{n}}
$$

### z- and t-distributions

It has been noted that the t-distribution is similar to the normal distribution, but with heavier tails. Hence, it is a good approximation for the normal distribution.

<img src="https://upload.wikimedia.org/wikipedia/commons/2/2a/Comparing_the_Standard_Normal_Distribution_and_Student%27s_T_Distribution.png" height="260">

[Figure](https://commons.wikimedia.org/wiki/File:Comparing_the_Standard_Normal_Distribution_and_Student%27s_T_Distribution.png#filelinks): Comparing the Standard Normal Distribution and Student's T Distribution.