# Overview of Hypothesis Testing

We will discuss basic concepts of hypothesis testing in statistics, and illustrate with coding examples. 
### Outline
- Hypothesis testing basics
- Null vs Alternative Hypothesis
- Normal Distribution, Z-scores, Standardization
- Level of Statistical Significance (alpha)
- P-value, Power
- Type I vs Type II error
- One Tailed Test, Two Tailed Test
- Degrees of Freedom

## What is Hypothesis Testing?
A hypothesis is an educated guess about the state of the world that can be verified by an experiment or observation. Hypothesis testing is commonly used in scientific experiments as a precise and rigorous way to verify results.


## Types of Hypotheses
We usually compare two hypotheses against each other. 

The **null hypothesis**, denoted by $H_0$ usually denotes the status quo. It is the hypothesis that the given variable **has no effect**.

The **alternative hypothesis(es)**, denoted by $H_1$ or $H_A$, is the opposite of the null. It is the hypothesis that the given observations on a sample are due to the real effect of a given variable, **not** by random chance. 


> Simple example:
We have a sample dataset of student observations with two attributes available: test score and number of chat messages sent to the prof. The null hypothesis would be "The number of chat message have no effect on test score." The alt. hypothesis would be "If a student sends a higher number of chat messages, then she will have a higher test score."


## Normal Distribution, Standardization, Z-scores

The Normal, or Gaussian, Distribution has a *bell-shaped curve*. It's a common type of continuous probability distribution. We can write 
$$ X \sim N(\mu, \sigma^2) $$

To indicate that the random variable X is normally distributed with mean $\mu$ equal to zero and having unit variance $\sigma^2$. This means the variance, and the std. deviation, tends towards 1 as the sample size *n* stretches to infinity.

<img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/8/8c/Standard_deviation_diagram.svg/525px-Standard_deviation_diagram.svg.png">

### Why do we care about the normal distribution?
The normal distribution is important because of the **Central Limit Theorem**. If you have a population (with unknown mean and variance) and repeatedly take large enough samples, then the distribution of all those samples' means $\bar{X}$ will be approximately normally distributed. The curve will look and more like a bell shape.

## What's a Z-score?
Closely related to the idea of a normal distribution is the unit of measurement called a **Z-Score**.

It describes the distance of an observation from the population mean. Thus, we need to know what the population mean and standard dev. are, in order to use it.

Given an observation, the **Z-Score** is the **number of standard deviations it's away from the mean**. It's a way to compare a single observation to the entire population on average. For example, if we are given one student's test score, it would be useful to know how he scored compared to the rest of the class.

Formula for Z-Score, given observation value x:
$$ Z = \frac{X-\mu}{\sigma} $$




## What is Statistical Significance?
Following an experiment or observation, we want to know if the results are "significant" or relevant. Statistical significance indicates whether a result is really caused by an effect, or if it's just random chance. E.g. do we really know if the more a student chats to the prof, the higher her score will be? Or is it just random chance?

If a statistic, such as sample mean $\bar{x}$, has high significance, then it's very **unlikely** for the null hypothesis to be true.

## What is Significance Level?
The significance level, denoted by $\alpha$, is the probability of rejecting the null hypothesis, *given the null is assumed to be true*. We usually pre-set the significance level to a certain threshold, like 5%, before collecting results.

## What is P-value?
The p-value is the probability of getting an observation that is *as extreme (as small or as large)* as our test statistic, *given that the null is assumed to be true*. Therefore, if we get a reaaally tiny p-value, then it's unlikely that the null hypothesis is true. And a small p-value is a "good thing" if we want to prove our alt. hypothesis.

We say the results of an experiment are "statistically significant" when the p-value is *less than or equal to* the significance level $\alpha$.

In other words, **reject the null hypothesis** in favor of the alternative hypothesis if 
$$ p \leq \alpha $$

<img src="https://www.simplypsychology.org/p-value.png">

By the graphic, we can see the very unlikely situations (if the null were true) to be on the tail ends of the distribution. If the observation falls where the dark green text indicates, then the p-value is represented by that shaded region. Since it's smaller than the significance level (it goes past the 95% threshold), we can say the results are statistically significant.

## Type I vs Type II Error

<img src="https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/sas/sas4-onesamplettest/Figure1.PNG">

If the null hypothesis is true, but we accidentally reject it anyway in favor of the alternative, then we made a **Type I Error**.
If the null hypothesis is false, but we failed to reject it, then we made a **Type II Error**.

> Example: A doctor is evaluating the results of a pregnancy test. Null hypothesis is that the patient is NOT pregnant. If the patient really is NOT pregnant (e.g. ran the test on a man), but we reject the null and say he is pregnant, then we made a Type I error. If the patient IS pregnant (e.g. ran the test on an actual pregnant woman), but we fail to reject the null and we say she is not pregnant, then we made a Type II error.

<img src="https://www.statisticssolutions.com/wp-content/uploads/2017/12/rachnovblog.jpg">

## What is Power?
Power is "kind of like the opposite of significance level". 

**Power is the probability of rejecting the null hypothesis, given the null is false and the alternative is true.**

Power is the same as the probability of avoiding a Type II error. It's the probability of having a true result.

> In the above example, power is "how likely are we to correctly predict a patient is pregnant?". E.g. if power = 99.99%, then it's like our pregnancy test is pretty good.

Real world uses: Power analyses is usually done *before* scientists run an experiment. An estimate of power can tell us how large our sample size needs to be in order to conduct a good study.