# Hypothesis Testing

> - Overview of Hypothesis Testing
> - Bayesian approach to Hypothesis Testing
> - An example of hypothesis testing involving coint-tossing

--- 

### Hypothesis Testing

A **hypothesis** is a statment about a population parameter.

We create two hypothesis:
- The **null hypothesis** $H_0$ 
- The **alternative hypothesis** $H_1$ or $H_a$

We decide which one to call the null depending on how the problem is set up

### Hypothesis testing: Decision Rules

A **hypothesis testing precedure** gives us a rule to decide: 
- For which values of the test statistic do we **accept** $H_0$
- For which values of the test statistic do we **reject** $H_0$ and accept $H_1$

You may hear some people say that you can reject $H_0$ but that you never accept $H_0$. 

Here, this doesn't matter very much, since we're using hypthesis testing in order to decide which of two paths to take in the project.

### Hypothesis Testing: Bayesian Approach

In the **Bayesian approach** (example to follow), we don't get a decision bounday. Instead, we get a **posterior probability** that $H_0$ is true.


### Coin tossing example

You have two coins: 
- Coin 1 has a 70% probability of coming up heads
- Coin 2 has a 50% probability of coming up heads

pick one coin without looking. 

Toss the coin 10 times and record the number of heads.

Given the number of heads you see, which of the two coins did you toss?
---

### Coin Tossing Example: Likelihood Ratio

Given what we know about coins 1 and 2, we can make a table of the probability of seeing x heads out of 10 tosses.  

![image.png](attachment:image.png)

We can now calculate a likelihood ratio, based on the number of heads we saw, when tossing the unidentified coin.

![image-2.png](attachment:image-2.png)

Suppose we saw three heads.

- P1(3)/P2(3) = 0.117/0.009 = 13
- Coin 1 was 13 times more likely to give us the output (3 heads than coin 2)
- We call this the **likelihood ratio**.

---

### Hypothesis Testing: Bayesian Interpretation

In the Bayesian interpretation, we need **priors** for each hypothesis.

- In this case, we randomly chose the coin to flip
- P($H_1$ = we chose coin 1) = 1/2 and 
- P($H_2$ = we chose coin 2) = 1/2

Since we have no way, before seeing the data, to determine the coin that was chose, we just assign 1/2 to each.

Prios P($H_1$) = 1/2 = P($H_2$) = 1/2

Updating prios after seeing the data 3 heads (Bayes' Rule):

![image-3.png](attachment:image-3.png)

The priors are multiplied by the likelihood ratio, which does not depend on the priors.

The likelihood ratio tells us how we should update the priors in reaction to seeing a given set of data.

---

### Type I vs Type II Errors

> - Hypothesis testing terminology including Type I and Type II errors
> - Examples of Hypothesis Tests in practice

### Neyman-Pearson Interpretation

The **Neyman-Pearson paradigm**(1993) is non-Bayesian.

This gives an up or down vote on $H_0$. vs $H_1$.

Terminology: 

![image-4.png](attachment:image-4.png)

**Power** of a test = 1 - P(Type II error)

---
### Example : Customer Churn

Customer **churn** occurs when a customer leaves a company.

Data related to churn may include a target variable for whether or not the customer left.

Features could include: 
    - The lenght of time as a customer
    - The tyope and amount purchased
    - Other Customer characteristics

Chrun prediction is often approached by predicting a score for individuals that estimates the probability the probability the customer will leave. 

### Customer Churn: Type I vs Type II Errors

Suppose we use data on customer characteristics to predict who will churn over the next year.

In our data, customer who have been with the company for longer are less likely to churn.

This could be due to an underlaying effect, or due to chance:

- A **Type I error** occurs when this effect is due to chance, bue we find it to be significant in the model. 

- A **Type II error** occurs when we ascribe the effect to chance, but the effect is non-coincidental.


### Hypothesis Testing: Terminology

The likelihood ratio is called a **test statistic**: we use it to decide whether to accept/reject $H_0$.

The **rejection region**: is the set of values of the test statistic that lead to rejection of $H_0$.

The **acceptance region**: is the set of values of the test statistic that lead to acceptance of $H_0$.

The **null distribution**: is the test statistic's distribution when the null is true.

### Hypothesis Testing: Marketing Intervention

Testing marketing interventions affectiveness:
- For a new direct mail marketing campaign to existing customers, the null hypothesis ($H_0$), suggests the campaign does not impact purchasing. 
- The alternative hypothesis ($H_1$) suggests it has an impact.

### Hypothesis Testing: Website Layout

Testing a change in website layout:
- For a proposed change to a web layoutm we may test a null hypothesis ($H_0$) that the change has no impact on traffic.
- Here, we would look for evidence to reject the null in favor of an alternative hypothesis ($H_1$) that there is an impact on traffic.

### Hypothesis Testing: Product Quality/Size

Testing whether  a product meets expected size thresholds:
- Suppose a product is produced in various factories, with expected size S
- To confirm that the product size meets the standard within a margin of error, the company might: 
    - Randomly sample from each production source, 
    - Establish $H_0$ (product size is not significantly different from S),
    - and $H_1$ (There is a significantly diviation in product size),
    - test whether $H_0$ can be rejected in favor of $H_1$., based on the observed mean and standard deviation.

--- 
### Significance Level and P-Values
> - Hypothesis testing: Significance level and P-values
> - Power and sample size considerations 

### Significant Level and P-Values

We know the distribution of the null hypothesis.

To get a rejection region, we calculate the test satistic. 

We will choose, before testing the data, the level at which we will reject the null hypothesis.

A **Significance Level ($\alpha$)** is the probability threshold below which the null hypothesis will be rejected.

We must choose and $\alpha$ before computing the test statistic.
If we don't, we might be accused of **p-hacking**.

Choosing $\alpha$ is a somewhat arbitrary, but often .01 or .05.

Import terminology:

- The **p-value**: smallest significance level at which the null hypothesis would be rejected.
- The **confidence interval**: the values of statistic for which we accept the null. 

![image-5.png](attachment:image-5.png)

### Coin Tossing Example: P-Value

![image-6.png](attachment:image-6.png)

Suppose we saw three heads in rolls:

- P(3 or less heads) = P(0 heads) + P(1 head) + P(2 heads) + P(3 heads) = ~ 17%
- Under the null hypothesis, a value this extreme occurs 17% of the time.

### Hypothesis Testing: Coin Tossing

In the coin tossing example:
- $H_0$: The coin is fair and P(H) = 0.5
- $H_1$: The coin is not fair and P(H) < 0.5

How can we test the null hypothesis if we observe 3 heads in 10 flips?

Testing the null hypothesis:
- We know $H_0$ is distributed binom(10, 0.5)
- Choose a **p-value cutoff** (more on p-value soon), say 5%
- Calculate the CDF of 3 heads from a binom(10, 0.5).
- CDF = 17.1% (above our cuttof)
- This is > 5%, so we don't reject $H_0$.

---

### F-Statistic

$H_0$: The data can be modeled by setting all betas to zero.

Reject the null if the p-value is small enough.

![image-7.png](attachment:image-7.png)

### Power and Sample Size

If you do many 5% significance tests looking for a significant result, the chances of making at leats one type I error increase. 

Probability of at least one type I error is approximately = 1 - (1 - 0.05)#tests

This is roughly 0.05 x (# tests), if you have 10 or fewer tests.

![image-8.png](attachment:image-8.png)

### Power: Bonferroni Correction

The **Bonferroni Correction**: Says "choose $p_threshold$ so that the probability of making a Type I error (assuming no effect) is 5%."

Typically choose: $p_threshold$ = 0.05/(#tests)

Bonferroni correction allows the probability of a type I error to be controlled, but at the cost of power.

Effects either need to be larger or the tests need larger sample, to be detected. 

Best practice is to limit number of comparisons done to a few well-motivated cases. 

---

### Correlation vs Causation 

> - Correlation vs Causation
> - Confounding variables
> - Examples of spurios correlations

### Does is Rain More on Cooler Days?

We associate rain with cold weather.

Doest it actually rain more when days are cooler?

Maybe it depends on where you are. 

Some places have summer monsoons, so maybe as it gets warmer there, it rains more. 

- Warmer weather increases evaporation, which can increase humidity.
   - In warm weather, there is water in the air to form precipitation.
   - This mechanism would suggest warmer weather --> more rain. 

- Cooler weather decreases dew point (i.e air can hold less water).
    - This suggests if humid air enters the air and cools, it will rain.
    - This mechanism would suggest cooler weather --> more rain.

### How correlations are important 

If two variables X and Y are correlated, then X is useful for predicting Y.

If we are trying to model Y, and we find things that correlated wth Y, we may improve the modelling.

We should be careful about changing X with the hope of changing Y.

X and Y can be correlated for many reasons:

    - X causes Y (what we want)
    - Y causes X (mixing up cause-and-effect)
    - X and Y are both caused by something else (confounding)
    - X and Y aren't really related, we just got unlucky in the sample (spurious)

### Mixing up cause and effect 

1. Student test scores are **positively correlated** with amount of time studied. 

This **doesn't** mean we should get students to study more by curving everyone's grades upward (this would likely have the opposite effect!). It is more likely that studying helps students learn material, so studying causes better performance. 

2. Customer satisfaction is **negatively correlated** with customer service call volume. 

This **doesn't** mean that we should remove or hide the customer service numbers, with the hope of improving customer satisfaction. 

### Confounding Variables

A **confounding variable** is something that causes both X and Y to change. 

X and Y are correlated even though X doesn't cause Y, and Y doesn't cause X.

Example of confounding variable:

1. The number of annual car accidents and the number of people named "John" are posively correlated. (both are correlated with **population size**)

2. The amount of ice-cream sold and the number of drownings in a week are positively correlated. (both are positively correlated with **temperature**)

3. Number of factories a chip manufacturer owns and the number of chips sold are positively correlated. (both are driven by **demand from the market**)

### Spurious Correlations 

These are correlations that are just "coincidences" due to the particular sampe, and would probably not hold on longer samples / different samples.

![image-9.png](attachment:image-9.png)

![image-10.png](attachment:image-10.png)
