In [1]:
import math

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from scipy import stats


## Hypothesis Testing

- **alpha**: $\alpha$: 1 - confidence level (95% confidence level -> $\alpha = .05$)
- **null hypothesis**: $H_0$: the "status quo"
- **alternative hyopthesis**: $H_a$: the opposite; alternative

We either *reject* or *fail to reject* the null hypothesis

**p-value**

- P(data|$H_0$)
- The likelihood the we see the evidence at hand under the null hypothesis
- If the null hypothesis is true, the likelihood of observing our data

if $p < \alpha$: we reject $H_0$

if $p >= \alpha$: we fail to reject $H_0$

- Are the average grades in web development vs data science classes different?

    $H_0$: The average grades for webdev and data science are the same.
    
    $H_a$: The average grades for webdev and data science are not the same.
    
- Is there a relationship between how early a student shows up to class and their grade?

    $H_0$: there is no relationship between how early a student comes to class and their grade
    
    $H_a$: there is a relationship between how early a student comes to class and their grade



> Are the plants in classroom helping?

- survey scores
    - $H_0$: Survey scores before and after plants were introduced are no different.
    - $H_a$: Survey scores improved after plants were introduced in the classroom.


- True Positive
    - small p-value -- < alpha (.001)
    - reject $H_0$
    - avg survey score before plants: 3.2
    - avg survey score after plants: 4.5
- False Positive
    - small p-value
    - reject $H_0$
    - we cancelled a quiz right before everyone took the surveys
- True Negative
    - higher p-value
    - fail to reject $H_0$
    - avg survey score before plants: 3.9
    - avg survey score after plants: 4.1
- False Negative
    - higher p-value
    - fail to reject $H_0$
    - avg survey score before plants: 3.5
    - avg survey score after plants: 3.3
    - everyone took the survey when they were in the middle of the tableau project

---

> Do houses with even street numbers sell for more money?


- $H_0$: the property values for even numbered houses are the same or lower than the overall average property value
- $H_a$: the property values for even numberd houses are higher than the overall average property value

- True Positive
    - low p-value
    - reject $H_0$
    - avg property value for even # houses -> \$200,000
    - avg property value overall -> \$150,000
- False Positive
    - low p-value
    - reject $H_0$
    - a significant difference in property values
    - we found the overall average property value from the county website
    - we calculated the average property value for even # houses based on sampling the dominion
- False Negative
    - fail to reject $H_0$
    - high p-value
    - the average house price for even numbered houses is not higher than the overall average
    - for example: response bias on an online survey
- True Negative
    - high p-value
    - fail to reject $H_0$
    - avg price for even # houses: \$145,000
    - overall avg house price: \$140,000

## Exercise Review

2 x 2 categories

- False / True: Whether we concluded the right thing
- Positive / Negative: Whether we concluded there is something (+) or there isn't something hapenning (-)

> Has the network latency gone up since we switched ISPs?

- $H_0$: the network latency is the same or worse since the ISP switch.
- $H_a$: the new ISP has higher latency
- True Positive: We reject $H_0$, our latency is ~ 300ms on average with the new ISP, it was ~50ms
- False Positive: We reject $H_0$, our data says the latency is higher; our data is biased, we took recordings between 6pm and 8pm when everyone is streaming media
- False Negative: We fail to reject $H_0$: we took recordings between 6am and 7am when no one is on the network
- True Negative: Our data says there isn't much difference in latency, we fail to reject $H_0$

> Is the website redesign any good?

- $H_0$: the number of click throughs since the site redesign had remained same or lowered
- $H_a$: there are more click throughs since the site redesign
- TP: reject $H_0$, we conclude that the redesign helped, and it really did
- FP: reject $H_0$, we conclude that the redesign helped, but really it didn't; we only showed the new website to previously engaged customers
- FN: fail to reject $H_0$, we conclude the redesign did not help, but really it did; we only showed the new website to people who provided negative feedback
- TN: fail to reject $H_0$, we conclude the redesign did not help, and it really didn't

> Is our TV ad driving more sales?

We're frito-lay advertising cool ranch doritos.

- $H_0$: Cool ranch doritos sales did not increase when we're running an advertisement
- $H_a$: Sales for cool ranch doritos increase when we're running an ad
- FP: reject $H_0$; we conclude that the advertisement helped sales, but really it didn't
- FN: fail to reject $H_0$; we conclude that the ad did not help, but really it did
- TP: reject $H_0$; we conclude that the ad helped, and it did
- TN: fail to reject $H_0$: we conclude the ad did not boost sales, and really it didn't

> suppose we a production issue while the ad is running -- there's not enough product to sell, sales are low

- FN: conclude that the ad didn't help, but really it did, we just didn't have enough inventory

> suppose there's a global pandemic, and everyone is staying inside while the ad is running. There's increased sales of cool ranch doritos.

- FP: conclude that the ad helped, when really it didn't