# Hypothesis Testing

## Scenarios

- Chemistry - do inputs from two different barley fields produce different
yields?
- Astrophysics - do star systems with near-orbiting gas giants have hotter
stars?
- Economics - demography, surveys, etc.
- Medicine - BMI vs. Hypertension, etc.
- Business - which ad is more effective given engagement?

![img1](./img/img1.png)

![img2](./img/img2.png)

### Null Hypothesis / Alternative Hypothesis Structure

![img3](./img/img3.png)                  

### The Null Hypothesis

![gmonk](https://vignette.wikia.nocookie.net/villains/images/2/2f/Ogmork.jpg/revision/latest?cb=20120217040244) There is NOTHING, **no** difference.
![the nothing](https://vignette.wikia.nocookie.net/theneverendingstory/images/a/a0/Bern-foster-the-neverending-story-clouds.jpg/revision/latest?cb=20160608083230)

### The Alternative hypothesis

![difference](./img/giphy.gif)

### Error

- TYPE I: False positive rate (incorrectly reject)
- TYPE II: False negative rate (incorrectly fail to reject)

### Choosing the right error rate

- Alpha, α
- Sigma, σ
- Depends on field of study, 0.2 ≤ α ≤ 0.00001

### T-test

Why use it?
- Sometimes the population standard deviation is irrelevant, and sometimes it’s
unknown. (we’ll get to the different types of t-test later)
- Sometimes a sample is too small to be confident that it’s an accurate representation of reality

### T vs Z (again)

A t-test is like a modified z-test:
- Penalize for small sample size - “degrees of freedom”
- Use sample std. dev. s to estimate population σ

![img5](./img/img5.png)

### T and Z in detail
![img4](./img/img4.png)

### T-value table

![img6](./img/img6.png)

### P-Values
![picjellybeans](https://imgs.xkcd.com/comics/significant.png)

### Language of Hypothesis Testing

If p < α : we *reject* the null hypothesis<br>
If p > α : we *fail to reject* the null hypothesis


Language is **important**

### What if the experiment fails?

- Don’t throw out failed experiments
- This methodology, with this data, does not produce significant results
 - More data
 - More time
 - More details

### T-test success recipe

Regardless of the type of t-test you are performing, there are 5 main steps to executing them:

- Set up null and alternative hypotheses

- Choose a significance level

- Calculate the test statistic

- Determine the critical or p-value (find the rejection region)

- Compare t-value with critical t-value to accept or reject the Null hypothesis.

# Question 1
Is this any different from population?
- Population mean = 85
- Sample = [90,100,110]

#### Using `scipi`

In [1]:
from scipy.stats import ttest_1samp
data = [90,100,110]
ttest_1samp(data,85)

Ttest_1sampResult(statistic=2.5980762113533156, pvalue=0.12168993434632014)

We have a sample here that might be different from the population, how do we formulate our hypothesis here? 

H0: 

HA: 

#### Manual implementation

In [16]:
from statistics import stdev

data = [90,100,110]
mu = 85
n = len(data)
s = stdev(data)
df = n-1

t = (np.mean(data)-85)/(s/(n**.5))

Before we start looking at whether we're right or not, we should decide on the level of confidence we want to have to see if we're right or not? 

A good rule of thumb is 95% confidence in general, though there are times when it's important to go lower. 

So here we have 2 degrees of freedom and a 95% CI. Let's pull up our T-table and get working!

We see we have a CV of 2.92, let's run our t value to see if it is high enough to reject the null.

In [17]:
print(t)
print(df)

2.5980762113533156
2


2.598<2.92 FTR

Failing to reject our null means that we cannot confirm the veracity of our initial hypothesis at the confidence level we'd like. 

Since the mean of the sample is higher, we can guess that our sample may have statistically significantly higher prices. 

Since our sample size is so high, we can actually use a T or a Z test as well. Let's work both for some practice!

HA: XBar > mu
H0: XBar <= mu

In [44]:
mu = 180
n = 30
SD = 15
df = n-1
XBar = 205
SSD = 20

t = (205-mu)/(SSD/(n**.5))
Z = (205-mu)/(SD/(n**.5))

Critical Values AT 1% T: 2.756

With Z scores we can use the Critical Region approach and look for the value where the AUC is less than 1%. 
That CV is 2.33

In [45]:
print(t)
print(df)
print(Z)

6.846531968814576
29
9.128709291752768


# Question 3
Given the same data 1, how many more samples would you need to achieve p = 0.01, assuming sample mean and sample std. dev. do not change.

In [33]:
data = [90,100,110]
mu = 85
n = len(data)
s = stdev(data)
df = n-1

t = (np.mean(data)-mu)/(s/(n**.5))

In [34]:
print(t)

2.5980762113533156


In [35]:
for n in range(3,10):
    df = n-1
    t = (100-85)/(s/(n**.5))
    print (df,t)

2 2.5980762113533156
3 3.0
4 3.3541019662496843
5 3.674234614174767
6 3.968626966596886
7 4.242640687119286
8 4.5


You'd need 5 degrees of freedom, n=6.  That's 3 more samples.

Q4

Suppose it is up to you to determine if a certain state (Michigan) receives a significantly different amount of public school funding (per student) than the USA average. You know that the USA mean public school yearly funding is $6800 per student per year, with a standard deviation of $400.

Next, suppose you collect a sample (n = 1000) from Michigan and determine that the sample mean for Michigan (per student per year) is $6873

http://www.mathandstatistics.com/learn-stats/hypothesis-testing/two-tailed-z-test-hypothesis-test-by-hand

H0: Xbar == mu


HA: Xbar != mu

Let's be 90% sure about this.
But because this is two tailed, we have to do it a bit differently.

Both sides summed have a value of 5% so each individual side has half that. So intead of having an alpha of .1, we have one of .05. 

T CV = 1.980


Z CV = 1.65

In [42]:
mu = 6800
Xbar=6873
n = 1000
SD = 400
SSD = 500
df = n-1

t = (Xbar-mu)/(SSD/(n**.5))
Z = (Xbar-mu)/(SD/n**.5)

In [43]:
print(t)
print(df)
print(Z)

4.6169253838458335
999
5.771156729807292
