# Hypothesis Testing

## Scenarios

- Chemistry - do inputs from two different barley fields produce different
yields?
- Astrophysics - do star systems with near-orbiting gas giants have hotter
stars?
- Economics - demography, surveys, etc.
- Medicine - BMI vs. Hypertension, etc.
- Business - which ad is more effective given engagement?

![img1](./img/img1.png)

![img2](./img/img2.png)

### Null Hypothesis / Alternative Hypothesis Structure

<img src="img/img3.png" width=350>

### The Null Hypothesis

There is NOTHING, **no** difference.

### The Alternative hypothesis

![difference](./img/giphy.gif)

### Error

- TYPE I: False positive rate (incorrectly reject)
- TYPE II: False negative rate (incorrectly fail to reject)

### Choosing the right error rate

- Alpha, α
- Sigma, σ
- Depends on field of study, 0.00001 ≤ α ≤ 0.2

### T-test

Why use it?
- Sometimes the population standard deviation is irrelevant, and sometimes it’s
unknown. (we’ll get to the different types of t-test later)
- Sometimes a sample is too small to be confident that it’s an accurate representation of reality

### T vs Z (again)

A t-test is like a modified z-test:
- Penalize for small sample size - “degrees of freedom”
- Use sample std. dev. s to estimate population σ

<img src="img/img5.png" width=500>

### T and Z in detail
<img src="img/img4.png" width=500>

### T-value table

<img src="img/img6.png" width=500>

### P-Values
<img src="https://imgs.xkcd.com/comics/significant.png" width=500>

[Source](https://xkcd.com/882/)

### Language of Hypothesis Testing

If p < α : we *reject* the null hypothesis<br>
If p > α : we *fail to reject* the null hypothesis


Language is **important**

### What if the experiment fails?

- Don’t throw out failed experiments
- This methodology, with this data, does not produce significant results
 - More data
 - More time
 - More details

### T-test success recipe

Regardless of the type of t-test you are performing, there are 5 main steps to executing them:

- Set up null and alternative hypotheses

- Choose a significance level

- Calculate the test statistic

- Determine the critical or p-value (find the rejection region)

- Compare t-value with critical t-value to accept or reject the Null hypothesis.

# Question 1
Is this any different from population?
- Population mean = 85
- Sample = [90,100,110]

#### Using `scipi`

In [1]:
from scipy.stats import ttest_1samp
data = [90, 100, 110]
ttest_1samp(data, 85)

Ttest_1sampResult(statistic=2.5980762113533156, pvalue=0.12168993434632014)

In [5]:
type(Out[1])
result = Out[1]

In [8]:
# one-sided test: 
#     H0
#     H1 = our sample mean is greater than the population mean
data = [90, 100, 110]
ttest_1samp(data, 85)
result.pvalue/2
# ^interpretation for one-sided test: we take pvalue and divide by 2 to see if meets 5% threshold

0.06084496717316007

#### Manual implementation

In [2]:
from statistics import stdev

data = [90,100,110]
mu = 85
n = len(data)
# ^regular n, not dof
s = stdev(data)
# ^sample std dev
df = n-1

t = (100-85)/(s/(n**.5))

In [3]:
print(t)
print(df)

2.5980762113533156
2


# Question 2

I'm buying jeans from store A and store B.  I know nothing about their inventory other than prices. Should I go just one store for a less expensive pair of jeans?
I'm pretty apprehensive about this big decision so alpha = 0.10

Try this both manually and with scipy

- [20,30,30,50,75,25,30,30,40,80]
- [60,30,70,90,60,40,70,40]

In [40]:
from scipy.stats import ttest_ind

In [42]:
store1 = [20,30,30,50,75,25,30,30,40,80]
store2 = [60,30,70,90,60,40,70,40]

In [43]:
ttest_ind(store1, store2, equal_var=False)

Ttest_indResult(statistic=-1.7120298677915535, pvalue=0.10685037968363302)

In [20]:
import numpy as np

In [44]:
from scipy import stats
# np.random.seed(12345678)
# Test with sample with identical means:
rvs1 = stats.norm.rvs(loc=5, scale=10, size=500)
rvs2 = stats.norm.rvs(loc=5, scale=10, size=500)
stats.ttest_ind(rvs1, rvs2)
# ^ even with samples from same distribution, we can get wildly different values

Ttest_indResult(statistic=0.15424515519934606, pvalue=0.8774476248539035)

In [14]:
np.mean(rvs1)

4.900716400695466

In [15]:
np.mean(rvs2)

5.538717619681068

In [None]:
print(t)
print(df)

# Question 3
Given the same data 1, how many more samples would you need to achieve p = 0.01, assuming sample mean and sample std. dev. do not change.

In [16]:
data = [90,100,110]
mu = 85
n = len(data)
s = stdev(data)
df = n-1

t = (100-85)/(s/(n**.5))

In [17]:
print(t)

2.5980762113533156


In [18]:
for n in range(3,10):
    df = n-1
    t = (100-85)/(s/(n**.5))
    print (df,t)

2 2.5980762113533156
3 3.0
4 3.3541019662496843
5 3.674234614174767
6 3.968626966596886
7 4.242640687119286
8 4.5


In [None]:
# we need 3 more observations to reject null.

In [None]:
for n in range(3,10):
    df = n-1
    t = (100-85)/(s/(n**.5))
    print (df,t)

# Using T-tests for hypothesis testing for the means

In [24]:
import pandas as pd
df = pd.read_csv('../day-2-hypothesis-testing/data/WA_Fn-UseC_-Telco-Customer-Churn.csv')

[Link to the dataset](https://www.kaggle.com/blastchar/telco-customer-churn)

__Your Turn__

1. Find how many different values are there in the PaymentMethod column.


In [25]:
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


In [26]:
df.PaymentMethod.value_counts()

Electronic check             2365
Mailed check                 1612
Bank transfer (automatic)    1544
Credit card (automatic)      1522
Name: PaymentMethod, dtype: int64

In [33]:
df.describe()

Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges
count,7043.0,7043.0,7043.0
mean,0.162147,32.371149,64.761692
std,0.368612,24.559481,30.090047
min,0.0,0.0,18.25
25%,0.0,9.0,35.5
50%,0.0,29.0,70.35
75%,0.0,55.0,89.85
max,1.0,72.0,118.75


__Your Turn__

1. Select one of the categories above in PaymentMethod and we will investigate whether this data is statistically significantly different from the national data or not.

2. Suppose we know that nationwide the average monthly average spendings for the service is $70 but we don't know the standard deviation for this data. Construct a hypothesis testing for the case, the certain PaymentMethod is different than the national data.
  - hint: use `scipy.stats.ttest_1samp`

3. In our case we will focus on Payment Method == `'Mailed check'` but you can work with others too.

$H_{a}$: spending by mailed check is different than national average spending
 
$H_{0}$: there is no difference in the average spending between mailed check and national average

$\alpha$: 0.05


In [28]:
sample = df.loc[df.PaymentMethod == 'Mailed check'].MonthlyCharges

In [29]:
ttest_1samp(sample, 70)

Ttest_1sampResult(statistic=-39.79616513656546, pvalue=8.78311106869432e-242)

In [None]:
# We get a pvalue of practically zero, can reject the null hypothesis.

In [30]:
sample.mean()

43.917059553349915

In [32]:
ttest_1samp(sample.sample(10), 70)

Ttest_1sampResult(statistic=-6.252317282738224, pvalue=0.00014922092927042023)

In [None]:
# picking a sample of 10 from the 1612 observations, the p value is still significant to reject null. 

__Your Turn__

1. From the data set 'df' get the rows with SeniorCitizen ==1 and keep them in a variable called seniors

2. Keep other records in a variable called 'others'

3. Check how many observations do we have in each sample

In [35]:
df.SeniorCitizen.value_counts()

0    5901
1    1142
Name: SeniorCitizen, dtype: int64

In [34]:
seniors = df.loc[df.SeniorCitizen == 1]

In [36]:
others = df.loc[df.SeniorCitizen != 1]

In [46]:
seniors.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
20,8779-QRDMV,Male,1,No,No,1,No,No phone service,DSL,No,...,Yes,No,No,Yes,Month-to-month,Yes,Electronic check,39.65,39.65,Yes
30,3841-NFECX,Female,1,Yes,No,71,Yes,Yes,Fiber optic,Yes,...,Yes,Yes,No,No,Two year,Yes,Credit card (automatic),96.35,6766.95,No
31,4929-XIHVW,Male,1,Yes,No,2,Yes,No,Fiber optic,No,...,Yes,No,Yes,Yes,Month-to-month,Yes,Credit card (automatic),95.5,181.65,No
34,3413-BMNZE,Male,1,No,No,1,Yes,No,DSL,No,...,No,No,No,No,Month-to-month,No,Bank transfer (automatic),45.25,45.25,No
50,8012-SOUDQ,Female,1,No,No,43,Yes,Yes,Fiber optic,No,...,No,No,Yes,No,Month-to-month,Yes,Electronic check,90.25,3838.75,No


__Your Turn__

1. Now we would like to compare the MonthlyCharges for Seniors and others. 

I hypothesize that seniors should have lower Monthlycharges average than others. 

2. Write a hypothesis test that checks this claim.


$H_{a}:$ Seniors have lower monthly charges than others

$H_{0}:$ Seniors have the same or higher monthly charges than others

$\alpha:$ 0.05

Now we will test our results by using two sample t_test (`scipy.stats.ttest_ind`).

In [48]:
from scipy.stats import ttest_ind
ttest_ind(seniors.MonthlyCharges, others.MonthlyCharges, equal_var=False)

Ttest_indResult(statistic=22.288279118400933, pvalue=3.826212668910673e-98)

In [None]:
# conclusion: P-value is low enough that we reject null and conclude seniors have lower monthly charges than others.