# Hypothesis Testing

* Statistical hypothesis tests are based a statement called the null hypothesis that assumes nothing interesting is going on between whatever variables you are testing. The exact form of the null hypothesis varies from one type test to another: if you are testing whether groups differ, the null hypothesis states that the groups are the same. For instance, if you wanted to test whether the average age of voters in your home state differs from the national average, the null hypothesis would be that there is no difference between the average ages.
-----------
* The purpose of a hypothesis test is to determine whether the null hypothesis is likely to be true given sample data. If there is little evidence against the null hypothesis given the data, you accept the null hypothesis. If the null hypothesis is unlikely given the data, you might reject the null in favor of the alternative hypothesis: that something interesting is going on. The exact form of the alternative hypothesis will depend on the specific test you are carrying out. Continuing with the example above, the alternative hypothesis would be that the average age of voters in your state does in fact differ from the national average.
-----------
* Once you have the null and alternative hypothesis in hand, you choose a significance level (often denoted by the Greek letter α.). The significance level is a probability threshold that determines when you reject the null hypothesis. After carrying out a test, if the probability of getting a result as extreme as the one you observe due to chance is lower than the significance level, you reject the null hypothesis in favor of the alternative. This probability of seeing a result as extreme or more extreme than the one observed is known as the p-value.
-----------
## T tests
* The T-test is a statistical test used to determine whether a numeric data sample of differs significantly from the population or whether two samples differ from one another.

### One-Sample T-Test
* A one-sample t-test checks whether a sample mean differs from the population mean. Let's create some dummy age data for the population of voters in the entire country and a sample of voters in NYC and test the whether the average age of voters NYC differs from the population:



* H0 : Null Hypothesis - No difference between mean age of population and mean of sample age of NYC voters
* H1 : Alternate Hypothesis - There is a significant difference between mean age of population and mean of sample age of NYC voters
  
  p value = 0.05

In [1]:
import numpy as np
import pandas as pd
import math
import scipy.stats as stats
import matplotlib.pyplot as plt

In [2]:
np.random.seed(6)

pop_age1=stats.poisson.rvs(loc=18,mu=35,size=150000)
pop_age2=stats.poisson.rvs(loc=18,mu=10,size=100000)

pop_age=np.concatenate((pop_age1,pop_age2))

nyc_age1=stats.poisson.rvs(loc=18,mu=30,size=30)
nyc_age2=stats.poisson.rvs(loc=18,mu=10,size=20)

nyc_age=np.concatenate((nyc_age1,nyc_age2))

In [3]:
print("Average age of country ", pop_age.mean())
print("Average age of NYC ", np.mean(nyc_age))

Average age of country  43.000112
Average age of NYC  39.26


* Let's conduct a t-test at a 95% confidence level and see if it correctly rejects the null hypothesis that the sample comes from the same distribution as the population. To conduct a one sample t-test, we can the stats.ttest_1samp() function:

In [4]:
stats.ttest_1samp(a=nyc_age,popmean=pop_age.mean())

Ttest_1sampResult(statistic=-2.5742714883655027, pvalue=0.013118685425061678)

* The t test results shows us that t stats is -2.574 with a p value of 0.013. Since we conducted a t test with 95% confidence interval, we can <b>reject the null hypothesis</b> that there is no difference between the Avg. country mean and NYC age mean
----------------------
* We can also check if t stats i.e. -2.574 lies outside or inside of the quantiles for 95% confidence interval. The right quantile would be 0.975 and left would be 0.025

In [5]:
stats.t.ppf(q=0.025,df=49)

-2.0095752344892093

In [6]:
stats.t.ppf(q=0.975,df=49)

2.009575234489209

* We can clearly see that t statistics lies outside of the quantiles.

### Two-Sample T-Test

* A two-sample t-test investigates whether the means of two independent data samples differ from one another. In a two-sample test, the null hypothesis is that the means of both groups are the same. Let's generate a sample of voter age data for Georgia and test it against the sample we made earlier:

* H0 : Null Hypothesis - No difference between mean of sample age of GA voters and mean of sample age of NYC voters
* H1 : Alternate Hypothesis - There is a difference between mean of sample age of GA voters and mean of sample age of NYC voters
  
  p value = 0.05
  
  
Note - Both of the samples are from the same population

In [7]:
np.random.seed(12)

ga_age1=stats.poisson.rvs(loc=18,mu=33,size=30)
ga_age2=stats.poisson.rvs(loc=18,mu=13,size=20)

ga_age=np.concatenate((ga_age1,ga_age2))

print(ga_age.mean())

42.8


In [8]:
stats.ttest_ind(a=nyc_age,b=ga_age,equal_var=False)

Ttest_indResult(statistic=-1.7083870793286842, pvalue=0.09073104343957748)

* The test yield p value of 0.09 which is greater than p=0.05, we <b>fail to reject</b> the H0 i.e. Null hypothesis

## Paired T-Test

* The basic two sample t-test is designed for testing differences between independent groups. In some cases, we might be interested in testing differences between samples of the same group at different points in time. For instance, a hospital might want to test whether a weight-loss drug works by checking the weights of the same group patients before and after treatment. A paired t-test lets you check whether the means of samples from the same group differ.

In [9]:
np.random.seed(11)

before=stats.norm.rvs(scale=30,loc=250,size=100)

after=before+stats.norm.rvs(scale=5,loc=-1.25,size=100)

In [10]:
weight=pd.DataFrame({'weight_before':before, 'weight_after':after, 'weight_change':after-before})

In [11]:
weight.head()

Unnamed: 0,weight_before,weight_after,weight_change
0,302.483642,305.605006,3.121364
1,241.41781,240.526071,-0.891739
2,235.463046,226.017788,-9.445258
3,170.400443,165.91393,-4.486513
4,249.751461,252.590309,2.838848


In [12]:
weight.describe()

Unnamed: 0,weight_before,weight_after,weight_change
count,100.0,100.0,100.0
mean,250.345546,249.115171,-1.230375
std,28.132539,28.422183,4.783696
min,170.400443,165.91393,-11.495286
25%,230.421042,229.148236,-4.046211
50%,250.830805,251.134089,-1.413463
75%,270.637145,268.927258,1.738673
max,314.700233,316.720357,9.759282


* The summary shows that patients lost about 1.23 pounds on average after treatment. Let's conduct a paired t-test to see whether this difference is significant at a 95% confidence level:

In [13]:
stats.ttest_rel(a=before,b=after)

Ttest_relResult(statistic=2.5720175998568284, pvalue=0.011596444318439857)

* p<0.05 shows that there is significant difference between mean of the weights before and after a drug experiment. That means the drug worked on the sample.