## 14 - Hypothesis Testing

**Null Hypothesis:** Status Quo

**Alternative Hypothesis:** Challenging the status quo

### One-Sample t-Test
A one-sample t-test checks whether a sample mean differs from the population mean.

### Two-Sample t-Test
A two-sample t-test investigates whether the means of two independent data samples differ from one another.
In a two-sample test, the null hypothesis is that the means of both groups are the same.

In [93]:
import math
import scipy.stats as stats
import numpy as np

In [94]:
x1 = stats.norm.rvs(10000, 1000, 50)
x2 = stats.norm.rvs(1000, 100, 50)
x3 = stats.norm.rvs(1000, 100, 50)
print(x1)
print(x2)
print(x3)

[10012.93726395 10646.05303011 10602.09391472  9566.47452246
  8737.50314446  9537.69081156 11169.32011653  9937.24196242
 11422.28457801  8600.09098033 10267.35633433 11590.55969822
  9284.62912847  9493.88322116  9497.84457499 10295.33751743
 10768.61404061  8648.70483576 10885.76257612 10118.92413997
  9852.53934703 10376.87074406  8464.44907397  9822.92906336
 10114.01781819  9302.00167677 10898.37713412  9429.77542658
 10859.02559281  9734.85263932 10927.7268361  10822.35843428
  9496.91901979 10459.68918736 10418.33303232 10728.59966894
 11564.98999388  9593.41788357  8293.40101729  9795.9368847
 10786.21063232  9780.2416118  10152.74013002 10229.90584579
  9467.80085343  8777.71007743 11551.61890546 11027.11888246
 10936.27621954  9393.12163097]
[ 880.46003368 1005.48804137 1144.55274657  724.2569291   876.26535344
 1071.80166669 1123.94198282 1036.75850259  833.3455589  1195.12810584
 1075.41073142 1068.35456357 1125.96762312  964.2433148   853.27033606
 1033.28173762 1005.6825

In [95]:
print(x1.mean())
print(x2.mean())
print(x3.mean())

10082.845233105245
987.0211604581028
977.8476924112699


### One Sample t-Test

H0: the mean of population that x1 sampled from is 0

Ha: the mean of population that x1 sampled from is not 0


In [96]:
stats.ttest_1samp(x1, 0)

Ttest_1sampResult(statistic=84.1104184713285, pvalue=1.1879218277615648e-54)

p-value is the probability of having the sameple under the hull hypothese.
If the population mean is 0 (null hypothesis), then the chance for the sample to have a mean of 9770 is very very slim - almost impossible.

### Two Sample Test - x1 vs x2

H0: The populations that X1 and X2 were sampled from have the same mean
Ha: The populations that X1 and X2 were sampled from have different means

In [97]:
stats.ttest_ind(x1,x2)

Ttest_indResult(statistic=75.2982688917182, pvalue=1.5464486135779753e-88)

Very small tiny p-value, so reject the null hypothesis and accept the alternative hypothesis.

### Two Sample Test - x2 vs x3

H0: The populations that X2 and X3 were sampled from have the same mean
Ha: The populations that X2 and X3 were sampled from have different means

In [98]:
stats.ttest_ind(x2, x3)

Ttest_indResult(statistic=0.44192131789576433, pvalue=0.6595198044117564)

Very large p-value, so unable to reject the null hypothesis.

### Two Sample Test - x4 vs x5 

Let's make the population means different (1000 vs 1005)

H0: The populations that X4 and X5 were sampled from have the same mean
Ha: The populations that X4 and X5 were sampled from have different means

In [99]:
x4 = stats.norm.rvs(1000, 100, 50)
x5 = stats.norm.rvs(1005, 100, 50)
stats.ttest_ind(x4,x5)

Ttest_indResult(statistic=-0.6165788219522748, pvalue=0.5389422757221363)

Still relatively large p-value, so unable to reject the null hypothesis. 
Even though the population means are different but the difference is not statistically significant.

### Two Sample Test - x4 vs x5 

Let's make the population means more different (1000 vs 1025)

H0: The populations that X4 and X5 were sampled from have the same mean
Ha: The populations that X4 and X5 were sampled from have different means

In [100]:
x4 = stats.norm.rvs(1000, 100, 50)
x5 = stats.norm.rvs(1020, 100, 50)
stats.ttest_ind(x4,x5)

Ttest_indResult(statistic=-3.035035317110791, pvalue=0.003080086448108852)

The p-value is smaller but still greater than 0.05 that standard used in research. We will not reject the null hypothesis. We conclude the populations are not significantly different.

### Two Sample Test - x4 vs x5 

Let's make the population means larger different (1000 vs 1050) - 5% difference

H0: The populations that X4 and X5 were sampled from have the same mean
Ha: The populations that X4 and X5 were sampled from have different means

In [101]:
x4 = stats.norm.rvs(1000, 100, 50)
x5 = stats.norm.rvs(1050, 100, 50)
stats.ttest_ind(x4,x5)

Ttest_indResult(statistic=-1.2924524938869988, pvalue=0.19923961028448847)

Now, we have a p-value that is less than standard 0.05.
We can reject the null hypothesis and state that the means of the two populations are the same.

### The End