# Hypothesis Testing

The purpose of the test is to tell if there is any significant difference between two data sets.



## Overview

This module covers,

1) One sample and Two sample t-tests

2) ANOVA

3) Type I and Type II errors

4) Chi-Squared Tests

## Question 1 

*A student is trying to decide between two GPUs. He want to use the GPU for his research to run Deep learning algorithms, so the only thing he is concerned with is speed.*

*He picks a Deep Learning algorithm on a large data set and runs it on both GPUs 15 times, timing each run in hours. Results are given in the below lists GPU1 and GPU2.*

In [167]:
from scipy import stats
from scipy.stats import ttest_1samp
import numpy as np

In [168]:
GPU1 = np.array([11,9,10,11,10,12,9,11,12,9,11,12,9,10,9])
GPU2 = np.array([11,13,10,13,12,9,11,12,12,11,12,12,10,11,13])

#Assumption: Both the datasets (GPU1 & GPU 2) are random, independent, parametric & normally distributed

Hint: You can import ttest function from scipy to perform t tests 

**First T test**

*One sample t-test*

Check if the mean of the GPU1 is equal to zero.
- Null Hypothesis is that mean is equal to zero.
- Alternate hypothesis is that it is not equal to zero.

In [169]:
#Calculation using ttest library
t_stats, p_value = ttest_1samp(GPU1,0)
print("T-Stat : ",t_stats)
print("p-value : ", p_value)

T-Stat :  34.056241516158195
p-value :  7.228892044970457e-15


** Answer : One Sample t-test**
* P-Value is less than 5% of Level of Significance.
* Hence we reject the Null Hypothesis.
* Mean of GPU1 and GPU2 are not the same

## Question 2

Given,

Null Hypothesis : There is no significant difference between data sets

Alternate Hypothesis : There is a significant difference

*Do two-sample testing and check whether to reject Null Hypothesis or not.*

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

In [170]:
# Null Hypothesis -      Ho : µ1 = µ2
# Alternate Hypothesis - Ha : µ1 != µ2

In [171]:
t_stat_2t_ind, p_value_2t_ind = stats.ttest_ind(GPU1, GPU2)
print("2 Sample Independent tStats : ", t_stat_2t_ind)
print("2 Sample Independent pValue : ", p_value_2t_ind)

2 Sample Independent tStats :  -2.627629513471839
2 Sample Independent pValue :  0.013794282041452725


** Answer : Two Sample Indepdent Test **
* PValue for Two-Sample Independent test between GPU1 and GPU2 is less than 5% level of significance
* Hence we reject the NULL hypothesis. 
* This concludes there is a significant difference between GPU1 and GPU2

## Question 3

He is trying a third GPU - GPU3.

* Null Hypothesis -      $H_0$ : $µ1$ = $µ3$
* Alternate Hypothesis - $H_a$ : $µ1$ $\neq$ $µ3$

In [172]:
GPU3 = np.array([9,10,9,11,10,13,12,9,12,12,13,12,13,10,11])

#Assumption: Both the datasets (GPU1 & GPU 3) are random, independent, parametric & normally distributed

*Do two-sample testing and check whether there is significant differene between speeds of two GPUs GPU1 and GPU3.*

#### Answer:

In [173]:
t_stat_2t_ind, p_value_2t_ind = stats.ttest_ind(GPU1, GPU3)
print("2 Sample Independent tStats : ", t_stat_2t_ind)
print("2 Sample Independent pValue : ", p_value_2t_ind)

2 Sample Independent tStats :  -1.4988943759093303
2 Sample Independent pValue :  0.14509210993138993


* pValue for the Two-Sample Independent test between GPU1 and GPU3 is more than 5% level of significance
* Hence we FAIL to reject the NULL hypothesis. 
* This concludes there is NO SIGNIFICANT difference between GPU1 and GPU3

## ANOVA

## Question 4 

If you need to compare more than two data sets at a time, an ANOVA is your best bet. 

*The results from three experiments with overlapping 95% confidence intervals are given below, and we want to confirm that the results for all three experiments are not significantly different.*

But before conducting ANOVA, test equality of variances (using Levene's test) is satisfied or not. If not, then mention that we cannot depend on the result of ANOVA

In [174]:
import numpy as np
import scipy.stats

e1 = np.array([1.595440,1.419730,0.000000,0.000000])
e2 = np.array([1.433800,2.079700,0.892139,2.384740])
e3 = np.array([0.036930,0.938018,0.995956,1.006970])

#Assumption: All the 3 datasets (e1,e2 & e3) are random, independent, parametric & normally distributed

Perform levene test on the data

The Levene test tests the null hypothesis that all input samples are from populations with equal variances. Levene’s test is an alternative to Bartlett’s test bartlett in the case where there are significant deviations from normality.

source: scipy.org

#### Answer: Levene Test

In [175]:
f_stats_lev, p_value_lev = stats.levene(e1,e2,e3)

In [176]:
print('P Value by Levene Test : ', p_value_lev)

P Value by Levene Test :  0.12259792666001798


* PValue is greater than .05 Level of Confidence
* Hence we FAIL to rejcet $H_0$, which means the variance is NOT significantly different among samples

## Question 5

The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.

use stats.f_oneway() module to perform one-way ANOVA test

In [192]:
one_way_stat, one_way_pValue = stats.f_oneway(e1, e2, e3)
print("pValue from One Way ANOVA Test : ", one_way_pValue)

pValue from One Way ANOVA Test :  0.13574644501798466


** Answer: ANOVA Test **

* P Value(0.13574644501798466) is greater than the level of significance of 5%.
* Hence we FAIL to reject $H_0$
* So Mean($\mu$) of the experiment are not different

In [177]:
import pandas as pd
exp_results = pd.DataFrame()
exp1 = pd.DataFrame({"Exp":"1", "Results":e1})
exp2 = pd.DataFrame({"Exp":"2", "Results":e2})
exp3 = pd.DataFrame({"Exp":"3", "Results":e3})

In [185]:
exp_results = exp_results.append(exp1)
exp_results = exp_results.append(exp2)
exp_results = exp_results.append(exp3)

In [179]:
crit = stats.f.ppf(q=1-0.005,dfn=2, dfd=60)
crit

5.794990754114814

In [180]:
import statsmodels.api as sm
from statsmodels.formula.api import ols

In [181]:
model = ols('Results ~ Exp', data=exp_results).fit()
anova_model = sm.stats.anova_lm(model, type=2)
print(anova_model)

           df    sum_sq   mean_sq         F    PR(>F)
Exp       2.0  2.399066  1.199533  2.513576  0.135746
Residual  9.0  4.294994  0.477222       NaN       NaN


## Question 6

*In one or two sentences explain about **TypeI** and **TypeII** errors.*

#### Answer:

** Type I Error: **
* This is called "False Alarm", 
* This happens when $H_0$ is True, but we reject it. 
* Probability of TypeI error occurance is called Alpha($\alpha$).

** Type II Error : **
* This is called a "Missed Oppurtunity"
* This happens when we fail to reject $H_0$, but actually it was false, should have been rejected.
* Since we fail to reject $H_0$, we might have missed some oppurtunities.


## Question 7 

You are a manager of a chinese restaurant. You want to determine whether the waiting time to place an order has changed in the past month from its previous population mean value of 4.5 minutes. 
State the null and alternative hypothesis.

#### Answer:


* $\mu_p$ - Waiting Time for Previous Month
* $\mu_c$ - Waiting Time for Current Month
* 
* $H_0$ : $\mu_p$ = $\mu_c$
* $H_A$ : $\mu_p$ $\neq$ $\mu_c$

## Chi square test

## Question 8

Let's create a small dataset for dice rolls of four players

In [148]:
import numpy as np

d1 = [5, 8, 3, 8]
d2 = [9, 6, 8, 5]
d3 = [8, 12, 7, 2]
d4 = [4, 16, 7, 3]
d5 = [3, 9, 6, 5]
d6 = [7, 2, 5, 7]

dice = np.array([d1, d2, d3, d4, d5, d6])

run the test using SciPy Stats library

Depending on the test, we are generally looking for a threshold at either 0.05 or 0.01. Our test is significant (i.e. we reject the null hypothesis) if we get a p-value below our threshold.

For our purposes, we’ll use 0.01 as the threshold.

use stats.chi2_contingency() module 

This function computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table

Print the following:

- chi2 stat
- p-value
- degree of freedom
- contingency



In [149]:
dice

array([[ 5,  8,  3,  8],
       [ 9,  6,  8,  5],
       [ 8, 12,  7,  2],
       [ 4, 16,  7,  3],
       [ 3,  9,  6,  5],
       [ 7,  2,  5,  7]])

In [150]:
chi2_stat, p_valu_chi2, dof_chi2, exp_freq_chi2 = stats.chi2_contingency(dice)
print("Chi-Square Stats : ", chi2_stat)
print("Chi-Square pValue : ", p_valu_chi2)
print("Chi-Square DOF : ", dof_chi2)
print("Chi-Square Contigency : ",exp_freq_chi2)

Chi-Square Stats :  23.315671914716496
Chi-Square pValue :  0.07766367301496693
Chi-Square DOF :  15
Chi-Square Contigency :  [[ 5.57419355  8.20645161  5.57419355  4.64516129]
 [ 6.50322581  9.57419355  6.50322581  5.41935484]
 [ 6.73548387  9.91612903  6.73548387  5.61290323]
 [ 6.96774194 10.25806452  6.96774194  5.80645161]
 [ 5.34193548  7.86451613  5.34193548  4.4516129 ]
 [ 4.87741935  7.18064516  4.87741935  4.06451613]]


** Answer Chi-Square Test: **
* P-Value is greater than Level of Significance
* Hence we accept the NULL Hypothesis

## Question 9

### Z-test

Get zscore on the above dice data using stats.zscore module from scipy. Convert zscore values to p-value and take mean of the array.

In [200]:
dice

array([[ 5,  8,  3,  8],
       [ 9,  6,  8,  5],
       [ 8, 12,  7,  2],
       [ 4, 16,  7,  3],
       [ 3,  9,  6,  5],
       [ 7,  2,  5,  7]])

In [201]:
zscores = stats.zscore(dice)

In [203]:
zscores

array([[-0.46291005, -0.18884739, -1.83711731,  1.44115338],
       [ 1.38873015, -0.64208114,  1.22474487,  0.        ],
       [ 0.9258201 ,  0.7176201 ,  0.61237244, -1.44115338],
       [-0.9258201 ,  1.62408759,  0.61237244, -0.96076892],
       [-1.38873015,  0.03776948,  0.        ,  0.        ],
       [ 0.46291005, -1.54854863, -0.61237244,  0.96076892]])

In [211]:
p_values_zscore = stats.norm.sf(zscores)
print('Pvalues from Zscores : ', p_values_zscore)
print('Mean of pValues : ',p_values_zscore.mean())

Pvalues from Zscores :  [[0.67828558 0.57489379 0.96690371 0.07477068]
 [0.08245741 0.73958975 0.11033568 0.5       ]
 [0.17726974 0.23649578 0.27014569 0.92522932]
 [0.82273026 0.05217856 0.27014569 0.83166582]
 [0.91754259 0.48493574 0.5        0.5       ]
 [0.32171442 0.93925487 0.72985431 0.16833418]]
Mean of pValues :  0.49478056512575197


## Question 10

A Paired sample t-test compares means from the same group at different times.

The basic two sample t-test is designed for testing differences between independent groups. 
In some cases, you might be interested in testing differences between samples of the same group at different points in time. 
We can conduct a paired t-test using the scipy function stats.ttest_rel(). 

In [152]:
before= stats.norm.rvs(scale=30, loc=100, size=500) ## Creates a normal distribution with a mean value of 100 and std of 30
after = before + stats.norm.rvs(scale=5, loc=-1.25, size=500)

* $H_0$: $\mu$_after - $\mu$_before = 0
* $H_A$: $\mu$_after - $\mu$_before $\neq$ 0

Test whether a weight-loss drug works by checking the weights of the same group patients before and after treatment using above data.

In [194]:
t_stats_2t_rel, p_value_2t_rel = stats.ttest_rel(after, before)
print('pValue : ', p_value_2t_rel)

pValue :  1.7971029024318626e-10


** Answer: Paired t-Test for Related Data **
* pValue is less than the standard 5% Level of significance (0.05)
* Hence we reject $H_0$
* It proves that there was significant weight-loss after the drug