# Hypothesis Testing

The purpose of the test is to tell if there is any significant difference between two data sets.



## Overview

This module covers,

1) One sample and Two sample t-tests

2) ANOVA

3) Type I and Type II errors

4) Chi-Squared Tests

## Question 1 

*A student is trying to decide between two GPUs. He want to use the GPU for his research to run Deep learning algorithms, so the only thing he is concerned with is speed.*

*He picks a Deep Learning algorithm on a large data set and runs it on both GPUs 15 times, timing each run in hours. Results are given in the below lists GPU1 and GPU2.*

In [1]:
from scipy import stats as stats
import numpy as np

In [2]:
GPU1 = np.array([11,9,10,11,10,12,9,11,12,9,11,12,9,10,9])
GPU2 = np.array([11,13,10,13,12,9,11,12,12,11,12,12,10,11,13])

#Assumption: Both the datasets (GPU1 & GPU 2) are random, independent, parametric & normally distributed

Hint: You can import ttest function from scipy to perform t tests 

**First T test**

*One sample t-test*

Check if the mean of the GPU1 is equal to zero.
- Null Hypothesis is that mean is equal to zero.
- Alternate hypothesis is that it is not equal to zero.

In [3]:
from    scipy.stats             import  ttest_1samp,ttest_ind, wilcoxon, ttest_ind_from_stats
#import  scipy.stats             as      stats    #Already imported.
from    statsmodels.stats.power import  ttest_power
import  matplotlib.pyplot       as      plt

In [4]:
#Given above assumption that GPU1 is random, independent, parametric and normally distributed.
print(ttest_1samp(GPU1,0))
#Pvalue is less than 0.05, hence reject Null hypothesis
#print(wilcoxon(GPU1-0)) #this is for nonparametric test.

Ttest_1sampResult(statistic=34.056241516158195, pvalue=7.228892044970457e-15)


Pvalue is less than 0.05, hence reject Null hypothesis, Mean of GPU1 is not equal to zero.

## Question 2

Given,

Null Hypothesis : There is no significant difference between data sets

Alternate Hypothesis : There is a significant difference

*Do two-sample testing and check whether to reject Null Hypothesis or not.*

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

In [5]:

t_statistic, p_value  =  ttest_ind(GPU1,GPU2)
print('P Value %1.5f' % p_value) 
#p value is 0.014 and it is less than 5% level of significance hence we reject null hypothesis. 
#Average Speed of both the GPU is not the same.

P Value 0.01379


Conclusion: Average Speed of both the GPU is not the same by rejecting null hypothesis.

## Question 3

He is trying a third GPU - GPU3.

In [7]:
GPU3 = np.array([9,10,9,11,10,13,12,9,12,12,13,12,13,10,11])

#Assumption: Both the datasets (GPU1 & GPU 3) are random, independent, parametric & normally distributed

*Do two-sample testing and check whether there is significant differene between speeds of two GPUs GPU1 and GPU3.*

#### Answer:

In [8]:
#H0: speeds of GPU1 = speeds of GPU2
#Ha: Speed(GPU1 <> speed of GPU3)
ttest_ind(GPU1,GPU3)


Ttest_indResult(statistic=-1.4988943759093303, pvalue=0.14509210993138993)

Conclusion:  p > 0.05, failed to reject null hypothesis. There is no significant difference in GPU1 and GPU3.

## ANOVA

## Question 4 

If you need to compare more than two data sets at a time, an ANOVA is your best bet. 

*The results from three experiments with overlapping 95% confidence intervals are given below, and we want to confirm that the results for all three experiments are not significantly different.*

But before conducting ANOVA, test equality of variances (using Levene's test) is satisfied or not. If not, then mention that we cannot depend on the result of ANOVA

In [9]:
import numpy as np

e1 = np.array([1.595440,1.419730,0.000000,0.000000])
e2 = np.array([1.433800,2.079700,0.892139,2.384740])
e3 = np.array([0.036930,0.938018,0.995956,1.006970])

#Assumption: All the 3 datasets (e1,e2 & e3) are random, independent, parametric & normally distributed

Perform levene test on the data

The Levene test tests the null hypothesis that all input samples are from populations with equal variances. Levene’s test is an alternative to Bartlett’s test bartlett in the case where there are significant deviations from normality.

source: scipy.org

#### Answer:

In [10]:
from scipy.stats import levene
Assumption: alpha = 0.05  

levene(e1,e2,e3)
#H0 :  Variances of (e1=e2=e3) Variance of e1,e2 and e3 are equal
#Ha : Variance of e1,e2,e3 are different


LeveneResult(statistic=2.6741725711150446, pvalue=0.12259792666001798)

#conclusion: P value > 0.05 , we will not reject the null hypothesis. Input samples e1,e2,e3 have equal variance.

## Question 5

The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.

use stats.f_oneway() module to perform one-way ANOVA test

H0: μ(e1) = μ(e2) = μ(e3)  # Population mean is same for all 3 groups
HA: Atleast one pair, population mean is unequal.
Assumption: alpha = 0.05  if not given

In [11]:
stats.f_oneway(e1,e2,e3)

F_onewayResult(statistic=2.51357622845924, pvalue=0.13574644501798466)

Conclusion: pvalue (0.135 > 0.05) , population mean is not same for all three given groups. 

## Question 6

*In one or two sentences explain about **TypeI** and **TypeII** errors.*

#### Answer:

TypeI Error :  Rejecting the null hypothesis when it is true. 
               Probability of rejecting null hypothesis when it is true.
               alpha -> Level of significance of test ie critical region. 
               1 - alpha will give the confidence level of test.
TypeII Error:  Accepting the null hypothesis when it is false. 
               Probability of accepting null hypothesis when it is false (Beta). 
               1 - Beta determines the power of test (identify the demarcation between null and alternate hypothesis).

## Question 7 

You are a manager of a chinese restaurant. You want to determine whether the waiting time to place an order has changed in the past month from its previous population mean value of 4.5 minutes. 
State the null and alternative hypothesis.

#### Answer:


H0(Null hyp): No difference in waiting time to place an order.  sample mean (past month mean) = population mean = 4.5
HA(Alt hyp): There is a significant difference in waiting time to place an order . past month mean  (sample mean) <> 4.5 

## Chi square test

## Question 8

Let's create a small dataset for dice rolls of four players

In [15]:
import numpy as np

d1 = [5, 8, 3, 8]
d2 = [9, 6, 8, 5]
d3 = [8, 12, 7, 2]
d4 = [4, 16, 7, 3]
d5 = [3, 9, 6, 5]
d6 = [7, 2, 5, 7]

dice = np.array([d1, d2, d3, d4, d5, d6])

run the test using SciPy Stats library

Depending on the test, we are generally looking for a threshold at either 0.05 or 0.01. Our test is significant (i.e. we reject the null hypothesis) if we get a p-value below our threshold.

For our purposes, we’ll use 0.01 as the threshold.

use stats.chi2_contingency() module 

This function computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table

Print the following:

- chi2 stat
- p-value
- degree of freedom
- contingency



In [16]:
from scipy.stats import chisquare,chi2_contingency

In [17]:
dice  #Assuming that the dice is a contingency table, we push this table to chi2_contingency to get the results

array([[ 5,  8,  3,  8],
       [ 9,  6,  8,  5],
       [ 8, 12,  7,  2],
       [ 4, 16,  7,  3],
       [ 3,  9,  6,  5],
       [ 7,  2,  5,  7]])

In [42]:
contingency = np.array
chi2stat,pvalue,dof,contigency = chi2_contingency(dice)
print("chi2stat :",chi2stat)
print("p-value :",pvalue)
print("Degree_of_freedom:",dof)
print("contingency",contigency)

chi2stat : 23.315671914716496
p-value : 0.07766367301496693
Degree_of_freedom: 15
contingency [[ 5.57419355  8.20645161  5.57419355  4.64516129]
 [ 6.50322581  9.57419355  6.50322581  5.41935484]
 [ 6.73548387  9.91612903  6.73548387  5.61290323]
 [ 6.96774194 10.25806452  6.96774194  5.80645161]
 [ 5.34193548  7.86451613  5.34193548  4.4516129 ]
 [ 4.87741935  7.18064516  4.87741935  4.06451613]]


In [None]:
Conclusion: since p-value (0.08) > 0.01 (given alph - threshold) , we failed to reject null hypothesis.

## Question 9

### Z-test

Get zscore on the above dice data using stats.zscore module from scipy. Convert zscore values to p-value and take mean of the array.

In [95]:
dice

array([[ 5,  8,  3,  8],
       [ 9,  6,  8,  5],
       [ 8, 12,  7,  2],
       [ 4, 16,  7,  3],
       [ 3,  9,  6,  5],
       [ 7,  2,  5,  7]])

In [96]:
zscore = stats.zscore(dice)  #Computing zscore
print(zscore)

[[-0.46291005 -0.18884739 -1.83711731  1.44115338]
 [ 1.38873015 -0.64208114  1.22474487  0.        ]
 [ 0.9258201   0.7176201   0.61237244 -1.44115338]
 [-0.9258201   1.62408759  0.61237244 -0.96076892]
 [-1.38873015  0.03776948  0.          0.        ]
 [ 0.46291005 -1.54854863 -0.61237244  0.96076892]]


In [106]:
p = stats.t.cdf(zscore,df =3)  # t-dist pvalue computation from zscore
p

array([[0.33747058, 0.43113293, 0.08175319, 0.87740306],
       [0.8704713 , 0.28325714, 0.845966  , 0.5       ],
       [0.78858687, 0.73758805, 0.70820859, 0.12259694],
       [0.21141313, 0.89858974, 0.70820859, 0.20377218],
       [0.1295287 , 0.51387787, 0.5       , 0.5       ],
       [0.66252942, 0.10962418, 0.29179141, 0.79622782]])

In [98]:
p_value = stats.norm.cdf(zscore)  #Normalized cdf of each Zscore
print(p_value)

[[0.32171442 0.42510621 0.03309629 0.92522932]
 [0.91754259 0.26041025 0.88966432 0.5       ]
 [0.82273026 0.76350422 0.72985431 0.07477068]
 [0.17726974 0.94782144 0.72985431 0.16833418]
 [0.08245741 0.51506426 0.5        0.5       ]
 [0.67828558 0.06074513 0.27014569 0.83166582]]


In [99]:
print("T-distribution prob value",np.mean(p)) #mean of p value when t dist is considered.
print("Std normalized dist value",np.mean(p_value)) #mean of p value when std normalized distribution is considered.

T-distribution prob value 0.5045832367959608
Std normalized dist value 0.505219434874248


In [100]:
print(np.mean(dice))  #mean of the samples
print(np.mean(zscore))  #mean of samples

6.458333333333333
-3.700743415417188e-17


## Question 10

A Paired sample t-test compares means from the same group at different times.

The basic two sample t-test is designed for testing differences between independent groups. 
In some cases, you might be interested in testing differences between samples of the same group at different points in time. 
We can conduct a paired t-test using the scipy function stats.ttest_rel(). 

In [55]:
before= stats.norm.rvs(scale=30, loc=100, size=500) ## Creates a normal distribution with a mean value of 100 and std of 30
after = before + stats.norm.rvs(scale=5, loc=-1.25, size=500)

Test whether a weight-loss drug works by checking the weights of the same group patients before and after treatment using above data.

In [56]:
before

array([ 87.94943319,  68.02583598,  85.8207894 ,  64.93697595,
       133.79079219,  87.94389603,  42.9323999 , 103.84572305,
       104.69340863,  93.53808116, 114.50188542,  84.98816285,
       113.83154207,  84.3652939 , 128.92809543,  70.63924504,
       121.280334  , 172.61659738, 119.35447785, 134.79174173,
       136.05184803, 100.46230663, 118.2011052 ,  77.15450977,
        94.10591649, 134.41458727, 134.25038213,  96.48420753,
        78.03937592,  70.25505322,  70.08714004,  43.84984463,
       147.03328616, 103.64479204, 106.74228574,  87.92622623,
        58.83835071, 123.41929487, 125.83851227,  96.27033902,
       103.82063822,  58.20557213,  46.2286103 ,  97.24577306,
        69.74483636, 126.4234211 ,  99.83233881, 133.58864291,
        82.16895582, 138.25367406, 110.40929374,  71.32831047,
       127.80447935, 101.63566888, 103.92244353, 140.75331808,
        84.62831171,  65.53857721, 117.1895991 , 108.06492169,
        78.0281685 ,  85.45640096,  79.45612277, 144.40

In [57]:
after

array([ 84.96987846,  69.16134557,  83.84015267,  61.26211037,
       138.43296269,  99.25689455,  39.67783642, 101.24682162,
       102.00852096,  88.65667944, 101.0703703 ,  85.71183481,
       110.66930881,  77.1945676 , 119.88733437,  78.2271675 ,
       120.84262402, 170.26190025, 129.41188314, 138.50571195,
       135.53979545,  94.83524438, 126.15698082,  80.22533522,
        91.03429351, 124.39781471, 129.24711112,  99.77467738,
        82.45056561,  61.16654103,  65.13734539,  37.73162547,
       151.42269974,  99.7381904 , 108.92447244,  90.75773484,
        59.3881114 , 124.27280514, 121.78704747, 100.68180973,
       111.22067229,  60.48445894,  41.11175326,  97.28598071,
        66.05871861, 123.87111919,  99.33259945, 133.93039231,
        82.59586805, 139.14404008, 108.99309216,  74.4181836 ,
       126.30297275, 105.35411632,  99.61174807, 135.96997383,
        87.62011237,  74.20784586, 106.44798884, 100.68354436,
        67.9294092 ,  83.66164792,  80.4666168 , 140.25

H0 : No difference in the samples of same group at different point of time.
Ha : There is a significant difference in samples at different point of time.

In [58]:
stats.ttest_rel(before,after)

Ttest_relResult(statistic=7.044291534446359, pvalue=6.21672425442972e-12)

Conclusion: It is evident from p value(rejecting null hypothesis as 0.0002 < 0.05) that there is a significant difference in sample at different point of time.
Weight loss drug works well and there is significant reduction in weight. 