# Hypothesis Testing

The purpose of the test is to tell if there is any significant difference between two data sets.



## Overview

This module covers,

1) One sample and Two sample t-tests

2) ANOVA

3) Type I and Type II errors

4) Chi-Squared Tests

## Question 1 

*A student is trying to decide between two GPUs. He want to use the GPU for his research to run Deep learning algorithms, so the only thing he is concerned with is speed.*

*He picks a Deep Learning algorithm on a large data set and runs it on both GPUs 15 times, timing each run in hours. Results are given in the below lists GPU1 and GPU2.*

In [1]:
from scipy import stats 
import numpy as np
import pandas as pd

In [2]:
from scipy.stats             import ttest_1samp,ttest_ind, wilcoxon
from statsmodels.stats.power import ttest_power
import matplotlib.pyplot     as     plt
import math

%matplotlib inline

In [3]:
GPU1 = np.array([11,9,10,11,10,12,9,11,12,9,11,12,9,10,9])
GPU2 = np.array([11,13,10,13,12,9,11,12,12,11,12,12,10,11,13])

#Assumption: Both the datasets (GPU1 & GPU 2) are random, independent, parametric & normally distributed

In [4]:
print(GPU1.size)
print(GPU2.size)


15
15


Hint: You can import ttest function from scipy to perform t tests 

**First T test**

*One sample t-test*

Check if the mean of the GPU1 is equal to zero.
- Null Hypothesis is that mean is equal to zero.
- Alternate hypothesis is that it is not equal to zero.

* $H_0$: $\mu$ = 0
* $H_A$: $\mu$ > 0 || $H_A$: $\mu$ < 0

### Considering the data is normally distributed

In [5]:
N=GPU1.size
alpha=0.05
t_statistic, p_value = ttest_1samp(GPU1, 0)
print(t_statistic,p_value)
if p_value<alpha:
    print("Reject Null ,i.e- Mean of GPU1 is not equal to zero")
else:
    print('Fail to reject null,i.e- Mean of GPU1 is equal to zero')

34.056241516158195 7.228892044970457e-15
Reject Null ,i.e- Mean of GPU1 is not equal to zero


### Power of test

In [6]:
delta=(np.mean(GPU1)-0)/np.std(GPU1,ddof=1)
delta

8.79328374843322

In [7]:
print(ttest_power(delta,nobs=GPU1.size,alpha=alpha,alternative='two-sided'))

1.0


Hence 100% chances to reject null

## Question 2

Given,

Null Hypothesis : There is no significant difference between data sets

Alternate Hypothesis : There is a significant difference

*Do two-sample testing and check whether to reject Null Hypothesis or not.*

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

* $H_0$: $\mu1$ = $\mu1$
* $H_A$: $\mu1$ != $\mu2$

### Considering data is[Independent] Parameterized/ normally distributed

In [8]:
n1=GPU1.size
n2=GPU2.size
var_GPU1 = GPU1.var(ddof=1)
var_GPU2 = GPU2.var(ddof=1)

In [9]:
#std deviation
s = np.sqrt((var_GPU1 + var_GPU2)/2)
s

1.1812019705529173

In [10]:
## Calculate the t-statistics
t = (GPU1.mean() - GPU2.mean())/(s*np.sqrt(2/N))
t

-2.6276295134718395

In [11]:
## Compare with the critical t-value
#Degrees of freedom
df = (n1-1)+(n2-1)
df

28

In [12]:
#p-value after comparison with the t 
p = 2*(stats.t.cdf(t,df=df))
p

0.013794282041452685

In [13]:
print("t_statistic = " + str(t))
print("p_value = " + str(p))

t_statistic = -2.6276295134718395
p_value = 0.013794282041452685


In [14]:
## Cross Checking with the internal scipy function
t2, p2 = stats.ttest_ind(GPU1,GPU2)
print("t = " + str(t2))
print("p = " + str(p2))
if p2<alpha:
    print("Reject Null ,i.e- Means are not equal")
else:
    print('Fail to reject null,i.e- Mean of GPU1-GPU2 is equal to zero')

t = -2.627629513471839
p = 0.013794282041452725
Reject Null ,i.e- Means are not equal


## Question 3

He is trying a third GPU - GPU3.

In [15]:
GPU3 = np.array([9,10,9,11,10,13,12,9,12,12,13,12,13,10,11])

#Assumption: Both the datasets (GPU1 & GPU 3) are random, independent, parametric & normally distributed

*Do two-sample testing and check whether there is significant differene between speeds of two GPUs GPU1 and GPU3.*

#### Answer:

* $H_0$: $\mu1$ = $\mu3$
* $H_A$: $\mu1$ != $\mu3$

In [16]:
n1=GPU1.size
n2=GPU3.size
var_GPU1 = GPU1.var(ddof=1)
var_GPU3 = GPU3.var(ddof=1)
s = np.sqrt((var_GPU1 + var_GPU3)/2)
t = (GPU1.mean() - GPU3.mean())/(s*np.sqrt(2/N))
df = (n1-1)+(n2-1)
p = 2*(stats.t.cdf(t,df=df))
print("t_statistic = " + str(t))
print("p_value = " + str(p))

t_statistic = -1.4988943759093303
p_value = 0.14509210993138993


In [17]:
## Cross Checking with the internal scipy function
t2, p2 = stats.ttest_ind(GPU1,GPU3)
print("t = " + str(t2))
print("p = " + str(p2))
if p2<alpha:
    print("Reject Null ,i.e- Means are not equal")
else:
    print('Fail to reject null,i.e- Mean of (GPU1-GPU3) is equal to zero . i.e - Means are equal(This means both GPUs are effectively the same speed.)')

t = -1.4988943759093303
p = 0.14509210993138993
Fail to reject null,i.e- Mean of (GPU1-GPU3) is equal to zero . i.e - Means are equal(This means both GPUs are effectively the same speed.)


## ANOVA

## Question 4 

If you need to compare more than two data sets at a time, an ANOVA is your best bet. 

*The results from three experiments with overlapping 95% confidence intervals are given below, and we want to confirm that the results for all three experiments are not significantly different.*

But before conducting ANOVA, test equality of variances (using Levene's test) is satisfied or not. If not, then mention that we cannot depend on the result of ANOVA

In [18]:
import numpy as np

e1 = np.array([1.595440,1.419730,0.000000,0.000000])
e2 = np.array([1.433800,2.079700,0.892139,2.384740])
e3 = np.array([0.036930,0.938018,0.995956,1.006970])

#Assumption: All the 3 datasets (e1,e2 & e3) are random, independent, parametric & normally distributed

Perform levene test on the data

The Levene test tests the null hypothesis that all input samples are from populations with equal variances. Levene’s test is an alternative to Bartlett’s test bartlett in the case where there are significant deviations from normality.

source: scipy.org

#### Answer:

Ho- Variance are equal

Ha- Variance are not equal

In [19]:
from scipy.stats import levene
print(levene(e1,e2))
print(levene(e1,e3))
print(levene(e2,e3))
print(levene(e1,e2,e3))

LeveneResult(statistic=2.502047337898582, pvalue=0.16478397645307163)
LeveneResult(statistic=4.721691050668699, pvalue=0.07276576314239869)
LeveneResult(statistic=1.1400525071736345, pvalue=0.32670615872898084)
LeveneResult(statistic=2.6741725711150446, pvalue=0.12259792666001798)


As p value is higher than 0.05, we fail to reject null Hypothesis which means that the e1,e2 and e3 pass equality of variance test. So, we can perform ANOVA test.

## Question 5

The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.

use stats.f_oneway() module to perform one-way ANOVA test

* $H_0$: $\mu1$ = $\mu2$ = $\mu3$
* $H_A$: At least one $\mu$ differs 

In [34]:
stats.f_oneway(e1,e2,e3)


F_onewayResult(statistic=2.51357622845924, pvalue=0.13574644501798466)

As p value is higher than 0.05, we fail to reject null Hypothesis. (i.e.- the means are equal)

## Question 6

*In one or two sentences explain about **TypeI** and **TypeII** errors.*

#### Answer:

**Type I** error is the error when we reject the null hypothesis where the Null hpothesis is actually true,also known as a "false positive" or "false hit"

For example- We say that the cancer reports came in negative where infact the cancer is present


**Type II** error is the error when we accept the null hypothesis where the Null hypothesis is actually false,also known as a "false negative" or "miss"

For example- There is no cancer but the report came in positive

## Question 7 

You are a manager of a chinese restaurant. You want to determine whether the waiting time to place an order has changed in the past month from its previous population mean value of 4.5 minutes. 
State the null and alternative hypothesis.

#### Answer:


The null hypothesis is that the population mean has not changed from its previous value of 4.5 minutes
* $H_0$: $\mu$ = 4.5

the alternative hypothesis is that the population mean is not 4.5 minutes
* $H_A$: $\mu$ != 4.5

## Chi square test

## Question 8

Let's create a small dataset for dice rolls of four players

In [21]:
import numpy as np

d1 = [5, 8, 3, 8]
d2 = [9, 6, 8, 5]
d3 = [8, 12, 7, 2]
d4 = [4, 16, 7, 3]
d5 = [3, 9, 6, 5]
d6 = [7, 2, 5, 7]

dice = np.array([d1, d2, d3, d4, d5, d6])

run the test using SciPy Stats library

Depending on the test, we are generally looking for a threshold at either 0.05 or 0.01. Our test is significant (i.e. we reject the null hypothesis) if we get a p-value below our threshold.

For our purposes, we’ll use 0.01 as the threshold.

use stats.chi2_contingency() module 

This function computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table

Print the following:

- chi2 stat
- p-value
- degree of freedom
- contingency



In [22]:
from scipy.stats import chisquare,chi2_contingency
chi2_stat, p_val, dof, ex=chi2_contingency(dice)

In [23]:
print("===Chi2 Stat===")
print(chi2_stat)
print("\n")
print("===Degrees of Freedom===")
print(dof) #So we take (6–1) and multiply by (4–1) to get 15 degrees of freedom
print("\n")
print("===P-Value===")
print(p_val)
print("\n")
print("===Contingency Table===")
print(ex)

===Chi2 Stat===
23.315671914716496


===Degrees of Freedom===
15


===P-Value===
0.07766367301496693


===Contingency Table===
[[ 5.57419355  8.20645161  5.57419355  4.64516129]
 [ 6.50322581  9.57419355  6.50322581  5.41935484]
 [ 6.73548387  9.91612903  6.73548387  5.61290323]
 [ 6.96774194 10.25806452  6.96774194  5.80645161]
 [ 5.34193548  7.86451613  5.34193548  4.4516129 ]
 [ 4.87741935  7.18064516  4.87741935  4.06451613]]


In [24]:
if p_val>0.01:
    print('We have not met the threshold for statistical significance (i.e. Fail to reject null hypothesis)')
else:
    print(' Test is significant (i.e. Reject the null hypothesis)')

We have not met the threshold for statistical significance (i.e. Fail to reject null hypothesis)


## Question 9

### Z-test

Get zscore on the above dice data using stats.zscore module from scipy. Convert zscore values to p-value and take mean of the array.

In [25]:
z_score=stats.zscore(dice)
z_score

array([[-0.46291005, -0.18884739, -1.83711731,  1.44115338],
       [ 1.38873015, -0.64208114,  1.22474487,  0.        ],
       [ 0.9258201 ,  0.7176201 ,  0.61237244, -1.44115338],
       [-0.9258201 ,  1.62408759,  0.61237244, -0.96076892],
       [-1.38873015,  0.03776948,  0.        ,  0.        ],
       [ 0.46291005, -1.54854863, -0.61237244,  0.96076892]])

In [26]:
p_values1 = stats.norm.sf(abs(z_score)) #one-sided

p_values2 = stats.norm.sf(abs(z_score))*2 #twosided

print(p_values1,'\n\n',p_values2)
print('\n\n')
print('One-sided',p_values1.mean(),'\nTwo-sided',p_values2.mean())

[[0.32171442 0.42510621 0.03309629 0.07477068]
 [0.08245741 0.26041025 0.11033568 0.5       ]
 [0.17726974 0.23649578 0.27014569 0.07477068]
 [0.17726974 0.05217856 0.27014569 0.16833418]
 [0.08245741 0.48493574 0.5        0.5       ]
 [0.32171442 0.06074513 0.27014569 0.16833418]] 

 [[0.64342884 0.85021243 0.06619258 0.14954135]
 [0.16491482 0.5208205  0.22067136 1.        ]
 [0.35453948 0.47299156 0.54029137 0.14954135]
 [0.35453948 0.10435712 0.54029137 0.33666837]
 [0.16491482 0.96987148 1.         1.        ]
 [0.64342884 0.12149026 0.54029137 0.33666837]]



One-sided 0.23428473233691496 
Two-sided 0.4685694646738299


## Question 10

A Paired sample t-test compares means from the same group at different times.

The basic two sample t-test is designed for testing differences between independent groups. 
In some cases, you might be interested in testing differences between samples of the same group at different points in time. 
We can conduct a paired t-test using the scipy function stats.ttest_rel(). 

In [27]:
before= stats.norm.rvs(scale=30, loc=100, size=500) ## Creates a normal distribution with a mean value of 100 and std of 30
after = before + stats.norm.rvs(scale=5, loc=-1.25, size=500)

Test whether a weight-loss drug works by checking the weights of the same group patients before and after treatment using above data.

##### Hypothesis

ho- before=after i.e, drug does not work

ha- before(wt) != after(wt) i.e , drug works

In [28]:
p_mean=100
sd=30
n1=before.size
n2=after.size

In [29]:
var_before = before.var(ddof=1)
var_after = after.var(ddof=1)
s = np.sqrt((var_before + var_after)/2)
t = (before.mean() - after.mean())/(s*np.sqrt(2/N))
df = (n1-1)+(n2-1)


In [30]:
## Cross Checking with the internal scipy function
from scipy.stats import ttest_rel
t2, p2 = ttest_rel(a=before,b=after)
print("t = " + str(t2))
print("p = " + str(p2))
if p2<alpha:
    print("Reject Null ,i.e- Weight-Loss drug works")
else:
    print('Fail to reject null,i.e- Weight=Loss drug does not work (Means are equal)')

t = 6.274248779561384
p = 7.628846581354597e-10
Reject Null ,i.e- Weight-Loss drug works


In [31]:
# Power of test
delta=(before.mean()-after.mean())/(np.sqrt(((n1-1)*var_before)+(n2-1)*var_after)/(n1+n2-2))
print(delta)
print(ttest_power(delta,nobs=500,alpha=0.05,alternative='two-sided'))


1.5122260954892006
1.0
