# Hypothesis Testing

The purpose of the test is to tell if there is any significant difference between two data sets.



## Overview

This module covers,

1) One sample and Two sample t-tests

2) ANOVA

3) Type I and Type II errors

4) Chi-Squared Tests

## Question 1 

*A student is trying to decide between two GPUs. He want to use the GPU for his research to run Deep learning algorithms, so the only thing he is concerned with is speed.*

*He picks a Deep Learning algorithm on a large data set and runs it on both GPUs 15 times, timing each run in hours. Results are given in the below lists GPU1 and GPU2.*

In [6]:
from scipy import stats as stats
import numpy as np
from scipy.stats             import ttest_1samp,ttest_ind, wilcoxon
from statsmodels.stats.power import ttest_power
import matplotlib.pyplot     as     plt

In [2]:
GPU1 = np.array([11,9,10,11,10,12,9,11,12,9,11,12,9,10,9])
GPU2 = np.array([11,13,10,13,12,9,11,12,12,11,12,12,10,11,13])

#Assumption: Both the datasets (GPU1 & GPU 2) are random, independent, parametric & normally distributed

Hint: You can import ttest function from scipy to perform t tests 

**First T test**

*One sample t-test*

Check if the mean of the GPU1 is equal to zero.
- Null Hypothesis is that mean is equal to zero.
- Alternate hypothesis is that it is not equal to zero.

In [48]:
Xbar=np.mean(GPU1)
print (Xbar)
# Step 1 : Find Meand and SD
print('Mean is %2.1f Sd is %2.1f' % (GPU1.mean(),np.std(GPU1,ddof = 1)))
mu=0
s= np.std(GPU1,ddof=1)
n=15
se = s/np.sqrt(n)
print("Criticalvalues")
print(stats.t.isf(0.025,df=n-1,loc=mu,scale=se))
print(stats.t.isf(0.975,df=n-1,loc=mu,scale=se))
#print("P-Value")
#print(2*stats.t.cdf(Xbar,df=n-1,loc=mu,scale=se))
#print("P-Value is less than 5% hence reject Null")
t_statistic, p_value = ttest_1samp(GPU1,0)
print(t_statistic, p_value)
print ("Since P value is less than alpha ,there is sufficient evidence to reject the null hypothesis that mean is equal to zero")
#ùêª0 : ùúá = 0
#ùêªùê¥: ùúá != 0
# Mean of GPU1 is not equal to zero (10.3)
#Step 2: Decide the significance level
#Here we select ùõº = 0.05 and it is given that n, sample size = 15
#Since P value is less than alpha ,there is sufficient evidence to reject the null hypothesis that mean is equal to zero.


10.333333333333334
Mean is 10.3 Sd is 1.2
Criticalvalues
0.6507704546500327
-0.6507704546500326
34.056241516158195 7.228892044970457e-15
Since P value is less than alpha ,there is sufficient evidence to reject the null hypothesis that mean is equal to zero


## Question 2

Given,

Null Hypothesis : There is no significant difference between data sets

Alternate Hypothesis : There is a significant difference

*Do two-sample testing and check whether to reject Null Hypothesis or not.*

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

In [20]:
import scipy
from scipy import stats
ttest_stats , pval = scipy.stats.ttest_ind(GPU1,GPU2)
print(ttest_stats, pval)
print("p_value ",p_value)
print("Rejecting Null Hypothesis as P_value<0.05")

-2.627629513471839 0.013794282041452725
p_value  7.228892044970457e-15
Rejecting Null Hypothesis as P_value<0.05


## Question 3

He is trying a third GPU - GPU3.

In [21]:
GPU3 = np.array([9,10,9,11,10,13,12,9,12,12,13,12,13,10,11])

#Assumption: Both the datasets (GPU1 & GPU 3) are random, independent, parametric & normally distributed

*Do two-sample testing and check whether there is significant differene between speeds of two GPUs GPU1 and GPU3.*

#### Answer:

In [25]:
import scipy
from scipy import stats
ttest_stats , pval = stats.ttest_ind(GPU1,GPU3)
print("P value:", pval)

print("\nNull hypothesis = No significant difference between speeds of GPU1 , GPU3")
print("\nAlternate hypothesis = there is significant difference between speeds of GPU1, GPU3")

print("\nSince pval > alpha (0.05) , Null Hypothesis is true : There is no significant difference between GPU1 and GPU3")

P value: 0.14509210993138993

Null hypothesis = No significant difference between speeds of GPU1 , GPU3

Alternate hypothesis = there is significant difference between speeds of GPU1, GPU3

Since pval > alpha (0.05) , Null Hypothesis is true : There is no significant difference between GPU1 and GPU3


## ANOVA

## Question 4 

If you need to compare more than two data sets at a time, an ANOVA is your best bet. 

*The results from three experiments with overlapping 95% confidence intervals are given below, and we want to confirm that the results for all three experiments are not significantly different.*

But before conducting ANOVA, test equality of variances (using Levene's test) is satisfied or not. If not, then mention that we cannot depend on the result of ANOVA

In [28]:
import numpy as np
from scipy.stats import levene,f_oneway
e1 = np.array([1.595440,1.419730,0.000000,0.000000])
e2 = np.array([1.433800,2.079700,0.892139,2.384740])
e3 = np.array([0.036930,0.938018,0.995956,1.006970])

#Assumption: All the 3 datasets (e1,e2 & e3) are random, independent, parametric & normally distributed

Perform levene test on the data

The Levene test tests the null hypothesis that all input samples are from populations with equal variances. Levene‚Äôs test is an alternative to Bartlett‚Äôs test bartlett in the case where there are significant deviations from normality.

source: scipy.org

#### Answer:

In [31]:
levene(e1,e2,e3)


LeveneResult(statistic=2.6741725711150446, pvalue=0.12259792666001798)

In [32]:
print("since pvalue>=0.05 so Null hypothesis is true")
print("equality of variances (using Levene's test) is satisfied")

since pvalue>=0.05 so Null hypothesis is true
equality of variances (using Levene's test) is satisfied


## Question 5

The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.

use stats.f_oneway() module to perform one-way ANOVA test

In [33]:
from scipy.stats import f_oneway

In [37]:
f_statistic, p_value = f_oneway(e1, e2, e3)
print("\nP VALUE:", p_value)
print("\nChosen level of significance : 0.05")
print("\nSince the p_value is GREATER than the chosen level of significance,null hypothesis is true")


P VALUE: 0.13574644501798466

Chosen level of significance : 0.05

Since the p_value is GREATER than the chosen level of significance,null hypothesis is true


## Question 6

*In one or two sentences explain about **TypeI** and **TypeII** errors.*

#### Answer:

Type 1 :
Type I error is the rejection of a true null hypothesis (also known as a "false positive" finding or conclusion)
This error occurs when we reject the null hypothesis when we should have retained it.

Type 2 :
When the null hypothesis is false and you fail to reject it, it constitutes a type II error. Type II errors are equivalent to false negatives.


## Question 7 

You are a manager of a chinese restaurant. You want to determine whether the waiting time to place an order has changed in the past month from its previous population mean value of 4.5 minutes. 
State the null and alternative hypothesis.

#### Answer:


NULL HYPOTHESIS : Waiting time to place an order is equal to 4.5mins
ALTERNATE HYPOTHESIS : Waiting time to place an order is not equal to 4.5mins

## Chi square test

## Question 8

Let's create a small dataset for dice rolls of four players

In [39]:
import numpy as np

d1 = [5, 8, 3, 8]
d2 = [9, 6, 8, 5]
d3 = [8, 12, 7, 2]
d4 = [4, 16, 7, 3]
d5 = [3, 9, 6, 5]
d6 = [7, 2, 5, 7]

dice = np.array([d1, d2, d3, d4, d5, d6])

run the test using SciPy Stats library

Depending on the test, we are generally looking for a threshold at either 0.05 or 0.01. Our test is significant (i.e. we reject the null hypothesis) if we get a p-value below our threshold.

For our purposes, we‚Äôll use 0.01 as the threshold.

use stats.chi2_contingency() module 

This function computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table

Print the following:

- chi2 stat
- p-value
- degree of freedom
- contingency



In [40]:
from scipy.stats import chi2_contingency

alpha = 0.01 

chi_sq_Stat, p_value, deg_freedom, exp_freq = stats.chi2_contingency(dice)
#print(chi_sq_Stat, p_value, deg_freedom, exp_freq)


print("CONTINGENCY TABLE: {}\n".format(dice))
print("chi2 stat : {}\n".format(chi_sq_Stat))
print("P_VALUE : {}\n".format(p_value))
print("Chosen level of significance : {}\n".format(alpha))
print("DEGREE OF FREEDOM : {}\n".format(deg_freedom))
print("EXPECTED FREQUENCIES OF VALUES : {}".format(exp_freq))

print("\n\nSince p_value %.3f is greater than the chosen level of significance %.2f, we cannot reject the NULL hypothesis."%(p_value, alpha))

CONTINGENCY TABLE: [[ 5  8  3  8]
 [ 9  6  8  5]
 [ 8 12  7  2]
 [ 4 16  7  3]
 [ 3  9  6  5]
 [ 7  2  5  7]]

chi2 stat : 23.315671914716496

P_VALUE : 0.07766367301496693

Chosen level of significance : 0.01

DEGREE OF FREEDOM : 15

EXPECTED FREQUENCIES OF VALUES : [[ 5.57419355  8.20645161  5.57419355  4.64516129]
 [ 6.50322581  9.57419355  6.50322581  5.41935484]
 [ 6.73548387  9.91612903  6.73548387  5.61290323]
 [ 6.96774194 10.25806452  6.96774194  5.80645161]
 [ 5.34193548  7.86451613  5.34193548  4.4516129 ]
 [ 4.87741935  7.18064516  4.87741935  4.06451613]]


Since p_value 0.078 is greater than the chosen level of significance 0.01, we cannot reject the NULL hypothesis.


## Question 9

### Z-test

Get zscore on the above dice data using stats.zscore module from scipy. Convert zscore values to p-value and take mean of the array.

In [41]:
import scipy
from scipy.stats import zscore

z_score_value = zscore(dice) # w.r.t x-axis by default
print("Z SCORES:\n",z_score_value)

p_values = scipy.stats.norm.sf(abs(z_score_value))*2 # Since it's a two-tailed distribution
print("\n\nP VALUES:\n",p_values)

mean_val = np.mean(p_values)
print("\nMEAN VALUE OF THE P_VAL ARRAY:",mean_val,"\n")

Z SCORES:
 [[-0.46291005 -0.18884739 -1.83711731  1.44115338]
 [ 1.38873015 -0.64208114  1.22474487  0.        ]
 [ 0.9258201   0.7176201   0.61237244 -1.44115338]
 [-0.9258201   1.62408759  0.61237244 -0.96076892]
 [-1.38873015  0.03776948  0.          0.        ]
 [ 0.46291005 -1.54854863 -0.61237244  0.96076892]]


P VALUES:
 [[0.64342884 0.85021243 0.06619258 0.14954135]
 [0.16491482 0.5208205  0.22067136 1.        ]
 [0.35453948 0.47299156 0.54029137 0.14954135]
 [0.35453948 0.10435712 0.54029137 0.33666837]
 [0.16491482 0.96987148 1.         1.        ]
 [0.64342884 0.12149026 0.54029137 0.33666837]]

MEAN VALUE OF THE P_VAL ARRAY: 0.4685694646738299 



## Question 10

A Paired sample t-test compares means from the same group at different times.

The basic two sample t-test is designed for testing differences between independent groups. 
In some cases, you might be interested in testing differences between samples of the same group at different points in time. 
We can conduct a paired t-test using the scipy function stats.ttest_rel(). 

In [44]:
before= stats.norm.rvs(scale=30, loc=100, size=500) ## Creates a normal distribution with a mean value of 100 and std of 30
after = before + stats.norm.rvs(scale=5, loc=-1.25, size=500)

Test whether a weight-loss drug works by checking the weights of the same group patients before and after treatment using above data.

In [42]:
from scipy.stats import ttest_rel

In [45]:
t_statistic, p_value  =  ttest_rel(after, before)
print("P VALUE:",p_value)
print("\nChosen Level of Significance:","0.05")

P VALUE: 1.514757300639864e-08

Chosen Level of Significance: 0.05


In [46]:
print("\nSince the resultant p_value is less than the chosen level of significance, we can REJECT the null hypothesis and thereby conclude that the weight loss drug indeed works on patients before and after the treatment")


Since the resultant p_value is less than the chosen level of significance, we can REJECT the null hypothesis and thereby conclude that the weight loss drug indeed works on patients before and after the treatment
