# Hypothesis Testing

The purpose of the test is to tell if there is any significant difference between two data sets.



## Overview

This module covers,

1) One sample and Two sample t-tests

2) ANOVA

3) Type I and Type II errors

4) Chi-Squared Tests

## Question 1 

*A student is trying to decide between two GPUs. He want to use the GPU for his research to run Deep learning algorithms, so the only thing he is concerned with is speed.*

*He picks a Deep Learning algorithm on a large data set and runs it on both GPUs 15 times, timing each run in hours. Results are given in the below lists GPU1 and GPU2.*

In [127]:
from scipy import stats 
import pandas as pd
import numpy as np
from scipy.stats import ttest_1samp,ttest_ind,ttest_rel
import statsmodels.api         as     sm
from   statsmodels.formula.api import ols

In [128]:
GPU1 = np.array([11,9,10,11,10,12,9,11,12,9,11,12,9,10,9])
GPU2 = np.array([11,13,10,13,12,9,11,12,12,11,12,12,10,11,13])

#Assumption: Both the datasets (GPU1 & GPU 2) are random, independent, parametric & normally distributed

Hint: You can import ttest function from scipy to perform t tests 

**First T test**

*One sample t-test*

Check if the mean of the GPU1 is equal to zero.
- Null Hypothesis is that mean is equal to zero.
- Alternate hypothesis is that it is not equal to zero.

In [171]:
##print('Mean is %2.1f Sd is %2.1f' % (GPU1.mean(),np.std(GPU1,ddof = 1)))


Mean is 10.3 Sd is 1.2


In [172]:
t_statistic,p_value=ttest_1samp(GPU1,0)
print("P value",p_value)


P value 7.228892044970457e-15


In [234]:
##p value is very low. So rejecting null.Which implies alernate hypothesis is true.
## so mean is not equal to zero

## Question 2

Given,

Null Hypothesis : There is no significant difference between data sets

Alternate Hypothesis : There is a significant difference

*Do two-sample testing and check whether to reject Null Hypothesis or not.*

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

In [257]:
##t_statstic,p_value=ttest_ind(GPU1,GPU2,axis=0,nan_policy='propagate')
t_statstic,p_value=ttest_ind(GPU1,GPU2)
print(p_value)

0.013794282041452725


In [258]:
## Low value of P which means null is rejected. alternate hpostheisis is true. 
##So there is significant difference between data sets

## Question 3

He is trying a third GPU - GPU3.

In [259]:
GPU3 = np.array([9,10,9,11,10,13,12,9,12,12,13,12,13,10,11])

#Assumption: Both the datasets (GPU1 & GPU 3) are random, independent, parametric & normally distributed

*Do two-sample testing and check whether there is significant differene between speeds of two GPUs GPU1 and GPU3.*

#### Answer:

In [260]:
##Null hypothesis :  There is no signigicant differene between data sets
## alternate hypothesis : There is significant difference between data sets


In [261]:
t_statstic,p_value=ttest_ind(GPU1,GPU3,axis=0,nan_policy='propagate')
print(p_value)

0.14509210993138993


In [262]:
## Because of hgh p-value, failed to reject null and alternate hypothesis is false.
## there is no significant difference between data sets

## ANOVA

## Question 4 

If you need to compare more than two data sets at a time, an ANOVA is your best bet. 

*The results from three experiments with overlapping 95% confidence intervals are given below, and we want to confirm that the results for all three experiments are not significantly different.*

But before conducting ANOVA, test equality of variances (using Levene's test) is satisfied or not. If not, then mention that we cannot depend on the result of ANOVA

In [242]:
import numpy as np

e1 = np.array([1.595440,1.419730,0.000000,0.000000])
e2 = np.array([1.433800,2.079700,0.892139,2.384740])
e3 = np.array([0.036930,0.938018,0.995956,1.006970])

#Assumption: All the 3 datasets (e1,e2 & e3) are random, independent, parametric & normally distributed

Perform levene test on the data

The Levene test tests the null hypothesis that all input samples are from populations with equal variances. Levene’s test is an alternative to Bartlett’s test bartlett in the case where there are significant deviations from normality.

source: scipy.org

#### Answer:

In [263]:
 ##null hypothesis : variance are equal for three different experiment
##alternate hypothesis: variances are different 

In [264]:
stats.levene(e1,e2,e3)

LeveneResult(statistic=2.6741725711150446, pvalue=0.12259792666001798)

In [265]:
## verify high p value failed to reject null which means that all inputs samples are from population with equal variances
##hence applying anova

In [266]:
##null hypothesis is that results of all three tests are not significantly different  
## alternate hypothesis is results of all three tests are significantly different

In [267]:
print('Count, Mean and standard deviation of results of experiment 1: %3d, %3.2f and %3.2f' % (len(e1), e1.mean(),np.std(e1,ddof =1)))
print('Count, Mean and standard deviation of results of experiment 2: %3d, %3.2f and %3.2f' % (len(e2), e2.mean(),np.std(e2,ddof =1)))
print('Count, Mean and standard deviation of of results of experiment 3: %3d, %3.2f and %3.2f' % (len(e3), e3.mean(),np.std(e3,ddof =1)))

Count, Mean and standard deviation of results of experiment 1:   4, 0.75 and 0.87
Count, Mean and standard deviation of results of experiment 2:   4, 1.70 and 0.67
Count, Mean and standard deviation of of results of experiment 3:   4, 0.74 and 0.47


In [268]:
results_of_experiment_df = pd.DataFrame()
df1 = pd.DataFrame({'exp':'1','results':e1})
df2 = pd.DataFrame({'exp':'2','results':e2})
df3 = pd.DataFrame({'exp':'3','results':e3})

In [269]:
results_of_experiment_df = results_of_experiment_df.append(df1)
results_of_experiment_df = results_of_experiment_df.append(df2)
results_of_experiment_df = results_of_experiment_df.append(df3)
results_of_experiment_df

Unnamed: 0,exp,results
0,1,1.59544
1,1,1.41973
2,1,0.0
3,1,0.0
0,2,1.4338
1,2,2.0797
2,2,0.892139
3,2,2.38474
0,3,0.03693
1,3,0.938018


In [270]:
mod=ols('results ~ exp',data = results_of_experiment_df).fit()
aov_table = sm.stats.anova_lm(mod, typ=2)

In [271]:
print(aov_table)

            sum_sq   df         F    PR(>F)
exp       2.399066  2.0  2.513576  0.135746
Residual  4.294994  9.0       NaN       NaN


In [272]:
## the calcculated value of P  value is greater than given level of significance, so failed to reject null. 
##There is significant difference in the results of the experiment

## Question 5

The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.

use stats.f_oneway() module to perform one-way ANOVA test

In [273]:
stats.f_oneway(e1,e2,e3)

F_onewayResult(statistic=2.51357622845924, pvalue=0.13574644501798466)

## Question 6

*In one or two sentences explain about **TypeI** and **TypeII** errors.*

#### Answer:

Type 1 Error :is the rejection of true null hypothesis.This occurs when the null hypothesis (H0) is true, but is rejected. It is asserting something that is absent, a false hit.It is called as false positive.(a result that indicates that a given condition is present when it actually is not present).Usually a type I error leads to the conclusion that a supposed effect or relationship exists when in fact it does not. Examples of type I errors include a test that shows a patient to have a disease when in fact the patient does not have the disease.The type I error rate or significance level is the probability of rejecting the null hypothesis given that it is true. It is denoted by the Greek letter α (alpha) and is also called the alpha level. The significance level is set to 0.05 (5%), implying that it is acceptable to have a 5% probability of incorrectly rejecting the null hypothesis.



Type 2 Error:is the failure to reject a false null hypothesis.This occurs when the null hypothesis is false, but erroneously fails to be rejected. It is failing to assert what is present, a miss. A type II error is often called a false negative (where an actual hit was disregarded by the test and is seen as a miss) in a test checking for a single condition with a definitive result of true or false.Examples of type II errors are a blood test failing to detect the disease it was designed to detect, in a patient who really has the disease; This is represented as beta

## Question 7 

You are a manager of a chinese restaurant. You want to determine whether the waiting time to place an order has changed in the past month from its previous population mean value of 4.5 minutes. 
State the null and alternative hypothesis.

#### Answer:


Null Hypothesis:There is no change in waitime time to place order from previous mean of 4.5 minutes. $\mu$ = 4.5


Alternate Hypothesis:There is change in waiting time to place order from previous mean of 4.5 minutes $\mu$ $\neq$ 4.5

## Chi square test

## Question 8

Let's create a small dataset for dice rolls of four players

In [274]:
import numpy as np
# Here d1 to d6 are the number of dice and four values in each dice represents value of dice for player 1 to player 4

d1 = [1, 6, 3, 4]
d2 = [2, 5, 1, 3]
d3 = [4, 2, 3, 1]
d4 = [3, 4, 1, 2]
d5 = [1, 6, 3, 5]
d6 = [3, 2, 2, 1]

dice = np.array([d1, d2, d3, d4, d5, d6])

run the test using SciPy Stats library

Depending on the test, we are generally looking for a threshold at either 0.05 or 0.01. Our test is significant (i.e. we reject the null hypothesis) if we get a p-value below our threshold.

For our purposes, we’ll use 0.01 as the threshold.

use stats.chi2_contingency() module 

This function computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table

Print the following:

- chi2 stat
- p-value
- degree of freedom
- contingency



In [275]:
chi_sq_Stat, p_value,deg_freedom,exp_frequency = stats.chi2_contingency(dice)
print(chi_sq_Stat)
print(p_value)
print(deg_freedom)
print(exp_frequency)

11.445004959326388
0.720458335452983
15
[[2.88235294 5.14705882 2.67647059 3.29411765]
 [2.26470588 4.04411765 2.10294118 2.58823529]
 [2.05882353 3.67647059 1.91176471 2.35294118]
 [2.05882353 3.67647059 1.91176471 2.35294118]
 [3.08823529 5.51470588 2.86764706 3.52941176]
 [1.64705882 2.94117647 1.52941176 1.88235294]]


In [276]:
##We reject null due to higher value of P value 0.7 than our threshold of 0.01

## Question 9

### Z-test

Get zscore on the above dice data using stats.zscore module from scipy. Convert zscore values to p-value and take mean of the array.

In [277]:
z_scores = stats.zscore(dice)
print(z_scores)
p_values = stats.norm.sf(abs(z_scores))
print(p_values)
print('mean is ',p_values.mean())


[[-1.20604538  1.09454091  0.92847669  0.89442719]
 [-0.30151134  0.4975186  -1.29986737  0.2236068 ]
 [ 1.50755672 -1.29354835  0.92847669 -1.11803399]
 [ 0.60302269 -0.09950372 -1.29986737 -0.4472136 ]
 [-1.20604538  1.09454091  0.92847669  1.56524758]
 [ 0.60302269 -1.29354835 -0.18569534 -1.11803399]]
[[0.1139     0.13685891 0.17658018 0.18554668]
 [0.3815123  0.3094117  0.09682322 0.41153164]
 [0.06583401 0.09791074 0.17658018 0.13177624]
 [0.2732468  0.46036917 0.09682322 0.32736042]
 [0.1139     0.13685891 0.17658018 0.05876243]
 [0.2732468  0.09791074 0.42634184 0.13177624]]
mean is  0.20239343794918715


## Question 10

A Paired sample t-test compares means from the same group at different times.

The basic two sample t-test is designed for testing differences between independent groups. 
In some cases, you might be interested in testing differences between samples of the same group at different points in time. 
We can conduct a paired t-test using the scipy function stats.ttest_rel(). 

In [278]:
before= stats.norm.rvs(scale=30, loc=100, size=500) ## Creates a normal distribution with a mean value of 100 and std of 30
after = before + stats.norm.rvs(scale=5, loc=-1.25, size=500)


Test whether a weight-loss drug works by checking the weights of the same group patients before and after treatment using above data.

In [279]:
###null hypothesis : no change in weights of patient after treatment
## aleternate hypothesis: change in weights of patient after treatment


In [280]:
statistic_value,p_value = stats.ttest_rel(before,after)

In [281]:
print(p_value)

1.7334069656974153e-08


In [282]:
##p value is signigicatly low, so rejecting null hypothesis.There is change in weight of pateint after treatment hence drug works