# Hypothesis Testing

The purpose of the test is to tell if there is any significant difference between two data sets.



## Overview

This module covers,

1) One sample and Two sample t-tests

2) ANOVA

3) Type I and Type II errors

4) Chi-Squared Tests

## Question 1 

*A student is trying to decide between two GPUs. He want to use the GPU for his research to run Deep learning algorithms, so the only thing he is concerned with is speed.*

*He picks a Deep Learning algorithm on a large data set and runs it on both GPUs 15 times, timing each run in hours. Results are given in the below lists GPU1 and GPU2.*

In [1]:
# Import numpy and stats from scipy

from scipy import stats
import numpy as np

In [2]:
GPU1 = np.array([11,9,10,11,10,12,9,11,12,9,11,12,9,10,9])
GPU2 = np.array([11,13,10,13,12,9,11,12,12,11,12,12,10,11,13])

#Assumption: Both the datasets (GPU1 & GPU 2) are random, independent, parametric & normally distributed

Hint: You can import ttest function from scipy to perform t tests 

**First T test**

*One sample t-test*

Check if the mean of the GPU1 is equal to zero.
- Null Hypothesis is that mean is equal to zero.
- Alternate hypothesis is that it is not equal to zero.

In [3]:
#Import the function ttest_1samp from scipy.stats,
#To calculate the T-test for the mean of ONE group or sample of scores. 

from scipy.stats import ttest_1samp

In [4]:
#One-Sample t-test on GPU1
#Null Hypothesis : Mean of GPU1 is equal to zero
#Alternate Hypothesis : Mean of GPU1 is not equal to zero

t_statistic_one_samp, p_value_one_samp = ttest_1samp(GPU1, 0)


In [5]:
print ('One-Sample t-test')
print ('=================')

print("t_statistic = ", t_statistic_one_samp, "\np_value = ",p_value_one_samp, '\n')

One-Sample t-test
t_statistic =  34.056241516158195 
p_value =  7.228892044970457e-15 



In [6]:
print("Since the p-value is very low, we reject the Null Hypothesis.\nSo, Mean of GPU1 is not equal to zero.")

Since the p-value is very low, we reject the Null Hypothesis.
So, Mean of GPU1 is not equal to zero.


## Question 2

Given,

Null Hypothesis : There is no significant difference between data sets

Alternate Hypothesis : There is a significant difference

*Do two-sample testing and check whether to reject Null Hypothesis or not.*

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

In [7]:
#Import the function ttest_ind from scipy.stats,
#To calculate the T-test for the means of two independent samples of scores.

from scipy.stats import ttest_ind

In [8]:
#Two-Sample t-test on GPU1 and GPU2
#Null Hypothesis : There is no significant difference between GPU1 and GPU2
#Alternate Hypothesis : There is a significant difference between GPU1 and GPU2

t_statistic_two_samp, p_value_two_samp = ttest_ind(GPU1, GPU2)


In [9]:
print ('Two-Sample t-test')
print ('=================')

print("t_statistic = ",t_statistic_two_samp, "\np_value = ", p_value_two_samp, '\n')

Two-Sample t-test
t_statistic =  -2.627629513471839 
p_value =  0.013794282041452725 



In [10]:
print("Since the p-value is less than 0.05, we reject the Null Hypothesis.")
print("So,there is a significant difference between the datasets GPU1 and GPU2")

Since the p-value is less than 0.05, we reject the Null Hypothesis.
So,there is a significant difference between the datasets GPU1 and GPU2


## Question 3

He is trying a third GPU - GPU3.

In [11]:
GPU3 = np.array([9,10,9,11,10,13,12,9,12,12,13,12,13,10,11])

#Assumption: Both the datasets (GPU1 & GPU 3) are random, independent, parametric & normally distributed

*Do two-sample testing and check whether there is significant differene between speeds of two GPUs GPU1 and GPU3.*

#### Answer:

In [12]:
#Two-Sample t-test on GPU1 and GPU3
#Null Hypothesis: There is significant differene between speeds of GPU1 and GPU3
#Alternate Hypothesis: There is no significant differene between speeds of GPU1 and GPU3

t_statistic_two_samp2, p_value_two_samp2 = ttest_ind(GPU1, GPU3)


In [13]:
print ('Two-Sample t-test')
print ('=================')

print("t_statistic = ",t_statistic_two_samp2, "\np_value = ", p_value_two_samp2, '\n')

Two-Sample t-test
t_statistic =  -1.4988943759093303 
p_value =  0.14509210993138993 



In [14]:
print("Since the p-value is greater than 0.05, we do not reject the Null Hypothesis.")
print("So,there is a significant difference between the datasets GPU1 and GPU3")

Since the p-value is greater than 0.05, we do not reject the Null Hypothesis.
So,there is a significant difference between the datasets GPU1 and GPU3


## ANOVA

## Question 4 

If you need to compare more than two data sets at a time, an ANOVA is your best bet. 

*The results from three experiments with overlapping 95% confidence intervals are given below, and we want to confirm that the results for all three experiments are not significantly different.*

But before conducting ANOVA, test equality of variances (using Levene's test) is satisfied or not. If not, then mention that we cannot depend on the result of ANOVA

In [15]:
import numpy as np

e1 = np.array([1.595440,1.419730,0.000000,0.000000])
e2 = np.array([1.433800,2.079700,0.892139,2.384740])
e3 = np.array([0.036930,0.938018,0.995956,1.006970])

#Assumption: All the 3 datasets (e1,e2 & e3) are random, independent, parametric & normally distributed

Perform levene test on the data

The Levene test tests the null hypothesis that all input samples are from populations with equal variances. Levene’s test is an alternative to Bartlett’s test bartlett in the case where there are significant deviations from normality.

source: scipy.org

#### Answer:

In [16]:
#Import levene and shapiro functions from scipy.stats
#Perform the Shapiro-Wilk test for normality, to test that the data was drawn from a normal distribution.
#Perform Levene test for equal variances, to test that all input samples are from populations with equal variances

from scipy.stats import levene, shapiro

In [17]:
# Shapiro-Wilk Test to test if a random sample is drawn from a normal distribution or not.
# Null Hypothesis: The input sample is from a population with normal distribution.
# Alternate Hypothesis: The input sample is not from a population with normal distribution.

t_statistic_sh_e1, p_value_sh_e1 = shapiro(e1)
t_statistic_sh_e2, p_value_sh_e2 = shapiro(e2)
t_statistic_sh_e3, p_value_sh_e3 = shapiro(e3)

In [18]:
print ('Shapiro Test')
print ('============')
print("t_statistic_e1 = ",t_statistic_sh_e1, "  p_value_e1 = ", p_value_sh_e1)
print("t_statistic_e2 = ",t_statistic_sh_e2, "  p_value_e2 = ", p_value_sh_e2)
print("t_statistic_e3 = ",t_statistic_sh_e3, "  p_value_e3 = ", p_value_sh_e3, '\n')

Shapiro Test
t_statistic_e1 =  0.7761102914810181   p_value_e1 =  0.0658247321844101
t_statistic_e2 =  0.9608921408653259   p_value_e2 =  0.784522294998169
t_statistic_e3 =  0.6824523210525513   p_value_e3 =  0.007115834858268499 



In [19]:
print("\nSince the p-value is greater than 0.05, we do not reject the Null Hypothesis.")
print("So, All input samples are from populations with normal distribution\n")


Since the p-value is greater than 0.05, we do not reject the Null Hypothesis.
So, All input samples are from populations with normal distribution



In [20]:
# Levene's Test to test that all input samples are from populations with equal variances.
# Null Hypothesis: All input samples are from populations with equal variances.
# Alternate Hypothesis: Atleast one of samples is from a population with variance not equal to other two.

levene_statistic, p_value_levene = levene(e1,e2,e3)

In [21]:
print ('Levene Test')
print ('===========')
print ('Levene Result Statistic:', levene_statistic)
print ('Levene Result P value:', p_value_levene, '\n')

Levene Test
Levene Result Statistic: 2.6741725711150446
Levene Result P value: 0.12259792666001798 



In [22]:
print("\nSince the p-value is greater than 0.05, we do not reject the Null Hypothesis.")
print("So, All input samples are from populations with equal variances.\n")
print("Equality of variances using Levene's test is satisfied.So,we can depend on the result of ANOVA.\n")


Since the p-value is greater than 0.05, we do not reject the Null Hypothesis.
So, All input samples are from populations with equal variances.

Equality of variances using Levene's test is satisfied.So,we can depend on the result of ANOVA.



## Question 5

The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.

use stats.f_oneway() module to perform one-way ANOVA test

In [23]:
#Import f_oneway function from scipy.stats
#f_oneway function performs a 1-way ANOVA.
#The One-Way ANOVA tests that two or more groups have the same population mean.
#This test is applied to samples from two or more groups, possibly with differing sizes.

from scipy.stats import f_oneway

In [24]:
# One-Way ANOVA Test
# Null Hypothesis: All input samples are from populations with equal variances.
# Alternate Hypothesis: Atleast one of samples is from a population with variance not equal to other two.

f, p = f_oneway(e1,e2,e3)

In [25]:
print ('One-way ANOVA')
print ('=============')
print ('F value:', f)
print ('P value:', p, '\n')

One-way ANOVA
F value: 2.51357622845924
P value: 0.13574644501798466 



In [26]:
print("\nSince the p-value is greater than 0.05, we do not reject the Null Hypothesis.")
print("So, All input samples are from populations with equal variances\n")


Since the p-value is greater than 0.05, we do not reject the Null Hypothesis.
So, All input samples are from populations with equal variances



## Question 6

*In one or two sentences explain about **TypeI** and **TypeII** errors.*

#### Answer:

## Question 7 

You are a manager of a chinese restaurant. You want to determine whether the waiting time to place an order has changed in the past month from its previous population mean value of 4.5 minutes. 
State the null and alternative hypothesis.

#### Answer:


## Chi square test

## Question 8

Let's create a small dataset for dice rolls of four players

In [27]:
import numpy as np

d1 = [5, 8, 3, 8]
d2 = [9, 6, 8, 5]
d3 = [8, 12, 7, 2]
d4 = [4, 16, 7, 3]
d5 = [3, 9, 6, 5]
d6 = [7, 2, 5, 7]

dice = np.array([d1, d2, d3, d4, d5, d6])

run the test using SciPy Stats library

Depending on the test, we are generally looking for a threshold at either 0.05 or 0.01. Our test is significant (i.e. we reject the null hypothesis) if we get a p-value below our threshold.

For our purposes, we’ll use 0.01 as the threshold.

use stats.chi2_contingency() module 

This function computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table

Print the following:

- chi2 stat
- p-value
- degree of freedom
- contingency



In [28]:
#Import chi2_contingency function from scipy.stats
#chi2_contingency function performs Chi-square test of independence of variables in a contingency table.

from scipy.stats import chi2_contingency

In [29]:
#Chi-Square Test of Independence
#Null Hypothesis : Each of the dice throw is independent.
#Alternate Hypothesis : All dice throws are dependent.

chi2_stat, p_value, degree_of_freedom, expected = chi2_contingency(dice)


In [30]:
print ('Chi square Test')
print ('===============\n')

print("===Chi2 Stat===")
print(chi2_stat,'\n')

print("===Degrees of Freedom===")
print(degree_of_freedom,'\n')

print("===P-Value===")
print(p_value,'\n')

print("===Contingency Table===")
print(expected,'\n')

Chi square Test

===Chi2 Stat===
23.315671914716496 

===Degrees of Freedom===
15 

===P-Value===
0.07766367301496693 

===Contingency Table===
[[ 5.57419355  8.20645161  5.57419355  4.64516129]
 [ 6.50322581  9.57419355  6.50322581  5.41935484]
 [ 6.73548387  9.91612903  6.73548387  5.61290323]
 [ 6.96774194 10.25806452  6.96774194  5.80645161]
 [ 5.34193548  7.86451613  5.34193548  4.4516129 ]
 [ 4.87741935  7.18064516  4.87741935  4.06451613]] 



In [31]:
print("\nSince the p-value is greater than the threhold 0.01, we do not reject the Null Hypothesis.")
print("So, our test is not significant and each dice throw is independent.\n")


Since the p-value is greater than the threhold 0.01, we do not reject the Null Hypothesis.
So, our test is not significant and each dice throw is independent.



## Question 9

### Z-test

Get zscore on the above dice data using stats.zscore module from scipy. Convert zscore values to p-value and take mean of the array.

In [32]:
#Import zscore function from scipy.stats
#zscore function calculates the z score of each value in the sample, relative to the sample mean and standard deviation.

from scipy.stats import zscore

In [33]:
z_scores = zscore(dice)
z_scores

array([[-0.46291005, -0.18884739, -1.83711731,  1.44115338],
       [ 1.38873015, -0.64208114,  1.22474487,  0.        ],
       [ 0.9258201 ,  0.7176201 ,  0.61237244, -1.44115338],
       [-0.9258201 ,  1.62408759,  0.61237244, -0.96076892],
       [-1.38873015,  0.03776948,  0.        ,  0.        ],
       [ 0.46291005, -1.54854863, -0.61237244,  0.96076892]])

In [34]:
#Import ndtr function from scipy.special
#ndtr is a Gaussian cumulative distribution function.
#Returns the area under the standard Gaussian probability density function,integrated from minus infinity to x

from scipy.special import ndtr

In [35]:
#Using scipy.special.ndtr() function to convert Z scores to p-values

p_values_ndtr_func = 1- ndtr(z_scores)
p_values_ndtr_func

array([[0.67828558, 0.57489379, 0.96690371, 0.07477068],
       [0.08245741, 0.73958975, 0.11033568, 0.5       ],
       [0.17726974, 0.23649578, 0.27014569, 0.92522932],
       [0.82273026, 0.05217856, 0.27014569, 0.83166582],
       [0.91754259, 0.48493574, 0.5       , 0.5       ],
       [0.32171442, 0.93925487, 0.72985431, 0.16833418]])

In [36]:
#Import norm from scipy.stats
#Use the sf function in scipy.stats.norm to convert Z scores to p-values

from scipy.stats import norm

In [37]:
#Using scipy.stats.norm.sf() function to convert Z scores to p-values
#The norm.sf() is based on the assumption that the distribution is a normal one.
#We are multiplying by 2 to consider both sides of the Z scores (Upper tail and lower tail) for p-values

p_values_sf_func = norm.sf(abs(z_scores))*2
p_values_sf_func

array([[0.64342884, 0.85021243, 0.06619258, 0.14954135],
       [0.16491482, 0.5208205 , 0.22067136, 1.        ],
       [0.35453948, 0.47299156, 0.54029137, 0.14954135],
       [0.35453948, 0.10435712, 0.54029137, 0.33666837],
       [0.16491482, 0.96987148, 1.        , 1.        ],
       [0.64342884, 0.12149026, 0.54029137, 0.33666837]])

In [38]:
#Mean of p-values obtained using ndtr() function

p_values_ndtr_func.mean()

0.49478056512575197

In [39]:
#Mean of p-values obtained using sf() function
#The Mean of p-values obtained using both the functions is approximately equal

p_values_sf_func.mean()

0.4685694646738299

## Question 10

A Paired sample t-test compares means from the same group at different times.

The basic two sample t-test is designed for testing differences between independent groups. 
In some cases, you might be interested in testing differences between samples of the same group at different points in time. 
We can conduct a paired t-test using the scipy function stats.ttest_rel(). 

In [40]:
before= stats.norm.rvs(scale=30, loc=100, size=500) ## Creates a normal distribution with a mean value of 100 and std of 30
after = before + stats.norm.rvs(scale=5, loc=-1.25, size=500)

Test whether a weight-loss drug works by checking the weights of the same group patients before and after treatment using above data.

In [41]:
#Import ttest_rel function from scipy.stats
#Paired sample t-test performs the T-test on TWO RELATED samples of scores, a and b.
#Paired sample t-test is a two-sided test to test that 2 related or repeated samples have identical expected values.

from scipy.stats import ttest_rel

In [42]:
#Paired sample t-test to compare means from the same group at different times.
#Null Hypothesis: The mean of weights is identical.
#Alternate Hypothesis: The mean of weights is not identical.

ttest_rel(before, after)

Ttest_relResult(statistic=5.07032369825051, pvalue=5.607034007947108e-07)

In [43]:
print("\nSince the p-value is very low,we reject the Null Hypothesis.")
print("So,the mean of weights is not identical and the weight-loss drug does not work.\n")


Since the p-value is very low,we reject the Null Hypothesis.
So,the mean of weights is not identical and the weight-loss drug does not work.

