# Hypothesis Testing

The purpose of the test is to tell if there is any significant difference between two data sets.



## Overview

This module covers,

1) One sample and Two sample t-tests

2) ANOVA

3) Type I and Type II errors

4) Chi-Squared Tests

## Question 1 

*A student is trying to decide between two GPUs. He want to use the GPU for his research to run Deep learning algorithms, so the only thing he is concerned with is speed.*

*He picks a Deep Learning algorithm on a large data set and runs it on both GPUs 15 times, timing each run in hours. Results are given in the below lists GPU1 and GPU2.*

In [0]:
from scipy import stats 
import numpy as np

In [0]:
GPU1 = np.array([11,9,10,11,10,12,9,11,12,9,11,12,9,10,9])
GPU2 = np.array([11,13,10,13,12,9,11,12,12,11,12,12,10,11,13])

#Assumption: Both the datasets (GPU1 & GPU 2) are random, independent, parametric & normally distributed

Hint: You can import ttest function from scipy to perform t tests 

**First T test**

*One sample t-test*

Check if the mean of the GPU1 is equal to zero.
- Null Hypothesis is that mean is equal to zero.
- Alternate hypothesis is that it is not equal to zero.

In [111]:
# To check if mean of GPU1 is equal to zero 
from scipy import stats
from scipy.stats import ttest_1samp
from scipy import stats
import numpy as np
GPU1 = np.array([11,9,10,11,10,12,9,11,12,9,11,12,9,10,9])
mean_GPU1 = np.mean(GPU1)
print ("Mean of GPU1 = ", mean_GPU1)
print ("Mean of GPU1 is not equal to 0")
stats.ttest_ind(GPU1,GPU2, axis = 0, equal_var = True)

Mean of GPU1 =  10.333333333333334
Mean of GPU1 is not equal to 0


Ttest_indResult(statistic=0.8388897797818647, pvalue=0.4086362743456251)

## Question 2

Given,

Null Hypothesis : There is no significant difference between data sets

Alternate Hypothesis : There is a significant difference

*Do two-sample testing and check whether to reject Null Hypothesis or not.*

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

In [56]:
# Two sample testing to reject Null Hypothesis or not.
import pandas as pd
from scipy import stats
GPU1 = np.array([11,9,10,11,10,12,9,11,12,9,11,12,9,10,9])
GPU2 = np.array([11,13,10,13,12,9,11,12,12,11,12,12,10,11,13])
GPU1 = stats.norm.rvs(loc=5,scale=10,size=15)
GPU2 = stats.norm.rvs(loc=5,scale=10,size=15)
stats.ttest_ind(GPU1, GPU2)

Ttest_indResult(statistic=-1.372236277074278, pvalue=0.18088399926319684)

## Question 3

He is trying a third GPU - GPU3.

In [34]:
GPU3 = np.array([9,10,9,11,10,13,12,9,12,12,13,12,13,10,11])

#Assumption: Both the datasets (GPU1 & GPU 3) are random, independent, parametric & normally distributed

*Do two-sample testing and check whether there is significant differene between speeds of two GPUs GPU1 and GPU3.*

#### Answer:

In [44]:
# Two sample test to check difference between speeds of GPU1 and GPU3.
GPU1 = np.array([11,9,10,11,10,12,9,11,12,9,11,12,9,10,9])
GPU3 = np.array([9,10,9,11,10,13,12,9,12,12,13,12,13,10,11])
mean_GPU1 = np.mean(GPU1)
mean_GPU3 = np.mean(GPU3)
stats.ttest_ind(GPU1,GPU3,equal_var=False) 

Ttest_indResult(statistic=-1.4988943759093303, pvalue=0.1456833060579751)

## ANOVA

## Question 4 

If you need to compare more than two data sets at a time, an ANOVA is your best bet. 

*The results from three experiments with overlapping 95% confidence intervals are given below, and we want to confirm that the results for all three experiments are not significantly different.*

But before conducting ANOVA, test equality of variances (using Levene's test) is satisfied or not. If not, then mention that we cannot depend on the result of ANOVA

In [0]:
import numpy as np

e1 = np.array([1.595440,1.419730,0.000000,0.000000])
e2 = np.array([1.433800,2.079700,0.892139,2.384740])
e3 = np.array([0.036930,0.938018,0.995956,1.006970])

#Assumption: All the 3 datasets (e1,e2 & e3) are random, independent, parametric & normally distributed

Perform levene test on the data

The Levene test tests the null hypothesis that all input samples are from populations with equal variances. Levene’s test is an alternative to Bartlett’s test bartlett in the case where there are significant deviations from normality.

source: scipy.org

#### Answer:

In [46]:
# Performing Levene's test using given 3 datasets
import numpy as np

e1 = np.array([1.595440,1.419730,0.000000,0.000000])
e2 = np.array([1.433800,2.079700,0.892139,2.384740])
e3 = np.array([0.036930,0.938018,0.995956,1.006970])
stats.levene(e1,e2,e3)

LeveneResult(statistic=2.6741725711150446, pvalue=0.12259792666001798)

## Question 5

The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.

use stats.f_oneway() module to perform one-way ANOVA test

In [45]:
# Performing one-way ANOVA test with given 3 samples having same population mean
import numpy as np

e1 = np.array([1.595440,1.419730,0.000000,0.000000])
e2 = np.array([1.433800,2.079700,0.892139,2.384740])
e3 = np.array([0.036930,0.938018,0.995956,1.006970])
stats.f_oneway(e1,e2,e3)

F_onewayResult(statistic=2.51357622845924, pvalue=0.13574644501798466)

## Question 6

*In one or two sentences explain about **TypeI** and **TypeII** errors.*

**TypeI**
a) Type I Error - describes a situation where we reject the null hypothesis when it is actually true. This type of error is also known as a "false positive" or "false hit".

- Probability of making type I error is alpha (α) which is significance set for hypothesis test.
- Eg: For  α = 0.5, this indicates that we are willing to accept that there is 5% chance we are wrong when we predict Null hypothesis.
- To lower the risk, we should take lesser value of 'α'.
- Correct Decision (probability = 1 - α)
Type I Error - rejecting H0 when it is true (probability = α)

**TypeII** errors:

b) Type II errors - describes a situation where null hypothesis is false and we fail to reject it. This type of error is also known as "false negative"

- Type II Error - fail to reject H0 when it is false (probability = β)
- The probability of rejecting the null hypothesis when it is false is equal to 1–β. 
- This value is the power of the test.
- We can decrease your risk of committing a type II error by ensuring your test has enough power. We can do this by ensuring our sample size is large enough to detect a practical difference when one truly exists.
- Correct Decision (probability = 1 - β)

## Question 7 

You are a manager of a chinese restaurant. You want to determine whether the waiting time to place an order has changed in the past month from its previous population mean value of 4.5 minutes. 
State the null and alternative hypothesis.

#### Answer:


# Null and Alternate Hypothesis for checking waiting time in Chinese restaurant has changed or not in past month

In [None]:
Answer : 
    
# Null hypothesis for ordering time has changed in past month with mean = 4.5 mins
The null hypothesis is that the population mean has not changed from its previous value of 4.5 minutes
H0: µ = 4.5
        
# Alternative hypothesis is the opposite of the null hypothesis, 
the alternative hypothesis is that the population mean is not 4.5 minutes and may be has increased or decreased.
H1: µ ≠ 4.5

## Chi square test

## Question 8

Let's create a small dataset for dice rolls of four players

In [0]:
import numpy as np

d1 = [5, 8, 3, 8]
d2 = [9, 6, 8, 5]
d3 = [8, 12, 7, 2]
d4 = [4, 16, 7, 3]
d5 = [3, 9, 6, 5]
d6 = [7, 2, 5, 7]

dice = np.array([d1, d2, d3, d4, d5, d6])

run the test using SciPy Stats library

Depending on the test, we are generally looking for a threshold at either 0.05 or 0.01. Our test is significant (i.e. we reject the null hypothesis) if we get a p-value below our threshold.

For our purposes, we’ll use 0.01 as the threshold.

use stats.chi2_contingency() module 

This function computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table

Print the following:

- chi2 stat
- p-value
- degree of freedom
- contingency



In [52]:
# Performing Chi Square test and printing chi2stat,p-value,degree of freedom and contingency from result got after chi test.
# Threshold = 0.01
import numpy as np
from scipy import stats

d1 = [5, 8, 3, 8]
d2 = [9, 6, 8, 5]
d3 = [8, 12, 7, 2]
d4 = [4, 16, 7, 3]
d5 = [3, 9, 6, 5]
d6 = [7, 2, 5, 7]

dice = np.array([d1, d2, d3, d4, d5, d6])
Contingency = stats.chi2_contingency(dice)
chi2_stat, p_val, dof
print("------Chi2 Stat--------")
print(chi2_stat)
print("\n")
print("----Degrees of Freedom----")
print(dof)
print("\n")
print("------P-Value-------")
print(p_val)
print("\n")
print("------Contingency Table----")
print(Contingency)

------Chi2 Stat--------
23.315671914716496


----Degrees of Freedom----
15


------P-Value-------
0.07766367301496693


------Contingency Table----
(23.315671914716496, 0.07766367301496693, 15, array([[ 5.57419355,  8.20645161,  5.57419355,  4.64516129],
       [ 6.50322581,  9.57419355,  6.50322581,  5.41935484],
       [ 6.73548387,  9.91612903,  6.73548387,  5.61290323],
       [ 6.96774194, 10.25806452,  6.96774194,  5.80645161],
       [ 5.34193548,  7.86451613,  5.34193548,  4.4516129 ],
       [ 4.87741935,  7.18064516,  4.87741935,  4.06451613]]))


## Question 9

### Z-test

Get zscore on the above dice data using stats.zscore module from scipy. Convert zscore values to p-value and take mean of the array.

In [73]:
# Performing Z-test on given data and convert z-score values to p-value and mean of the array
import numpy as np
from scipy import stats
import scipy.stats as st

d1 = [5, 8, 3, 8]
d2 = [9, 6, 8, 5]
d3 = [8, 12, 7, 2]
d4 = [4, 16, 7, 3]
d5 = [3, 9, 6, 5]
d6 = [7, 2, 5, 7]

dice = np.array([d1, d2, d3, d4, d5, d6])
Z_score = stats.zscore(dice)
Z_score

array([[-0.46291005, -0.18884739, -1.83711731,  1.44115338],
       [ 1.38873015, -0.64208114,  1.22474487,  0.        ],
       [ 0.9258201 ,  0.7176201 ,  0.61237244, -1.44115338],
       [-0.9258201 ,  1.62408759,  0.61237244, -0.96076892],
       [-1.38873015,  0.03776948,  0.        ,  0.        ],
       [ 0.46291005, -1.54854863, -0.61237244,  0.96076892]])

In [78]:
# Converting z-scores to p-value
p_value = stats.norm.sf(abs(Z_score))
p_value

array([[0.32171442, 0.42510621, 0.03309629, 0.07477068],
       [0.08245741, 0.26041025, 0.11033568, 0.5       ],
       [0.17726974, 0.23649578, 0.27014569, 0.07477068],
       [0.17726974, 0.05217856, 0.27014569, 0.16833418],
       [0.08245741, 0.48493574, 0.5       , 0.5       ],
       [0.32171442, 0.06074513, 0.27014569, 0.16833418]])

In [83]:
# Finding mean of the array
Mean_Value = np.mean(dice)
Mean_Value

6.458333333333333

## Question 10

A Paired sample t-test compares means from the same group at different times.

The basic two sample t-test is designed for testing differences between independent groups. 
In some cases, you might be interested in testing differences between samples of the same group at different points in time. 
We can conduct a paired t-test using the scipy function stats.ttest_rel(). 

In [0]:
before= stats.norm.rvs(scale=30, loc=100, size=500) ## Creates a normal distribution with a mean value of 100 and std of 30
after = before + stats.norm.rvs(scale=5, loc=-1.25, size=500)

Test whether a weight-loss drug works by checking the weights of the same group patients before and after treatment using above data.

In [86]:
# Checking weights of same group of patients before and after treatment

#1 Checking weights of group of patients before treatment
before= stats.norm.rvs(scale=30, loc=100, size=500)
after = before + stats.norm.rvs(scale=5, loc=-1.25, size=500)
stats.ttest_rel(before,after)

Ttest_relResult(statistic=5.709360426076865, pvalue=1.948849148195536e-08)

In [104]:
after = before + stats.norm.rvs(scale=5, loc=-1.25, size=500)
Mean_after = after.mean()
Mean_after
print("Mean value of patients weight after drug test = ", Mean_after)

Mean value of patients weight after drug test =  97.1416067550648


In [105]:
Mean_before= before.mean()
print (" Mean value of patients weight before drug test = " , Mean_before)

 Mean value of patients weight before drug test =  98.47598166947209


In [106]:
# Inferring from results if treat results in weight loss or not
print('Since Mean value of weight of patients of after drug test is < Before drug test, weight-loss drug works')

Since Mean value of weight of patients of after drug test is < Before drug test, weight-loss drug works
