# Hypothesis Testing

The purpose of the test is to tell if there is any significant difference between two data sets.



## Overview

This module covers,

1) One sample and Two sample t-tests

2) ANOVA

3) Type I and Type II errors

4) Chi-Squared Tests

## Question 1 

*A student is trying to decide between two GPUs. He want to use the GPU for his research to run Deep learning algorithms, so the only thing he is concerned with is speed.*

*He picks a Deep Learning algorithm on a large data set and runs it on both GPUs 15 times, timing each run in hours. Results are given in the below lists GPU1 and GPU2.*

In [93]:
import scipy.stats as stats
from scipy.stats import ttest_1samp
import numpy as np

In [94]:
GPU1 = np.array([11,9,10,11,10,12,9,11,12,9,11,12,9,10,9])
GPU2 = np.array([11,13,10,13,12,9,11,12,12,11,12,12,10,11,13])

#Assumption: Both the datasets (GPU1 & GPU 2) are random, independent, parametric & normally distributed

Hint: You can import ttest function from scipy to perform t tests 

**First T test**

*One sample t-test*

Check if the mean of the GPU1 is equal to zero.
- Null Hypothesis is that mean is equal to zero.
- Alternate hypothesis is that it is not equal to zero.

In [95]:
# Population Parameters
print("[A] Population Parameters")
print("-------------------------")
mu = 0
print("mu = ", mu)
print("")

# Sample Parameters
print("[B] Sample Parameters")
print("---------------------")
xbar = GPU1.mean()
print("xbar = ", xbar)
s = np.std(GPU1, ddof = 1)
print("s = ", s)
n = len(GPU1)
print("n = ", n)
se = s/np.sqrt(n)
print("se = ", se)
dof = n - 1
print("dof = ", dof)
print("")

print("[C] Hypothesis")
print("--------------")
print("H0: mu = 0")
print("H0: mu != 0")
alpha = 0.05
print("Chosen Level of Significance (alpha) = ", alpha)
print("This is a two tailed test")
print("")

print("[D] Analysis")
print("------------")

print("Hypothesis test using Critical Value Approach:")
t4xbar = (xbar - mu)/se
print("\tt-statistics = ", t4xbar)

upper = stats.t.isf(q = 0.025, df = dof)
lower = stats.t.ppf(q = 0.025, df = dof)

print("\tCritical values (lower, upper) = ", (lower, upper))

reject_h0_using_critical_values_test = (t4xbar < lower) or (t4xbar > upper)
print("\tShould we reject H0 ? - ", np.where(reject_h0_using_critical_values_test, "Yes", "No"))

print("")
print("Hypothesis test using p-value Approach:")
p_value = 2 * stats.t.sf(t4xbar, df = dof)
print("\tp-value = ", p_value)
reject_h0_using_p_value = (p_value < alpha)
print("\tShould we reject H0 ? - ", np.where(reject_h0_using_p_value, "Yes", "No"))


[A] Population Parameters
-------------------------
mu =  0

[B] Sample Parameters
---------------------
xbar =  10.333333333333334
s =  1.1751393027860062
n =  15
se =  0.3034196632775998
dof =  14

[C] Hypothesis
--------------
H0: mu = 0
H0: mu != 0
Chosen Level of Significance (alpha) =  0.05
This is a two tailed test

[D] Analysis
------------
Hypothesis test using Critical Value Approach:
	t-statistics =  34.056241516158195
	Critical values (lower, upper) =  (-2.1447866879169277, 2.1447866879169277)
	Should we reject H0 ? -  Yes

Hypothesis test using p-value Approach:
	p-value =  7.228892044970457e-15
	Should we reject H0 ? -  Yes


## Question 2

Given,

Null Hypothesis : There is no significant difference between data sets

Alternate Hypothesis : There is a significant difference

*Do two-sample testing and check whether to reject Null Hypothesis or not.*

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

In [96]:
print("[A] Hypothesis")
print("--------------")
print("H0: muGPU1 - muGPU2  = 0")
print("H1: muGPU1 - muGPU2 != 0")
alpha = 0.05
print("Level of significance (alpha) = ", alpha)
print("")

print("[B] Analysis")
print("------------")
print("This is a case of Independent samples.")
print("Assuming equal variance, using pooled-variance t-test.")
# Compute t_statistic and p_value
t_statistic, p_value = stats.ttest_ind(GPU1, GPU2)
print("t_statistic = ", t_statistic)
print("p_value     = ", p_value)
print("")
reject_h0 = p_value < alpha
print(np.where(reject_h0, 
               "Conclusion : There is a significant difference between performance of GPU1 and GPU2.", 
               "Conclusion : There isn't significant difference between performance of GPU1 and GPU2."))

[A] Hypothesis
--------------
H0: muGPU1 - muGPU2  = 0
H1: muGPU1 - muGPU2 != 0
Level of significance (alpha) =  0.05

[B] Analysis
------------
This is a case of Independent samples.
Assuming equal variance, using pooled-variance t-test.
t_statistic =  -2.627629513471839
p_value     =  0.013794282041452725

Conclusion : There is a significant difference between performance of GPU1 and GPU2.


## Question 3

He is trying a third GPU - GPU3.

In [97]:
GPU3 = np.array([9,10,9,11,10,13,12,9,12,12,13,12,13,10,11])

#Assumption: Both the datasets (GPU1 & GPU 3) are random, independent, parametric & normally distributed

*Do two-sample testing and check whether there is significant differene between speeds of two GPUs GPU1 and GPU3.*

#### Answer:

In [98]:
print("[A] Hypothesis")
print("--------------")
print("H0: muGPU1 - muGPU3  = 0")
print("H1: muGPU1 - muGPU3 != 0")
alpha = 0.05
print("Level of significance (alpha) = ", alpha)
print("")

print("[B] Analysis")
print("------------")
print("This is a case of Independent samples.")
print("Assuming equal variance, using pooled-variance t-test.")
# Compute t_statistic and p_value
t_statistic, p_value = stats.ttest_ind(GPU1, GPU3, equal_var=True)
print("t_statistic = ", t_statistic)
print("p_value     = ", p_value)
print("")
reject_h0 = p_value < alpha
print(np.where(reject_h0, 
               "Conclusion : There is a significant difference between performance of GPU1 and GPU3.", 
               "Conclusion : There isn't significant difference between performance of GPU1 and GPU3."))

[A] Hypothesis
--------------
H0: muGPU1 - muGPU3  = 0
H1: muGPU1 - muGPU3 != 0
Level of significance (alpha) =  0.05

[B] Analysis
------------
This is a case of Independent samples.
Assuming equal variance, using pooled-variance t-test.
t_statistic =  -1.4988943759093303
p_value     =  0.14509210993138993

Conclusion : There isn't significant difference between performance of GPU1 and GPU3.


## ANOVA

## Question 4 

If you need to compare more than two data sets at a time, an ANOVA is your best bet. 

*The results from three experiments with overlapping 95% confidence intervals are given below, and we want to confirm that the results for all three experiments are not significantly different.*

But before conducting ANOVA, test equality of variances (using Levene's test) is satisfied or not. If not, then mention that we cannot depend on the result of ANOVA

In [99]:
import numpy as np
import scipy.stats as stats

e1 = np.array([1.595440,1.419730,0.000000,0.000000])
e2 = np.array([1.433800,2.079700,0.892139,2.384740])
e3 = np.array([0.036930,0.938018,0.995956,1.006970])

#Assumption: All the 3 datasets (e1,e2 & e3) are random, independent, parametric & normally distributed

Perform levene test on the data

The Levene test tests the null hypothesis that all input samples are from populations with equal variances. Levene’s test is an alternative to Bartlett’s test bartlett in the case where there are significant deviations from normality.

source: scipy.org

#### Answer:

In [100]:
print("[A] Levene's Test")
print("------------------")
print("Performing Levene's Test to check homogeneity of variance (i.e. all 3 groups have same variance)")
print("")

# Leven's test - Test all input samples are from populations with equal variances.
f_stat_4_levene, p_val = stats.levene(e1, e2, e3)

print("F-Stat for Levene's test = ", f_stat_4_levene)

p_value_4_levene = stats.f.sf(f_stat_4_levene, dfn =2 , dfd = 9)
print("p-value for Levene's test = ", p_value_4_levene)

f_crit = stats.f.isf(q = 0.05, dfn =2 , dfd = 9)
print("F-Critical value = ", f_crit)

reject_h0_4_levene = f_stat_4_levene > f_crit

print(np.where(reject_h0_4_levene, 
               "Conclusion : There is evidence of significant difference among the variances of given groups.",
               "Conclusion : There is insufficient evidence of significant difference among the variances of given groups. i.e. All groups have same/nearly same variance. So we can depend on ANOVA."))
print("")
    

[A] Levene's Test
------------------
Performing Levene's Test to check homogeneity of variance (i.e. all 3 groups have same variance)

F-Stat for Levene's test =  2.6741725711150446
p-value for Levene's test =  0.12259792666001798
F-Critical value =  4.25649472909375
Conclusion : There is insufficient evidence of significant difference among the variances of given groups. i.e. All groups have same/nearly same variance. So we can depend on ANOVA.



## Question 5

The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.

use stats.f_oneway() module to perform one-way ANOVA test

In [101]:
print("[A] One-Way ANOVA (Analysis of Variance)")
print("----------------------------------------")

f_stat, p_value = stats.f_oneway(e1, e2, e3)
print("F-Stat = ", f_stat)
print("p-value = ", p_value)
reject_h0_using_f_stat = f_stat > f_crit

print(np.where(reject_h0_using_f_stat, 
               "Conclusion: There is a significant difference among the means of given groups.",
               "Conclusion : There isn't a significant difference among the means of given groups."))

[A] One-Way ANOVA (Analysis of Variance)
----------------------------------------
F-Stat =  2.51357622845924
p-value =  0.13574644501798466
Conclusion : There isn't a significant difference among the means of given groups.


## Question 6

*In one or two sentences explain about **TypeI** and **TypeII** errors.*

#### Answer:

Type I error describes a situation where you reject the null hypothesis when it is actually true. This type of error is also known as a "false positive" or "false hit".

Type II error describes a situation where you fail to reject the null hypothesis when it is actually false. Type II error is also known as a "false negative" or "miss".

## Question 7 

You are a manager of a chinese restaurant. You want to determine whether the waiting time to place an order has changed in the past month from its previous population mean value of 4.5 minutes. 
State the null and alternative hypothesis.

#### Answer:


### The null hypothesis is that the population mean has not changed from its previous value of 4.5 minutes. 
This is stated as

H0: µ = 4.5

The alternative hypothesis is the opposite of the null hypothesis. Because the null hypothesis is that the population mean is 4.5 minutes, the alternative hypothesis is that the population mean is not 4.5 minutes. 

This is stated as

H1 : µ not equal to 4.5

## Chi square test

## Question 8

Let's create a small dataset for dice rolls of four players

In [102]:
import numpy as np

d1 = [5, 8, 3, 8]
d2 = [9, 6, 8, 5]
d3 = [8, 12, 7, 2]
d4 = [4, 16, 7, 3]
d5 = [3, 9, 6, 5]
d6 = [7, 2, 5, 7]

dice = np.array([d1, d2, d3, d4, d5, d6])
dice

array([[ 5,  8,  3,  8],
       [ 9,  6,  8,  5],
       [ 8, 12,  7,  2],
       [ 4, 16,  7,  3],
       [ 3,  9,  6,  5],
       [ 7,  2,  5,  7]])

run the test using SciPy Stats library

Depending on the test, we are generally looking for a threshold at either 0.05 or 0.01. Our test is significant (i.e. we reject the null hypothesis) if we get a p-value below our threshold.

For our purposes, we’ll use 0.01 as the threshold.

use stats.chi2_contingency() module 

This function computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table

Print the following:

- chi2 stat
- p-value
- degree of freedom
- contingency



In [103]:
chi2stat, p_value, dof, expected = stats.chi2_contingency(dice)

print("[A] Chi-Square Test of Independence")
print("-----------------------------------")
print("chi2 = ", chi2stat)
print("p_value = ", p_value)
print("dof = ", dof)
print("expected : \n", expected)
print("")

# Extra Work
print("[B] Extra Work (Hypothesis) : ")
print("------------------------------")
print("H0 : There is no relationship between Dice Roll and Player outcome.")
print("H1 : There is a relationship between Dice Roll and Player outcome.")

chi_crit = stats.chi2.isf(q = 0.01, df = dof)
print("chi_crit = ", chi_crit)

reject_h0 = chi2stat > chi_crit

print(np.where(reject_h0, 
               "Conclusion : There is a relationship between Dice Roll and Player outcome.",
               "Conclusion : There is no relationship between Dice Roll and Player outcome."))


[A] Chi-Square Test of Independence
-----------------------------------
chi2 =  23.315671914716496
p_value =  0.07766367301496693
dof =  15
expected : 
 [[ 5.57419355  8.20645161  5.57419355  4.64516129]
 [ 6.50322581  9.57419355  6.50322581  5.41935484]
 [ 6.73548387  9.91612903  6.73548387  5.61290323]
 [ 6.96774194 10.25806452  6.96774194  5.80645161]
 [ 5.34193548  7.86451613  5.34193548  4.4516129 ]
 [ 4.87741935  7.18064516  4.87741935  4.06451613]]

[B] Extra Work (Hypothesis) : 
------------------------------
H0 : There is no relationship between Dice Roll and Player outcome.
H1 : There is a relationship between Dice Roll and Player outcome.
chi_crit =  30.577914166892494
Conclusion : There is no relationship between Dice Roll and Player outcome.


## Question 9

### Z-test

Get zscore on the above dice data using stats.zscore module from scipy. Convert zscore values to p-value and take mean of the array.

In [104]:
print("[A] Z-Scores of 6*4 Matrix")
print("--------------------------")
print("")
# Calculate the z score of each value in the sample, relative to the sample mean and standard deviation.
zscores = stats.zscore(dice)
print(zscores)
print("")

print("[B] Corresponding p-values")
print("--------------------------")
print("")
pvalues = stats.norm.cdf(zscores)
print(pvalues)
print("")

print("[C] Mean of given 6* 4 Matrix")
print("-----------------------------")
print("")
print(dice.mean())

[A] Z-Scores of 6*4 Matrix
--------------------------

[[-0.46291005 -0.18884739 -1.83711731  1.44115338]
 [ 1.38873015 -0.64208114  1.22474487  0.        ]
 [ 0.9258201   0.7176201   0.61237244 -1.44115338]
 [-0.9258201   1.62408759  0.61237244 -0.96076892]
 [-1.38873015  0.03776948  0.          0.        ]
 [ 0.46291005 -1.54854863 -0.61237244  0.96076892]]

[B] Corresponding p-values
--------------------------

[[0.32171442 0.42510621 0.03309629 0.92522932]
 [0.91754259 0.26041025 0.88966432 0.5       ]
 [0.82273026 0.76350422 0.72985431 0.07477068]
 [0.17726974 0.94782144 0.72985431 0.16833418]
 [0.08245741 0.51506426 0.5        0.5       ]
 [0.67828558 0.06074513 0.27014569 0.83166582]]

[C] Mean of given 6* 4 Matrix
-----------------------------

6.458333333333333


## Question 10

A Paired sample t-test compares means from the same group at different times.

The basic two sample t-test is designed for testing differences between independent groups. 
In some cases, you might be interested in testing differences between samples of the same group at different points in time. 
We can conduct a paired t-test using the scipy function stats.ttest_rel(). 

Test whether a weight-loss drug works by checking the weights of the same group patients before and after treatment using above data.

In [105]:
before= stats.norm.rvs(scale=30, loc=100, size=500) ## Creates a normal distribution with a mean value of 100 and std of 30
after = before + stats.norm.rvs(scale=5, loc=-1.25, size=500)

print("[A] Sample Parameters ")
print("----------------------")
n = len(before)
print("n = ", n)
dof = n - 1
print("dof = ", dof)
print("")

print("[B] Hypothesis")
print("--------------")
print("H0 : muBefore - muAfter  = 0")
print("H0 : muBefore - muAfter != 0")
print("Level of significance (alpha) : 0.05")
print("")

print("[C] Analysis")
print("------------")

t_crit = stats.t.isf(q = 0.025, df = dof)
print("t_crit = ", t_crit)

t_stat, p_value = stats.ttest_rel(before, after)
print("t_stat = ", t_stat)
print("p_value = ", p_value)
print("")
reject_h0 = t_stat > t_crit
print(np.where(reject_h0, 
               "Conclusion : There is a significant difference in before-drug and after-drug test.",
               "Conclusion : There isn't significant difference in before-drug and after-drug test."))


[A] Sample Parameters 
----------------------
n =  500
dof =  499

[B] Hypothesis
--------------
H0 : muBefore - muAfter  = 0
H0 : muBefore - muAfter != 0
Level of significance (alpha) : 0.05

[C] Analysis
------------
t_crit =  1.9647293909876653
t_stat =  8.222170537904674
p_value =  1.7415917486649281e-15

Conclusion : There is a significant difference in before-drug and after-drug test.
