# Hypothesis Testing

The purpose of the test is to tell if there is any significant difference between two data sets.



## Overview

This module covers,

1) One sample and Two sample t-tests

2) ANOVA

3) Type I and Type II errors

4) Chi-Squared Tests

## Question 1 

*A student is trying to decide between two GPUs. He want to use the GPU for his research to run Deep learning algorithms, so the only thing he is concerned with is speed.*

*He picks a Deep Learning algorithm on a large data set and runs it on both GPUs 15 times, timing each run in hours. Results are given in the below lists GPU1 and GPU2.*

In [2]:
from scipy import stats 
from scipy.stats import ttest_1samp
from scipy.stats import ttest_ind
from scipy.stats import t
from scipy.stats import f
import numpy as np

In [3]:
GPU1 = np.array([11,9,10,11,10,12,9,11,12,9,11,12,9,10,9])
GPU2 = np.array([11,13,10,13,12,9,11,12,12,11,12,12,10,11,13])

#Assumption: Both the datasets (GPU1 & GPU 2) are random, independent, parametric & normally distributed

Hint: You can import ttest function from scipy to perform t tests 

**First T test**

*One sample t-test*

Check if the mean of the GPU1 is equal to zero.
- Null Hypothesis is that mean is equal to zero.
- Alternate hypothesis is that it is not equal to zero.

In [4]:
mu="\u03BC"
alp="\u03B1"
not_equal="\u2260"
alpha=0.05

In [77]:
print("h0:",mu,"= 0")
print("\nh1:",mu,not_equal,"0\n")
print(alp,":",alpha)
print("\n{Two-Tailed Test}\n\nt critical value against given alpha is: +/-",t.ppf(1-(alpha/2),df=(len(GPU1)-1)))


t_statistic, p_value = stats.ttest_1samp(GPU1, 0)
print("\nt-statistic is: %.3f" %t_statistic)
print("\np-value is: %.7f" %p_value)

if(p_value<alpha):
    print("\nObservation: p_value < ",alp)
    print("\nConclusion: Reject h0. Mean of GPU1 is not equal to 0")
elif(p_value>=alpha):
    print("\nFail to reject h0. Mean of GPU1 is equal to 0")


h0: μ = 0

h1: μ ≠ 0

α : 0.05

{Two-Tailed Test}

t critical value against given alpha is: +/- 2.1447866879169273

t-statistic is: 34.056

p-value is: 0.0000000

Observation: p_value <  α

Conclusion: Reject h0. Mean of GPU1 is not equal to 0


## Question 2

Given,

Null Hypothesis : There is no significant difference between data sets

Alternate Hypothesis : There is a significant difference

*Do two-sample testing and check whether to reject Null Hypothesis or not.*

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

In [72]:
def calculate_population_variablity(sample1,sample2):
    sample1_variance=np.var(sample1)
    sample2_variance=np.var(sample2)
    degrees_of_freedom_num=0
    degrees_of_freedom_den=0
    f_statistics=0
    if(sample1_variance>sample2_variance):
        degrees_of_freedom_num=len(sample1)-1
        degrees_of_freedom_den=len(sample2)-1
        f_statistics=sample1_variance/sample2_variance
    elif(sample2_variance>sample1_variance):
        degrees_of_freedom_num=len(sample2)-1
        degrees_of_freedom_den=len(sample1)-1
        f_statistics=sample2_variance/sample1_variance
   
    f_critical_value=f.ppf(q=1-(alpha/2),dfn=degrees_of_freedom_num,dfd=degrees_of_freedom_den)
    
    print(alp,":",alpha)
    print("\n{Two-Tailed Test} \n\nf_critical_value against given alpha is: +/-",f_critical_value)
    print("\nf_statistics:",f_statistics)
  
    if(f_statistics>f_critical_value):
        print("\nf_statistics > f_critical_value")
        print("\nVariability in samples. Cannot Proceed with the Pooled Variance T-Test")
    else:
        print("\nf_statistics < f_critical_value")
        print("\nNo Variability in samples. Proceed with the Pooled Variance T-Test ")
        pooled_variance_ttest(sample1,sample2)
        return
        
              
    

In [73]:
def pooled_variance_ttest(sample1,sample2):
    print("\n\n")
    print("\nLet",mu+"1 be the mean of sample1")
    print("\nLet",mu+"2 be the mean of sample2")
    print("\n")
    print("h0:",mu+"1 =",mu+"2")
    print("\nh1:",mu+"1",not_equal,mu+"2\n")
    print(alp,":",alpha)
    print("\n{Two-Tailed Test} \n\nt critical value against given alpha is: +/-",t.ppf((1-(alpha/2)),df=len(sample1)+len(sample2)-2))

    t_statistic,p_value=ttest_ind(sample1,sample2)
    print("\nt_statistic is:",t_statistic)
    print("\np_value is:",p_value)

    if(p_value<alpha):
        print("\nObservation: p_value < ",alp)
        print("\nConclusion: Reject h0. There is significant difference between speed of given two GPUs")
    elif(p_value>=alpha):
        print("\nObservation: p_value > ",alp)
        print("\nConclusion: Fail to Reject h0. There is no significant difference between speed of given two GPUs")
    

In [79]:
calculate_population_variablity(GPU1,GPU2)

α : 0.05

{Two-Tailed Test} 

f_critical_value against given alpha is: +/- 2.97858752410188

f_statistics: 1.020689655172414

f_statistics < f_critical_value

No Variability in samples. Proceed with the Pooled Variance T-Test 




Let μ1 be the mean of sample1

Let μ2 be the mean of sample2


h0: μ1 = μ2

h1: μ1 ≠ μ2

α : 0.05

{Two-Tailed Test} 

t critical value against given alpha is: +/- 2.048407141795244

t_statistic is: -2.627629513471839

p_value is: 0.013794282041452725

Observation: p_value <  α

Conclusion: Reject h0. There is significant difference between speed of given two GPUs


## Question 3

He is trying a third GPU - GPU3.

In [75]:
GPU3 = np.array([9,10,9,11,10,13,12,9,12,12,13,12,13,10,11])

#Assumption: Both the datasets (GPU1 & GPU 3) are random, independent, parametric & normally distributed

*Do two-sample testing and check whether there is significant differene between speeds of two GPUs GPU1 and GPU3.*

#### Answer:

In [76]:
calculate_population_variablity(GPU1,GPU3)

α : 0.05

{Two-Tailed Test} 

f_critical_value against given alpha is: +/- 2.97858752410188

f_statistics: 1.6000000000000008

f_statistics < f_critical_value

No Variability in samples. Proceed with the Pooled Variance T-Test 




Let μ1 be the mean of sample1

Let μ2 be the mean of sample2


h0: μ1 = μ2

h1: μ1 ≠ μ2

α : 0.05

{Two-Tailed Test} 

t critical value against given alpha is: +/- 2.048407141795244

t_statistic is: -1.4988943759093303

p_value is: 0.14509210993138993

Observation: p_value >  α

Conclusion: Fail to Reject h0. There is no significant difference between speed of given two GPUs


## ANOVA

## Question 4 

If you need to compare more than two data sets at a time, an ANOVA is your best bet. 

*The results from three experiments with overlapping 95% confidence intervals are given below, and we want to confirm that the results for all three experiments are not significantly different.*

But before conducting ANOVA, test equality of variances (using Levene's test) is satisfied or not. If not, then mention that we cannot depend on the result of ANOVA

In [28]:
import numpy as np

e1 = np.array([1.595440,1.419730,0.000000,0.000000])
e2 = np.array([1.433800,2.079700,0.892139,2.384740])
e3 = np.array([0.036930,0.938018,0.995956,1.006970])

#Assumption: All the 3 datasets (e1,e2 & e3) are random, independent, parametric & normally distributed

Perform levene test on the data

The Levene test tests the null hypothesis that all input samples are from populations with equal variances. Levene’s test is an alternative to Bartlett’s test bartlett in the case where there are significant deviations from normality.

source: scipy.org

In [78]:
#Calculate the f-critical-value
total_groups=3
total_elements=12
f_critical_value=f.ppf(q=1-(alpha),dfn=total_groups-1,dfd=total_elements-total_groups)
print("{Right-Tailed Test} \n\nf_critical_value is:",f_critical_value,"\n")
print(alp,": 0.05")

#Calculate the population variability
f_stats,p_value=stats.levene(e1,e2,e3)
print("\nf_statistics:",f_stats)
print("\np-value is:",p_value)

if(f_stats>f_critical_value):
    print("\nObservation: f_statistics > f_critical_value")
    print("\nPopulation from which samples are drawn do not have equal variances ")
else:
    print("\nObservation: f_statistics < f_critical_value")
    print("\nPopulation from which samples are drawn have equal variances. Proceed with ANOVA Test")

{Right-Tailed Test} 

f_critical_value is: 4.25649472909375 

α : 0.05

f_statistics: 2.6741725711150446

p-value is: 0.12259792666001798

Observation: f_statistics < f_critical_value

Population from which samples are drawn have equal variances. Proceed with ANOVA Test


## Question 5

The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.

use stats.f_oneway() module to perform one-way ANOVA test

In [36]:
print("\nLet",mu+"1 be the mean of e1")
print("\nLet",mu+"2 be the mean of e2")
print("\nLet",mu+"3 be the mean of e3")

print("\n")
print("h0:",mu+"1 =",mu+"2","=",mu+"3")
print("\nh1:",mu+"1",not_equal,mu+"2",not_equal,mu+"3")

#Calculate the f-critical-value
total_groups=3
total_elements=12
f_critical_value=f.ppf(q=1-(alpha),dfn=total_groups-1,dfd=total_elements-total_groups)
print("\n\n{Right Tailed Test} \n\nf_critical_value is:",f_critical_value,"\n")
print(alp,": 0.05")

f_stats,p_value=stats.f_oneway(e1,e2,e3)
print("\nf_statistics is:",f_stats)
print("\np_value is:",p_value)

if(f_stats>f_critical_value):
    print("\nObservation: f_statistics > f_critical_value")
    print("\nReject h0. Groups have different mean")
else:
    print("\nObservation: f_statistics < f_critical_value")
    print("\nFail to Reject h0. Groups have same mean")
    


Let μ1 be the mean of e1

Let μ2 be the mean of e2

Let μ3 be the mean of e3


h0: μ1 = μ2 = μ3

h1: μ1 ≠ μ2 ≠ μ3


{Right Tailed Test} 

f_critical_value is: 4.25649472909375 

α : 0.05

f_statistics is: 2.51357622845924

p_value is: 0.13574644501798466

Observation: f_statistics < f_critical_value

Fail to Reject h0. Groups have same mean


## Question 6

*In one or two sentences explain about **TypeI** and **TypeII** errors.*


In [None]:
Type I Errors: 

Type I Error occurs when you reject the null hypothesis even when null hypothesis is TRUE

Type I Error is also called as Significance Level

Type II Errors:
    
Type II Error occurs when you fail to reject null hypothesis even when null hypothesis is FALSE

Power: Probability(Reject h0 | h0 is false)  that is probability of rejecting the null hypothesis

       when it is False
      
       that is: 1-P(Not Rejecting H0 | h0 is false) {Type II Error}
       
       that is: Power can also be defined as probability of not committing the Type II Error
       

Higher the Significance Level: Probability of committing the Type I Error will be more 

but Probability of committing Type II Error will be less and hence Power will be more

As per given problem, depending upon what is defined as null hypothesis and what is defined as  

alternate hypothesis,in some scenerios, committing Type I Error can be costly and in some scenerios 

committing Type II Error can be costly



## Question 7 

You are a manager of a chinese restaurant. You want to determine whether the waiting time to place an order has changed in the past month from its previous population mean value of 4.5 minutes. 
State the null and alternative hypothesis.



Let mu be the population mean

(Null Hypothesis) h0: mu = 4.5

(Alternate Hypothesis) h1: mu is not equal to 4.5

## Chi square test

## Question 8

Let's create a small dataset for dice rolls of four players

In [39]:
import numpy as np

d1 = [5, 8, 3, 8]
d2 = [9, 6, 8, 5]
d3 = [8, 12, 7, 2]
d4 = [4, 16, 7, 3]
d5 = [3, 9, 6, 5]
d6 = [7, 2, 5, 7]

dice = np.array([d1, d2, d3, d4, d5, d6])

run the test using SciPy Stats library

Depending on the test, we are generally looking for a threshold at either 0.05 or 0.01. Our test is significant (i.e. we reject the null hypothesis) if we get a p-value below our threshold.

For our purposes, we’ll use 0.01 as the threshold.

use stats.chi2_contingency() module 

This function computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table

Print the following:

- chi2 stat
- p-value
- degree of freedom
- contingency



In [40]:
#Get the degrees of freedom
degrees_of_freedom=dice.size-sum(dice.shape)+dice.ndim-1
print("Degrees of Freedom is:",degrees_of_freedom)

Degrees of Freedom is: 15


In [42]:
threshold=0.01
chi2_critical_value=stats.chi2.ppf((1-threshold),degrees_of_freedom)
print("{Right Tailed Test} \n\nchi2 critical value is:",chi2_critical_value,"\n")
print(alp,":",threshold)

{Right Tailed Test} 

chi2 critical value is: 30.57791416689249 

α : 0.01


In [44]:
chi2_stats,p_value,dof,contingency=stats.chi2_contingency(dice)

print("\nchi2_statistics is:",chi2_stats)
print("\np_value is:",p_value)
print("\nDegrees of Freedom is:",dof)
print("\nContingency Table for Expected Values is:\n\n",contingency)


chi2_statistics is: 23.315671914716496

p_value is: 0.07766367301496693

Degrees of Freedom is: 15

Contingency Table for Expected Values is:

 [[ 5.57419355  8.20645161  5.57419355  4.64516129]
 [ 6.50322581  9.57419355  6.50322581  5.41935484]
 [ 6.73548387  9.91612903  6.73548387  5.61290323]
 [ 6.96774194 10.25806452  6.96774194  5.80645161]
 [ 5.34193548  7.86451613  5.34193548  4.4516129 ]
 [ 4.87741935  7.18064516  4.87741935  4.06451613]]


## Question 9

### Z-test

Get zscore on the above dice data using stats.zscore module from scipy. Convert zscore values to p-value and take mean of the array.

In [56]:
zscore=stats.zscore(dice,axis=1)

print("zscore array is:\n",zscore)

p_values=stats.norm.pdf(zscore)

print("\n\np-value array is:\n",p_values)

print("\nMean of p-values is:",p_values.mean())

zscore array is:
 [[-0.47140452  0.94280904 -1.41421356  0.94280904]
 [ 1.26491106 -0.63245553  0.63245553 -1.26491106]
 [ 0.21055872  1.33353857 -0.07018624 -1.47391105]
 [-0.68313005  1.65903012 -0.09759001 -0.87831007]
 [-1.27017059  1.5011107   0.11547005 -0.34641016]
 [ 0.85518611 -1.58820278 -0.12216944  0.85518611]]


p-value array is:
 [[0.35698924 0.25579397 0.14676266 0.25579397]
 [0.17925632 0.32662631 0.32662631 0.17925632]
 [0.39019603 0.1639652  0.39796087 0.13464071]
 [0.31591823 0.10074839 0.39704707 0.2712667 ]
 [0.17806525 0.12930191 0.39629151 0.37570969]
 [0.27675845 0.11302655 0.39597618 0.27675845]]

Mean of p-values is: 0.2641973460944451


## Question 10

A Paired sample t-test compares means from the same group at different times.

The basic two sample t-test is designed for testing differences between independent groups. 
In some cases, you might be interested in testing differences between samples of the same group at different points in time. 
We can conduct a paired t-test using the scipy function stats.ttest_rel(). 

In [58]:
before= stats.norm.rvs(scale=30, loc=100, size=500) ## Creates a normal distribution with a mean value of 100 and std of 30
after = before + stats.norm.rvs(scale=5, loc=-1.25, size=500)

Test whether a weight-loss drug works by checking the weights of the same group patients before and after treatment using above data.

In [62]:
print("\nLet",mu+"1 be the mean weight before taking drug")
print("\nLet",mu+"2 be the mean weight after taking drug")

print("\n")
print("h0:",mu+"1 =",mu+"2")
print("\nh1:",mu+"1",not_equal,mu+"2\n")
print(alp,":",alpha)

t_critical_value=t.ppf(1-(alpha/2),df=before.size-1)
print("\n{Two-Tailed Test} \n\nt_critical_value against given alpha is: +/-",t_critical_value)

t_stats,p_value=stats.ttest_rel(before,after)

print("\nt-statistics is:",t_stats)
print("\np-value is: %.7f" %p_value)

if(p_value<alpha):
    print("\nObservation: p_value < alpha")
    print("\nReject H0. Significant difference in weight before and after taking drug")
else:
    print("\nObservation: p_value >= alpha")
    print("\nFail to reject H0. No change in weight before and after taking the drug")




Let μ1 be the mean weight before taking drug

Let μ2 be the mean weight after taking drug


h0: μ1 = μ2

h1: μ1 ≠ μ2

α : 0.05

{Two-Tailed Test} 

t_critical_value against given alpha is: +/- 1.9647293909876649

t-statistics is: 4.7125916240354995

p-value is: 0.0000032

Observation: p_value < alpha

Reject H0. Significant difference in weight before and after taking drug
