# Hypothesis Testing

The purpose of the test is to tell if there is any significant difference between two data sets.



## Overview

This module covers,

1) One sample and Two sample t-tests

2) ANOVA

3) Type I and Type II errors

4) Chi-Squared Tests

## Question 1 

*A student is trying to decide between two GPUs. He want to use the GPU for his research to run Deep learning algorithms, so the only thing he is concerned with is speed.*

*He picks a Deep Learning algorithm on a large data set and runs it on both GPUs 15 times, timing each run in hours. Results are given in the below lists GPU1 and GPU2.*

In [1]:
from scipy import stats 
import numpy as np

In [2]:
GPU1 = np.array([11,9,10,11,10,12,9,11,12,9,11,12,9,10,9])
GPU2 = np.array([11,13,10,13,12,9,11,12,12,11,12,12,10,11,13])

#Assumption: Both the datasets (GPU1 & GPU 2) are random, independent, parametric & normally distributed

Hint: You can import ttest function from scipy to perform t tests 

**First T test**

*One sample t-test*

Check if the mean of the GPU1 is equal to zero.
- Null Hypothesis is that mean is equal to zero.
- Alternate hypothesis is that it is not equal to zero.

Answer:

Since we dont know the standard deviation of population, we will use t statistic. One sample t -test on GPU1.

    H0: Mu1=0
    HA: Mu1!=0
    
Since alternate hypothesis says, it is not equal to zero. We need to consider two tailed one sample t test. 

In [63]:
print('Here we select α = 0.05\n')

from scipy.stats import ttest_1samp 
Mu=0
Xbar=np.mean(GPU1)
print ('Observed Sample Mean of GPU1:',Xbar)
S=np.std(GPU1,ddof=1)
print ('Standard Deviation of GPU1:',S)
n=15
SE=S/np.sqrt(n)
print ('Standard Error:', SE)
print('Critical value:', stats.t.isf(0.025,df=n-1, loc=Mu,scale=SE) ,'and', stats.t.isf(.975,df=n-1,loc=Mu,scale=SE))
P_value= 2*(1-stats.t.cdf(Xbar,df=n-1, loc=Mu,scale=SE))
print('P value: ',P_value)
t_stat=(Xbar-Mu)/SE
print('t Statistic: ',t_stat)

print('\nUsing ttest_1samp function:')
t_statistic, p_value = ttest_1samp(GPU1, 0)
print('t_statistic:',t_statistic, ' P value: ',p_value)

print('\nSince P value is very less than 5% of level of significance, hence we reject Null Hypothesis.')
print('That means mean is not equal to zero. ')

Here we select α = 0.05

Observed Sample Mean of GPU1: 10.333333333333334
Standard Deviation of GPU1: 1.1751393027860062
Standard Error: 0.3034196632775998
Critical value: 0.6507704546500327 and -0.6507704546500326
P value:  7.105427357601002e-15
t Statistic:  34.056241516158195

Using ttest_1samp function:
t_statistic: 34.056241516158195  P value:  7.228892044970457e-15

Since P value is very less than 5% of level of significance, hence we reject Null Hypothesis.
That means mean is not equal to zero. 


## Question 2

Given,

Null Hypothesis : There is no significant difference between data sets

Alternate Hypothesis : There is a significant difference

*Do two-sample testing and check whether to reject Null Hypothesis or not.*

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

In [50]:
print ('Mean of GPU1:',np.mean(GPU1))
print ('Mean of GPU2:',np.mean(GPU2))

t_statistic,p_value = stats.ttest_ind(GPU1,GPU2)
print ('t_statistic:',t_statistic, ' P value: ',p_value)
print('\n Since P value is less than 5% of significance level hence it rejects the NULL Hypothesis. ')
print('That means there is a significant difference between the datasets.')

Mean of GPU1: 10.333333333333334
Mean of GPU2: 11.466666666666667
t_statistic: -2.627629513471839  P value:  0.013794282041452725

 Since P value is less than 5% of significance level hence it rejects the NULL Hypothesis. 
That means there is a significant difference between the datasets.


## Question 3

He is trying a third GPU - GPU3.

In [51]:
GPU3 = np.array([9,10,9,11,10,13,12,9,12,12,13,12,13,10,11])

#Assumption: Both the datasets (GPU1 & GPU 3) are random, independent, parametric & normally distributed

*Do two-sample testing and check whether there is significant differene between speeds of two GPUs GPU1 and GPU3.*

#### Answer:

In [64]:
print ('Mean of GPU1:',np.mean(GPU1))
print ('Mean of GPU3:',np.mean(GPU3))
print('Here we select α = 0.05\n')

t_statistic,p_value = stats.ttest_ind(GPU1,GPU3)
print ('t_statistic:',t_statistic, ' P value: ',p_value)
print('\n Since P value which is 14.5% which is greater than 5% of significance level hence it fails to reject NULL Hypothesis.')
print('That means there is NO significant difference between the datasets.')

Mean of GPU1: 10.333333333333334
Mean of GPU3: 11.066666666666666
Here we select α = 0.05

t_statistic: -1.4988943759093303  P value:  0.14509210993138993

 Since P value which is 14.5% which is greater than 5% of significance level hence it fails to reject NULL Hypothesis.
That means there is NO significant difference between the datasets.


## ANOVA

## Question 4 

If you need to compare more than two data sets at a time, an ANOVA is your best bet. 

*The results from three experiments with overlapping 95% confidence intervals are given below, and we want to confirm that the results for all three experiments are not significantly different.*

But before conducting ANOVA, test equality of variances (using Levene's test) is satisfied or not. If not, then mention that we cannot depend on the result of ANOVA

In [54]:
import numpy as np

e1 = np.array([1.595440,1.419730,0.000000,0.000000])
e2 = np.array([1.433800,2.079700,0.892139,2.384740])
e3 = np.array([0.036930,0.938018,0.995956,1.006970])

#Assumption: All the 3 datasets (e1,e2 & e3) are random, independent, parametric & normally distributed

Perform levene test on the data

The Levene test tests the null hypothesis that all input samples are from populations with equal variances. Levene’s test is an alternative to Bartlett’s test bartlett in the case where there are significant deviations from normality.

source: scipy.org

#### Answer:

In [106]:
print('Null Hypothesis: All input samples are from populations with equal variances')
print('Alternate Hypothesis: All input samples are NOT from populations with equal variances\n')
print('Here we select α = 0.05\n')

statistic, p_value = stats.levene(e1,e2,e3) #Since samples are normally distributed, we are using mean here
print('statistic:', statistic, 'P_value:' , p_value )
print('\nP value which is greater than 5% significance level, it fails to reject NULL Hypothesis. Means Null Hypothesis is true')
print('That means all input samples are from populations with equal variances.')

Null Hypothesis: All input samples are from populations with equal variances
Alternate Hypothesis: All input samples are NOT from populations with equal variances

Here we select α = 0.05

statistic: 2.6741725711150446 P_value: 0.12259792666001798

P value which is greater than 5% significance level, it fails to reject NULL Hypothesis. Means Null Hypothesis is true
That means all input samples are from populations with equal variances.


## Question 5

The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.

use stats.f_oneway() module to perform one-way ANOVA test

In [70]:
print('Null Hypothesis: All groups have the same population mean')
print('Alternate Hypothesis: At least one group has different mean\n')
print('Here we select α = 0.05\n')

statistic, p_value = stats.f_oneway(e1,e2,e3)
print('statistic:', statistic, 'P_value:' , p_value )
print('\nP value which is greater than 5% significance level, it fails to reject NULL Hypothesis. Means Null Hypothesis is true')
print('That means all groups have the same population mean.')

Null Hypothesis: All groups have the same population mean
Alternate Hypothesis: At least one group has different mean

Here we select α = 0.05

statistic: 2.51357622845924 P_value: 0.13574644501798466

P value which is greater than 5% significance level, it fails to reject NULL Hypothesis. Means Null Hypothesis is true
That means all groups have the same population mean.


## Question 6

*In one or two sentences explain about **TypeI** and **TypeII** errors.*

#### Answer:

Type 1 and type II errors are mistakes in testing a hypothesis. A type I error occurs when the results of research show that a difference exists but in truth there is no difference; so, the null hypothesis H0 is wrongly rejected when it is true. A type II error occurs when the null hypothesis is accepted, but the alternative is true; that is, the null hypothesis, is not rejected when it is false. 

## Question 7 

You are a manager of a chinese restaurant. You want to determine whether the waiting time to place an order has changed in the past month from its previous population mean value of 4.5 minutes. 
State the null and alternative hypothesis.

#### Answer:


As a manager, I would want to have waiting time lesser compared to earlier to attract customers or to increase customer satisfaction. SO as a quality check I would like to know if waiting time has been increased or not. 
   
   Null Hypothesis : waiting time after <= waiting time before  
Alternate Hypothesis: waiting time after > waiting time before

## Chi square test

## Question 8

Let's create a small dataset for dice rolls of four players

In [71]:
import numpy as np

d1 = [5, 8, 3, 8]
d2 = [9, 6, 8, 5]
d3 = [8, 12, 7, 2]
d4 = [4, 16, 7, 3]
d5 = [3, 9, 6, 5]
d6 = [7, 2, 5, 7]

dice = np.array([d1, d2, d3, d4, d5, d6])

run the test using SciPy Stats library

Depending on the test, we are generally looking for a threshold at either 0.05 or 0.01. Our test is significant (i.e. we reject the null hypothesis) if we get a p-value below our threshold.

For our purposes, we’ll use 0.01 as the threshold.

use stats.chi2_contingency() module 

This function computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table

Print the following:

- chi2 stat
- p-value
- degree of freedom
- contingency



In [78]:
chi2,P,DOF,Contingency = stats.chi2_contingency(dice)
print('Chi2 stat: ' , chi2)
print('p-value: ', P)
print('Degree of Fredom:', DOF)
print('Contingency: \n' ,Contingency)
print('\nP-value which is .077 which is greater than level of significance 0.01 hence we fail to reject the Null Hypothesis.')

Chi2 stat:  23.315671914716496
p-value:  0.07766367301496693
Degree of Fredom: 15
Contingency: 
 [[ 5.57419355  8.20645161  5.57419355  4.64516129]
 [ 6.50322581  9.57419355  6.50322581  5.41935484]
 [ 6.73548387  9.91612903  6.73548387  5.61290323]
 [ 6.96774194 10.25806452  6.96774194  5.80645161]
 [ 5.34193548  7.86451613  5.34193548  4.4516129 ]
 [ 4.87741935  7.18064516  4.87741935  4.06451613]]

P-value which is .077 which is greater than level of significance 0.01 hence we fail to reject the Null Hypothesis.


## Question 9

### Z-test

Get zscore on the above dice data using stats.zscore module from scipy. Convert zscore values to p-value and take mean of the array.

In [105]:
Z_d1=stats.zscore(d1,ddof=1)
print('Zscore of d1:', Z_d1)
Z_d2=stats.zscore(d2,ddof=1)
print('Zscore of d2:', Z_d2)
Z_d3=stats.zscore(d3,ddof=1)
print('Zscore of d3:', Z_d3)
Z_d4=stats.zscore(d4,ddof=1)
print('Zscore of d4:', Z_d4)
Z_d5=stats.zscore(d5,ddof=1)
print('Zscore of d5:', Z_d5)
Z_d6=stats.zscore(d6,ddof=1)
print('Zscore of d6:', Z_d6)

print('\nAssuming it is 1 tailed left tailed test\n')
print('With Zscores, it is sampled distribution which has Mu =0 and Sigma =1\n')

P_d1=stats.norm.cdf(Z_d1,loc=0,scale=1)
print('P_value of d1:', P_d1)
P_d2=stats.norm.cdf(Z_d2,loc=0,scale=1)
print('P_value of d1:', P_d2)
P_d3=stats.norm.cdf(Z_d3,loc=0,scale=1)
print('P_value of d1:', P_d3)
P_d4=stats.norm.cdf(Z_d4,loc=0,scale=1)
print('P_value of d1:', P_d4)
P_d5=stats.norm.cdf(Z_d5,loc=0,scale=1)
print('P_value of d1:', P_d5)
P_d6=stats.norm.cdf(Z_d6,loc=0,scale=1)
print('P_value of d1:', P_d6)

print('\nMean of p_values of d1: ', np.mean(P_d1))
print('Mean of p_values of d2: ', np.mean(P_d2))
print('Mean of p_values of d3: ', np.mean(P_d3))
print('Mean of p_values of d4: ', np.mean(P_d4))
print('Mean of p_values of d5: ', np.mean(P_d5))
print('Mean of p_values of d6: ', np.mean(P_d6))

Zscore of d1: [-0.40824829  0.81649658 -1.22474487  0.81649658]
Zscore of d2: [ 1.09544512 -0.54772256  0.54772256 -1.09544512]
Zscore of d3: [ 0.1823492   1.15487828 -0.06078307 -1.27644442]
Zscore of d4: [-0.59160798  1.43676223 -0.08451543 -0.76063883]
Zscore of d5: [-1.1  1.3  0.1 -0.3]
Zscore of d6: [ 0.7406129  -1.37542395 -0.10580184  0.7406129 ]

Assuming it is 1 tailed left tailed test

With Zscores, it is sampled distribution which has Mu =0 and Sigma =1

P_value of d1: [0.3415457  0.79289191 0.11033568 0.79289191]
P_value of d1: [0.86333916 0.29194121 0.70805879 0.13666084]
P_value of d1: [0.57234566 0.87592986 0.47576599 0.10089923]
P_value of d1: [0.27705657 0.92460722 0.46632332 0.22343641]
P_value of d1: [0.13566606 0.90319952 0.53982784 0.38208858]
P_value of d1: [0.77053591 0.08450002 0.45786979 0.77053591]

Mean of p_values of d1:  0.5094163004680505
Mean of p_values of d2:  0.5
Mean of p_values of d3:  0.5062351845851735
Mean of p_values of d4:  0.4728558780897726
Me

## Question 10

A Paired sample t-test compares means from the same group at different times.

The basic two sample t-test is designed for testing differences between independent groups. 
In some cases, you might be interested in testing differences between samples of the same group at different points in time. 
We can conduct a paired t-test using the scipy function stats.ttest_rel(). 

In [67]:
before= stats.norm.rvs(scale=30, loc=100, size=500) ## Creates a normal distribution with a mean value of 100 and std of 30
after = before + stats.norm.rvs(scale=5, loc=-1.25, size=500)

Test whether a weight-loss drug works by checking the weights of the same group patients before and after treatment using above data.

In [89]:
print('Null Hypothesis: Weight remains same before and after weight loss drug treatment means Mu of difference is zero')
print('Alternate Hypothesis: Weight does not remain same before and after weight loss drug treatment means Mu of difference is not equal to zero\n')
print('Here we select α = 0.05\n')


Null Hypothesis: Weight remains same before and after weight loss drug treatment means Mu of difference is zero
Alternate Hypothesis: Weight does not remain same before and after weight loss drug treatment means Mu of difference is not equal to zero

Here we select α = 0.05



In [95]:
statistic, p_value = stats.ttest_rel(after,before)
print('statistic:', statistic, 'P_value:' , p_value )
print('\nHere P value is less than 5% significance level, it rejects the NULL Hypothesis. Means Alternate Hypothesis is true.')
print('That means Weight does not remain same before and after weight loss drug treatment that proves that weight loss drug works.')

statistic: -6.0022637654511914 P_value: 3.746148844127565e-09

Here P value is less than 5% significance level, it rejects the NULL Hypothesis. Means Alternate Hypothesis is true.
That means Weight does not remain same before and after weight loss drug treatment that proves that weight loss drug works.
