# Hypothesis Testing

The purpose of the test is to tell if there is any significant difference between two data sets.



## Overview

This module covers,

1) One sample and Two sample t-tests

2) ANOVA

3) Type I and Type II errors

4) Chi-Squared Tests

## Question 1 

*A student is trying to decide between two GPUs. He want to use the GPU for his research to run Deep learning algorithms, so the only thing he is concerned with is speed.*

*He picks a Deep Learning algorithm on a large data set and runs it on both GPUs 15 times, timing each run in hours. Results are given in the below lists GPU1 and GPU2.*

In [52]:
from scipy import stats 
import numpy as np
from scipy.stats import ttest_1samp, ttest_ind,mannwhitneyu,levene,shapiro,wilcoxon
from statsmodels.stats.power import ttest_power
import pandas as pd
from matplotlib import pyplot as plt

In [53]:
GPU1 = np.array([11,9,10,11,10,12,9,11,12,9,11,12,9,10,9])
GPU2 = np.array([11,13,10,13,12,9,11,12,12,11,12,12,10,11,13])

#Assumption: Both the datasets (GPU1 & GPU 2) are random, independent, parametric & normally distributed

Hint: You can import ttest function from scipy to perform t tests 

**First T test**

*One sample t-test*

Check if the mean of the GPU1 is equal to zero.
- Null Hypothesis is that mean is equal to zero.
- Alternate hypothesis is that it is not equal to zero.

In [54]:
gpu1a = np.mean(GPU1)
print(gpu1a)
gpu2a = np.mean(GPU2)
print(gpu2a)

10.333333333333334
11.466666666666667


In [55]:
t_statistic, p_value = ttest_1samp(GPU1,gpu1a)

In [56]:
print(t_statistic, p_value)

0.0 1.0


In [57]:
if p_value < 0.05:    # alpha value is 0.05 or 5%
   print(" we are rejecting null hypothesis")
else:
  print("we are accepting null hypothesis")

we are accepting null hypothesis


## Question 2

Given,

Null Hypothesis : There is no significant difference between data sets

Alternate Hypothesis : There is a significant difference

*Do two-sample testing and check whether to reject Null Hypothesis or not.*

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

In [58]:
t_statistic, p_value = ttest_ind(GPU1, GPU2)
print(t_statistic,p_value)

-2.627629513471839 0.013794282041452725


In [59]:
if p_value < 0.05:    # alpha value is 0.05 or 5%
   print(" we are rejecting null hypothesis")
else:
  print("we are accepting null hypothesis")

 we are rejecting null hypothesis


## Question 3

He is trying a third GPU - GPU3.

In [60]:
from scipy.stats import ttest_1samp, ttest_ind,mannwhitneyu,levene,shapiro,wilcoxon
from statsmodels.stats.power import ttest_power
GPU3 = np.array([9,10,9,11,10,13,12,9,12,12,13,12,13,10,11])

#Assumption: Both the datasets (GPU1 & GPU 3) are random, independent, parametric & normally distributed

*Do two-sample testing and check whether there is significant differene between speeds of two GPUs GPU1 and GPU3.*

#### Answer:

In [61]:
gpu3a = np.mean(GPU3)
print(gpu3a)

11.066666666666666


In [62]:
t_statistic, p_value = ttest_ind(GPU1, GPU3)
print(t_statistic,p_value)

-1.4988943759093303 0.14509210993138993


In [63]:
u, p_value = mannwhitneyu(GPU1,GPU3)
print(u, p_value)

79.0 0.08037248132236419


In [64]:
t_statistic,p_value = ttest_1samp(GPU3-GPU1,0)
print(t_statistic,p_value)

1.585355832526882 0.13520778142018045


In [65]:
z_statistic,p_value = wilcoxon(GPU3-GPU1)
print(z_statistic,p_value)

13.0 0.135545449694452


## ANOVA

## Question 4 

If you need to compare more than two data sets at a time, an ANOVA is your best bet. 

*The results from three experiments with overlapping 95% confidence intervals are given below, and we want to confirm that the results for all three experiments are not significantly different.*

But before conducting ANOVA, test equality of variances (using Levene's test) is satisfied or not. If not, then mention that we cannot depend on the result of ANOVA

In [66]:
import numpy as np
from scipy.stats import ttest_1samp, ttest_ind,mannwhitneyu,levene,shapiro,wilcoxon
from statsmodels.stats.power import ttest_power
import pandas as pd
from matplotlib import pyplot as plt

e1 = np.array([1.595440,1.419730,0.000000,0.000000])
e2 = np.array([1.433800,2.079700,0.892139,2.384740])
e3 = np.array([0.036930,0.938018,0.995956,1.006970])

#Assumption: All the 3 datasets (e1,e2 & e3) are random, independent, parametric & normally distributed

Perform levene test on the data

The Levene test tests the null hypothesis that all input samples are from populations with equal variances. Levene’s test is an alternative to Bartlett’s test bartlett in the case where there are significant deviations from normality.

source: scipy.org

#### Answer:

In [67]:
t_statistic, p_value = stats.levene(e1, e2,e3)
print(t_statistic,p_value)

2.6741725711150446 0.12259792666001798


In [68]:
t_statistic, p_value = ttest_ind(e1, e2)
print(t_statistic,p_value)

-1.717241805824198 0.1367445156591996


In [69]:
u, p_value = mannwhitneyu(e1,e2)
print(u, p_value)

3.0 0.09563349343943439


In [70]:
t_statistic,p_value = ttest_1samp(e2-e1,0)
print(t_statistic,p_value)

1.7779303837799494 0.17347797110091911


In [71]:
z_statistic,p_value = wilcoxon(e2-e1)
print(z_statistic,p_value)

1.0 0.14412703481601533


In [72]:
levene(e1,e2)

LeveneResult(statistic=2.502047337898582, pvalue=0.16478397645307163)

In [73]:
t_statistic, p_value = ttest_ind(e2, e3)
print(t_statistic,p_value)

2.3307461823929563 0.05858088218526373


In [74]:
u, p_value = mannwhitneyu(e2,e3)
print(u, p_value)

3.0 0.09696542614120535


In [75]:
t_statistic,p_value = ttest_1samp(e3-e2,0)
print(t_statistic,p_value)

-2.6693631881468565 0.07573147423508503


In [76]:
z_statistic,p_value = wilcoxon(e3-e2)
print(z_statistic,p_value)

1.0 0.14412703481601533


In [77]:
levene(e2,e3)

LeveneResult(statistic=1.1400525071736345, pvalue=0.32670615872898084)

In [78]:
t_statistic, p_value = ttest_ind(e3, e1)
print(t_statistic,p_value)

-0.018778416184578622 0.9856267549440176


In [79]:
u, p_value = mannwhitneyu(e3,e1)
print(u, p_value)

8.0 0.44227471942648117


In [80]:
t_statistic,p_value = ttest_1samp(e3-e1,0)
print(t_statistic,p_value)

-0.014951819285598852 0.9890093859816056


In [81]:
z_statistic,p_value = wilcoxon(e3-e1)
print(z_statistic,p_value)

5.0 1.0


In [82]:
levene(e3,e1)

LeveneResult(statistic=4.721691050668699, pvalue=0.07276576314239869)

## Question 5

The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.

use stats.f_oneway() module to perform one-way ANOVA test

In [86]:
e4 = stats.f_oneway(e1,e2,e3)
e4

F_onewayResult(statistic=2.51357622845924, pvalue=0.13574644501798466)

## Question 6

*In one or two sentences explain about **TypeI** and **TypeII** errors.*

#### Answer:

Type I error, also known as a “false positive”: the error of rejecting a null hypothesis when it is actually true. In other words, this is the error of accepting an alternative hypothesis (the real hypothesis of interest) when the results can be attributed to chance. Plainly speaking, it occurs when we are observing a difference when in truth there is none (or more specifically - no statistically significant difference). So the probability of making a type I error in a test with rejection region R is 0 P R H ( | is true).

Type II error, also known as a "false negative": the error of not rejecting a null hypothesis when the alternative hypothesis is the true state of nature. In other words, this is the error of failing to accept an alternative hypothesis when you don't have adequate power. Plainly speaking, it occurs when we are failing to observe a difference when in truth there is one. So the probability of making a type II error in a test with rejection region R is 1 ( | is true) − P R Ha . The power of the test can be ( | is true) P R Ha. 

## Question 7 

You are a manager of a chinese restaurant. You want to determine whether the waiting time to place an order has changed in the past month from its previous population mean value of 4.5 minutes. 
State the null and alternative hypothesis.

#### Answer:


Step 1: The null hypothesis is that the population mean has not changed from its previous value of 4.5 minutes:H0: µ = 4.5The alternative hypothesis is the opposite of the null hypothesis, the alternative hypothesis is that the population mean is not 4.5 minutes:H1: µ ≠ 4.5

Step 2 : You have selected a sample of n =25. The level of significance is .05 (that is α = .05)

Step 3 : Because σ is known, you use normal distribution and the Z test static

Step 4: Because α = 0.05, the critical value of the Z statistic are -1.96 and +1.96. The rejection region is Z < -1.96 or Z > + 1.96

Step 5: You collect the sample data and compute X = 5.1. Compute test statistic using the equation5.1 – 4.5= = 2.501.2√ 25

Step 6 : Because Z = 2.50 > 1.96, you reject the null hypothesis and conclude that there is evidence that the population mean waiting time to place an order has changed from its previous value of 4.5 minutes. The mean waiting time for customers is longer now than it was last month


## Chi square test

## Question 8

Let's create a small dataset for dice rolls of four players

In [83]:
import numpy as np
from scipy.stats import chisquare,chi2_contingency

d1 = [5, 8, 3, 8]
d2 = [9, 6, 8, 5]
d3 = [8, 12, 7, 2]
d4 = [4, 16, 7, 3]
d5 = [3, 9, 6, 5]
d6 = [7, 2, 5, 7]

dice = np.array([d1, d2, d3, d4, d5, d6])

run the test using SciPy Stats library

Depending on the test, we are generally looking for a threshold at either 0.05 or 0.01. Our test is significant (i.e. we reject the null hypothesis) if we get a p-value below our threshold.

For our purposes, we’ll use 0.01 as the threshold.

use stats.chi2_contingency() module 

This function computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table

Print the following:

- chi2 stat
- p-value
- degree of freedom
- contingency



In [87]:
chi_sq_stat, p_value, deg_freedom, exp_freq = stats.chi2_contingency(dice)

In [94]:
print ('Chi Square Test-',chi_sq_stat, 'P-Value', p_value, 'degree of freedom',deg_freedom, 
       'contingency -', exp_freq)

Chi Square Test- 23.315671914716496 P-Value 0.07766367301496693 degree of freedom 15 contingency - [[ 5.57419355  8.20645161  5.57419355  4.64516129]
 [ 6.50322581  9.57419355  6.50322581  5.41935484]
 [ 6.73548387  9.91612903  6.73548387  5.61290323]
 [ 6.96774194 10.25806452  6.96774194  5.80645161]
 [ 5.34193548  7.86451613  5.34193548  4.4516129 ]
 [ 4.87741935  7.18064516  4.87741935  4.06451613]]


## Question 9

### Z-test

Get zscore on the above dice data using stats.zscore module from scipy. Convert zscore values to p-value and take mean of the array.

In [95]:
print ("\nZ-score for dice : \n", stats.zscore(dice, axis = 0)) 
print ("\nZ-score for dice : \n", stats.zscore(dice, axis = 1)) 


Z-score for dice : 
 [[-0.46291005 -0.18884739 -1.83711731  1.44115338]
 [ 1.38873015 -0.64208114  1.22474487  0.        ]
 [ 0.9258201   0.7176201   0.61237244 -1.44115338]
 [-0.9258201   1.62408759  0.61237244 -0.96076892]
 [-1.38873015  0.03776948  0.          0.        ]
 [ 0.46291005 -1.54854863 -0.61237244  0.96076892]]

Z-score for dice : 
 [[-0.47140452  0.94280904 -1.41421356  0.94280904]
 [ 1.26491106 -0.63245553  0.63245553 -1.26491106]
 [ 0.21055872  1.33353857 -0.07018624 -1.47391105]
 [-0.68313005  1.65903012 -0.09759001 -0.87831007]
 [-1.27017059  1.5011107   0.11547005 -0.34641016]
 [ 0.85518611 -1.58820278 -0.12216944  0.85518611]]


In [99]:
dice1 = stats.zscore(dice)
p_values = stats.norm.sf(abs(dice1))
p_values

array([[0.32171442, 0.42510621, 0.03309629, 0.07477068],
       [0.08245741, 0.26041025, 0.11033568, 0.5       ],
       [0.17726974, 0.23649578, 0.27014569, 0.07477068],
       [0.17726974, 0.05217856, 0.27014569, 0.16833418],
       [0.08245741, 0.48493574, 0.5       , 0.5       ],
       [0.32171442, 0.06074513, 0.27014569, 0.16833418]])

## Question 10

A Paired sample t-test compares means from the same group at different times.

The basic two sample t-test is designed for testing differences between independent groups. 
In some cases, you might be interested in testing differences between samples of the same group at different points in time. 
We can conduct a paired t-test using the scipy function stats.ttest_rel(). 

In [108]:
import numpy as np
from scipy.stats import ttest_1samp, ttest_ind,mannwhitneyu,levene,shapiro,wilcoxon
from statsmodels.stats.power import ttest_power
import pandas as pd
from matplotlib import pyplot as plt

before= stats.norm.rvs(scale=30, loc=100, size=500) ## Creates a normal distribution with a mean value of 100 and std of 30
after = before + stats.norm.rvs(scale=5, loc=-1.25, size=500)

Test whether a weight-loss drug works by checking the weights of the same group patients before and after treatment using above data.

In [109]:
t_statistic,p_value = stats.ttest_rel(before,after)
print(t_statistic,p_value)

7.002424986574714 8.166494531475022e-12
