# Hypothesis Testing

The purpose of the test is to tell if there is any significant difference between two data sets.



## Overview

This module covers,

1) One sample and Two sample t-tests

2) ANOVA

3) Type I and Type II errors

4) Chi-Squared Tests

## Question 1 

*A student is trying to decide between two GPUs. He want to use the GPU for his research to run Deep learning algorithms, so the only thing he is concerned with is speed.*

*He picks a Deep Learning algorithm on a large data set and runs it on both GPUs 15 times, timing each run in hours. Results are given in the below lists GPU1 and GPU2.*

In [1]:
import scipy.stats as stats 
import numpy as np

In [2]:
GPU1 = np.array([11,9,10,11,10,12,9,11,12,9,11,12,9,10,9])
GPU2 = np.array([11,13,10,13,12,9,11,12,12,11,12,12,10,11,13])

#Assumption: Both the datasets (GPU1 & GPU 2) are random, independent, parametric & normally distributed

Hint: You can import ttest function from scipy to perform t tests 

**First T test**

*One sample t-test*

Check if the mean of the GPU1 is equal to zero.
- Null Hypothesis is that mean is equal to zero.
- Alternate hypothesis is that it is not equal to zero.

### Answer:
* $H_0$: $\mu$ = 0 (Null Hypothesis is that mean is equal to zero)
* $H_A$: $\mu$ != 0 (Alternate hypothesis is that its mean is not equal to zero)
* $\alpha$ = 0.05 and Sample size(n) = 15

In [3]:
xbar = np.mean(GPU1)
print("Sample mean: {}".format(xbar))

std = np.std(GPU1,ddof=1)
print("Standard. Deviation: {}".format(std))

Sample mean: 10.333333333333334
Standard. Deviation: 1.1751393027860062


In [4]:
mu=0
n=15
se = std/np.sqrt(n)
print("Standard Error: {}".format(se))

Standard Error: 0.3034196632775998


In [5]:
print("Critical Values:")
print(stats.t.isf(0.025, df=n-1, loc=mu, scale=se))
print(stats.t.isf(0.975, df=n-1, loc=mu, scale=se))

Critical Values:
0.6507704546500327
-0.6507704546500326


In [6]:
print("P-Value:")
print(2*stats.t.cdf(xbar,df=n-1,loc=mu,scale=se))

P-Value:
1.999999999999993


### Here p-value = 1.999999999999993 which is greater that 5% significance level. Hence fail to reject Null Hypothesis, GPU1 mean is equal to zero.

## ------------------------------------------------Question1 Complete------------------------------------------------

## Question 2

Given,

Null Hypothesis : There is no significant difference between data sets

Alternate Hypothesis : There is a significant difference

*Do two-sample testing and check whether to reject Null Hypothesis or not.*

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html

### Answer:

In [7]:
xbar_gpu1 = np.mean(GPU1)
print("Sample mean GPU1: {}".format(xbar_gpu1))

xbar_gpu2 = np.mean(GPU2)
print("Sample mean GPU1: {}".format(xbar_gpu2))

Sample mean GPU1: 10.333333333333334
Sample mean GPU1: 11.466666666666667


In [8]:
std_gpu1 = np.std(GPU1,ddof=1)
print("Standard. Deviation: {}".format(std_gpu1))

std_gpu2 = np.std(GPU2,ddof=1)
print("Standard. Deviation: {}".format(std_gpu2))

Standard. Deviation: 1.1751393027860062
Standard. Deviation: 1.1872336794093274


In [9]:
#Calculate the p_value and test statistic

t_statistic, p_value  =  stats.ttest_ind(GPU1,GPU2)
print('P Value %1.3f' % p_value) 

P Value 0.014


#### p_value is 0.014 and it is less than 5% level of significance. So the statistical decision is to reject the null hypothesis and accept alternet hypothesis at 5% level of significance. Hence there is a significant difference.

## ------------------------------------------------Question2 Complete------------------------------------------------

## Question 3

He is trying a third GPU - GPU3.

In [10]:
GPU3 = np.array([9,10,9,11,10,13,12,9,12,12,13,12,13,10,11])

#Assumption: Both the datasets (GPU1 & GPU 3) are random, independent, parametric & normally distributed

*Do two-sample testing and check whether there is significant differene between speeds of two GPUs GPU1 and GPU3.*

### Answer:

In [11]:
GPU_Diff = np.array(GPU1 - GPU3)
print(GPU_Diff)

[ 2 -1  1  0  0 -1 -3  2  0 -3 -2  0 -4  0 -2]


### Null Hypothesis is that mean of difference between speed is equal to zero
* $H_0$: $\mu$ = 0 

### Alternate hypothesis is that its mean difference between speed is not equal to zero
* $H_A$: $\mu$ != 0 
* $\alpha$ = 0.05 
* Sample size(n) = 15

In [12]:
xbar_Diff = np.mean(GPU_Diff)
print("Sample mean: {}".format(xbar_Diff))

std_Diff = np.std(GPU_Diff,ddof=1)
print("Standard. Deviation: {}".format(std_Diff))

Sample mean: -0.7333333333333333
Standard. Deviation: 1.7915143899851347


In [13]:
t_statistic, p_value = stats.ttest_1samp(GPU_Diff,0)
print(p_value)

0.13520778142018045


#### p_value is 0.13520778142018045 and it is more than 5% level of significance. There is no sufficient evidence to reject the null hypothesis. Hence there is no significant differene between speeds of two GPUs GPU1 and GPU3.

## ------------------------------------------------Question3 Complete------------------------------------------------

## ANOVA

## Question 4 

If you need to compare more than two data sets at a time, an ANOVA is your best bet. 

*The results from three experiments with overlapping 95% confidence intervals are given below, and we want to confirm that the results for all three experiments are not significantly different.*

But before conducting ANOVA, test equality of variances (using Levene's test) is satisfied or not. If not, then mention that we cannot depend on the result of ANOVA

In [14]:
import numpy as np

e1 = np.array([1.595440,1.419730,0.000000,0.000000])
e2 = np.array([1.433800,2.079700,0.892139,2.384740])
e3 = np.array([0.036930,0.938018,0.995956,1.006970])

#Assumption: All the 3 datasets (e1,e2 & e3) are random, independent, parametric & normally distributed

Perform levene test on the data

The Levene test tests the null hypothesis that all input samples are from populations with equal variances. Levene’s test is an alternative to Bartlett’s test bartlett in the case where there are significant deviations from normality.

source: scipy.org

### Answer:

### Null Hypothesis is variances are equal across all samples.
* $H_0$: $\sigma_1^2$ = $\sigma_2^2$ = $\sigma_3^2$ 

### Alternate hypothesis is  variances are not equal for at least one pair
* $H_A$: $\sigma_1^2$ != $\sigma_2^2$ != $\sigma_3^2$ 

### Significance Level
* $\alpha$ = 0.05

In [15]:
#Levene test on the data
statistic, p_value = stats.levene(e1,e2,e3)
print(p_value)

0.12259792666001798


#### p_value is 0.12259792666001798 and it is more than 5% level of significance. There is no sufficient evidence to reject the null hypothesis. Hence variances are equal across all samples.

## ------------------------------------------------Question4 Complete------------------------------------------------

## Question 5

The one-way ANOVA tests the null hypothesis that two or more groups have the same population mean. The test is applied to samples from two or more groups, possibly with differing sizes.

use stats.f_oneway() module to perform one-way ANOVA test

### Answer:

In [16]:
f_statistic, p_value = stats.f_oneway(e1,e2,e3)
print(p_value)

0.13574644501798466


#### p_value is 0.13574644501798466 and it is more than 5% level of significance. There is no sufficient evidence to reject the null hypothesis.

## ------------------------------------------------Question5 Complete------------------------------------------------

## Question 6

*In one or two sentences explain about **TypeI** and **TypeII** errors.*

### Answer:

### Type I: 
#### $\alpha$ is Type-I error. It is conditional probabilty of rejecting the NULL given that NULL is TRUE. A type I error is also referred as a false positive(a result that indicates that a given condition is present when it actually is not present). 

### Type II:
#### A type II error occurs when the null hypothesis is FALSE, but erroneously fails to be rejected NULL. A type II error is often called a false negative in a test checking for a single condition with a definitive result of true or false. A type II error is committed when a true alternative hypothesis is not believed.

## ------------------------------------------------Question6 Complete------------------------------------------------

## Question 7 

You are a manager of a chinese restaurant. You want to determine whether the waiting time to place an order has changed in the past month from its previous population mean value of 4.5 minutes. 
State the null and alternative hypothesis.

### Answer:


### Null Hypothesis is that mean of difference between speed is equal to zero
* $H_0$: $\mu_w$ = 4.5

### Alternative hypothesis is that mean waiting time to place an order is not equal to 4.5(changed from previous population mean)
* $H_A$: $\mu_w$ != 4.5

## ------------------------------------------------Question7 Complete------------------------------------------------

## Chi square test

## Question 8

Let's create a small dataset for dice rolls of four players

In [17]:
import numpy as np

d1 = [5, 8, 3, 8]
d2 = [9, 6, 8, 5]
d3 = [8, 12, 7, 2]
d4 = [4, 16, 7, 3]
d5 = [3, 9, 6, 5]
d6 = [7, 2, 5, 7]

dice = np.array([d1, d2, d3, d4, d5, d6])

run the test using SciPy Stats library

Depending on the test, we are generally looking for a threshold at either 0.05 or 0.01. Our test is significant (i.e. we reject the null hypothesis) if we get a p-value below our threshold.

For our purposes, we’ll use 0.01 as the threshold.

use stats.chi2_contingency() module 

This function computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table

Print the following:

- chi2 stat
- p-value
- degree of freedom
- contingency



### Answer:

In [18]:
chi_sq_Stat, p_value, deg_freedom, exp_freq = stats.chi2_contingency(dice)
print("chi2 stat: {}".format(chi_sq_Stat))
print("p_value: {}".format(p_value))
print("deg_freedom: {}".format(deg_freedom))
print("stats.chi2_contingency: {}".format(exp_freq))

chi2 stat: 23.315671914716496
p_value: 0.07766367301496693
deg_freedom: 15
stats.chi2_contingency: [[ 5.57419355  8.20645161  5.57419355  4.64516129]
 [ 6.50322581  9.57419355  6.50322581  5.41935484]
 [ 6.73548387  9.91612903  6.73548387  5.61290323]
 [ 6.96774194 10.25806452  6.96774194  5.80645161]
 [ 5.34193548  7.86451613  5.34193548  4.4516129 ]
 [ 4.87741935  7.18064516  4.87741935  4.06451613]]


## ------------------------------------------------Question8 Complete------------------------------------------------

## Question 9

### Z-test

Get zscore on the above dice data using stats.zscore module from scipy. Convert zscore values to p-value and take mean of the array.

### Answer:

In [19]:
zscore = stats.zscore(dice)
print(zscore)

[[-0.46291005 -0.18884739 -1.83711731  1.44115338]
 [ 1.38873015 -0.64208114  1.22474487  0.        ]
 [ 0.9258201   0.7176201   0.61237244 -1.44115338]
 [-0.9258201   1.62408759  0.61237244 -0.96076892]
 [-1.38873015  0.03776948  0.          0.        ]
 [ 0.46291005 -1.54854863 -0.61237244  0.96076892]]


In [20]:
p_value_1side = stats.norm.sf(zscore)
p_value_2side = stats.norm.sf(zscore) *2

print("one-sided p-value:")
print(p_value_1side)

print("two-sided p-value:")
print(p_value_2side)

one-sided p-value:
[[0.67828558 0.57489379 0.96690371 0.07477068]
 [0.08245741 0.73958975 0.11033568 0.5       ]
 [0.17726974 0.23649578 0.27014569 0.92522932]
 [0.82273026 0.05217856 0.27014569 0.83166582]
 [0.91754259 0.48493574 0.5        0.5       ]
 [0.32171442 0.93925487 0.72985431 0.16833418]]
two-sided p-value:
[[1.35657116 1.14978757 1.93380742 0.14954135]
 [0.16491482 1.4791795  0.22067136 1.        ]
 [0.35453948 0.47299156 0.54029137 1.85045865]
 [1.64546052 0.10435712 0.54029137 1.66333163]
 [1.83508518 0.96987148 1.         1.        ]
 [0.64342884 1.87850974 1.45970863 0.33666837]]


In [21]:
p_value_1side_mean = np.mean(p_value_1side)
print("One sided pvalue mean: {}".format(p_value_1side_mean))

p_value_2side_mean = np.mean(p_value_2side)
print("Two sided pvalue mean: {}".format(p_value_2side_mean))

One sided pvalue mean: 0.49478056512575197
Two sided pvalue mean: 0.9895611302515039


## ------------------------------------------------Question9 Complete------------------------------------------------

## Question 10

A Paired sample t-test compares means from the same group at different times.

The basic two sample t-test is designed for testing differences between independent groups. 
In some cases, you might be interested in testing differences between samples of the same group at different points in time. 
We can conduct a paired t-test using the scipy function stats.ttest_rel(). 

In [22]:
before= stats.norm.rvs(scale=30, loc=100, size=500) ## Creates a normal distribution with a mean value of 100 and std of 30
after = before + stats.norm.rvs(scale=5, loc=-1.25, size=500)

Test whether a weight-loss drug works by checking the weights of the same group patients before and after treatment using above data.

### Answer
* Here Null hypothesis states that difference in weights, $\mu{After}$ equals $\mu{Before}$. 
* And Alternative hypthesis states that difference in weight is more than 0, $\mu{After}$ $\neq$ $\mu{Before}$

* $H_0$: $\mu{After}$ - $\mu{Before}$ =  0
* $H_A$: $\mu{After}$ - $\mu{Before}$ $\neq$  0
* $\alpha$ = 0.05
* sample size = 500

In [23]:
t_statistic, pvalue = stats.ttest_rel(after, before)
print(pvalue)

8.077792101161151e-12


#### p_value is 2.864038280827853e-05 and it is more than 5% level of significance. There is no sufficient evidence to reject the null hypothesis. Hence weight loss drug is not working and $\mu{After}$ = $\mu{Before}$.

## ----------------------------------------------------------END----------------------------------------------------------