# Hypothesis testing

## Z test
### One sample test

Formula: $$  z  = \frac{(\bar x -  \Delta)}{\frac{\sigma}{\sqrt n}}  $$

Question 1 : A herd of 1,500 steer was fed a special high‐protein grain for a month. A random sample of 29 were weighed and had gained an average of 6.7 pounds. If the standard deviation of weight gain for the entire herd is 7.1, test the hypothesis that the average weight gain per steer for the month was more than 5 pounds.

In [105]:
from scipy.stats import norm
from scipy import stats
from math import sqrt,ceil
import math
import numpy as np

In [108]:
# H0 => mean=5
# Ha => mean>5
# It is a one(right) tailed test
# Given population mean=5, sample mean=6.7, population sd=7.1, n=29
population_mean=5
sample_mean=6.7
population_sd=7.1
n=29

z = (sample_mean-population_mean) / (population_sd/sqrt(n))
print("z - score = ",z)

pValue = 1-norm.cdf(z)               # right tailed
print("p Value = ", pValue)

z - score =  1.2894056580462898
p Value =  0.0986285477062051


Here p>alpha(0.05) accept null hypothesis.

## ---------------------------------------------------------------------------------

Question 2 :In national use, a vocabulary test is known to have a mean score of 68 and a standard deviation of 13. A class of 19 students takes the test and has a mean score of 65. Is the class typical of others who have taken the test? Assume a significance level of p < 0.05.

In [121]:
# Let's solve the problem
# It is a two tailed test 
# If the significance level is 5% we have to put it 2.5% on both sides of the disribution
# H0 => mean=68
# Ha => mean!=68
# Given population mean=68, sample mean=65, population sd=13, n=19
population_mean=68
sample_mean=65
population_sd=13
n=19

z = (sample_mean-population_mean) / (population_sd/sqrt(n))
print("z - score = ",z)

critical_z = norm.ppf(0.025)
print("critical_z : ", critical_z)

z - score =  -1.005899756201694
critical_z :  -1.9599639845400545


On zscore is between critical z, we accept the null hypothesis.

## ---------------------------------------------------------------------

Question 3 :A sample of 12 machine pins has a mean diameter of 1.15 inches, and the population standard deviation is known to be 0.04. What is a 99 percent confidence interval of diameter width for the population?

In [25]:
# n=12, sample mean=1.15,  population standard deviation = 0.04

In [110]:
# we have to calculate the z score at pvalue = 1-0.99 i.e., 0.01
# since it is two tailed, we have to calculate z score at both sides i.e., 0.005
# calculating zscore on left tail
zscore = norm.ppf(0.005)
print("zscore : ",zscore)

# calculating interval
val1 = 1.15 + zscore * (0.04/sqrt(12))
val2 = 1.15 - zscore * (0.04/sqrt(12))

print("Interval_99 : ("+str(val1)+","+str(val2)+")")

zscore :  -2.575829303548901
Interval_99 : (1.1202568851641903,1.1797431148358095)


## -------------------------------------------------------------------------
Question 4:How many subjects will be needed to find the average age of students at Fisher College plus or minus a year, with a 95 percent significance level and a population standard deviation of 3.5?

In [41]:
# n=?, pValue = 0.05 which is 0.025 on both sides
# critical z score = 1.96
# let sample mean=x and sample size=n

# eq1 => x-1 = x - 1.96 * (3.5 / sqrt(n) )
# eq2 => x+1 = x + 1.96 * (3.5 / sqrt(n) )

# eq2-eq1 => 
n = pow(1.96*3.5,2)

print("Sample size : ", math.ceil(n))

Sample size :  48


### Two sample test

Formula : $$ z = \frac{\bar x_1 - \bar x_2 - \Delta}{\sqrt{ \frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2} }} $$

Question 5:The amount of a certain trace element in blood is known to vary with a standard deviation of 14.1 ppm (parts per million) for male blood donors and 9.5 ppm for female donors. Random samples of 75 male and 50 female donors yield concentration means of 28 and 33 ppm, respectively. What is the likelihood that the population means of concentrations of the element are the same for men and women?

In [123]:
# H0 => mean1=mean2
# Ha => mean1!=mean2

zscore = ((28-33)-(0)) / sqrt( (14.1**2 / 75) + (9.5**2) / 50 )
print("zscore : ",zscore)

# it is a two tailed test over significance level 5%
critical_z = norm.ppf(0.025)
print("critical_z : ", critical_z)

zscore :  -2.368684181472862
critical_z :  -1.9599639845400545


since zscore not in critical range,  reject the null hypothesis

## -------------------------------------------------------------------------------
## T test

### One sample test

Formula: $$  z  = \frac{(\bar x -  \Delta)}{\frac{s}{\sqrt n}}  $$

Question 6:A professor wants to know if her introductory statistics class has a good grasp of basic math. Six students are chosen at random from the class and given a math proficiency test. The professor wants the class to be able to score above 70 on the test. The six students get scores of 62, 92, 75, 68, 83, and 95. Can the professor have 90 percent confidence that the mean score for the class on the test would be above 70?

In [118]:
# H0 => mean=70
# Ha => mean>70
# It is a upper tailed test (one tailed) significance level = 0.1
n=6
df=n-1

sample = np.array([62, 92, 75, 68, 83, 95])
sample_mean = np.mean(sample)
sample_std  = sqrt(sum(np.square(sample-sample_mean))/(n-1))

print("sample_mean : %s,  sample_std : %s"%(sample_mean, sample_std))
tstat = (sample_mean-70)/(sample_std/sqrt(6))
print("tstat : ",tstat)
pValue = 1-stats.t.cdf(tstat,df=df)
print("pValue : ",pValue)

sample_mean : 79.16666666666667,  sample_std : 13.166877635440631
tstat :  1.705313636019149
pValue :  0.07442681355650138


Since p value < 0.1, we can reject the null hypothesis

## -------------------------------------------------------------------

Question 7:A Little League baseball coach wants to know if his team is representative of other teams in scoring runs. Nationally, the average number of runs scored by a Little League team in a game is 5.7. He chooses five games at random in which his team scored 5 , 9, 4, 11, and 8 runs. Is it likely that his team's scores could have come from the national distribution? Assume an alpha level of 0.05.

In [130]:
# p_mean = 5.7
# significance level = 0.05
# H0 => mean = 5.7
# Ha => mean != 5.7
# It is a two tailed test
n=5;df=4;p_mean=5.7
sample = np.array([5,9,4,11,8])

sample_mean = np.mean(sample)
sample_std  = sqrt(sum(np.square(sample-sample_mean))/(n-1))

print("sample_mean : %s,  sample_std : %s"%(sample_mean, sample_std))
tstat = (sample_mean-p_mean)/(sample_std/sqrt(n))
print("tstat : ",tstat)
critical_t = stats.t.ppf(0.025,df=df)
print("critical_t : ", critical_t)

sample_mean : 7.4,  sample_std : 2.8809720581775866
tstat :  1.319455893700766
critical_t :  -2.7764451051977996


Since critical t between critical range,  you accept the null hypothesis.

## ------------------------------------------------------------------------------------

Question 8:Using the previous example, what is a 95 percent confidence interval for runs scored per team per game?

In [132]:
# calculating tstat on left tail
n=5;df=4;p_mean=5.7
sample = np.array([5,9,4,11,8])

sample_mean = np.mean(sample)
sample_std  = sqrt(sum(np.square(sample-sample_mean))/(n-1))

tstat_95 = stats.t.ppf(0.05/2,df=df)
print("tstat_95 : ",tstat_95)
val1 = sample_mean + tstat_95 * (sample_std/(sqrt(n)))
val2 = sample_mean - tstat_95 * (sample_std/(sqrt(n)))
print("Interval_95 : ("+str(val1)+","+str(val2)+")")

tstat_95 :  -2.7764451051977996
Interval_95 : (3.822800715529883,10.977199284470117)


## --------------------------------------------------------------------------
### Two sample test

Formula : $$ z = \frac{\bar x_1 - \bar x_2 - \Delta}{\sqrt{ \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} }}  $$

Question 9:An experiment is conducted to determine whether intensive tutoring (covering a great deal of material in a fixed amount of time) is more effective than paced tutoring (covering less material in the same amount of time). Two randomly chosen groups are tutored separately and then administered proficiency tests. Use a significance level of α < 0.05.

For intensive : n = 12, xbar = 46.31, s = 6.44

For paced     : n = 10, xbar = 42.79, s = 7.92

In [137]:
# H0 => mean1 = mean2
# Ha => mean1 > mean2
# tstat = ((sample_mean1-sample_mean2)-(mu1-mu2)) / sqrt( (ssd1**2 / n1) + (ssd2**2) / n2 )
df = 10+12-2
tstat = ((46.31-42.79)-(0)) / sqrt( (6.44**2 / 12) + (7.92**2) / 10 )
print("tstat : ",tstat)
# it is a one tailed test
pValue = 1-stats.t.cdf(0.05,df=df)
print("pValue : ",pValue)

tstat :  1.1285313315395324
pValue :  0.48030918549939483


Null hypothesis can not be rejected

## -----------------------------------------------------------------------
Question 10:Estimate a 90 percent confidence interval for the difference between the number of raisins per box in two brands of breakfast cereal.


Brand A : n = 6, xbar = 102.1, s = 12.3

Brand B : n = 9, xbar = 93.6, s = 7.52

In [138]:
df = 9+6-2 # smaller one
# significance level = 0.1
tstat = stats.t.ppf(0.05,df=df)
print(tstat)
val1 = 102.1-93.6 + tstat * sqrt( (12.3**2 / 6) + (7.52**2) / 9 )
val2 = 102.1-93.6 - tstat * sqrt( (12.3**2 / 6) + (7.52**2) / 9 )
print("Interval_95 : ("+str(val1)+","+str(val2)+")")

-1.7709333959867992
Interval_95 : (-1.439083096975132,18.439083096975132)


## ----------------------------------------------------------------------------

### Paired difference test

Formula : $$ fstat = \frac{(\bar x -  \Delta)}{\frac{s}{\sqrt n}} $$

Question 11:A farmer decides to try out a new fertilizer on a test plot containing 10 stalks of corn. Before applying the fertilizer, he measures the height of each stalk. Two weeks later, he measures the stalks again, being careful to match each stalk's new height to its previous one. The stalks would have grown an average of 6 inches during that time even without the fertilizer. Did the fertilizer help? Use a significance level of 0.05.


Before height = 35.5,31.7,31.2,36.3,22.8,28,24.6,26.1,34.5,27.7

After height = 45.3,36,38.6,44.7,31.4,33.5,28.8,35.8,42.9,35.0

In [140]:
# H0 => mean = 6
# Ha => mean > 6
# it is one(upper tailed test)
before_height = np.array([35.5,31.7,31.2,36.3,22.8,28,24.6,26.1,34.5,27.7])
after_height  = np.array([45.3,36,38.6,44.7,31.4,33.5,28.8,35.8,42.9,35.0])
hypothesised_diff=6
n=10
df = 10+10-2

change = after_height - before_height
print("change : ", change)

change_mean = np.mean(change)
change_std  = sqrt(sum(np.square(change-change_mean))/(10-1))
print("change_mean : %s, change_std : %s"%(change_mean, change_std))

tstat = (change_mean-hypothesised_diff)/(change_std/sqrt(n))
print("tstat",tstat)

pValue = 1-stats.t.cdf(tstat,df=df) # upper tailed
print("pValue",pValue)


change :  [9.8 4.3 7.4 8.4 8.6 5.5 4.2 9.7 8.4 7.3]
change_mean : 7.359999999999999, change_std : 2.053290042833695
tstat 2.0945397523545712
pValue 0.02531441610669749


Since pvalue<0.05 , null hypothesis is rejected

### Wilcoxon Signed-Rank Test (A non parametric test) For paired sample
It is used to test whether the distributions of two paired samples are equal or not.

In [141]:
# H0 => mean1 = mean2
# Ha => mean1v != mean2
alpha = 0.05
from scipy.stats import wilcoxon
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
stat, p = wilcoxon(data1,data2)
print(stat, p)
if p >= alpha:
    print('Accept null hypothesis')
else:
    print('Reject null hypothesis')

21.0 0.5076243443095237
Accept null hypothesis


## ------------------------------------------------------------------------------
## ANOVA and F-test

### One way anova

$$ fstat = \frac{between \ group\ variability}{within \ group\ variability} $$

In [89]:
# assuming level of significance=5%
# H0 => the means of the samples are equal.
# Ha => one or more of the means of the samples are unequal.
from scipy.stats import f_oneway
alpha = 0.05
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
data3 = [-0.208, 0.696, 0.928, -1.148, -0.213, 0.229, 0.137, 0.269, -0.870, -1.204]

stat, p = f_oneway(data1, data2, data3)
print(stat, p)
if p >= alpha:
    print('Accept null hypothesis')
else:
    print('Reject null hypothesis')

0.09641783499925058 0.9083957433926546
Accept null hypothesis


### Kruskal - Wallis H test (A non parametric test) For two or more samples
Used to test whether the distributions of two or more independent samples are equal or not.

In [90]:
# H0 => the distributions of all samples are equal.
# Ha => the distributions of one or more samples are not equal.
from scipy.stats import kruskal
alpha = 0.05
data1 = [0.873, 2.817, 0.121, -0.945, -0.055, -1.436, 0.360, -1.478, -1.637, -1.869]
data2 = [1.142, -0.432, -0.938, -0.729, -0.846, -0.157, 0.500, 1.183, -1.075, -0.169]
stat, p = kruskal(data1, data2)
print(stat, p)
if p >= alpha:
    print('Accept null hypothesis')
else:
    print('Reject null hypothesis')

0.5714285714285694 0.4496917979688917
Accept null hypothesis


## Chi-Squared Test ( A non parametric test)

$$ \chi^2 = \frac{(Observed-Expected)^2}{Expected} $$

In [148]:
# significance level = 5%
# H0 => Same
# Ha => Different
table = [[10, 20, 30],
         [6,  9,  17]]
stat, p, dof, expected = stats.chi2_contingency(table)
alpha = 0.05

if p > alpha:
    print('Accept null hypothesis')
else:
    print('Reject null hypothesis')

Accept null hypothesis


$$ Thank You $$