## Examples of Hypothesis testing

In [1]:
from scipy import stats
import pandas as pd 
import scipy

- A supermarket plans to launch a loyalty program if it results in an average spending per shopper of more than 120 dollar per week. A random sample of 80 shoppers enrolled in the pilot program spent an average of 130dollar in a week with a std. deviation of 40dollar. Should the loyalty program be launched?

In [2]:
# mu-population mean-120 ,n-sample size-80, x-130, s-40
# H0: population mean < 120
# HA: population mean >= 120
# Decide test: Population paramter unknown. so we use t-test.
# one tail test- We consider only one possibility
# confidence level-95% (0.95) , alpha= 0.05

# static value= 2.33

In [3]:
stats.t.cdf(2.33,df = 79)  #left hand side probability

0.9888199812854052

In [4]:
p = 1-stats.t.cdf(2.23,df = 79)  #Here we get right hand side probability
p  #p- value

0.014292908802574056

In [5]:
alpha = 0.05
if p < alpha:
    print('Reject H0. Do not launched the program.')
else:
    print('Fail to accept H0. Launched the program.')

Reject H0. Do not launched the program.


## One sample test

- Ex. An outbreak of Salmonella-related illness was attributed to ice cream produced at a certain factory. Scientists measured the level of Salmonella in 9 randomly sampled batches of ice cream. 

The levels (in MPN/g) were 0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418. 

Is there evidence that the mean level of Salmonella in the ice cream is greater than 0.3 MPN/g?

In [6]:
# population mean= 0.3
# n=sample size=9
'''If Salmonella level < 0.3 then ice-cream is safe.
If Salmonella level >= 0.3 then ice cream is harmful.
we considered ice cream is safe to consume until we dont have any proof-> Null hypothesis'''
# H0: population mean < 0.3
# HA: population mean >= 0.3
# t-test
# one tail test


'If Salmonella level < 0.3 then ice-cream is safe.\nIf Salmonella level >= 0.3 then ice cream is harmful.\nwe considered ice cream is safe to consume until we dont have any proof-> Null hypothesis'

In [7]:
data = pd.Series([0.593, 0.142, 0.329, 0.691, 0.231, 0.793, 0.519, 0.392, 0.418])
data

0    0.593
1    0.142
2    0.329
3    0.691
4    0.231
5    0.793
6    0.519
7    0.392
8    0.418
dtype: float64

In [8]:
scipy.stats.ttest_1samp(data,0.3)
#This gives two tail test p value i.e addition of prpbabilities of both regions.

TtestResult(statistic=2.2050588385131595, pvalue=0.05853032968489765, df=8)

In [9]:
s,p = scipy.stats.ttest_1samp(data,0.3)

In [10]:
s

2.2050588385131595

In [11]:
p  #p-value

0.05853032968489765

In [12]:
p_value = p/2  #for one tail test 
p_value

0.029265164842448826

In [13]:
alpha = 0.05
if p_value < alpha:
    print('Reject H0. Ice-cream factory is causing the problem.')
else:
    print('Fail to reject H0. Ice-cream factory is not causing the problem.')

Reject H0. Ice-cream factory is causing the problem.


## Two sample test

- Ex- 6 Subjects were given a drug (treatment group) and an additional 6 subjects a placebo (control group). Their reaction time to a stimulus was measured (in ms). We want to perform a two-sample t-test for comparing the means of the treatment and control groups.

Control : 91, 87, 99, 77, 88, 91

Treat :101, 110, 103, 93, 99, 104

In [14]:
# Actual effect -> Treatment != Control - Medicine is effective because patients feeling good due to medicine.
# Placebo effct -> Treatment = Control - Both are same means people not feeling good beacuse of medicine.
# placebo effect - Patients takes medicine and they think we getting well beacause of medicine but acutually medicine not work peples psycological think is the reason behind it.

In [15]:
# H0: T=C -> Medicine is not effective
# HA : T=! C -> Medicine is effective

In [16]:
Control = pd.Series([91, 87, 99, 77, 88, 91])
Treat = pd.Series([101, 110, 103, 93, 99, 104]) 

In [17]:
Control

0    91
1    87
2    99
3    77
4    88
5    91
dtype: int64

In [18]:
Treat

0    101
1    110
2    103
3     93
4     99
5    104
dtype: int64

In [19]:
 stats.ttest_ind(Control,Treat)

TtestResult(statistic=-3.4456126735364876, pvalue=0.006272124350809803, df=10.0)

In [20]:
s,p =stats.ttest_ind(Control,Treat)

In [21]:
s,p

(-3.4456126735364876, 0.006272124350809803)

In [22]:
alpha =0.05
if p < alpha:
    print('Reject Null Hypothesis. Medicine is having pharmaceutical effect.')
else:
    print('Fail to reject Null Hypothesis. Medicine is not effective.')

Reject Null Hypothesis. Medicine is having pharmaceutical effect.


## Two Proportion t-test 

In [23]:
import numpy as np

In [24]:
#Data:
n1 = 247
p1 = 0.37

n2 = 308
p2 = 0.39

In [25]:
population1 = np.random.binomial(1, p1, n1)
population2 = np.random.binomial(1, p2, n2)

In [26]:
population1

array([1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1,
       0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1,
       0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1,
       0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0,
       0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1,
       0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0,
       0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0,
       0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
       0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1,
       0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 1,
       1, 0, 0, 0, 1])

In [27]:
population2

array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1,
       1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1,
       1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0,
       0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1,
       0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0,
       0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0,
       0, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0,
       1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0,
       0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0,
       0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0,
       0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0,
       1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1,
       1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0,

In [28]:
len(population1)

247

In [29]:
len(population2)

308

In [30]:
import statsmodels.api as sm
t,p,df = sm.stats.ttest_ind(population1, population2)

In [31]:
t

0.7376498966453473

In [32]:
p

0.46104010583509225

In [33]:
df

553.0

In [34]:
if p < 0.05:
    print('Reject Null Hypothesis')
else:
    print('Fail to reject Null hypothesis. Proportions of samples are equal.')

Fail to reject Null hypothesis. Proportions of samples are equal.


## Anova test or F-test

In [35]:
df = pd.read_csv('https://raw.githubusercontent.com/aishwaryamate/Datasets/main/Iris.csv',index_col=0)
df

URLError: <urlopen error [Errno 11001] getaddrinfo failed>

In [None]:
df['SepalLengthCm']

In [None]:
f ,p = stats.f_oneway(df['SepalWidthCm'],df['SepalLengthCm'],df['PetalLengthCm'],df['PetalWidthCm'])

In [None]:
f

In [None]:
p

In [None]:
if p < 0.05:
    print('Reject Null Hypothesis. At least one sample is different.')
else:
    print('Fail to reject Null Hypothesis.All samples are same.')

## Chi2 -Test 

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/aishwaryamate/Python/main/Statistics/chi2.csv')
df

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/aishwaryamate/Python/main/Statistics/chi2.csv',index_col=0)
df

In [None]:
from scipy.stats import chi2_contingency

In [None]:
obs = pd.crosstab(index=df['Athlete'],columns=df['Smoker'])
obs

In [None]:
chi2,p,df,exp  = chi2_contingency(obs)

In [None]:
chi2

In [None]:
p

In [None]:
if p < 0.05:
    print('Reject Null Hypothesis.Columns are dependent on each other.')
else:
    print('Fail to reject null hypothesis. Columns are independent to each other')