# Hypothesis tests

## Terms
### Statistics
mean
median
mode
skewness
kurtosis
std
var

### Population

### Sample

### Null Hypothesis (Ho)

### Alternative Hypothesis(H1/Ha)

### Level of Significance
Confidence interval is generally called alpha (α). The typical value of alpha (α) is 0.05. This statement implies that there is 95% confidence for the valid conclusion of the test.
### P-Value



### Tail - 1 or 2
-  2 Tail
    -  Ho : U1 = U2  ; Ha : U1 != U2 ; alpha = .05 
-  1 Tail
    - Left : Ho : U1 <= U2  ; Ha : U1 != U2 ;  alpha = .05/2 
    - Left : Ho : U1 >= U2  ; Ha : U1 < U2  ;  alpha = .05/2

In [1]:
import numpy as np
p_value = np.array([0.04])
alpha = np.array([0.04])

In [2]:
if (p_value > alpha):    print ('Accept Ho & Reject Ha')
else:    print ('Reject Ho in favour of Ha')

Reject Ho in favour of Ha


In [3]:
def testCheck(tn=None, pv=None, a=.05):
    if (pv > a ):        print (tn, ' : Accept Ho & Reject Ha : pv=', pv.round(2), ':: a= ', a, ' T/F pv > a is ' , pv < a)
    else:        print (tn, ' : Reject Ho in favour of Ha : pv=', pv.round(2), ':: a= ', a, ' T/F pv < a is ', pv < a)

In [4]:
testCheck(tn='Stats', pv=p_value, a=alpha)

Stats  : Reject Ho in favour of Ha : pv= [0.04] :: a=  [0.04]  T/F pv < a is  [False]


## Python Libraries
-  import scipy.stats as stats
-  from scipy.stats import ttest_1samp  # 1 sample t-test
-  t_value, p_value = stats.ttest_ind(data1, data2) # 2 sample t-test Independent
-  t_value, p_value = stats.ttest_rel(data1, data2)  # 2 sample t-test Dependent/Paired
-  https://docs.scipy.org/doc/scipy/reference/stats.html

# Assumptions in these parameteric tests
- Whether the two samples data groups are independent.
- Whether the data elements in respective groups follow any normal distribution.
- Whether the given two samples have similar variances. This assumption is also known as the homogeneity assumption.

In [5]:
#import libraries
import numpy as np
import pandas as pd
import scipy.stats as stats

In [6]:
np.set_printoptions(precision=4)

## Z tests
- z-statistic has a normal distribution
-  As per central limit theorem, observations are assumed to be roughly normally distributed as sample size increases.
-  Hence the z-test is most effective for samples bigger than 30.

In [7]:
from scipy import stats  
from statsmodels.stats import weightstats as stests  

In [8]:
data1 = [89, 93, 95, 93, 97, 98, 96, 99, 93, 97,  110, 104, 119, 105, 104, 110, 110, 112, 115, 114]
print(len(data1), np.mean(data1))

20 102.65


In [9]:
alpha = 0.05
z_test , p_value = stests.ztest(data1, x2 = None, value = 160)
testName = 'Z-Test : 1 sample'
print(testName, z_test.round(3), p_value,  p_value.round(3))

Z-Test : 1 sample -29.114 2.417334226169332e-186 0.0


In [10]:
testCheck(testName, p_value, alpha)

Z-Test : 1 sample  : Reject Ho in favour of Ha : pv= 0.0 :: a=  0.05  T/F pv < a is  True


### Z-test : 2-sample

In [11]:
data1 = [83, 85, 86, 90, 90, 93, 93, 95, 97, 97, 106, 108, 106, 108, 111, 113, 113, 112, 116, 111]  
data2 = [92, 92, 90, 93, 93, 97, 94, 98, 109, 108, 110, 117, 110, 115, 114, 114, 130, 130, 149, 131]  
testName = 'Z-Test : 2 sample : '  
print(len(data1), len(data2))
z_test ,p_value = stests.ztest(x1=data1, x2 = data2, value = 0, alternative = 'two-sided')  
print(testName, z_test.round(3), p_value,  p_value.round(3)) 

20 20
Z-Test : 2 sample :  -1.976 0.04813782199434202 0.048


In [12]:
testCheck(testName, p_value, alpha)

Z-Test : 2 sample :   : Reject Ho in favour of Ha : pv= 0.05 :: a=  0.05  T/F pv < a is  True


# T - Tests
- Independent Samples T-test: This test is used to compare the averages or means for two groups.
- Paired Sample T-test: This test is used to compare means from the same group at different times (For example, one year apart).
- One Sample T-test: This test is used to test the mean of a single group against an acknowledged mean.

## t-test - 1 Sample
-  Sample average differs statistically from an actual or assumed population mean
-  Sample size < 30
<br> eg
-  Average age of 10 staff is 30
-  Average salaries of 25 staff is 50000

In [13]:
from scipy.stats import ttest_1samp  
data1 = [45, 89, 23, 46, 12, 69, 45, 24, 34, 67]  
np.mean(data1)

45.4

In [14]:
np.mean(data1) == 30

False

In [15]:
testName = 'T-Test : 1-sample : '
t_test, p_value = ttest_1samp(data1, 30)
alpha = .05
print(testName, t_test.round(2), p_value, p_value.round(4), alpha)

T-Test : 1-sample :  2.04 0.07179988272763561 0.0718 0.05


In [16]:
testCheck(testName, p_value, alpha)  # 2 tail alpha = alpha

T-Test : 1-sample :   : Accept Ho & Reject Ha : pv= 0.07 :: a=  0.05  T/F pv > a is  False


## t-test - 2 Sample (Independent)
2 tails
Ho : U1  = U2
Ha : U1 != U2
egs:
-  Is mean salary of Males and Females same.. Here males and females will form different population group, hence idependent
-  Is mean age of staff from HR and Marketing Dept same... Here 2 depts will form different population from where samples will be drawn

### TTest-2SI
2 Different classes - Samples are independent
We collect heights of 15 students out of 60 from each class
-  Ho : mean(class1) = mean(class2)
-  Ho : mean(class1) != mean(class2)


In [17]:
class1 = np.array([14, 15, 15, 16, 13, 8, 14,17, 16, 14, 19, 20, 21, 15, 15, 16, 16, 13, 14, 12])
class2 = np.array([15, 17, 14, 17, 14, 8, 12,19, 19, 14, 17, 22, 24, 16, 13, 16, 13, 18, 15, 13])
print(len(class1), len(class2))
print(np.var(class1).round(2), np.var(class2).round(2)) #variance
print(np.mean(class1).round(2), np.mean(class2).round(2))  #mean

20 20
7.73 12.26
15.15 15.8


In [18]:
# Perform the two sample t-test with equal variances
alpha=.05
t_stats, p_value = stats.ttest_ind(a=class1, b=class2, equal_var=True)
print(p_value.round(2), ' : is pvalue < a level for rejecting Ho : ', p_value < alpha)
#annot reject the null hypothesis of the test.
#We do not have sufficient evidence to say that the mean height of students between the two data groups is different.

0.53  : is pvalue < a level for rejecting Ho :  False


#### Eg TTest - 2S - Indep

In [19]:
import random
gender = random.choices(['M','F'], k=1000, weights=[.6, .4])
#pd.Series(gender).value_counts()
sal = np.random.randint(low=30000, high=200000, size=1000)
salaries = pd.DataFrame({'gender':gender, 'salary':sal})
salaries.head()

Unnamed: 0,gender,salary
0,M,70368
1,F,35884
2,M,70664
3,F,59589
4,F,49353


In [20]:
salaries.groupby('gender').agg(['count', np.mean]).reset_index().round()

Unnamed: 0_level_0,gender,salary,salary
Unnamed: 0_level_1,Unnamed: 1_level_1,count,mean
0,F,410,114807.0
1,M,590,110990.0


In [21]:
# 10 samples of each of male and female
mSal = salaries.loc[salaries['gender']=='M', 'salary'].sample(n=10).tolist()
fSal = salaries.loc[salaries['gender']=='M', 'salary'].sample(n=10).tolist()
genderSal = pd.DataFrame({'mSal':mSal, 'fSal':fSal})
genderSal

Unnamed: 0,mSal,fSal
0,86810,148932
1,168353,76887
2,189414,96843
3,125728,135463
4,122573,199411
5,157609,98025
6,123799,161529
7,107623,56775
8,146515,62441
9,69323,150549


In [22]:
genderSal.agg([np.mean, np.std, 'count', np.min, np.max]).round()

Unnamed: 0,mSal,fSal
mean,129775.0,118686.0
std,36811.0,47389.0
count,10.0,10.0
amin,69323.0,56775.0
amax,189414.0,199411.0


In [23]:
# H test
testName = ': T-Test 2-sample IND : '
from scipy.stats import ttest_ind  
alpha = .05
t_test,p_value = ttest_ind(genderSal.mSal, genderSal.fSal)
print(testName, t_test.round(2), p_value.round(2), p_value.round(4), alpha)

: T-Test 2-sample IND :  0.58 0.57 0.5662 0.05


In [24]:
testCheck(testName, p_value, alpha) 

: T-Test 2-sample IND :   : Accept Ho & Reject Ha : pv= 0.57 :: a=  0.05  T/F pv > a is  False


In [25]:
#eg - Part 2: R Style Data frame : statsmodels
from statsmodels.stats.weightstats import ttest_ind

In [26]:
t_stats, p_value, dof = ttest_ind(genderSal['mSal'], genderSal['fSal'])
print('T_Stats :', t_stats.round(2), ' P Value : ', p_value.round(2), ' Degrees of Freedom ', dof)

T_Stats : 0.58  P Value :  0.57  Degrees of Freedom  18.0


In [27]:
testCheck(testName, p_value, alpha)
#average height of students of class1 is not statistically different from class2

: T-Test 2-sample IND :   : Accept Ho & Reject Ha : pv= 0.57 :: a=  0.05  T/F pv > a is  False


In [28]:
#end

## t-test - 2 Sample (Dependent) / Paired
2 tails  : Ho : U1 = U2 ; Ha : U1 <= U2 
1 tail   : Ho : U1 = U2 ; Ha : U1 != U2
<br> Same set of people (Paired/Dependent); Measure if there was any change.
<br> Medicene is sent to market if there are positive changes only (trials)
egs
-  Persons BP before and after medications
-  Performance of staff before and after training
-  Performance of staff during covid and after covid

In [29]:
alpha = 0.05
beforeTrg =[23, 20, 19, 21, 18, 20, 18, 17, 23, 16, 19] #d1
afterTrg = [24, 19, 22, 18, 20, 22, 20, 20, 23, 20, 18] #d2

In [30]:
print(np.mean(beforeTrg).round(2)), print(np.mean(afterTrg).round(2))
print(len(beforeTrg))
#some improvement, but statistically check the data

19.45
20.55
11


In [31]:
testName = 'T-Test 2-sample Paired/Dependent '
t_value,p_value=stats.ttest_rel(beforeTrg, afterTrg)

In [32]:
print(testName, 'T-Stats = ', t_value.round(2) , ' : P-Value = ', p_value.round(2), ' : Significance Level alpha = ', alpha)

T-Test 2-sample Paired/Dependent  T-Stats =  -1.71  : P-Value =  0.12  : Significance Level alpha =  0.05


In [33]:
testCheck(testName, p_value, alpha) 

T-Test 2-sample Paired/Dependent   : Accept Ho & Reject Ha : pv= 0.12 :: a=  0.05  T/F pv > a is  False


In [34]:
#employees have not benefitted from the training 
#Sampling was done for 100 times. More than 5 times , 
#it was found that mean values before and after trg did not vary much at 5% significance level

## Anova 
1 way F-test : Compare 2 group
if there are more than 2 groups, each group is compared with other in combination
It can be determined whether groups are similar or not 

In [35]:
#data
np.set_printoptions(precision=4)
data1 = [0.0842, 0.0368, 0.0847, 0.0935, 0.0376, 0.0963, 0.0684, 0.0758, 0.0854, 0.0855]  
data2 = [0.0785, 0.0845, 0.0758, 0.0853, 0.0946, 0.0785, 0.0853, 0.0685]  
data3 = [0.0864, 0.2522, 0.0894, 0.2724, 0.0853, 0.1367, 0.853] 
print('\n Data', data1, '\n',data2, '\n', data3)
print("\n Length :",  np.count_nonzero(data1),  np.count_nonzero(data2),  len(data3), sep='\t')
print("\n Mean : ", np.mean(data1).round(2), np.mean(data2).round(2), np.mean(data3).round(2), sep='\t')
print("\n SD : ", np.std(data1), np.std(data2), np.std(data3), sep='\t' )


 Data [0.0842, 0.0368, 0.0847, 0.0935, 0.0376, 0.0963, 0.0684, 0.0758, 0.0854, 0.0855] 
 [0.0785, 0.0845, 0.0758, 0.0853, 0.0946, 0.0785, 0.0853, 0.0685] 
 [0.0864, 0.2522, 0.0894, 0.2724, 0.0853, 0.1367, 0.853]

 Length :	10	8	7

 Mean : 	0.07	0.08	0.25

 SD : 	0.020235997627989583	0.0073216715987539345	0.2553831111671008


In [36]:
import scipy.stats 
testName = 'Anova (F-Test) : more than 2 group'
f_test, p_value = scipy.stats.f_oneway(data1, data2, data3)  

In [37]:
print(testName, f_test.round(2), p_value.round(2), p_value.round(4), alpha)

Anova (F-Test) : more than 2 group 3.72 0.04 0.0404 0.05


In [38]:
testCheck(testName, p_value, alpha) 
#No 2 groups are same

Anova (F-Test) : more than 2 group  : Reject Ho in favour of Ha : pv= 0.04 :: a=  0.05  T/F pv < a is  True


### eg Blood Pressure : Before and After ----
https://www.pythonfordatascience.org/independent-samples-t-test-python/#ttest_scipy_stats

In [39]:
import researchpy as rp

ModuleNotFoundError: No module named 'researchpy'

In [None]:
df = pd.read_csv("https://raw.githubusercontent.com/researchpy/Data-sets/master/blood_pressure.csv")
df.info()

In [None]:
df.head()

In [None]:
rp.ttest(group1= df['bp_after'][df['sex'] == 'Male'], group1_name= "Male",   group2= df['bp_after'][df['sex'] == 'Female'], group2_name= "Female")


In [None]:
summary, results = rp.ttest(group1= df['bp_after'][df['sex'] == 'Male'], group1_name= "Male",
                            group2= df['bp_after'][df['sex'] == 'Female'], group2_name= "Female")
print(summary)

In [None]:
print(results)

In [None]:
import scipy.stats as stats
stats.ttest_ind(df['bp_after'][df['sex'] == 'Male'],  df['bp_after'][df['sex'] == 'Female'])

In [None]:
#here is a statistically significant difference in the average post blood pressure 
#between males and females, t= 3.3480, p= 0.001.

## Chi-Square
- This test is used when two categorized variables are from the same population. Its purpose is to decide if the two elements are significantly associated. chi-square test evaluating independence.
<br> eg
- employees giving feedback on org culture : Gender(M/F) - Feedback Rating(Good/Satisfactory/Poor). Does Gender has effect on ratings
- Large Sample size, 2 nominal/categorical variables
- Types 
    -  Independence : whether feedback type is related to gender type
    -  Goodness of fit : Values of feedback fit our belief from gender types

In [None]:
from scipy.stats import chi2_contingency  
testName = 'Chi Square Test for Independence'

Ho : Gender and Feedback variables do not have correlation
H1 : Gender and Feedback variables have correlation : Males & Female have pattern in giving feedback

In [40]:
#create data
random.seed(100)
gender = random.choices(['M','F'], k=1000, weights=[.6, .4])
random.seed(115)
feedback = random.choices(['Good','Satisfactory','Poor'], k=1000, weights=[.4, .3,.2])
empFeedback = pd.DataFrame({'gender':gender, 'feedback':feedback})
empFeedback.head()

Unnamed: 0,gender,feedback
0,M,Poor
1,M,Good
2,F,Good
3,F,Good
4,F,Satisfactory


In [41]:
empF1 = pd.crosstab(empFeedback['gender'],empFeedback['feedback'])
empF1

feedback,Good,Poor,Satisfactory
gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
F,149,98,156
M,251,140,206


In [42]:
empF1.sum(axis=1)
#data = [[149, 98, 156],[251, 140, 206]]

gender
F    403
M    597
dtype: int64

In [43]:
empF1.div(empF1.sum(axis=1), axis=0).round(2)

feedback,Good,Poor,Satisfactory
gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
F,0.37,0.24,0.39
M,0.42,0.23,0.35


In [44]:
#data = [[231, 256, 321], [245, 312, 213]]  
data = [[149, 98, 156],[251, 140, 206]]
print(np.array(data))

[[149  98 156]
 [251 140 206]]


In [45]:
test, p_value, dof, expected_val = chi2_contingency(data)  
print(testName, f_test.round(2), p_value.round(2), p_value.round(4), alpha)

NameError: name 'chi2_contingency' is not defined

In [46]:
test, p_value, dof, expected_val = chi2_contingency(empF1)  
print(testName, f_test.round(2), p_value.round(2), p_value.round(4), alpha)

NameError: name 'chi2_contingency' is not defined

In [47]:
testCheck(testName, p_value, alpha) 
#if pvalue < .05, strong correlation between gender and feedback type else gender and feedback are independent
#here pvalue > .05, hence, the gender and feedback are not related

Anova (F-Test) : more than 2 group  : Reject Ho in favour of Ha : pv= 0.04 :: a=  0.05  T/F pv < a is  True


In [48]:
data = [[207, 282, 241], [234, 242, 232]]
stat, p, dof, expected = chi2_contingency(data)
print(p) #  Ho - indep

NameError: name 'chi2_contingency' is not defined

In [360]:
data = [[149, 98, 156],[251, 140, 206]]
stat, p, dof, expected = chi2_contingency(data)
print(p) # Ho - indep

0.24695302988409282


In [363]:
M=[200,150,50]
F=[250,300,50]
cn=["BJP","Congress", "Independent"]
data = pd.DataFrame({'BJP':[200, 250], 'Congress':[150, 300], 'Independent':[50, 50]}, index=['M','F'])
print(data)
stat, p, dof, expected = chi2_contingency(data)
print(p) # Reject Ho, Accept Ha : gender-Party dependent

   BJP  Congress  Independent
M  200       150           50
F  250       300           50
0.0003029775487145488


In [364]:
data.div(data.sum(axis=1), axis=0).round(2)

Unnamed: 0,BJP,Congress,Independent
M,0.5,0.38,0.12
F,0.42,0.5,0.08


### Goodness of Fit
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.goodness_of_fit.html
res = stats.goodness_of_fit(stats.rayleigh, x, statistic='cvm',  known_params={'loc': 0}, random_state=rng)
res.fit_result , res.statistic, res.pvalue
chi_square_test_statistic, p_value = stats.chisquare( observed_data, expected_data)
-  Questions
    -  Equal proportions of M & F employees
    -  Equal proportion of ...
    -  More Good Feedback and very less Poor feedback
https://www.scribbr.com/statistics/chi-square-goodness-of-fit/    

In [381]:
from scipy import stats

In [382]:
# no of hours a staff works in day k of a week vs expected no of hours
observed_data = [8, 6, 10, 7, 8, 11, 9] 
expected_data = [9, 8, 11, 8, 10, 7, 6]
print(pd.Series(expected_data)/sum(expected_data))

0    0.152542
1    0.135593
2    0.186441
3    0.135593
4    0.169492
5    0.118644
6    0.101695
dtype: float64


In [383]:
testName = 'Chi Sqs Goodness of Fit '
#when goodness of fit is high, values expected are close to observed data & vice-versa
#we conclude that population also follows this pattern
chi_square_test_statistic, p_value = stats.chisquare( observed_data, expected_data)

In [372]:
# chi square test statistic and p value
print('chi_square_test_statistic is : ' +  str(chi_square_test_statistic.round(2)))
print('p_value : ' + str(p_value.round(2)))  
# find Chi-Square critical value
print(stats.chi2.ppf(1-0.05, df=6))

chi_square_test_statistic is : 5.01
p_value : 0.54
12.591587243743977


In [373]:
print(testName, chi_square_test_statistic.round(2), p_value.round(2), p_value.round(4), alpha)

Chi Sqs Goodness of Fit  5.01 0.54 0.5422 0.05


In [374]:
testCheck(testName, p_value, alpha) 

Chi Sqs Goodness of Fit   : Accept Ho & Reject Ha : pv= 0.54 :: a=  0.05  T/F pv > a is  False


## F-test Variance

#Links
https://www.datacamp.com/tutorial/an-introduction-to-python-t-tests
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ttest_ind.html
