#### Hypothesis Testing
- A statistical hypothesis is a hypothesis that is testable on the basis of observed data modelled as the realised values taken by a collection of random variables
- Hypothesis testing is a form of statistical inference that uses data from a sample to draw conclusions about a population parameter or a population probability distribution
- A hypothesis test evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data
- Null hypothesis or H0: a tentative assumption made about the parameter or distribution
- Alternative hypothesis or H1: an assumption that is the opposite of what is stated in the null hypothesis
- The hypothesis-testing procedure involves using sample data to determine whether or not H0 can be rejected

![alt text](https://miro.medium.com/max/875/1*fEPOHXPQO_ZNJC4UQDXmqw.png "H0 vs H1")

##### Level of significance
- Refers to the degree of significance in which we accept or reject the null-hypothesis
- 100% accuracy is not possible for accepting or rejecting a hypothesis, so we therefore select a level of significance that is usually 5%

##### Parametric Tests 
- Parametric tests are those tests for which we have prior knowledge of the population distribution (i.e, normal), or if not then we can easily approximate it to a normal distribution which is possible with the help of the Central Limit Theorem
- Common parametric tests:
    1. T-test
    2. Z-test
    3. ANOVA

##### Non-Parametric Tests
- In Non-Parametric tests, we don’t make any assumption about the parameters for the given population or the population we are studying
- There is no fixed set of parameters is available, and also there is no distribution of any kind available for use; hence, referred to as `distribution-free tests`
- Common non-parametric tests:
    1. Chi-square test
    2. Mann-Whitney U-test
    3. Kruskal-Wallis H-test

######  T Tests
- A t-test is a type of inferential statistic which is used to determine if there is a significant difference between the means of two groups which may be related in certain features
- Mostly used when the data would follow a normal distribution and may have unknown variances

###### One sample t-test 
- The One Sample t Test determines whether the sample mean is statistically different from a known or hypothesised population mean
- The One Sample t Test is a parametric test

In [1]:
# Example : Suppose you have 10 age records and you are testing whether the avg age is 30 or not

# H0: Average age of the sample is 30 years
# H1: Average age of the sample is not 30 years

In [2]:
# Reading the data

import pandas as pd
import numpy as np
age = pd.read_csv("Age_data_one_sample.csv")
age

Unnamed: 0,Age
0,32
1,34
2,29
3,29
4,22
5,39
6,38
7,37
8,38
9,36


In [22]:
# Calculating the mean

mean_age = age["Age"].mean()
mean_age

31.0

In [23]:
std_age = age["Age"].std()
std_age

6.2634470725607025

In [10]:
import math

In [24]:
math.sqrt(len(age))

3.7416573867739413

In [25]:
t_stat = (mean_age - 30) / (std_age/math.sqrt(len(age)))
t_stat

0.5973799001456603

In [34]:
# Carrying out one sample t-test

from scipy.stats import ttest_1samp
tstat, pval = ttest_1samp(age["Age"], 30,alternative='two-sided')
print("p-values: ", pval)

if pval < 0.05:
    print(" we are rejecting null hypothesis; we accept the alternate hypothesis")
else:
    print("we fail to reject null hypothesis")

p-values:  0.5605155888171379
we fail to reject null hypothesis


In [4]:
tstat

0.5973799001456603

`scipy.stats.ttest_1samp(a, popmean, axis=0, nan_policy='propagate', alternative='two-sided')`

**Parameters:**
   - a: array_like
     
     Sample observation

   - popmean: float or array_like
     
     Expected value in null hypothesis; if array_like, then it must have the same shape as a excluding the axis dimension

   - axis: int or None, optional
     
     Axis along which to compute test; default is 0; if None, compute over the whole array a

   - nan_policy: {‘propagate’, ‘raise’, ‘omit’}, optional
     
     Defines how to handle when input contains nan. The following options are available (default is ‘propagate’):

     ‘propagate’: returns nan

     ‘raise’: throws an error

     ‘omit’: performs the calculations ignoring nan values

   - alternative: {‘two-sided’, ‘less’, ‘greater’}, optional
     
     Defines the alternative hypothesis. The following options are available (default is ‘two-sided’):

     ‘two-sided’

     ‘less’: one-sided

     ‘greater’: one-sided

In [5]:
# Conclusion: The average age of the sample is 30 years

##### Two sample t-test
- Compares the means of two independent groups in order to determine whether there is statistical evidence that the associated population means are significantly different
- Parametric test

In [6]:
# Example: Analyze the age data for two samples and examine if there is any association between the two
# H0: The two samples are related with each other
# H1: The two samples are not related with each other

In [17]:
# Reading the data

age_df = pd.read_csv("Age_data_two_sample.csv")
age_df

Unnamed: 0,Age_1,Age_2
0,32,39
1,34,27
2,29,32
3,29,32
4,22,25
5,39,35
6,38,39
7,37,35
8,38,28
9,36,22


In [18]:
# Calculating the mean and std dev of both the samples

mean_age1 = age_df["Age_1"].mean()
mean_age2 = age_df["Age_2"].mean()
print("Mean of first age sample: ", mean_age1)
print("Mean of second age sample: ", mean_age2)

std_age1 = age_df["Age_1"].std()
std_age2 = age_df["Age_2"].std()
print("Std dev of first age sample: ", std_age1)
print("Std dev of second age sample: ", std_age2)

Mean of first age sample:  31.0
Mean of second age sample:  30.571428571428573
Std dev of first age sample:  6.2634470725607025
Std dev of second age sample:  5.774137107885414


In [20]:
# Two sample t-test

from scipy.stats import ttest_ind
tstat, pval = ttest_ind(age_df["Age_1"], age_df["Age_2"],alternative='two-sided')
print("p-value",pval)

if pval <0.05:
    print("we reject null hypothesis")
else:
    print("we fail to reject null hypothesis")

p-value 0.8521525600803064
we fail to reject null hypothesis


`scipy.stats.ttest_ind(a, b, axis=0, alternative='two-sided')`

**Parameters**
   - a, b: array_like
     
     The arrays must have the same shape, except in the dimension corresponding to axis (the first, by default).

   - axis: int or None, optional
     
     Axis along which to compute test. If None, compute over the whole arrays, a, and b.
   
   - alternative: {‘two-sided’, ‘less’, ‘greater’}, optional
     
     Defines the alternative hypothesis. The following options are available (default is ‘two-sided’):
     
     ‘two-sided’

     ‘less’: one-sided

     ‘greater’: one-sided

In [10]:
# Conclusion: p-value > significance level; H0 accepted
# The samples were taken from the same population

##### Paired sampled t-test 
- The paired sample t-test is also called dependent sample t-test
- It is a uni variate test that tests for a significant difference between two related variables

In [11]:
# Example: You recorded the blood pressure for a group of people before and after a treatment is administered 
# and want to analyze whether there exists a mean difference between before and after the treatment

# H0: Mean difference between two samples is zero
# H1: Mean difference between two samples is not zero

In [3]:
# Reading the data

bp_df = pd.read_csv("blood_pressure.csv")
bp_df.head()

Unnamed: 0,patient,sex,agegrp,bp_before,bp_after
0,1,Male,30-45,143,153
1,2,Male,30-45,163,170
2,3,Male,30-45,153,168
3,4,Male,30-45,153,142
4,5,Male,30-45,146,141


In [4]:
print(bp_df['bp_before'].mean())
print(bp_df['bp_after'].mean())

156.45
151.35833333333332


In [42]:
bp_df.shape

(120, 5)

In [46]:
# Paired sampled t-test

from scipy.stats import ttest_rel
ttest,pval = ttest_rel(bp_df['bp_before'], bp_df['bp_after'],alternative='greater')
print("p-value: ", pval)

if pval<0.05:
    print("reject null hypothesis")
else:
    print("accept null hypothesis")

p-value:  0.0005648957322420411
reject null hypothesis


`scipy.stats.ttest_rel(a, b, axis=0, nan_policy='propagate', alternative='two-sided')`

**Parameters**
   - a, b: array_like
     
     The arrays must have the same shape.

   - axis: int or None, optional
     
     Axis along which to compute test. If None, compute over the whole arrays, a, and b.

   - nan_policy: {‘propagate’, ‘raise’, ‘omit’}, optional
     
     Defines how to handle when input contains nan. The following options are available (default is ‘propagate’):

     ‘propagate’: returns nan

     ‘raise’: throws an error

     ‘omit’: performs the calculations ignoring nan values

   - alternative: {‘two-sided’, ‘less’, ‘greater’}, optional
     
     Defines the alternative hypothesis. The following options are available (default is ‘two-sided’):

     ‘two-sided’

     ‘less’: one-sided

     ‘greater’: one-sided

In [14]:
# Conclusion: The mean difference between two samples of blood pressure is not zero

#### Z-Test
- A parametric test used to determine whether the means are different when the population variance is known and the sample size is large (i.e, greater than 30)
- Assumptions of this test:
   1. Population distribution is normal
   2. Samples are random and independent
   3. The sample size is large
   4. Population standard deviation is known

**`When To Use Z-Test`**
- Your sample size is greater than 30; otherwise use a t test
- Data points should be independent from each other
- Your data should be normally distributed; however, for large sample sizes (over 30) this doesn’t always matter
- Your data should be randomly selected from a population, where each item has an equal chance of being selected
- Sample sizes should be equal if at all possible

**One Sample Z-test: To compare a sample mean with that of the population mean**

In [15]:
# Example: What may be concluded about the sample of blood pressure collected prior the treatment belonging to
# a population with mean bp 156?

# H0: The bp_before sample belongs to the population with mean 156
# H1: The bp_before sample does not belong to the population with mean 156

In [5]:
# One sample z test

from statsmodels.stats import weightstats as stests
# ztest ,pval = stests.ztest(bp_df['bp_before'], x2=None, value=156)
ztest ,pval = stests.ztest(bp_df['bp_after'], x2=None, value=156)
print("p-value: ", float(pval))

if pval<0.05:
    print("reject null hypothesis")
else:
    print("accept null hypothesis")

p-value:  0.00033524859139319116
reject null hypothesis


`stests.ztest(x1, x2, value, alternative='two-sided')`

**Parameters**
   - x1: array_like, 1-D or 2-D
     first of the two independent samples

   - x2: array_like, 1-D or 2-D
     second of the two independent samples

   - value: float
     In the one sample case, value is the mean of x1 under the Null hypothesis.
     
     In the two sample case, value is the difference between mean of x1 and
     mean of x2 under the Null hypothesis; the test statistic is `x1_mean - x2_mean - value`

   - alternative: str
     The alternative hypothesis, H1, has to be one of the following:

     'two-sided'- H1: difference in means not equal to value (default)
     
     'larger' -   H1: difference in means larger than value
     
     'smaller' -  H1: difference in means smaller than value


In [17]:
# Conclusion: The sample of blood pressure collected prior to the treatment belongs to the population with mean 156

**Two Sample Z-test: To compare the means of two different samples**

In [18]:
# Example: Examine whether the bp_before and after samples have different means or not

# H0: Mean difference between two groups is 0
# H1: Mean difference between two groups is not 0

In [19]:
ztest ,pval = stests.ztest(bp_df['bp_before'], bp_df['bp_after'], value=0, alternative='two-sided')
print("p-value: ", float(pval))

if pval<0.05:
    print("reject null hypothesis")
else:
    print("accept null hypothesis")

p-value:  0.002162306611369422
reject null hypothesis


In [20]:
# Conclusion: The mean difference between before and after bp samples is not zero, that is the sample means are different

#### ANOVA 
- Also called as Analysis of variance, it is a parametric test of hypothesis testing
- An extension of the T-test and Z-test
- Used to test the significance of the differences in the mean values among more than two sample groups
- It uses F-test to statistically test the equality of means and the relative variance between them
- Assumptions of ANOVA:
    1. Population distribution is normal
    2. Samples are random and independent
    3. Homogeneity of sample variance
- F-statistic = variance between the sample means/variance within the sample

![alt text](https://miro.medium.com/max/800/1*SV4xlXEFCKzgT4PbB6nlgA.png "F-statistic")

- Unlike the Z and T distributions, the F-distribution does not have any negative values because between and within-group variability are always positive due to squaring each deviation

**One Way ANOVA:- It tells whether two or more groups are similar or not based on their mean similarity and f-score**

In [21]:
import pandas as pd
import numpy as np
import researchpy as rp
# pip install researchpy

###### Researchpy 
- Researchpy produces Pandas DataFrames that contains relevant statistical testing information that is commonly required
for academic research
- The information is returned as Pandas DataFrames to make for quick and easy exporting of results to any format/method that works with the traditional Pandas DataFrame
- Researchpy is essentially a wrapper that combines various established packages such as pandas, scipy.stats, numpy, and statsmodels to get all the standard required information in one method

In [22]:
# Example: A new medication was developed to increase the immunity of those who take the medication
# The purpose of this study was to test for a difference between the dosage levels

# H0: There is no difference between the different groups' immunity
# H1: There is significant difference between the different groups' immunity

In [23]:
one_way_df = pd.read_csv('one-way.txt')
one_way_df.head()

Unnamed: 0,person,dose,immunity
0,1,1,3
1,2,1,2
2,3,1,1
3,4,1,1
4,5,1,4


In [24]:
one_way_df.drop('person', axis= 1, inplace= True)

In [25]:
# Recoding value from numeric to string

one_way_df['dose'].replace({1: 'placebo', 2: 'low', 3: 'high'}, inplace= True)
one_way_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15 entries, 0 to 14
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   dose      15 non-null     object
 1   immunity  15 non-null     int64 
dtypes: int64(1), object(1)
memory usage: 368.0+ bytes


In [26]:
# Analyzing mean immunity across various dosage groups
one_way_df.groupby('dose').agg({'immunity':np.mean})

Unnamed: 0_level_0,immunity
dose,Unnamed: 1_level_1
high,5.0
low,3.2
placebo,2.2


In [27]:
rp.summary_cont(one_way_df['immunity'].groupby(one_way_df['dose']))





Unnamed: 0_level_0,N,Mean,SD,SE,95% Conf.,Interval
dose,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
high,5,5.0,1.5811,0.7071,3.0368,6.9632
low,5,3.2,1.3038,0.5831,1.5811,4.8189
placebo,5,2.2,1.3038,0.5831,0.5811,3.8189


summary_cont() method returns a nice data table as a Pandas DataFrame that includes the variable name, total number of non-missing observations, standard deviation, standard error, and the 95% confidence interval

In [28]:
# ONE-WAY ANOVA USING SCIPY.STATS

import scipy.stats as stats

stats.f_oneway(one_way_df['immunity'][one_way_df['dose'] == 'high'],
               one_way_df['immunity'][one_way_df['dose'] == 'low'],
               one_way_df['immunity'][one_way_df['dose'] == 'placebo'])

F_onewayResult(statistic=5.11864406779661, pvalue=0.024694289538222603)

`scipy.stats.f_oneway(*args, axis=0)`

**Parameters**
   - sample1, sample2, …: array_like
     
     The sample measurements for each group. There must be at least two arguments. If the arrays are multidimensional, then all the dimensions of the array must be the same except for axis.

   - axis: int, optional
     
     Axis of the input arrays along which the test is applied. Default is 0.

##### Inference
- The overall average immunity was 3.5 95% CI(2.5, 4.4) with group averages of 2.2 95% CI(0.9, 3.5) for the placebo group; 3.2 95% CI(1.9, 4.5) for the low dose group; and 5.0 95% CI(3.5, 6.5) for the high dose group
- There is a statistically significant difference between the groups and their effects on the immunity, F= 5.12, p-value= 0.0247; thereby H0 rejected and H1 accepted

##### ONE-WAY ANOVA USING STATSMODELS

- statsmodels is a Python module that provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration
- An extensive list of result statistics are available for each estimator. The results are tested against existing statistical packages to ensure that they are correct
- statsmodels.formula.api.ols(formula, data, subset=None, drop_cols=None): Create a Model from a formula and dataframe

In [29]:
import statsmodels.api as sm
from statsmodels.formula.api import ols

model = ols('immunity ~ C(dose)', data=one_way_df).fit()
aov_table = sm.stats.anova_lm(model, typ=2)
aov_table

Unnamed: 0,sum_sq,df,F,PR(>F)
C(dose),20.133333,2.0,5.118644,0.024694
Residual,23.6,12.0,,


`statsmodels.stats.anova.anova_lm(*args, **kwargs)`

**Parameters**
   - args: fitted linear model results instance
     
     One or more fitted linear models
     
   - test: str {“F”, “Chisq”, “Cp”} or None
     
     Test statistics to provide. Default is “F”.

   - typ: str or int {“I”,”II”,”III”} or {1,2,3}
     
     The type of Anova test to perform

**Two-Way ANOVA: is used when we have two independent variables and more than two groups to compare & test**

In [30]:
# Example: In the toothgrowth data, examine whether the supplements and formula has any impact on the tooth growth,
# independently as well as together

In [31]:
# Reading the data
tooth_df = pd.read_csv('ToothGrowth.csv')
tooth_df.head()

Unnamed: 0.1,Unnamed: 0,len,supp,dose
0,1,4.2,VC,0.5
1,2,11.5,VC,0.5
2,3,7.3,VC,0.5
3,4,5.8,VC,0.5
4,5,6.4,VC,0.5


In [32]:
# Two-way ANOVA testing

formula = 'len ~ C(supp) + C(dose) + C(supp):C(dose)'
model = ols(formula, tooth_df).fit()
aov_table = sm.stats.anova_lm(model, typ=2)
aov_table

Unnamed: 0,sum_sq,df,F,PR(>F)
C(supp),205.35,1.0,15.571979,0.0002311828
C(dose),2426.434333,2.0,91.999965,4.046291e-18
C(supp):C(dose),108.319,2.0,4.106991,0.02186027
Residual,712.106,54.0,,


#### Both supplements and dosage significantly impact the growth of the tooth and their combination also has an effect

#### Chi-Square Test 
- A non-parametric test 
- Helps to test independence between two variables
- It makes a comparison between the expected frequencies and the observed frequencies
- The test is applied when you have two categorical variables from a single population
- Contingency Table: A contingency table (also called crosstab) is used in statistics to summarise the relationship between several categorical variables

In [33]:
# Example: Data is collected based on people's gender and their color preference
# Is there any association between the two variables in question?

# H0: Gender and color preference are dependent on each other
# H1: Gender and color preference are independent of each other

In [34]:
# Creating dataframe of observed values for the variables in question

chi_df = pd.DataFrame([[48,22,33,47],[35,36,42,27]], index=["Male","Female"], columns=["Black","White","Red","Blue"])
chi_df.head()

Unnamed: 0,Black,White,Red,Blue
Male,48,22,33,47
Female,35,36,42,27


In [35]:
# Calculating chi-square statistic

from scipy import stats
chi_stat,pval,dof,expected_values = stats.chi2_contingency(chi_df)

The above function computes the chi-square statistic and p-value for the hypothesis test of independence of the observed frequencies in the contingency table observed

In [36]:
print('Chi-square statistic:',chi_stat)
print('p-value:',pval)
print('Degree of Freedom: ',dof)
print('Expected values: \n', expected_values)

Chi-square statistic: 11.56978992417547
p-value: 0.00901202511379703
Degree of Freedom:  3
Expected values: 
 [[42.93103448 30.         38.79310345 38.27586207]
 [40.06896552 28.         36.20689655 35.72413793]]


In [37]:
# Critical value

from scipy.stats import chi2
alpha = 0.05
critical_value = chi2.ppf(q=1-alpha, df=dof)
print('critical_value:', critical_value)

critical_value: 7.814727903251179


`scipy.stats.chi2.ppf(q, df, loc=0, scale=1): Percent point function (inverse of cdf — percentiles)`

In [40]:
# Inference

if chi_stat>=critical_value:
    print("Reject H0, There is a relationship between gender and colour preference")
else:
    print("Accept H0, There is no relationship between gender and colour preference")
    
if pval<=alpha:
    print("Reject H0, There is a relationship between gender and colour preference")
else:
    print("Accept H0, There is no relationship between gedner and colour preference")

Reject H0, There is a relationship between gender and colour preference
Reject H0, There is a relationship between gender and colour preference


#### Mann Whitney U Test
- The Mann Whitney U test, sometimes called the Mann Whitney Wilcoxon Test or the Wilcoxon Rank Sum Test, is used to test whether two samples are likely to derive from the same population 
- The null hypothesis is that there is no difference between the distributions of the data samples
- We can implement the Mann-Whitney U test in Python using the `mannwhitneyu()` SciPy function; the functions takes the two data samples as arguments, and returns the test statistic and the p-value

#### Kruskal-Wallis H test
- The Kruskal-Wallis H test (sometimes also called the "one-way ANOVA on ranks") is a rank-based nonparametric test that can be used to determine if there are statistically significant differences between two or more groups of an independent variable on a continuous or ordinal dependent variable
- It is considered the nonparametric alternative to the one-way ANOVA, and an extension of the Mann-Whitney U test to allow the comparison of more than two independent groups
- The default assumption or the null hypothesis is that all data samples were drawn from the same distribution; specifically, that the population medians of all groups are equal
- The Kruskal-Wallis H-test can be implemented in Python using the `kruskal()` SciPy function; it takes two or more data samples as arguments and returns the test statistic and p-value as the result

In [41]:
#### end of notebook