In [1]:
import pandas as pd
from scipy import stats
from statsmodels.stats import weightstats as stests

## One Sample T-test

One sample t-test : The One Sample t Test determines whether the sample mean is statistically different from a known or hypothesised population mean. The One Sample t Test is a parametric test.

Example :- you have 20 ages and you are checking whether avg age is 30 or not. (check code below for that using python)

In [4]:
from scipy.stats import ttest_1samp
import numpy as np
ages = np.genfromtxt('ages.csv')
print(ages)
ages_mean = np.mean(ages)
print(ages_mean)

tset, pval = ttest_1samp(ages, 30)
print('p-values',pval)
print ('t-values',tset)
if pval < 0.27:    # alpha value is 0.5 or 50%
   print(" we are rejecting null hypothesis")
else:
  print("we have failed to reject null hypothesis")

[26. 43. 22. 33. 34. 37. 29. 30. 21. 27. 33. 40. 50. 39. 29. 27. 27. 28.
 40. 39. 25. 41. 21. 24. 24. 48. 22.]
31.814814814814813
p-values 0.26819660606308904
t-values 1.1314440755093675
 we are rejecting null hypothesis


## Two sampled T-test :

The Independent Samples t Test or 2-sample t-test compares the means of two independent groups in order to determine whether there is statistical evidence that the associated population means are significantly different. The Independent Samples t Test is a parametric test. This test is also known as: Independent t Test.

Example : is there any association between week1 and week2 ( code is given below in python)

In [12]:
from scipy.stats import ttest_ind
import numpy as np
week1 = np.genfromtxt("week1.csv",  delimiter=",")
week2 = np.genfromtxt("week2.csv",  delimiter=",")
print(week1)
print("week2 data :-\n")
print(week2)
week1_mean = np.mean(week1)
week2_mean = np.mean(week2)
print("week1 mean value:",week1_mean)
print("week2 mean value:",week2_mean)
week1_std = np.std(week1)
week2_std = np.std(week2)
print("week1 std value:",week1_std)
print("week2 std value:",week2_std)
ttest,pval = ttest_ind(week1,week2)
print("p-value",pval)
if pval <0.05:
  print("we reject null hypothesis")
else:
  print("we accept null hypothesis")

[26. 43. 22. 33. 34. 37. 29. 30. 21. 27. 33. 40. 50. 39. 29. 27. 27. 28.
 40. 39. 25. 41. 21. 24. 24. 48. 22.]
week2 data :-

[23. 32. 42. 15. 26. 25. 39.  9. 29. 21. 16. 15. 31. 25. 22. 40.  5.  7.
 35. 50. 10. 14.  5. 18. 46. 20. 48.]
week1 mean value: 31.814814814814813
week2 mean value: 24.74074074074074
week1 std value: 8.178730486641179
week2 std value: 13.074126534361229
p-value 0.023213828798588056
we reject null hypothesis


## Example 2

Paired sampled t-test :- The paired sample t-test is also called dependent sample t-test. It’s an uni variate test that tests for a significant difference between 2 related variables. An example of this is if you where to collect the blood pressure for an individual before and after some treatment, condition, or time point.

H0 :- means difference between two sample is 0

H1:- mean difference between two sample is not 0

check the code below for same

In [5]:
import pandas as pd
from scipy import stats
df = pd.read_csv("blood_pressure.csv")
df[['bp_before','bp_after']].describe()
ttest,pval = stats.ttest_rel(df['bp_before'], df['bp_after'])
print(pval)
if pval<0.05:
    print("reject null hypothesis")
else:
    print("accept null hypothesis")

0.0011297914644840823
reject null hypothesis


In [15]:
import pandas as pd
from scipy import stats
from statsmodels.stats import weightstats as stests
ztest ,pval = stests.ztest(df['bp_before'], x2=None, value=156)
print(float(pval))
if pval<0.05:
    print("reject null hypothesis")
else:
    print("accept null hypothesis")

0.6651614730255063
accept null hypothesis


## ANOVA F test

ANOVA (F-TEST) :- The t-test works well when dealing with two groups, but sometimes we want to compare more than two groups at the same time. For example, if we wanted to test whether voter age differs based on some categorical variable like race, we have to compare the means of each level or group the variable. We could carry out a separate t-test for each pair of groups, but when you conduct many tests you increase the chances of false positives. The analysis of variance or ANOVA is a statistical inference test that lets you compare multiple groups at the same time.

F = Between group variability / Within group variability

## Example 1

Two Way F-test :- Two way F-test is extension of 1-way f-test, it is used when we have 2 independent variable and 2+ groups. 2-way F-test does not tell which variable is dominant. if we need to check individual significance then Post-hoc testing need to be performed.

Now let’s take a look at the Grand mean crop yield (the mean crop yield not by any sub-group), as well the mean crop yield by each factor, as well as by the factors grouped together

In [20]:
import statsmodels.api as sm
from statsmodels.formula.api import ols
df_anova2 = pd.read_csv("crop_yield.csv")
model = ols('Yield ~ C(Fert)*C(Water)', df_anova2).fit()
print(f"Overall model F({model.df_model: .0f},{model.df_resid: .0f}) = {model.fvalue: .3f}, p = {model.f_pvalue: .4f}")
res = sm.stats.anova_lm(model, typ= 2)
res

Overall model F( 3, 16) =  4.112, p =  0.0243


Unnamed: 0,sum_sq,df,F,PR(>F)
C(Fert),69.192,1.0,5.766,0.028847
C(Water),63.368,1.0,5.280667,0.035386
C(Fert):C(Water),15.488,1.0,1.290667,0.272656
Residual,192.0,16.0,,
