# T-test

A t-test is a type of inferential statistic which is used to determine if there
is a significant difference between the means of two groups which may be related in certain features.

T-test has 2 types : 1. one sampled t-test 2. two-sampled t-test.
One sample t-test : The One Sample t Test determines whether the
sample mean is statistically different from a known or hypothesised population mean. The One Sample t Test is a parametric test.

In [1]:
#-----------------------------------------T-test---------------------------------#


from scipy.stats import ttest_1samp
import numpy as np

#10 ages and you are checking whether avg age is 30 or not.
#H0: The average age is 30
#H1: The average age is not 30.
ages = np.array([32,34,29,29,22,39,38,37,38,36,30,26,22,22])
print(ages)
#mean of the age 
ages_mean = np.mean(ages)
print(ages_mean)
#One Sample t-test
tset, pval = ttest_1samp(ages, 30)
print('p-values',pval)
if pval < 0.05:    # alpha value is 0.05 or 5%
   print(" we are rejecting null hypothesis")
else:
  print("we are accepting null hypothesis")


[32 34 29 29 22 39 38 37 38 36 30 26 22 22]
31.0
p-values 0.5605155888171379
we are accepting null hypothesis


# Z-Test

Z test is used if:
Your sample size is greater than 30. Otherwise, use a t test.
Data points should be independent from each other. In other words, one data point isn’t related or doesn’t affect another data point.
Your data should be normally distributed. However, for large sample sizes (over 30) this doesn’t always matter.
Your data should be randomly selected from a population, where each item has an equal chance of being selected.
Sample sizes should be equal if at all possible.

In [2]:
#-----------One Sample Z-test-----------#
import pandas as pd
from scipy import stats
from statsmodels.stats import weightstats as stests
df = pd.read_csv("blood_pressure.csv")

ztest ,pval = stests.ztest(df['bp_before'], x2=None, value=156)
print('One-sample z-test')
print(float(pval))
if pval<0.05:
    print("reject null hypothesis")
else:
    print("accept null hypothesis")

#-----------Two Sample Z-test-----------#
#Two-sample Z test- Just check two independent data groups and decide whether sample mean of two group is equal or not.
#H0 : mean of two group is 0
#H1 : mean of two group is not 0
#Example : we are checking in blood data after blood and before blood data.

ztest ,pval1 = stests.ztest(df['bp_before'], x2=df['bp_after'], value=0,alternative='two-sided')
print('Two-sample z-test')
print(float(pval1))
if pval1<0.05:
    print("reject null hypothesis")
else:
    print("accept null hypothesis")

One-sample z-test
0.6651614730255063
accept null hypothesis
Two-sample z-test
0.002162306611369422
reject null hypothesis


  import pandas.util.testing as tm


# ANOVA (F-TEST) :- 
The t-test works well when dealing with two groups, but sometimes we want to compare more than two groups at the same time.
For example, if we wanted to test whether voter age differs based on some categorical variable like race, we have to compare the means of each level or group the variable. 
The analysis of variance or ANOVA is a statistical inference test that lets you compare multiple groups at the same time.


In [3]:

#------------------------One Way F-test(Anova)------------------------# 
#To tell whether two or more groups are similar or not based on their mean similarity and f-score.
#Example : there are 3 different category of plant and their weight and need to check whether all 3 group are similar or not.
import pandas as pd
from scipy import stats
from statsmodels.stats import weightstats as stests
print('One-way Anova')
df_anova = pd.read_csv('PlantGrowth.csv')
df_anova = df_anova[['weight','group']]
grps = pd.unique(df_anova.group.values)
d_data = {grp:df_anova['weight'][df_anova.group == grp] for grp in grps}
 
F, p = stats.f_oneway(d_data['ctrl'], d_data['trt1'], d_data['trt2'])
print("p-value for significance is: ", p)
if p<0.05:
    print("reject null hypothesis")
else:
    print("accept null hypothesis")

#------------------------------------Two Way F-test-----------------------------------# 
#Two way F-test is extension of 1-way f-test, it is used when we have 2 independent variable and 2+ groups.
#2-way F-test does not tell which variable is dominant. If we need to check individual significance then Post-hoc testing need to be performed.

#e.g: Grand mean crop yield (the mean crop yield not by any sub-group), as well the mean crop yield by each factor, 
# as well as by the factors grouped together.
import statsmodels.api as sm
from statsmodels.formula.api import ols
print('Two-way ANova')
df_anova2 = pd.read_csv("https://raw.githubusercontent.com/Opensourcefordatascience/Data-sets/master/crop_yield.csv")
model = ols('Yield ~ C(Fert)*C(Water)', df_anova2).fit()
print(f"Overall model F({model.df_model: .0f},{model.df_resid: .0f}) = {model.fvalue: .3f}, p = {model.f_pvalue: .4f}")
res = sm.stats.anova_lm(model, typ= 2)
print(res)

One-way Anova
p-value for significance is:  0.0159099583256229
reject null hypothesis
Two-way ANova
Overall model F( 3, 16) =  4.112, p =  0.0243
                   sum_sq    df         F    PR(>F)
C(Fert)            69.192   1.0  5.766000  0.028847
C(Water)           63.368   1.0  5.280667  0.035386
C(Fert):C(Water)   15.488   1.0  1.290667  0.272656
Residual          192.000  16.0       NaN       NaN
