## T Test
- A t-test is a type of inferential statistic which is used to determine if there is a significant difference between the means of two groups which may be related in certain features

- T-test has 2 types : 1. one sampled t-test 2. two-sampled t-test.

## One-sample T-test with Python
- The test will tell us whether means of the sample and the population are different

In [2]:
age = [10,20,35,50,28,40,55,18,16,55,30,25,43,18,30,28,14,24,16,17,32,35,26,27,65,18,43,23,21,20,19,70]

- Find Mean

In [4]:
import numpy as np
age_mean = np.mean(age)
age_mean

30.34375

- Random age sample

In [6]:
age_sample=np.random.choice(ages, 10)

In [5]:
from scipy.stats import ttest_1samp

### ttest, p_value = ttest_1samp(RANDOM_SAMPLE, MEAN)

In [7]:
ttest,p_value=ttest_1samp(age_sample,30)

In [8]:
print(p_value)

0.4837410160142954


In [9]:
if p_value < 0.05:    # alpha value is 0.05 or 5%
    print(" we are rejecting null hypothesis")
else:
    print("we are accepting null hypothesis")

we are accepting null hypothesis


- ____________________________________________________________________________________________________________________________________________________________

## Some More Examples
- Consider the age of students in a college and in Class A

In [11]:
import numpy as np
import pandas as pd
import scipy.stats as stats
import math

loc->Starts from, mu-> Mean

In [15]:
np.random.seed(6)
school_ages=stats.poisson.rvs(loc=18,mu=35,size=1500)
classA_ages=stats.poisson.rvs(loc=18,mu=30,size=60)

- Find Mean

In [23]:
classA_ages_mean = np.mean(classA_ages)

In [19]:
school_ages_mean = school_ages.mean()

In [24]:
ttest , p_value = stats.ttest_1samp(classA_ages, classA_ages_mean)

In [25]:
p_value

1.0

In [26]:
if p_value < 0.05:    # alpha value is 0.05 or 5%
    print(" we are rejecting null hypothesis")
else:
    print("we are accepting null hypothesis")

we are accepting null hypothesis


In [27]:
ttest , p_value = stats.ttest_1samp(classA_ages, school_ages_mean)

In [28]:
if p_value < 0.05:    # alpha value is 0.05 or 5%
    print(" we are rejecting null hypothesis")
else:
    print("we are accepting null hypothesis")

 we are rejecting null hypothesis


-> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

## Two-sample T-test With Python
- The Independent Samples t Test or 2-sample t-test compares the means of two independent groups in order to determine whether there is statistical evidence that the associated population means are significantly different. 
- The Independent Samples t Test is a parametric test. This test is also known as: Independent t Test

In [34]:
np.random.seed(12)
ClassB_ages = stats.poisson.rvs(loc=18, mu=33, size=60)

- Find Mean

In [35]:
ClassB_ages_mean = ClassB_ages.mean()

In [42]:
ttest, p_value = stats.ttest_ind(classA_ages, ClassB_ages, equal_var=False)

In [43]:
if p_value < 0.05:    # alpha value is 0.05 or 5%
    print(" we are rejecting null hypothesis")
else:
    print("we are accepting null hypothesis")
print(p_value)

 we are rejecting null hypothesis
0.00039942095100859375


-> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

## Paired T-test With Python
- When you want to check how different samples from the same group are, you can go for a paired T-test

- we take same columns with data difference

In [45]:
weight1=[25,30,28,35,28,34,26,29,30,26,28,32,31,30,45]
weight2=weight1+stats.norm.rvs(scale=5,loc=-1.25,size=15)

In [46]:
weight_df=pd.DataFrame({"weight_10":np.array(weight1),
                         "weight_20":np.array(weight2),
                       "weight_change":np.array(weight2)-np.array(weight1)})

In [47]:
_,p_value=stats.ttest_rel(a=weight1,b=weight2)

In [48]:

if p_value < 0.05:    # alpha value is 0.05 or 5%
    print(" we are rejecting null hypothesis")
else:
    print("we are accepting null hypothesis")

we are accepting null hypothesis


-> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

## Correlation

In [50]:
import seaborn as sns
df=sns.load_dataset('iris')

In [51]:
df.corr()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
sepal_length,1.0,-0.11757,0.871754,0.817941
sepal_width,-0.11757,1.0,-0.42844,-0.366126
petal_length,0.871754,-0.42844,1.0,0.962865
petal_width,0.817941,-0.366126,0.962865,1.0


## Chi-Square Test-
- The test is applied when you have two categorical variables from a single population. 
- It is used to determine whether there is a significant association between the two variables.

In [52]:
dataset=sns.load_dataset('tips')

In [53]:
dataset.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


- crosstab converts dataset into matrix

In [54]:
dataset_table=pd.crosstab(dataset['sex'],dataset['smoker'])
print(dataset_table)

smoker  Yes  No
sex            
Male     60  97
Female   33  54


In [55]:
#Observed Values
Observed_Values = dataset_table.values 
print("Observed Values :-\n",Observed_Values)

Observed Values :-
 [[60 97]
 [33 54]]


In [57]:
val=stats.chi2_contingency(dataset_table)

In [58]:
Expected_Values=val[3]

In [60]:
print(dataset_table.iloc[0:2,0])

sex
Male      60
Female    33
Name: Yes, dtype: int64


In [61]:
dataset_table.iloc[0,0:2]

smoker
Yes    60
No     97
Name: Male, dtype: int64

In [62]:
no_of_rows=len(dataset_table.iloc[0:2,0])
no_of_columns=len(dataset_table.iloc[0,0:2])
ddof=(no_of_rows-1)*(no_of_columns-1)
print("Degree of Freedom:-",ddof)
alpha = 0.05

Degree of Freedom:- 1


In [63]:
from scipy.stats import chi2
chi_square=sum([(o-e)**2./e for o,e in zip(Observed_Values,Expected_Values)])
chi_square_statistic=chi_square[0]+chi_square[1]

In [64]:
print("chi-square statistic:-",chi_square_statistic)

chi-square statistic:- 0.001934818536627623


- Methord one

In [65]:
critical_value=chi2.ppf(q=1-alpha,df=ddof)
print('critical_value:',critical_value)

critical_value: 3.841458820694124


- Methrod two

In [67]:
#p-value
p_value=1-chi2.cdf(x=chi_square_statistic,df=ddof)
print('p-value:',p_value)
print('Significance level: ',alpha)

p-value: 0.964915107315732
Significance level:  0.05


In [68]:

if chi_square_statistic>=critical_value:
    print("Reject H0,There is a relationship between 2 categorical variables")
else:
    print("Retain H0,There is no relationship between 2 categorical variables")
    
if p_value<=alpha:
    print("Reject H0,There is a relationship between 2 categorical variables")
else:
    print("Retain H0,There is no relationship between 2 categorical variables")

Retain H0,There is no relationship between 2 categorical variables
Retain H0,There is no relationship between 2 categorical variables


-> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

## Anova Test(F-Test)¶
- The t-test works well when dealing with two groups, but sometimes we want to compare more than two groups at the same time.

- For example, if we wanted to test whether petal_width age differs based on some categorical variable like species, we have to compare the means of each level or group the variable

-> One Way F-test(Anova) :-
- It tell whether two or more groups are similar or not based on their mean similarity and f-score.

- Example : there are 3 different category of iris flowers and their petal width and need to check whether all 3 group are similar or not

In [69]:
import seaborn as sns
df1=sns.load_dataset('iris')
df_anova = df1[['petal_width','species']]
grps = pd.unique(df_anova.species.values)
d_data = {grp:df_anova['petal_width'][df_anova.species == grp] for grp in grps}
F, p = stats.f_oneway(d_data['setosa'], d_data['versicolor'], d_data['virginica'])
if p<0.05:
    print("reject null hypothesis")
else:
    print("accept null hypothesis")

reject null hypothesis
