#### One-sample T-test with python : 

##### the rest will tell us whether means of the sample and the population are different.
##### t = x^ - u / Sx^ where Sx^ = S/sqrt(n)
##### Where u = Proposed constant for the population mean.
##### x^ = sample mean 
##### n = Sample size (i.e number of observations)
##### Sx^ = Estimated standard error of the mean(S/Sqrt(n))

In [6]:
ages = [10, 20, 35, 50, 28, 40, 55, 18, 16, 55, 30, 25, 43, 18, 30, 28, 14, 24, 16, 17, 32, 35, 26, 27, 65, 18, 43, 23, 21,  20, 19, 70]

In [7]:
len(ages)

32

In [8]:
import numpy as np 
ages_mean = np.mean(ages)
print(ages_mean)

30.34375


In [9]:
# let's take sample 
sample_size = 10
age_sample = np.random.choice(ages, sample_size)

In [10]:
age_sample

array([23, 30, 16, 10, 18, 65, 27, 24, 30, 65])

In [11]:
from scipy.stats import ttest_1samp

In [12]:
ttest, p_value = ttest_1samp(age_sample, 30) # Second pararmeter is popmean , expected value in null hypothesis. can be check through shift + tab documentation will open. value is taken randomly as our sample mean was 30.34375 like that .

In [13]:
print(p_value)

0.8974529081430626


In [14]:
if p_value < 0.05: # alpha value is 0.05 0r 5% 
    print("we are rejecting null hypothesis")
else:
    print("we are accepting null hypothesis")

we are accepting null hypothesis


# Some more complex examples: 

In [18]:
import numpy as np 
import pandas as pd 
import scipy.stats as stats 
import math
np.random.seed(6)
school_ages = stats.poisson.rvs(loc=18, mu=35, size = 1500) # 1500 student, start with 18, mean 35 , poisson distribution.
classA_ages = stats.poisson.rvs(loc=18, mu=30, size=60)

In [19]:
classA_ages.mean()

46.9

In [20]:
ttest, p_value = ttest_1samp(a = classA_ages, popmean= school_ages.mean())

In [21]:
print(p_value)

1.139027071016194e-13


In [22]:
school_ages.mean()

53.303333333333335

In [23]:
if p_value < 0.05: # alpha value is 0.05 or 5%
    print("we are rejecting null hypothesis")
else:
    print("we are accepting null hypothesis")

we are rejecting null hypothesis


##### Two-sample T-tests with python: 

##### The independent samples t Test or 2 sample t test compares the means of two independent groups in order to determine whether there is satistical evidence that the associated population means are significant different. The independent samples t Test is a parametric test. This test is also known as: Independent t test.
##### We need not remember formulas as we have libraries.

In [24]:
np.random.seed(12)
ClassB_ages = stats.poisson.rvs(loc=18, mu=33, size = 60)
ClassB_ages.mean()

50.63333333333333

In [26]:
ttest, p_value = stats.ttest_ind(a = classA_ages, b = ClassB_ages, equal_var = False)

In [27]:
if p_value < 0.05: # alpha vlue is 0.05 0r 5%
    print("we are rejecting null hypothesis")
else: 
    print("we are accepting null hypothesis")

we are rejecting null hypothesis


#### Paired T-test with python: 

##### When you want to check how different samples from the same group are, you can go for a paired T-test.

In [28]:
weight1 = [25,30,28,35,28,34,26,29,30,26,28,32,31,30,45]
weight2 = weight1 + stats.norm.rvs(scale =5, loc = 1.25, size = 15)

In [29]:
print(weight1)
print(weight2)

[25, 30, 28, 35, 28, 34, 26, 29, 30, 26, 28, 32, 31, 30, 45]
[33.07926457 37.41022437 31.50444617 33.04295091 22.36201983 40.07873174
 20.8299827  23.8771395  38.86420881 34.55941216 29.43827982 32.019014
 28.92851213 33.00667769 43.82984284]


In [34]:

weight_df = pd.DataFrame({"weight_10": np.array(weight1), "weight_20": np.array(weight2), "weight_change":np.array(weight2) - np.array(weight1)})

In [35]:
weight_df

Unnamed: 0,weight_10,weight_20,weight_change
0,25,33.079265,8.079265
1,30,37.410224,7.410224
2,28,31.504446,3.504446
3,35,33.042951,-1.957049
4,28,22.36202,-5.63798
5,34,40.078732,6.078732
6,26,20.829983,-5.170017
7,29,23.877139,-5.122861
8,30,38.864209,8.864209
9,26,34.559412,8.559412


In [36]:
ttest, p_value = stats.ttest_rel(a = weight1, b = weight2)

In [37]:
print(p_value)

0.22252101288036757


In [38]:
if p_value < 0.05:
    print("we are rejecting null hypothesis")
else:
    print("we are accepting null hypothesis")

we are accepting null hypothesis


#### Chi square Test  : 

##### this test is applied when you have two categorical variables from a single population. It is used to determine whether there is a significant association between the two variables. 

In [40]:
import scipy.stats as stats

In [42]:
import seaborn as sns
import pandas as pd 
import numpy as np
dataset = sns.load_dataset('tips')

In [43]:
dataset.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [45]:
dataset_table = pd.crosstab(dataset['sex'], dataset['smoker']) # Crosstab will make a matrix , male with 60 smokers and 97 no smoker . female 33 smoker and 54 no smokers.
print(dataset_table)

smoker  Yes  No
sex            
Male     60  97
Female   33  54


In [46]:
# observed value 

observed_values = dataset_table.values

In [47]:
print("Observed values: - \n", observed_values)

Observed values: - 
 [[60 97]
 [33 54]]


In [48]:
val = stats.chi2_contingency(dataset_table) # calculates the chi square and p value 

In [49]:
val

(0.008763290531773594,
 0.925417020494423,
 1,
 array([[59.84016393, 97.15983607],
        [33.15983607, 53.84016393]]))

##### We are focus on particular 3 value. this value says that applyingg the chi square test and by seeing the sample population observed value (above) but we are getting these values i.e 59.84, 97.15, 33.15, 53.84 .these are expexted values and there is a difference between observed and expected values ...  

In [50]:
expected_values = val[3]

In [51]:
no_of_rows = len(dataset_table.iloc[0:2, 0])
no_of_columns = len(dataset_table.iloc[0, 0:2])
ddof = (no_of_rows -1)*(no_of_columns -1)
print("degree of freedom:", ddof)
alpha = 0.05

degree of freedom: 1
