# Two Sample T-Test

Two-sample tests are appropriate for comparing two samples, typically experimental and control samples from a scientifically controlled experiment.

https://en.wikipedia.org/wiki/Test_statistic

Two-proportion z-test, pooled for
![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)

![image-3.png](attachment:image-3.png)

## Test of Proportions

Tests of proportions are analogous to tests of means (the 50% proportion).


In [1]:
from scipy.stats import f
import matplotlib.pyplot as plt
import pandas as pd
from statsmodels.stats.proportion import proportions_ztest
from statsmodels.stats import proportion


##### Example: 
From vendor A we test 200 pieces and find 30 defectives. From vendor B we test 100 pieces and we find 10 defectives. Is there a significant difference in the quality of these two vendors? Use 95% confidence level.


In [2]:
count1 = 30
n1 = 200
count2 = 10
n2 = 100
proportion.test_proportions_2indep(count1=count1, nobs1=n1, count2=count2, nobs2=n2)

<class 'statsmodels.stats.base.HolderTuple'>
statistic = 1.145433008876846
pvalue = 0.2520298311822946
compare = diff
method = agresti-caffo
diff = 0.04999999999999999
ratio = 1.4999999999999998
odds_ratio = 1.588235294117647
variance = 0.001586401953652147
alternative = two-sided
value = 0
tuple = (1.145433008876846, 0.2520298311822946)

In [3]:
proportion.test_proportions_2indep(count1=count1, nobs1=n1, count2=count2, nobs2=n2, method='score')

<class 'statsmodels.stats.base.HolderTuple'>
statistic = 1.1989578808281796
pvalue = 0.2305443235633593
compare = diff
method = score
variance = 0.001739130434782609
alternative = two-sided
prop1_null = 0.13333333333333333
prop2_null = 0.13333333333333333
tuple = (1.1989578808281796, 0.2305443235633593)
diff = 0.04999999999999999
ratio = 1.4999999999999998
odds_ratio = 1.588235294117647
value = 0

#### Example 2:
Test of proportions, 'sex' and 'smoker' are two categorical variables. We want to see if the proportion of smokers in the female population is significantly less than it is in the male population

Ho = The proportions are equal

Ha = The two proportions are not equal

In [4]:
smokers_df = pd.read_csv('data/insurance.csv')

In [5]:
smokers_df.sample(5)

Unnamed: 0,age,sex,bmi,children,smoker,region,charges
130,59,female,26.505,0,no,northeast,12815.44495
167,32,female,33.155,3,no,northwest,6128.79745
344,49,female,41.47,4,no,southeast,10977.2063
1059,32,male,33.82,1,no,northwest,4462.7218
468,28,female,24.32,1,no,northeast,23288.9284


In [6]:
smokers_df.loc[(smokers_df['sex']=='female') & (smokers_df['smoker']=='yes')].shape[0]

115

In [8]:
n_female_smokers = smokers_df.loc[(smokers_df['sex']=='female') & (smokers_df['smoker']=='yes')].shape[0]
n_male_smokers = smokers_df.loc[(smokers_df['sex']=='male') & (smokers_df['smoker']=='yes')].shape[0]
n_females = smokers_df[smokers_df['sex']=='female'].shape[0]
n_males = smokers_df[smokers_df['sex']=='male'].shape[0]

In [9]:
print([n_female_smokers, n_male_smokers] , [n_females, n_males])
print(f' Proportion of smokers in females, males = {round(n_female_smokers*100/n_females,2)}%, \
      {round(n_male_smokers*100/n_males,2)}% respectively')

[115, 159] [662, 676]
 Proportion of smokers in females, males = 17.37%,       23.52% respectively


In [11]:
# Since the sample size is large we will proportions_ztest
stat, p_value_prop = proportions_ztest([n_female_smokers, n_male_smokers], [n_females, n_males])

In [12]:
if p_value_prop < 0.05:
    print(f'With a p-value of {round(p_value_prop,4)} the difference is significant. aka |We reject the null|')
else:
    print(f'With a p-value of {round(p_value_prop,4)} the difference is not significant. aka |We fail to reject the null|')

With a p-value of 0.0053 the difference is significant. aka |We reject the null|


Two-proportion z-test, unpooled for ![image.png](attachment:image.png)

![image-2.png](attachment:image-2.png)