# Two Sample Z Test of Proportions

Two sample Z test of proportions is the test to determine whether the two populations differ significantly on specific characteristics. In other words, compare the proportion of two different populations that have some single characteristic. 

**Import Libraries**

In [14]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.stats.api as sms
from statsmodels.stats.proportion import proportions_ztest

pd.set_option('display.max_columns', None)
pd.set_option('display.max_row', 10)
pd.set_option('display.float_format', lambda x: '%.5f' % x)

**Case Study**

Problem: Is there a statistical difference between the survival rates of male and female passengers on the titanic ship?

Hypothesis: There is no statistical difference between the survival rates of male and female

P1 = Survival rates of male<br>
P2 = Survival rates of female

HO: P1 = P2<br>
H1: P1 != P2

In [4]:
df = sns.load_dataset('titanic')

In [5]:
df.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


First, we need to female and male survivals

In [20]:
survival_succ_female = df.loc[df['sex'] == 'female', 'survived'].sum()
survival_succ_female

233

In [21]:
survival_succ_male = df.loc[df['sex'] == 'male', 'survived'].sum()
survival_succ_male

109

Second, we need to total observation units for each group

In [16]:
total_count_female = df.loc[df['sex'] == 'female', 'survived'].shape[0]
total_count_female

314

In [18]:
total_count_male = df.loc[df['sex'] == 'male', 'survived'].shape[0]
total_count_male

577

**Lets Analyze**

In [26]:
test_stat, p_value = proportions_ztest(count=(survival_succ_male, survival_succ_female),
                                      nobs=(total_count_male, total_count_female))
print('Test statistic: %.5f\np value: %.5f' % (test_stat, p_value))

Test statistic: -16.21883
p value: 0.00000


P value less than 0.05 so we reject the H0 hypothesis, which is there is no statistical difference between the survival proportion of men and women.