### Difference in proportions t test

There are two samples from two populations, and we wish to know how the population's proportions are different. For example, two occasions 20 years apart where the nation was asked "do you smoke"?

$ \hat{p}_1 = 0.37 $

$ \hat{p}_2 = 0.33 $

In [8]:
n_1 = 30
n_2 = 51

In [9]:
              #yes/total
proportion_1 = 15/n_1
proportion_2 = 26/n_2

Given $ n $ random experiments, the probability of any result is $ p $ and the opposite is $ q = 1-p $. The variance of any experient is $ p\times q $

The variance of an experiment involving $ n $ trials is $ \frac{p\times q}{n} $

The standard deviation of that proportion is therefore $ \sqrt{\frac{p\times q}{n}} $

In [10]:
proportion_difference = proportion_1 - proportion_2
proportion_difference

In [11]:
difference_variance = proportion_1*(1-proportion_1)/n_1 + proportion_2*(1-proportion_2)/n_2
difference_variance

The variance of the difference is:

$ S^2 = \frac{p_1 q_1}{n_1}+\frac{p_2 q_2}{n_2} $

That means the standard deviation is:

$ S = \sqrt{\frac{p_1 q_1}{n_1}+\frac{p_2 q_2}{n_2}} $

In [12]:
difference_sd = sqrt(difference_variance)
difference_sd

### Let's define the hypotheses

Here, we want to make sure the proportion has changed over the two populations.

$ h_0 : p_1 - p_2 = 0 $

$ h_a : p_1 - p_2 \ne 0 $

$ \alpha = 0.05 $

In [13]:
H_0_value = 0
alpha = .05 #A two tailed test uses alpha/2. One tailed test uses just alpha

Both samples are large, more than 1000, so we can use the normal distribution here. If either was small then we would need the T distribution

$ Z = \frac{(\hat{p}_1 - \hat{p}_2) - (p_1 - p_2)}{S} $

Since our hypothesis is that there is no difference between the proportions, then

$ Z = \frac{(\hat{p}_1 - \hat{p}_2) - 0}{S} $

In [14]:
test_statistic = (proportion_difference - 0)/difference_sd
test_statistic

In [15]:
critical_value = qnorm(1-alpha/2)
region_of_rejection = c(-critical_value, critical_value)
region_of_rejection

The test statistic falls in the upper region of rejection, so we can reject the $ h_0 $ and accept $ h_a $

### Confidence interval

$ (\hat{p_1}–\hat{p_2}) \pm  z\sqrt{\frac{\hat{p_1}(1-\hat{p_1})}{n_1} + \frac{\hat{p_2}(1-\hat{p_2})}{n_2}} $

(or, the critical Z value * the standard deviation):

$ (\hat{p_1}–\hat{p_2}) \pm zS $ 

In [16]:
(proportion_1 - proportion_2) - critical_value * sqrt(proportion_1*(1-proportion_1)/n_1 + proportion_2*(1-proportion_2)/n_2)

In [17]:
(proportion_1 - proportion_2) + critical_value * sqrt(proportion_1*(1-proportion_1)/n_1  + proportion_2*(1-proportion_2)/n_2)