### Difference in Proportions

#### Problem Statement

- Random samples of size `1000` are selected to receive either a placebo or a new drug believed to offset the most sever sympotoms of Covid-19. They are administered soon after the onset of symptoms of the disease.

- It is found that **128** individuals from the control group (receiving the palcebo) are hospitalized with sever disease, while **86** individuals from the treatment group are hospitalized.

Now answer the following questions:

- **Confidence Interval**: What is the 95% confidence interval for the difference in proportions?

- **Hypothesis Tests**: Is there a statistically significant difference in hospitalization rates for the two groups?



### Confidence Interval

Here are the steps for the calculation of the confidence interval:

Given:

- Sample Proportions $p_1$ and $p_2$
- Sample Sizes $n_1$ and $n_2$
- Confidence level (which gives the Z-multiple)

Using the above values, we calculate:

 - Point estimate: $\hat{p_1} - \hat{p_2}$
 
 
 
 - SE: $\LARGE\sqrt{\frac{\hat{p_1}*(1-\hat{p_1})}{n_1}+\frac{\hat{p_2}*(1-\hat{p_2})}{n_2}}$
 
 
 
 - Confidence Interval: Point Estimate $\pm$ Z-multiple*SE

Let's calculate the Confidence Interval first manually.

For the given question, we have provided the following values:

In [None]:
X1 = 128
n1 = 1000
X2 = 86
n2 = 1000
Zmultiple = 1.96 #Corresponding to 95% confidence level

In [None]:
## Let's calculate p1, p2 
p1 = X1/n1
p2 = X2/n2

In [None]:
p1,p2

(0.128, 0.086)

In [None]:
#Calculate point estimate
pestimate = p1-p2
pestimate

0.04200000000000001

In [None]:
import numpy as np

In [None]:
##Caclulate standard error
SE = np.sqrt(p1*(1-p1)/n1 + p2*(1-p2)/n2)
SE

0.01379202668210876

In [None]:
##Calculate the confidence interval
ConfInt = (pestimate - SE*Zmultiple, pestimate + SE*Zmultiple)
ConfInt

(0.014967627703066838, 0.06903237229693318)

---

Now let's use `statsmodels` to calculate the confidence interval

In [None]:
from statsmodels.stats.proportion import confint_proportions_2indep

In [None]:
?confint_proportions_2indep

In [None]:
Sc = 128
Nc = 1000
St = 86
Nt = 1000

In [None]:
#Calculate the confidence interval 
(lower, upper) = confint_proportions_2indep(Sc, Nc, St, Nt, compare='diff', alpha=0.05)

In [None]:
print(f'95% CI for difference in proportion: [{lower:.3f}, {upper:.3f}]')

95% CI for difference in proportion: [0.015, 0.069]


### Hypothesis Test for Difference in Proportions



Let $p_1$ and $p_2$ be the sample proportions

Our null and alternate hypotheses are as follows:

$$H_0: p_1-p_2 = 0$$
$$H_a: p_1-p_2 \neq 0$$

> This will be a two-sided test, with $\alpha$ = 0.05

We'll be performing this test directly using the methods from `statsmodels` library

In [None]:
from statsmodels.stats.proportion import test_proportions_2indep

In [None]:
?test_proportions_2indep

In [None]:
Sc = 128
Nc = 1000
St = 86
Nt = 1000

In [None]:
z_stat, pval = test_proportions_2indep(Sc, Nc, St, Nt, compare='diff', alternative='two-sided')
print(f'z statistic: {z_stat:.2f}')
print(f'p-value: {pval:.3f}')

z statistic: 3.03
p-value: 0.002
