# Confidence Intervals and Classical Hypothesis Testing: Proportions
*Curtis Miller*

The first major topic is reaching conclusions about proportions.

In a sample of size $N$ there are $M$ "successes" (say, people who clicked on an advertisement) and $N - M$ "failures" (everyone else, who did not click on an advertisement). The **sample proportion** is then:

$$\hat{p} = \frac{M}{N}$$

In fact, if your data $x_i$ is 1 for every "success" and 0 for every "failure", then we can say:

$$\hat{p} = \frac{1}{N} \sum_{i = 1}^{N} x_i = \bar{x}$$

That is, the sample proportion is the sample mean of the dataset.

Let's say we want to know what proportion of visitors (including future visitors, not yet seen) will click on our ad based on previous data. How can we go from a sample proportion to a statement about the **population proportion**?

## Confidence Interval for Population Proportion

We can constuct a **confidence interval**, an interval we believe will contain the true population proportion of visitors who click our ad. We have an interval with a lower and upper bound and we believe that the true population proportion is within this interval with some level of confidence. For a 95% confidence interval, we are "95% confident" the true proportion is in the interval (in the sense that such intervals contain the population proportion 95% of the time).

The classical way to construct this interval is to use the interval:

$$\hat{p} \pm z_{1 - \frac{\alpha}{2}} \sqrt{\hat{p}(1 - \hat{p}} \equiv \left(\hat{p} - z_{1 - \frac{\alpha}{2}} \sqrt{\hat{p}(1 - \hat{p}}, \hat{p} + z_{1 - \frac{\alpha}{2}} \sqrt{\hat{p}(1 - \hat{p}}\right)$$

where $z_{p}$ is the $100\times p$th percentile of the [Normal distribution](https://en.wikipedia.org/wiki/Normal_distribution).

In Python, the **statsmodels** package can be used for statistical computations such as computing a confidence interval.

Let's suppose that on a certain website, out of 1126 visitors on a given day, 310 clicked on an ad purchased by a sponsor. Let's construct a confidence interval for the *population* proportion of visitors who click the ad.

In [None]:
import statsmodels.api as sm

In [None]:
310 / 1126    # Sample proportion

In [None]:
from statsmodels.stats.proportion import proportion_confint    # Function for computing confidence intervals
proportion_confint(count=310,    # Number of "successes"
                   nobs=1126,    # Number of trials
                   alpha=(1 - 0.95))    # Alpha, which is 1 minus the confidence level

If we wanted a 99% confidence interval, we would have a wider interval, but more confidence that the true proportion lies in this interval.

In [None]:
proportion_confint(310, 1126, alpha=(1 - 0.99))

## Testing the Proportion

The website administrator claims that 30% of visitors to the website click the advertisement. Is this true? The sample proportion does not match the administrator's claim, but this does not discredit the claim.

We will do a **statistical test** to test the administrator's claim. We test the **null hypothesis**:
i
$$H_0: p = 0.3$$

(where $p$ denotes the true proportion of visitors who click the ad on the site) against the **alternative hypothesis**:

$$H_A: p \neq 0.3$$

How do we do this? We first compute a **test statistic**.

$$z = \frac{\hat{p} - p_0}{\sqrt{p_0(1 - p_0)}} = \frac{\hat{p} - 0.3}{\sqrt{0.3(1 - 0.3)}}$$

We then compute a $p$-value, which can be interpreted as the probability of observing a test statistic at least as "extreme" as the test statistic actually observed. If the $p$-value is small, we will reject $H_0$ and conclude that the administrator's claim is false; the proportion of visitors who click the ad is not $0.3$. If the $p$-value is not small, then we do not reject $H_0$; the evidence from our data does not contradict his claim.

What counts as a "small" $p$-value? Here, we will decide that if a $p$-value is less than 0.05, then the $p$-value is "small" and we reject the null hypothesis. If we see a $p$-value greater than 0.05, we will not reject the null hypothesis. (We could have chosen a number other than 0.05; maybe 0.01 if we wanted to err on the side of not contradicting the administrator.)

I now conduct the test and compute the $p$-value.

In [None]:
from statsmodels.stats.proportion import proportions_ztest    # Performs the test just described

res = proportions_ztest(count=310,
                        nobs=1126,
                        value=0.3,    # The hypothesized value of population proportion p
                        alternative='two-sided')    # Tests the "not equal to" alternative hypothesis

res    # A tuple; the first entry is the value of the test statistic, and the second is the p-value

Here, we got a test statistic of $z \approx -1.85$ and a $p$-value of $\approx 0.0636 > 0.05$. We conclude there is not enough statistical evidence to disagree with the website administrator.

## Testing for Common Proportions

The website decides to conduct an experiment. One day, the website shows its visitors different versions of an advertisement created by a sponsor. Users are randomly assigned to Version A and Version B. The website tracks how often Version A was clicked and how often Version B was clicked.

On this day, 516 visitors saw Version A of the ad, and 510 saw Version B. Of those who saw Version A, 108 clicked the ad, while 144 clicked Version B when shown.

Which ad generates more clicks?

Here we test the following hypotheses:

$$H_0: p_A = p_B$$
$$H_A: p_A \neq p_B$$

The test statistic for this test is:

$$z = \frac{\hat{p}_A - \hat{p}_B}{\sqrt{\frac{\hat{p}(1 - \hat{p})}{n_A + n_B}}}$$

where $\hat{p}_A$ and $\hat{p}_B$ are the sample proportions for group A and group B and $\hat{p}$ is the proportion from the pooled sample (grouping A and B together).

`proportions_ztest()` can perform this test.

In [None]:
import numpy as np

In [None]:
np.array([108, 144])

In [None]:
proportions_ztest(count=np.array([108, 144]),
                  nobs=np.array([516, 510]),
                  alternative='two-sided')

With a p-value of about 0.0066, which is small, we reject the null hypothesis; it appears that the two ads do not have the same proportion of clicks.

## $\chi^2$ Test for Goodness of Fit

The **$\chi^2$ test for goodness of fit** generalizes the test for a population proportion. Whereas we have worked before with variables that either do or do not have some characteristic (such as: a visitor either did or did not click an ad), this test checks whether a variable that could fall in some category has some distribution.

Suppose a website offers five colors of shoes: blue, black, brown, white, and red. We want to know whether each color is equally likely or not. That is, if $p_{\text{color}}$ is the proportion of shoe buyers who bought a particular color, we wish to test:

$$H_0: p_{\text{blue}} = p_{\text{black}} = p_{\text{brown}} = p_{\text{white}} = p_{\text{red}}$$
$$H_A: H_0 \text{ is false}$$

Suppose that out of 464 buyers of shoes, 98 bought blue shoes, 117 bought black shoes, 80 bought brown shoes, 73 bought white shoes, and 96 bought red shoes. If each shoe is equally likely to be bought, then $p_{\text{color}} = 0.2$ for every color. We would expect to see $0.2 \times 464 = 92.8$ pairs of each color sold if this were true.

We can now use the function `chisquare()` from **scipy.stats** to perform the test.

In [None]:
from scipy.stats import chisquare

In [None]:
chisquare(f_obs=[98, 117, 80, 73, 96],    # Observed frequency for each color
          f_exp=[464 * .2, 464 * .2, 464 * .2, 464 * .2, 464 * .2])    # Expected frequency under the null hypothesis

The p-value is approximately 0.0128. This is small, and suggests that the null hypothesis is false. It's likely that some shoe colors are more popular than others (black is a prime suspect).