## Lecture 17 - Confidence Intervals
### Math 032 - Summer 2024

In [None]:
import numpy as np
import scipy

$Z$ standard normal, we are interested in the *number of standard deviations* away from the mean. $\mu = 0$, $\sigma^2 =1$

#### 68-95-99.7 Rule
$$P(-1 \leq Z \leq 1) \approx 0.683$$
$$P(-2 \leq Z \leq 2) \approx 0.954$$
$$P(-3 \leq Z \leq 3) \approx 0.997$$

### Confidence Intervals

$X_1,...,X_n$ are samples from a common distribution. The sample mean $\overline{X}= \frac{X_1+...+X_n}{n}$ is used to estimate the true population mean $\mu$.

## Example -- Sampling Voters

$\mu = $ proportion of registered voters in the U.S. that will vote for Candidate A.

$\overline{X} = $ porportion of voters sampled in a pooll that say they will vote for Candidate A.

Suppose we <u>observe</u>: $\overline{X} = 504$ out of $1000$ voters say they will vote for A. <u>How confident</u> should you be that A will win?

$$P(\mu > 0.5\  \mid \  \overline{X} = 0.504) = \ ?$$


Asking what $P(\mu > 0.5)$ equals is actually a *silly question*: Either $\mu > 0.5$, or $\mu \leq 0.5$, so the probability is either $0$ or $1$. 

True $\mu$ unknown $\rightarrow$ can't use $z$-table to find probability.

e.g. we can't solve for $Z = \frac{\overline{X}-\mu}{\sigma/\sqrt{n}}$ since we don't know $\mu$, or even $\sigma$.

#### <u>Observation</u>: 

$$\overline{X} = \frac{504}{1000}=.504$$

$Z = \frac{\overline{X}-\mu}{\sigma/\sqrt{n}}$ is approximately normal.

### z-scores

A z-score with significance level $\alpha$

A z-score, denoted $z_\alpha$, is defined as the number of standard deviations an observation from a standard normal distribution is from the mean. For example, if $\alpha = 0.1$, then


 $$P(Z>z_{0.1}) = 0.1$$

Using a $z$-table, we find that $z_{0.1} \approx 1.28$.


In [None]:
alpha = ...
z = ...
print(z)

A **confidence interval** with confidence level $CL$ is an interval of the form $(c_l,c_u)$ (two-sided) or $(c_l,\infty)$ (one-sided), in which we are $CL$\% *certain* the true mean $\mu$ falls in that interval.


Steps in constructing a **Confidence Interval**

- Choose a **Confidence Level** CL.
    - e.g. 90,95, or 99\%
- Set $\alpha = 1-CL$
  - if CL = 90\%, then $\alpha = 1-0.9 = 0.1$
- Find the corresponding **z-score** using a table or calculator.
- If $\sigma$ is known, use $\sigma$. Otherwise, find the worst case scenario.
- Contstruct a one or two-sided **confidence interval**.
    - If one-sided, then
   $$c_l = \overline{X} - z_\alpha \frac{\sigma}{\sqrt{n}}$$
    - If two-sided, then
 $$c_l,c_u = \overline{X} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}$$
   

<u>Observation</u>: 

The proportion of voters polled that vote for A is 

$$\overline{X} = \frac{504}{10000}=.504$$


- Let's use a **confidence level** of $CL = 90$\%
- $\alpha = 1-0.9 = 0.1 $
- The corresponding **z-score** is $z_{0.1} \approx 1.28$.
- In our case, since the unerlying distribution is bernoulli, $\sigma^2 = p(1-p) = \mu(1-\mu)$. Therefore $\sigma \leq 1/2$.
- The lower bound for the **confidence interval** is

 $$c_l = \overline{X} - z_\alpha \frac{\sigma}{\sqrt{n}} = .504 - 1.28\frac{0.5}{\sqrt{1000}}$$



In [None]:
xbar = ...
n = ...
alpha = ...
z = ...
sigma = ...
c_lower = ...
print(c_lower)

So we are $90$\% certain the $\mu \in (0.4837,1]$. That is, we are $90$\% confident that the *true proportion* of voters that vote for Candidate A is *at least* $0.4837$.

### Question:

What if we had sampled $100,000$ voters, and observed $50,400$ voters voting for Candidate A? What would change in our estimate?

In [None]:
xbar = ...
n = ...
alpha = ...
z = ...
sigma = ...
c_lower = ...
print(c_lower)

We would now be $90$\% certain the $\mu \in (0.502,1]$. That is, we would be $90$\% confident that the *true proportion* of voters that vote for Candidate A is *at least* $0.502$. Hence, we would be at least $90$\% certain that candidate A will win the election. This is why it is important to get a large sample size.

In [1]:
## Examples 

<img src="lec17_examples.png" style="float: center; width: 50%">