# Content   

[Confidence Intervals and Margin of Error]()

[Confidence Interval Simulation and Interpreting the Confidence *Level*]()

[Conditions for a Valid Confidence Interval for a Proportion]()

[Critical Value (z*) for a Given Confidence Level]()

[Constructing and Interpreting a Confidence Interval for `p`]()

[Determining Sample Size]()

### Confidence Intervals and Margin of Error

#### Theory
A **point estimate** (like a sample proportion, `p̂`) is our single best guess for the true value of a population parameter (like `p`). However, this guess is almost certainly wrong due to random sampling variability.

A **confidence interval** addresses this by providing a range of plausible values for the true parameter. Instead of a single point, we give an interval that we believe contains the true value.

The general structure of a confidence interval is:
**Point Estimate ± Margin of Error**

The **Margin of Error (ME)** tells us how much we expect our sample statistic (`p̂`) to vary from the true parameter (`p`) due to random chance, for a given level of confidence. It represents the "plus or minus" part that creates the width of the interval. A smaller margin of error means a more precise estimate.

**Analogy:** Imagine trying to catch a fish (`p`, the true parameter) in a lake with a small net (`p̂`, your sample). You might miss. A confidence interval is like using a much larger net. You don't know exactly where the fish is inside the net, but you're much more confident that you've caught it. The margin of error is like the radius of your big net.

---

### Confidence Interval Simulation and Interpreting the Confidence *Level*

This is one of the most misunderstood concepts in statistics.

#### Theory
The **confidence level** (e.g., 95%) does **not** tell us the probability that our *one* calculated interval contains the true parameter. The true parameter `p` is a fixed number; it's either in our interval or it's not. The probability is either 1 or 0.

Instead, the **confidence level refers to the long-run success rate of the method**.

**Simulation Idea:**
Imagine 100 different researchers all go out and collect their own random samples from the same population. They each compute their own 95% confidence interval based on their sample data. Because of sampling variability, their intervals will all be slightly different.

A 95% confidence level means that we expect about **95 of those 100 intervals** to successfully capture the true population parameter. The other 5 intervals will miss the true parameter entirely, just due to bad luck in their random sample.

**Correct Interpretation of a 95% Confidence Level:**
*   "I am 95% confident that the *method* I used to construct this interval has captured the true population proportion."
*   "If I were to take many, many samples and construct an interval for each, I would expect 95% of those intervals to contain the true population proportion."

**Incorrect Interpretation:**
*   "There is a 95% chance that the true proportion `p` is in my interval [0.52, 0.58]." (This is wrong!)

---

### Conditions for a Valid Confidence Interval for a Proportion

For the math behind the confidence interval to be reliable, we must check the same conditions we did for the sampling distribution.

1.  **Random:** The data must come from a well-designed random sample or randomized experiment. This prevents bias.
2.  **Independent (10% Rule):** The sample size `n` should be no more than 10% of the population size `N` (`n ≤ 0.10N`).
3.  **Large Counts Condition (Normality):** Since we don't know the true `p`, we use our sample proportion `p̂` as our best guess. The condition requires that the number of successes and failures *in our sample* are both at least 10.
    *   **n * p̂ ≥ 10**  and  **n * (1-p̂) ≥ 10**

---

### Critical Value (z*) for a Given Confidence Level

#### Theory
The **critical value**, denoted **z*** (read "z-star"), defines the width of our interval. It's the number of standard deviations we need to go out from the mean of a standard Normal distribution to capture the central area equal to our confidence level.

*   For a **95%** confidence interval, we want the central 95% of the distribution. This leaves 5% for the tails, or 2.5% in each tail. We need the Z-score that corresponds to the 97.5th percentile (1 - 0.025). This value is **z* = 1.96**.
*   For a **90%** confidence interval, we leave 5% in each tail. We need the Z-score for the 95th percentile. This value is **z* = 1.645**.
*   For a **99%** confidence interval, we leave 0.5% in each tail. We need the Z-score for the 99.5th percentile. This value is **z* = 2.576**.

---

### Constructing and Interpreting a Confidence Interval for `p`

#### Theory
Now we combine all the pieces. The formula for a one-sample z-interval for a population proportion `p` is:
**p̂ ± z* * √[ p̂(1-p̂) / n ]**

Let's break it down:
*   `p̂`: The point estimate (from our sample).
*   `z*`: The critical value (determined by the confidence level).
*   `√[ p̂(1-p̂) / n ]`: This is the **Standard Error of the Proportion (SE_{p̂})**. It's our estimate of the standard deviation of the sampling distribution, since we have to use `p̂` instead of the unknown `p`.
*   The entire `z* * SE_{p̂}` part is the **Margin of Error (ME)**.

#### Calculation Example
A city wants to build a new park and surveys 400 residents. 220 of them say they support the new park. Let's construct and interpret a 95% confidence interval.

1.  **State:** We want to estimate the true proportion `p` of all city residents who support the park, with 95% confidence.
2.  **Plan:** We will use a one-sample z-interval for `p`.
    *   **Check Conditions:**
        *   **Random:** The problem states it was a survey, we assume it was random.
        *   **Independent:** 400 residents is likely less than 10% of all residents in a city.
        *   **Large Counts:** `p̂ = 220/400 = 0.55`.
            *   `n * p̂ = 400 * 0.55 = 220` (which is ≥ 10).
            *   `n * (1-p̂) = 400 * 0.45 = 180` (which is ≥ 10).
        *   Conditions are met.
3.  **Do:**
    *   `p̂ = 0.55`
    *   `z*` for 95% confidence is `1.96`.
    *   `SE_{p̂} = √[ 0.55(1-0.55) / 400 ] = √[ 0.2475 / 400 ] ≈ 0.02487`
    *   `ME = z* * SE_{p̂} = 1.96 * 0.02487 ≈ 0.04875`
    *   **Interval = p̂ ± ME = 0.55 ± 0.04875**
    *   Interval = (0.50125, 0.59875) or about **(50.1%, 59.9%)**

4.  **Conclude:** "We are 95% confident that the interval from 50.1% to 59.9% captures the true proportion of all city residents who support building the new park."

---

### Determining Sample Size

#### Theory
What if we want to plan a study and need to achieve a specific margin of error? We can work backward from the margin of error formula to solve for the required sample size `n`.

`ME = z* * √[ p*(1-p*) / n ]`
Solving for `n` gives:
**n = (z* / ME)² * p*(1-p*)**

There's a catch: to calculate the sample size `n`, we need `p*`, which is a guess for the true proportion `p`... which we don't know yet! We have two options:
1.  **Use a prior estimate:** If a similar study has been done before, use that `p̂` as your guess for `p*`.
2.  **Be conservative (most common):** Use **p* = 0.5**. The term `p*(1-p*)` is maximized when `p*=0.5`. Using this value will guarantee your sample size is large enough to achieve the desired margin of error, regardless of the true value of `p`.

#### Calculation Example
A political campaign wants to estimate its candidate's support with **95% confidence** and a **margin of error of no more than 3%** (`ME = 0.03`). How many people do they need to survey?

1.  `z*` for 95% confidence is `1.96`.
2.  `ME = 0.03`.
3.  Since they have no prior information, they must use the conservative estimate `p* = 0.5`.
4.  `n = (1.96 / 0.03)² * 0.5(1-0.5)`
    `n = (65.333)² * 0.25`
    `n = 4268.4 * 0.25 ≈ 1067.1`

Since we can't survey a fraction of a person, we **always round up** to the next whole number. They need to survey **1068** people.

***

### Python Code Illustration


In [1]:
import numpy as np
from scipy.stats import norm

# --- Part 1: Finding the Critical Value (z*) ---
confidence_level = 0.95
# The area in one tail is (1 - confidence_level) / 2
alpha_half = (1 - confidence_level) / 2
# We want the Z-score at the upper boundary, so we look for the percentile 1 - alpha_half
z_star = norm.ppf(1 - alpha_half)
print(f"--- Critical Value (z*) ---")
print(f"The critical value z* for a {confidence_level*100}% confidence level is: {z_star:.3f}\n")


# --- Part 2: Constructing a Confidence Interval ---
print("--- Constructing a Confidence Interval ---")
# Using our park survey example
n = 400
successes = 220
p_hat = successes / n

# We already found z_star for 95% confidence
# Calculate the Standard Error (SE)
standard_error = np.sqrt(p_hat * (1 - p_hat) / n)

# Calculate the Margin of Error (ME)
margin_of_error = z_star * standard_error

# Calculate the confidence interval bounds
lower_bound = p_hat - margin_of_error
upper_bound = p_hat + margin_of_error

print(f"Sample Proportion (p̂): {p_hat:.4f}")
print(f"Standard Error (SE): {standard_error:.4f}")
print(f"Margin of Error (ME): {margin_of_error:.4f}")
print(f"95% Confidence Interval: ({lower_bound:.4f}, {upper_bound:.4f})\n")
print("Interpretation: We are 95% confident that the interval from 50.12% to 59.88% captures the true proportion of all residents who support the park.")
print("-" * 50)


# --- Part 3: Determining Sample Size ---
print("\n--- Determining Sample Size ---")
# Goal: 95% confidence, Margin of Error of 3%

# z* for 95% confidence is the same as above
ME_desired = 0.03

# Case 1: No prior information, use the conservative p* = 0.5
p_star_conservative = 0.5
n_conservative = ((z_star / ME_desired)**2) * (p_star_conservative * (1-p_star_conservative))
# Always round up
n_conservative = np.ceil(n_conservative)

print(f"Required sample size (using p*=0.5): {n_conservative:.0f}")

# Case 2: We have a prior belief that the true proportion is around 55%
p_star_prior = 0.55
n_prior = ((z_star / ME_desired)**2) * (p_star_prior * (1-p_star_prior))
n_prior = np.ceil(n_prior)

print(f"Required sample size (using prior p*=0.55): {n_prior:.0f}")
print("\nNote that using a prior estimate (if you trust it) can reduce the required sample size.")


--- Critical Value (z*) ---
The critical value z* for a 95.0% confidence level is: 1.960

--- Constructing a Confidence Interval ---
Sample Proportion (p̂): 0.5500
Standard Error (SE): 0.0249
Margin of Error (ME): 0.0488
95% Confidence Interval: (0.5012, 0.5988)

Interpretation: We are 95% confident that the interval from 50.12% to 59.88% captures the true proportion of all residents who support the park.
--------------------------------------------------

--- Determining Sample Size ---
Required sample size (using p*=0.5): 1068
Required sample size (using prior p*=0.55): 1057

Note that using a prior estimate (if you trust it) can reduce the required sample size.
