# Confidence Intervals and Related Distributions

This notebook discusses motivations for sample intervals, introduces the Gamma, Chi-Square, and Student-t distributions, explains how they are used to construct confidence intervals. It includes definitions, formulas, conditions for validity, Python visualizations, and worked examples.

## Why Do We Care About Confidence Intervals?

In real-world data analysis, we often want to estimate unknown population parameters — such as the mean income in a city, the proportion of defective items in a shipment, or the true support for a political candidate.

However, we almost never have access to the entire population. Instead, we collect a sample and compute statistics like the sample mean $\bar{X}$ or sample proportion $\hat{p}$.

But sample statistics vary from one sample to another. If we report just a single number — like a point estimate — we provide **no sense of how precise or uncertain** that estimate is.

---

### What Is a Confidence Interval?

A **confidence interval** gives us a **range of plausible values** for the unknown population parameter, based on our sample data.
[link text](https://)
- It helps us **quantify the uncertainty** in our estimate in a statistically principled way.

---

### Why Not Just Use the Sample Estimate?

Sample estimates (like $\bar{X}$) are useful, but they can be misleading without context. Two studies might report the same sample mean, but if one is based on 20 observations and the other on 2,000, our confidence in those estimates should be very different.

Confidence intervals:
- Reflect the **sample size** and **variability** in the data
- Provide a much richer and more honest summary of what we know

---

In short, confidence intervals help us make better decisions and communicate results with appropriate caution and clarity.


## Beyond the Normal: Key Distributions for Confidence Intervals

Before we introduce the theorems and formulas for confidence intervals, we need to review several key probability distributions beyond the normal distribution.

We will use code and its output to help us better understand the concepts and theorems. Below, we import the relevant Python libraries and set up the Seaborn display style.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import gamma, chi2, t, norm
import seaborn as sns
sns.set(style="whitegrid")



### Gamma Distribution

The Gamma distribution is a two-parameter continuous probability distribution characterized by shape parameter $ \alpha > 0 $ and scale parameter $ \beta > 0$.

- **PDF**:  $
f_{\alpha, \beta}(x) = \frac{x^{\alpha-1} e^{-x/\beta}}{\beta^\alpha \Gamma(\alpha)}$, for  $x \geq 0$,  where
$
\Gamma(\alpha) = \int_{0}^{\infty} s^{\alpha - 1} e^{-s} \, ds
$

- **Mean**: $ \mathbb{E}[X] = \alpha \beta $  
- **Variance**: $ \text{Var}(X) = \alpha \beta^2 $

Note that the Gamma function does not have a closed-form expression in general. However, it satisfies the recursive property  
$\Gamma(x + 1) = x \, \Gamma(x)$ for $x \geq 1$.  

In particular, when $x$ is a positive integer, $\Gamma(x) = (x - 1)!$.

Thus, the Gamma function is considered a generalization of the factorial function to the real (and even complex) numbers.



Next we visualize the Gamma distributions for different shape and scale parameters.

In [None]:
# Plot Gamma Distribution for different parameters
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# Varying shape parameter (α)
x = np.linspace(0, 10, 1000)
shapes = [1, 2, 3, 5, 10.5]
scale = 2

for shape in shapes:
    pdf = gamma.pdf(x, a=shape, scale=scale)
    axes[0].plot(x, pdf, label=f'α={shape}, β={scale}', linewidth=2)

axes[0].set_title('Gamma Distribution - Varying Shape Parameter (α)')
axes[0].set_xlabel('x')
axes[0].set_ylabel('Probability Density')
axes[0].legend()
axes[0].grid(True)

# Varying rate parameter (β)
shape = 2
scales = [5, 2, 1, 0.5, 0.2]

for scale in scales:
    pdf = gamma.pdf(x, a=shape, scale=scale)
    axes[1].plot(x, pdf, label=f'α={shape}, β={scale}', linewidth=2)

axes[1].set_title('Gamma Distribution - Varying Scale Parameter (β)')
axes[1].set_xlabel('x')
axes[1].set_ylabel('Probability Density')
axes[1].legend()
axes[1].grid(True)

plt.tight_layout()
plt.show()


The left subplot shows Gamma PDF curves for different shape parameters while keeping the scale parameter fixed. As expected, the curves have noticeably different shapes.

The right subplot shows Gamma PDF curves for different scale parameters while keeping the shape parameter fixed. These curves have similar overall shapes but appear stretched or compressed depending on the scale.

### Chi-Square Distribution


#### Definition and Properties

The Chi-Square distribution is a special case of the Gamma distribution and is widely used in confidence intervals and hypothesis testing. Its parameter $n$ is called degrees of freedome.

- **Relation**: $ \chi^2_n \sim \text{Gamma}(\alpha = \frac{n}{2}, \beta = 2) $
- **PDF**: $f_{n}(x) = \frac{x^{\frac{n}{2}-1} e^{-\frac{x}{2}}}{2^{\frac{n}{2}} \Gamma(\frac{n}{2})}$ for $x > 0$
- **Mean**: $ n $
- **Variance**: $ 2n $


Next, we will display the Chi-Square distributions for different degrees of freedom.

In [None]:
# Plot Chi-Square Distribution for different degrees of freedom
plt.figure(figsize=(10, 6))

x = np.linspace(0, 20, 1000)
degrees_of_freedom = [1, 2, 3, 5, 10]

for df in degrees_of_freedom:
    pdf = chi2.pdf(x, df)
    plt.plot(x, pdf, label=f'k={df}', linewidth=2)

plt.title('Chi-Square Distribution for Different Degrees of Freedom')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.show()



These PDF curves of the chi-square distribution have shapes similar to those of the Gamma curves shown earlier in the left subplot. This is expected, since the chi-square distribution is a special case of the Gamma distribution.

Next we will plot related chi-square and Gamma distributions together.

#### Relationship Between the Chi-Square and Gamma Distributions

Recall that  $ \chi^2_n \sim \text{Gamma}(\alpha = \frac{n}{2}, \beta = 2)$.

Next we will generate and plot both $\chi^2$ and corresponding $\Gamma$ distributions.



In [None]:
# Demonstrate relationship with Gamma distribution
print("\n### Relationship between Chi-Square and Gamma:")
n = 4
x = np.linspace(0, 15, 1000)

# Chi-square PDF
chi2_pdf = chi2.pdf(x, n)

# Equivalent Gamma PDF (shape = n/2, scale = 2)
gamma_pdf = gamma.pdf(x, a=n/2, scale=2)

plt.figure(figsize=(10, 6))
plt.plot(x, chi2_pdf, 'b-', linewidth=3, label=f'Chi-Square (n={n})')
plt.plot(x, gamma_pdf, 'r--', linewidth=2, label=f'Gamma (α={n/2}, β=2)')
plt.title(f'Chi-Square (n={n}) as Special Case of Gamma Distribution')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.show()

These two PDF curves, generated by the chi-square and Gamma distributions, match closely. This is expected, since they are theoretically equivalent.

#### Relationship Between the Chi-Square and Normal Distributions

The chi-square distribution is directly related to the standard normal distribution.

If $Z_1, Z_2, \dots, Z_n$ are independent standard normal random variables (i.e., $Z_i \sim N(0, 1)$), then the sum of their squares follows a chi-square distribution with $n$ degrees of freedom:

$$
X = \sum_{i=1}^{n} Z_i^2 \sim \chi^2_n
$$

In other words, the chi-square distribution with $n$ degrees of freedom is the distribution of the sum of the squares of $n$ independent standard normal variables.



**Interpretation:**

This relationship is foundational in statistics. It explains why the chi-square distribution is used in:

- Constructing confidence intervals for population variance  
- Hypothesis testing involving variance  
- Goodness-of-fit tests and contingency tables

It also forms the basis for deriving the Student-$t$ and some other distributions (e.g. the F distribution, not covered in the book), which involve ratios that include chi-square-distributed variables.


Next we will simulate the sum of the squares of 5 independent standard normal variables, and plot its histogram along with the theoretical chi-square distribution.

In [None]:
# Simulation settings
num_samples = 100000  # number of simulated experiments
n = 5  # degrees of freedom (number of independent standard normals)

# Generate n independent standard normal samples per experiment
Z = np.random.randn(num_samples, n)

# Compute the sum of squares of each row → Chi-Square(n)
X = np.sum(Z**2, axis=1)

# Plot the histogram and the theoretical chi-square PDF
x_vals = np.linspace(0, 30, 500)
plt.figure(figsize=(8, 5))
plt.hist(X, bins=100, density=True, alpha=0.6, label=f'Simulated: sum of {n} $Z^2$')

# Overlay theoretical Chi-Square PDF
plt.plot(x_vals, chi2.pdf(x_vals, df=n), 'r-', lw=2, label=f'Chi-Square PDF (df={n})')

plt.title(rf'Simulation: Sum of {n} Squared Standard Normals $\sim \chi^2_{{{n}}}$')

plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()
plt.grid(True)
plt.show()


Indeed, the histogram of the sum of squares of five independent standard normal variables closely matches the theoretical Chi-Square distribution.

### Student t-Distribution


#### Definition and Properties

**Probability Density Function (PDF)**

$f_n(x) = \frac{\Gamma\left(\frac{n + 1}{2}\right)}{\sqrt{n \pi} \, \Gamma\left(\frac{n}{2}\right)} \cdot \left(1 + \frac{x^2}{n} \right)^{-\frac{n + 1}{2}}$
---

**Where** $x \in \mathbb{R}$ is the variable, and $n$ is degrees of freedom (positive integer)



**Key Properties:**
- **Expected Value (Mean):** E[T] = 0 (for n > 1)
- **Variance:** Var(T) = n/(n-2) for n > 2, ∞ for 1 < n ≤ 2, undefined otherwise
- Similar shape to normal distribution but heavier tails
- Approaches the standard normal distribution as n → ∞
- Important for small sample inference

Next we plot the t-distrubtion for different degrees of freedom, along with the standard normal distribution.

In [None]:
# Plot t-Distribution for different degrees of freedom
plt.figure(figsize=(10, 6))

x = np.linspace(-4, 4, 1000)
degrees_of_freedom = [1, 2, 5, 10, 30]

for df in degrees_of_freedom:
    pdf = t.pdf(x, df)
    plt.plot(x, pdf, label=f'n={df}', linewidth=2)

# Standard normal for comparison
normal_pdf = norm.pdf(x)
plt.plot(x, normal_pdf, 'k--', linewidth=2, label='Standard Normal')

plt.title("Student's t-Distribution vs Standard Normal")
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.show()

All of these t-distributions are symmetric about the y-axis and resemble the standard normal distribution. As the degrees of freedom increase, the t-distribution approaches the normal distribution.

Next, we'll focus on the right tail of these curves to take a closer look at how the t-distributions have heavier tails.

In [None]:

# Compare tail behavior
plt.figure(figsize=(10, 6))
x_tail = np.linspace(2.5, 4, 1000)

for df in [1, 2, 5, 10, 30]:
    pdf = t.pdf(x_tail, df)
    plt.plot(x_tail, pdf, label=f't (n={df})', linewidth=2)

normal_pdf = norm.pdf(x_tail)
plt.plot(x_tail, normal_pdf, 'k--', linewidth=2, label='Standard Normal')

plt.title('Tail Comparison: t-Distribution vs Normal')
plt.xlabel('x')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.show()

Indeed, all of these t-distributions have higher PDF values than the standard normal distribution in the tail region. This indicates that these tails have greater probability mass and, thus, are considered heavier.


#### Connection to Normal and Chi-Square Distributions

If  
- $Z \sim N(0, 1)$  
- $V \sim \chi^2_n$  
- $Z$ and $V$ are independent

then  

$ T = \frac{Z}{\sqrt{V/n}} \sim t_n$

This shows that the Student-$t$ distribution arises as the ratio of a standard normal variable to the square root of a scaled chi-square variable.


---



Next we siimulate how the Student-t distribution arises from a standard normal numerator and a scaled chi-square denominator. Then we compares the simulated t-distribution, the theoretical t-distribution (from scipy.stats.t), and the standard normal distribution.

In [None]:

# Settings
num_samples = 100000
n = 10  # degrees of freedom

# Step 1: Generate standard normal values
Z = np.random.randn(num_samples)

# Step 2: Generate chi-square values with n degrees of freedom
V = np.random.chisquare(df=n, size=num_samples)

# Step 3: Compute the T values
T = Z / np.sqrt(V / n)

# Plot the simulated t-distribution and compare with theoretical t and normal
x_vals = np.linspace(-5, 5, 500)
plt.figure(figsize=(8, 5))

# Histogram of simulated T values
plt.hist(T, bins=100, density=True, alpha=0.5, label='Simulated $t_n$')

# Overlay theoretical t-distribution PDF
plt.plot(x_vals, t.pdf(x_vals, df=n), 'r-', lw=2, label=f'Theoretical $t_{{{n}}}$')

# Overlay standard normal for comparison
plt.plot(x_vals, norm.pdf(x_vals), 'k--', lw=2, label='Standard Normal')

plt.title(rf'Simulation: $T = Z / \sqrt{{V / {n}}} \sim t_{{{n}}}$')
plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()
plt.grid(True)
plt.show()


The histogram of the $T$ values computed from independent standard normal and chi-square values matches the theoretical t-surve well. Both are also similar to the standard normal curve.

## Confidence Intervals


### Understanding the Critical z-value

Before we define confidence intervals, it’s important to understand the **critical z-value**.

A **critical z-value** is a value from the **standard normal distribution** such that the area **in the right tail** beyond this value equals a specified probability $\alpha$. In other words,

$$
P(Z > z_{\alpha}) = {\alpha}
$$


In a two-tailed context, the critical value $z_{\alpha/2}$ satisfies: $
P(Z > z_\frac{\alpha}{2}) = \frac{\alpha}{2}
$. This means the area to the right of $z_{\alpha/2}$ is $\alpha/2$, and by symmetry, the area to the left of $-z_{\alpha/2}$ is also $\alpha/2$. The area **between** $-z_{\alpha/2}$ and $z_{\alpha/2}$ is $1 - \alpha$, i.e.

$$P( z_\frac{\alpha}{2} > Z > -z_\frac{\alpha}{2}) = 1 - {\alpha}$$



Next, we define a function that accepts an alpha parameter, computes the corresponding critical value, and plots the standard normal distribution with the confidence region and its boundaries highlighted.


In [None]:
def plot_critical_z(alpha=0.05):
    """
    Plot the standard normal distribution showing the critical z-values
    and the central confidence region corresponding to (1 - alpha).

    Parameters:
    alpha (float): error rate (e.g., 1- alpha is the confidence level)
    """
    z_crit = norm.ppf(1 - alpha / 2)

    x = np.linspace(-4, 4, 1000)
    y = norm.pdf(x)

    plt.figure(figsize=(10, 5))
    plt.plot(x, y, label='Standard Normal PDF', color='black')

    # Fill central (1 - alpha) region
    plt.fill_between(x, y, where=(x >= -z_crit) & (x <= z_crit),
                     color='skyblue', alpha=0.6,
                     label=f'Confidence Region ({100*(1 - alpha):.0f}%)')

    # Fill tail regions
    plt.fill_between(x, y, where=(x < -z_crit), color='salmon', alpha=0.6,
                     label=f'Tail Area ({100*alpha/2:.1f}%)')
    plt.fill_between(x, y, where=(x > z_crit), color='salmon', alpha=0.6)

    # Annotate critical values
    plt.axvline(-z_crit, color='blue', linestyle='--',
                label=f'$-z_{{\\alpha/2}} \\approx {-z_crit:.2f}$')
    plt.axvline(z_crit, color='blue', linestyle='--',
                label=f'$z_{{\\alpha/2}} \\approx {z_crit:.2f}$')

    plt.title(f'Critical z-values and Confidence Region for α = {alpha}')
    plt.xlabel('Z')
    plt.ylabel('Density')
    plt.legend(loc='upper left')
    plt.grid(True)
    plt.show()


Next, we will apply the function using different alpha values, specifically, 0.10, 0.05, and 0.01m. They correspond to 90%, 95%, and 99% confidence levels.


In [None]:
plot_critical_z(alpha=0.10)   # 90% confidence


In [None]:
plot_critical_z(alpha=0.05)   # 95% confidence


In [None]:
plot_critical_z(alpha=0.01)   # 99% confidence

As the error rate decreases and the confidence level increases, the confidence interval becomes wider.

### Assumption of Simple Random Sampling

Before we introduce any confidence interval formulas, it’s important to highlight a key assumption shared across all of them:

> **All standard confidence interval formulas assume that the sample is a simple random sample (SRS) from the population.**

A **simple random sample** means:
- Every member of the population has an equal chance of being selected,
- The observations in the sample are independent of one another.

This assumption is critical because it ensures that:
- Sample statistics (like the sample mean $\bar{X}$, sample proportion $\hat{p}$, sample variance $s^2$) are **unbiased estimators** of population parameters,
- The theoretical formulas for confidence intervals yield **valid coverage probabilities** (e.g., 95%).

---

### What If the Sample Is Not Random?

If the sample is not drawn randomly (e.g., convenience sample or voluntary response), or if the observations are not independent (e.g., clustered or time-dependent), then:
- The confidence interval may not be valid,
- The actual error rate may be much higher than expected,
- Any conclusions drawn from the interval could be misleading.

In such cases, more advanced methods are needed to adjust for the sampling design or dependence structure.

---

### Summary

| Type of Confidence Interval         | Requires Simple Random Sample? | Why?                                 |
|------------------------------------|-------------------------------|--------------------------------------|
| Population Mean                    | Yes                           | Ensures $\bar{X}$ is unbiased        |
| Population Proportion              | Yes                           | Validates binomial and CLT-based logic |
| Population Variance                | Yes                           | Based on chi-square distribution     |
| Correlation Coefficient            | Yes                           | Assumes independent bivariate pairs  |

In all cases, **randomness and independence** are essential for reliable inference.

In all of our examples and homework problems, the data are assumed to come from a simple random sample.


### Confidence Interval for the Mean (σ Known)

$
\bar{X} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}
$

where $1-\alpha$ is the confidence level and $n$ is the sample size. The value $\alpha$ represents the total error rate, or the probability that the true population mean falls outside the confidence interval.


**Is a normal population required?**

- Yes, if the sample size is small. The formula is exact only when the population is normally distributed.
- No, if the sample size is large (typically $n \geq 30$). The Central Limit Theorem (CLT) allows the sampling distribution of the sample mean to be approximately normal, so the formula is still valid for large samples.






#### Margin of Error

In the confidence interval formula introduced above:

$$
\bar{X} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}},
$$

the term $z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}$ is commonly referred to as the **margin of error**.

The margin of error represents the **maximum expected difference** between the sample mean and the true population mean, given the specified confidence level.

This concept generalizes beyond the population mean — it also applies to confidence intervals for other quantities, such as **proportions** and **variances**, and to other probability distributions. We will explore these extensions later.

The margin of error also defines the **half-width** of the confidence interval:

$$
\text{Interval Center} \pm \text{Margin of Error}
$$

A smaller margin of error means the estimate is more precise. The margin of error can generally be reduced by:

- Increasing the sample size $n$
- Decreasing the confidence level (which lowers $z_{\alpha/2}$)
- Reducing the variability in the population (i.e., a smaller $\sigma$)


#### Example

Next we go through an example from the textbook, allowing different confidence leavels for easy exploration with the code.

First the data, sample mean, and known standard deviation.

In [None]:

# Example 3.5: Trace element in ingots (known variance)
print("\n### Example 3.5: Trace Elements in Ingots (Known Variance)")

# Data from Table 3.3
data = np.array([125, 110, 112, 116, 131, 108,
                 114, 121, 107, 106, 121, 106,
                 107, 113, 110, 113, 100, 121,
                 112, 109, 113, 113, 116, 109,
                 114, 123, 104, 112, 108, 113])

sample_mean = np.mean(data)
n = len(data)
known_variance = 49  # σ² = 49
known_std = np.sqrt(known_variance)  # σ = 7

print(f"Sample size: n = {n}")
print(f"Sample mean: X̄ = {sample_mean:.3f} ppm")
print(f"Known population standard deviation: σ = {known_std}")

The output confirms the sample mean reported in the book. Next, we compute the confidence interval for the mean using the confidence level set by the variables below.

In [None]:

# --- confidence levels ---
confidence_level = 0.90  # Change to 0.95, 0.99, etc.
alpha = 1 - confidence_level

In [None]:

z_critical = norm.ppf(1 - alpha / 2) #ppf serves as inverse cdf

margin_of_error = z_critical * known_std / np.sqrt(n)
ci_lower = sample_mean - margin_of_error
ci_upper = sample_mean + margin_of_error

print(f"\n{int(confidence_level * 100)}% Confidence Interval:")
print(f"Critical value: z_{alpha/2:.3f} = {z_critical:.3f}")
print(f"Margin of error: {margin_of_error:.3f}")
print(f"Confidence Interval: ({ci_lower:.3f}, {ci_upper:.3f})")

The results for the 95% confidence level match those presented in the textbook. Note that the confidence level used above may differ from 95%, depending on the  value of alpha that was set earlier.

Next, we visualize the confidence interval.

In [None]:
# Visualization
plt.figure(figsize=(10, 6))

# Plot normal distribution
x = np.linspace(sample_mean - 4*known_std/np.sqrt(n),
                sample_mean + 4*known_std/np.sqrt(n), 1000)
pdf = norm.pdf(x, sample_mean, known_std/np.sqrt(n))

plt.plot(x, pdf, 'b-', linewidth=2, label='Sampling Distribution of X̄')
plt.axvline(sample_mean, color='red', linestyle='--',
            label=f'Sample Mean = {sample_mean:.1f}')

# Shade confidence interval
x_ci = np.linspace(ci_lower, ci_upper, 100)
pdf_ci = norm.pdf(x_ci, sample_mean, known_std/np.sqrt(n))
plt.fill_between(x_ci, pdf_ci, alpha=0.3, color='blue',
                 label=f'{int(confidence_level * 100)}% Confidence Interval')

plt.axvline(ci_lower, color='green', linestyle=':', alpha=0.7)
plt.axvline(ci_upper, color='green', linestyle=':', alpha=0.7)

plt.title(f'{int(confidence_level * 100)}% Confidence Interval for Mean (Known Variance)')
plt.xlabel('Sample Mean (ppm)')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.show()


### Confidence Interval for the Mean (σ Unknown)

**Formula:**


$\bar{X} \pm t_{n-1, \alpha/2} \cdot \dfrac{s_n}{\sqrt{n}}$

where $s_n$ is the sample standard deviation, and $t_{n-1, \alpha/2}$ is the critical value from the t-distribution with $n - 1$ degrees of freedom.

Specifically, for the upper-tail critical value $t_{n,\alpha}$:

$$
P(T_n > t_{n,\alpha}) = \alpha
$$

Conceptually, for the computation of confidence interval, $t_{n-1, \alpha/2}$ plays the same role in the t-distribution that $z_{\alpha/2}$ does in the standard normal distribution: it marks the cutoff point beyond which only $\alpha/2$ of the probability remains in each tail.


**Is a normal population required?**

- Yes, especially for small sample sizes. The $t$-distribution is derived under the assumption of normality.
- No, if the sample size is large. The $t$-distribution approaches the normal distribution, so the formula becomes approximately valid.

Next we use code to solve example 3.6

In [None]:
# Example 3.6: Trace element in ingots (unknown variance)
print("\n### Example 3.6: Trace Elements in Ingots (Unknown Variance)")


# Assuming data, sample_mean, known_std, and n are defined
sample_std = np.std(data, ddof=1)  # Sample standard deviation
print(f"Sample standard deviation: s = {sample_std:.1f}")

# --- Confidence Interval using t-distribution ---
df = n - 1
t_critical = t.ppf(1 - alpha / 2, df)

margin_of_error_t = t_critical * sample_std / np.sqrt(n)
ci_lower_t = sample_mean - margin_of_error_t
ci_upper_t = sample_mean + margin_of_error_t

print(f"\n{int(confidence_level * 100)}% Confidence Interval (Unknown Variance):")
print(f"Degrees of freedom: df = {df}")
print(f"Critical value: t_({df}, {alpha/2:.3f}) = {t_critical:.3f}")
print(f"Margin of error: {margin_of_error_t:.3f}")
print(f"Confidence Interval: ({ci_lower_t:.3f}, {ci_upper_t:.3f})")

Once again, the results for the 95% confidence level match those presented in the textbook. Note that the confidence level used above may differ from 95%, depending on the value of alpha that was set earlier.

Next we compare the confidence intervals computed from sample variance (right above, using t-distribution) and true variance (earlier, using the normal distribution), and plot the t-distribution and normal distribution.

In [None]:
# --- Comparison (assuming known variance CI already computed) ---
print("\nComparison:")
print(f"Known variance CI: ({ci_lower:.3f}, {ci_upper:.3f})")
print(f"Unknown variance CI: ({ci_lower_t:.3f}, {ci_upper_t:.3f})")
print(f"Difference in width: {((ci_upper_t - ci_lower_t) - (ci_upper - ci_lower)):.3f}")

# --- Visualization ---
plt.figure(figsize=(12, 6))

x = np.linspace(sample_mean - 4 * sample_std / np.sqrt(n),
                sample_mean + 4 * sample_std / np.sqrt(n), 1000)

pdf_normal = norm.pdf(x, sample_mean, known_std / np.sqrt(n))
pdf_t = t.pdf((x - sample_mean) / (sample_std / np.sqrt(n)), df) * np.sqrt(n) / sample_std

plt.plot(x, pdf_normal, 'b-', linewidth=2, label='Normal (Known σ)')
plt.plot(x, pdf_t, 'r-', linewidth=2, label=f't-distribution (df={df})')

plt.axvline(ci_lower, color='blue', linestyle='--', alpha=0.7, label='Normal CI bounds')
plt.axvline(ci_upper, color='blue', linestyle='--', alpha=0.7)
plt.axvline(ci_lower_t, color='red', linestyle='--', alpha=0.7, label='t-dist CI bounds')
plt.axvline(ci_upper_t, color='red', linestyle='--', alpha=0.7)
plt.axvline(sample_mean, color='black', linestyle='-', label=f'Sample Mean = {sample_mean:.1f}')

plt.title(f'Comparison: {int(confidence_level * 100)}% Confidence Intervals (Known vs Unknown Variance)')
plt.xlabel('Sample Mean (ppm)')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.show()

For this particular example, the two distributions curves are very similar to each other. So are the confidence intervals.

### Confidence Interval for Proportions
**Note:** We covered Section 3.3.2 (including Example 3.7) and Problem 3.6 in class. The text below is provided for your reference on the theoretical aspects of this topic.


**Motivation**

Estimating a population proportion is a common goal in statistics. For example, we may want to estimate:

- The proportion of defective parts in a shipment
- The proportion of voters who support a candidate
- The proportion of emails that are spam

Rather than trying to measure the entire population, we collect a sample and compute the sample proportion $\bar{X} = \hat{p} = \frac{x}{n}$, where:

- $x$ is the number of "successes" in the sample
- $n$ is the total sample size

We then construct a **confidence interval** to estimate the true population proportion $p$.

---

**Confidence Interval Formula**

When the sample size is sufficiently large, the confidence interval for the population proportion $p$ is given by:

$$
\hat{p_n} \pm z_{\alpha/2} \cdot \sqrt{\frac{\hat{p_n}(1 - \hat{p_n})}{n} }
$$

where:

- $\hat{p_n}$ is the sample proportion
- $z_{\alpha/2}$ is the critical value from the standard normal distribution
- $n$ is the sample size
- $1 - \alpha$ is the confidence level

---

#### **When Is This Formula Valid?**

This formula is **approximately valid** under the following conditions:

- The sample is a **simple random sample** from the population
- The sample size is **large enough** so that both:
  - $n \hat{p} \geq 5$
  - $n(1 - \hat{p}) \geq 5$

These conditions ensure that the sampling distribution of $\hat{p}$ is approximately normal, according to the Central Limit Theorem.

---

#### **Is a Normal Population Required?**

**No, when the sample size is large enough.** Since we are dealing with a proportion (a discrete quantity), the population itself is **not normally distributed**.

However, the formula relies on the **sampling distribution of the sample proportion** $\hat{p}$ being approximately normal. This happens when the sample size is large and the success/failure conditions above are met.



Specifically:

$$
\hat{p} \approx N\left( \hat{p}, \sqrt{\hat{p}(1 - \hat{p}) / n} \right)
$$

That is, the sampling distribution of $\hat{p}$ is approximately normal, with:

- **Mean**: $\hat{p}$ (the observed sample proportion)
- **Standard deviation** (also called the standard error):  
  $$
  \sqrt{ \hat{p}(1 - \hat{p}) / n }
  $$

This approximation allows us to use the **standard normal distribution** to construct a confidence interval, even though $\hat{p}$ is a proportion (not a continuous variable).

---

#### What to Do Instead for Small Samples

If the sample size is small and the success/failure conditions are not met (i.e.,  
$n \hat{p} < 5$ or $n(1 - \hat{p}) < 5$), then the formula above should **not** be used for constructing a confidence interval for a proportion.

Instead, consider one of the following approaches:

- **Exact Binomial Interval (Clopper–Pearson):**  
  A conservative method based on the exact binomial distribution, not an approximation.

- **Wilson Score Interval:**  
  A more accurate method than the standard normal approximation, especially for small $n$ or extreme proportions.

- **Bayesian Intervals:**  
  Use a prior distribution and provide a credible interval for the true proportion. These are conceptually different from classical confidence intervals but often better calibrated for small samples.

These methods provide more reliable results when the sample size is small or the proportion is very close to 0 or 1. They are beyond the scope of our course.

