---

# **Confidence Intervals – Theory Notes**

## 1) Definition of Confidence Interval (CI)

A **confidence interval (CI)** is a range of values, derived from a sample statistic, that is likely to contain the true value of the population parameter (e.g., mean, proportion) with a certain degree of confidence.

* Example: A 95% CI of \[48, 52] for the population mean means we are 95% confident that the true mean lies between 48 and 52.
* It does **not** mean that the probability is 95% that the mean lies in that interval; rather, if we repeatedly sampled and built intervals, about 95% of them would contain the true mean.

---

## 2) Confidence Level and Significance Level

* **Confidence level (1 – α):** The proportion of times the confidence interval will contain the true parameter in repeated samples.

  * Common choices: 90%, 95%, 99%.
* **Significance level (α):** Probability of error (risk of excluding the true parameter).

  * Example: For a 95% confidence level, α = 0.05.
* Relation:

  $$
  \text{Confidence level} = 1 - \alpha
  $$

---

## 3) CI for Population Mean when **σ is known**

When population standard deviation **σ** is known (rare in practice), and sample mean = $\bar{X}$, sample size = $n$:

$$
CI = \bar{X} \; \pm \; Z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}
$$

* Here $Z_{\alpha/2}$ is the **Z-critical value** from the standard normal distribution.

---

## 4) CI for Population Mean when **σ is unknown**

When population standard deviation is **unknown** (most common case), we use the sample standard deviation $s$ instead of σ, and the **t-distribution** instead of Z.

$$
CI = \bar{X} \; \pm \; t_{\alpha/2, \, n-1} \cdot \frac{s}{\sqrt{n}}
$$

* Here $t_{\alpha/2, n-1}$ is the **t-critical value** with $(n-1)$ degrees of freedom.
* The t-distribution is wider than Z (to account for extra uncertainty when estimating σ).

---

## 5) Critical Z and t values for Common Confidence Levels

For large samples (n > 30), **Z-values** are typically used. For small samples (n ≤ 30), **t-values** are used (depending on degrees of freedom).

| Confidence Level | α    | Z (two-tailed) | t (approx, df = 30) |
| ---------------- | ---- | -------------- | ------------------- |
| 90%              | 0.10 | 1.645          | ≈ 1.697             |
| 95%              | 0.05 | 1.960          | ≈ 2.042             |
| 99%              | 0.01 | 2.576          | ≈ 2.750             |

* As degrees of freedom increase, **t → Z**.

---

✅ Summary:

* If σ known → use Z and population σ.
* If σ unknown → use t and sample s.
* Confidence level sets how wide the interval is (higher confidence → wider interval).

---


![image.png](attachment:45290440-c1f1-4eae-b936-845a46f70c91.png)!


# 1) σ Known → **Z–interval for the population mean**

**Problem.**
A factory measures the weight of a product. The population standard deviation is known to be **σ = 10 g**. From a sample of **n = 25** items, the sample mean is **x̄ = 52 g**. Construct a **95% CI** for the true mean weight μ.

**Solution (step by step).**

1. Confidence level = 95% ⇒ α = 0.05 ⇒ **Z\_{α/2} = 1.96**.
2. Standard error: $\text{SE} = \sigma/\sqrt{n} = 10/\sqrt{25} = 10/5 = 2$.
3. Margin of error: $E = Z\_{α/2}\times \text{SE} = 1.96 \times 2 = 3.92$.
4. CI: $\mu \in [\,x̄ - E,\; x̄ + E\,] = [\,52 - 3.92,\; 52 + 3.92\,]$.

**Answer:** **95% CI = \[48.08, 55.92] g**.
**Interpretation:** If we repeated this sampling many times, \~95% of such intervals would cover the true mean.

---
**Outline**
1. σ unknown → t with df = (n−1).
2. Compute SE (σ/√n or s/√n).
3. Pick critical value (Z\_{α/2} or t\_{α/2,df}).
4. Margin = critical × SE.
5. Report **x̄ ± margin** with a clear interpretation.


# 2) σ Unknown → **t–interval for the population mean**

**Problem.**
A new training module’s completion time (minutes) is sampled: **n = 16**, sample mean **x̄ = 20.5**, sample standard deviation **s = 4.0**. Construct a **95% CI** for the true mean time μ.

**Solution (step by step).**

1. σ unknown ⇒ use **t** with **df = n − 1 = 15**.
2. For 95%: α = 0.05 ⇒ **t\_{α/2,15} ≈ 2.131**.
3. Standard error: $\text{SE} = s/\sqrt{n} = 4/\sqrt{16} = 4/4 = 1$.
4. Margin of error: $E = t\_{α/2,15}\times \text{SE} = 2.131 \times 1 = 2.131$.
5. CI: $\mu \in [\,x̄ - E,\; x̄ + E\,] = [\,20.5 - 2.131,\; 20.5 + 2.131\,]$.

**Answer:** **95% CI = \[18.369, 22.631] minutes**.
**Interpretation:** Accounting for the extra uncertainty from estimating σ, we get a wider interval than the Z case.

---


#### Credit card problem

In [10]:
import numpy as np
from scipy import stats

In [7]:
# Given
n = 140
x_bar = 1990
sigma = 2500
alpha = 0.05
z_stat = stats.norm.ppf(1-alpha/2)
CI_left = x_bar - z_stat*(sigma/np.sqrt(n))
CI_right = x_bar + z_stat*(sigma/np.sqrt(n))
print(CI_left, CI_right)

1575.8820248378292 2404.1179751621708


#### Employee timings problem

In [14]:
# Given
n = 100
x_bar = 40
sigma = 10
alpha = 0.05
df = n-1

t_stat = stats.t.ppf(1-alpha/2, df)
CI_left = x_bar - t_stat*(sigma/np.sqrt(n))
CI_right = x_bar + t_stat*(sigma/np.sqrt(n))
print(CI_left, CI_right)

38.015783048491315 41.984216951508685
