# Confidence Intervals (CI)

A **Confidence Interval** is a range of values, derived from sample statistics, that is likely to contain the value of an unknown population parameter. While a Point Estimate gives you a single value, a Confidence Interval expresses the **uncertainty** of that estimate.

---

## 1. The Mathematical Formula
For a population mean ($\mu$), the confidence interval is calculated as:

$$CI = \bar{x} \pm \text{Margin of Error}$$
$$CI = \bar{x} \pm \left( Z^* \times \frac{\sigma}{\sqrt{n}} \right)$$

Where:
* $\bar{x}$: The Sample Mean (Point Estimate).
* $Z^*$: The **Critical Value** (determined by the confidence level).
* $\frac{\sigma}{\sqrt{n}}$: The **Standard Error (SE)**.
* $Z^* \times SE$: The **Margin of Error (MoE)**.

### Common Critical Values ($Z^*$):
* **90% Confidence:** $Z^* \approx 1.645$
* **95% Confidence:** $Z^* \approx 1.96$
* **99% Confidence:** $Z^* \approx 2.576$



---

## 2. Interpretation: What does "95% Confidence" mean?
It does **NOT** mean there is a 95% probability that the population mean lies within your specific interval.

It **DOES** mean that if you were to take 100 different samples and compute a confidence interval for each, approximately 95 of those intervals would contain the true population mean.



---

## 3. Data Science & ML Use Cases
* **Model Evaluation:** Instead of saying "Our model is 80% accurate," you say "We are 95% confident the model accuracy is between 78% and 82%."
* **A/B Testing:** Deciding if the uplift in conversion rate is large enough that the confidence intervals of the Control and Treatment groups do not overlap.
* **Risk Management:** Predicting future stock prices or energy demand within a range to account for volatility.

---

## 4. Python Implementation: Calculating CI
Use this code to calculate a 95% Confidence Interval for a set of sample data.


In [1]:

import numpy as np
from scipy import stats

# 1. Sample Data (e.g., heights of 30 people in cm)
data = [172, 168, 175, 170, 169, 174, 171, 173, 167, 170, 
        172, 169, 171, 170, 172, 168, 174, 171, 173, 170,
        169, 172, 171, 170, 175, 168, 172, 169, 171, 170]

# 2. Parameters
confidence = 0.95
n = len(data)
mean = np.mean(data)
std_err = stats.sem(data) # Standard Error of the Mean

# 3. Calculate Interval using the T-distribution (best for small/unknown samples)
h = std_err * stats.t.ppf((1 + confidence) / 2, n - 1)

start = mean - h
end = mean + h

print(f"Point Estimate (Mean): {mean:.2f}")
print(f"95% Confidence Interval: [{start:.2f}, {end:.2f}]")
print(f"Margin of Error: {h:.2f}")

Point Estimate (Mean): 170.87
95% Confidence Interval: [170.08, 171.65]
Margin of Error: 0.78


# The Z-Procedure

The **Z-procedure** is used to make inferences about a population mean ($\mu$) when the data follows a normal distribution and the population's variability is already established.

---

## 1. Mathematical Foundation
The Z-procedure relies on the **Standard Normal Distribution** ($Z \sim N(0, 1)$). 

### The Z-Interval Formula
To find the range where the true population mean likely exists:
$$\text{CI} = \bar{x} \pm Z^* \left( \frac{\sigma}{\sqrt{n}} \right)$$

### The Z-Test Statistic
To test if a sample comes from a population with a specific mean:
$$Z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}$$

**Where:**
* $\bar{x}$: Sample mean.
* $\mu_0$: Hypothesized population mean.
* $\sigma$: **Known** population standard deviation.
* $n$: Sample size.
* $Z^*$: Critical value (multiplier for confidence).



---

## 2. Necessary Criteria (Assumptions)
For the Z-procedure to be valid, these three conditions must be met:
1. **Randomness:** The data must come from a random sample or randomized experiment.
2. **Normality:** Either the population is normally distributed, or the sample size is large ($n \ge 30$) so the Central Limit Theorem applies.
3. **$\sigma$ is Known:** This is the specific requirement for Z-procedures. If you only have the sample standard deviation ($s$), you must use the **T-procedure** instead.

---

## 3. How to find the Z-Critical Value ($Z^*$)
The $Z^*$ value is the number of standard deviations you must go from the mean to capture a specific percentage of the data.

| Confidence Level | Alpha ($\alpha$) | $Z^*$ Value |
| :--- | :--- | :--- |
| 90% | 0.10 | 1.645 |
| 95% | 0.05 | 1.960 |
| 99% | 0.01 | 2.576 |



---

## 4. Use Cases in Data Science & ML
* **A/B Testing (Large Scale):** When you have millions of users and a stable baseline variance, Z-tests are used to see if a new feature improved conversion rates.
* **Anomaly Detection:** Calculating how many standard deviations a new data point is from the known population mean to identify "outliers."
* **Quality Control:** In manufacturing, if the "process spread" ($\sigma$) is known from historical data, a Z-procedure checks if the current batch mean is drifting.

---

## 5. Python Implementation
This code demonstrates how to find the Z-score and the P-value for an observed sample mean.


In [1]:

import numpy as np
from scipy.stats import norm

# 1. Inputs
pop_mean = 100       # Null hypothesis mean
pop_std = 15         # Known sigma
sample_size = 50
observed_mean = 105

# 2. Calculate Z-Statistic
standard_error = pop_std / np.sqrt(sample_size)
z_score = (observed_mean - pop_mean) / standard_error

# 3. Find P-Value (Two-tailed test)
p_value = 2 * (1 - norm.cdf(abs(z_score)))

print(f"Z-Statistic: {z_score:.4f}")
print(f"P-Value: {p_value:.4f}")

if p_value < 0.05:
    print("Result is statistically significant (Reject Null)")
else:
    print("Result is not significant (Fail to Reject Null)")

Z-Statistic: 2.3570
P-Value: 0.0184
Result is statistically significant (Reject Null)
