# **Confidence Interval (CI)**


Confidence intervals are a range of values within which we can be confident that the true population parameter lies. This range is estimated based on a sample from the population and a chosen level of confidence. The level of confidence speaks to the likelihood that the genuine populace parameter lies inside the certainty interim. It provides an estimate of the uncertainty around the sample estimate and is a fundamental concept in inferential statistics.

## **Key Concepts of Confidence Interval**

1. **Point Estimate**: A single value estimate of a population parameter (e.g., sample mean xˉ, sample proportion p^).

2. **Confidence Level**: The probability that the confidence interval contains the true population parameter. Common confidence levels are 90%, 95%, and 99%.

3. **Margin of Error**: The range above and below the point estimate within which the true population parameter is expected to fall. It depends on the standard error of the estimate and the critical value from the sampling distribution.

4. **Critical Value**: A factor that reflects the desired confidence level, typically obtained from the standard normal distribution (z-distribution) or the t-distribution.

## **Formula for Confidence Interval**

For a population mean with a known standard deviation, the confidence interval is given by:

$$
\text{CI} = \bar{x} \pm z \frac{\sigma}{\sqrt{n}}
$$

where:

- xˉis the sample mean.
- 𝑧 is the critical value from the standard normal distribution for the desired confidence level.
- 𝜎 is the population standard deviation.
- 𝑛 is the sample size.

When the population standard deviation is unknown, the sample standard deviation 𝑠 is used, and the t-distribution is applied:

$$
\text{CI} = \bar{x} \pm t \frac{\sigma}{\sqrt{n}}
$$

where 𝑡 is the critical value from the t-distribution with n−1 degrees of freedom.

## When should you use z-test and t-test?

- **Z-test**
    - The z test is appropriate when the population standard deviation is known. 
    - Sample size n is large (n ≥ 30)
    - Example: You are estimating the population mean height of a certain species of plant. The population standard deviation is known to be 3 cm, and you have a sample size of 50 plants.

- **T-test**
    - The t-test is used when the population standard deviation is unknown and the sample standard deviation is used as an estimate. 
    - Sample size is small (n < 30)
    - The data should be approximately normally distributed, especially for small sample sizes. If the sample size is large, the t-distribution approaches the normal distribution.
    - You are estimating the average exam score of students in a class. The population standard deviation is unknown, and you have a sample size of 20 students.


## **Example: Confidence Interval for a Population Mean**

**Known Population Standard Deviation**

Suppose we want to estimate the average height of a population, and we have a sample of 30 individuals with a mean height of 170 cm and a known population standard deviation of 5 cm. We want to construct a 95% confidence interval for the population mean.

1. **Point Estimate**: xˉ = 170cm 
2. **Confidence**: 95%
3. **Critical Value**: For 95% confidence, z = 1.96
4. **Standard Error**: σ/squareroot(n) = 5/squareroot(30) = 0.91

- CI = 170 ± 1.96×0.91
- CI = 170 ± 1.78
- CI = [168.22, 171.78]
​
**Unknown Population Standard Deviation**

If the population standard deviation is unknown and the sample standard deviation is s=5 cm, we use the t-distribution. For a sample size of 30 and 29 degrees of freedom:

1. **Point Estimate**: xˉ = 170cm 
2. **Confidence**: 95%
3. **Critical Value**: For 95% confidence and 29 degrees of freedom, t = 1.96
4. **Standard Error**: σ/squareroot(n) = 5/squareroot(30) = 0.91

- CI = 170 ± 1.96×0.91
- CI = 170 ± 1.86
- CI = [168.14, 171.86]

 **Interpreting Confidence Intervals**

- **Correct Interpretation**: "We are 95% confident that the true population mean height lies between 168.22 cm and 171.78 cm."
- **Incorrect Interpretation**: "There is a 95% probability that the true population mean is between 168.22 cm and 171.78 cm." (The true population mean is a fixed value; the interval either contains it or it doesn't.)

## **Example: Confidence Interval for a Proportion**

Suppose we want to estimate the proportion of voters who support a particular candidate. We sample 500 voters, and 280 of them support the candidate. We want to construct a 95% confidence interval for the population proportion.





Example 1: Confidence Interval for a Population Mean (Known Population Standard Deviation)

In [1]:
import numpy as np
import scipy.stats as stats

# Sample data
sample_data = [50, 52, 51, 49, 48, 47, 52, 50, 53, 51]
sample_mean = np.mean(sample_data)
n = len(sample_data)
sigma = 5  # Known population standard deviation
confidence_level = 0.95

# Z-critical value
z_critical = stats.norm.ppf((1 + confidence_level) / 2)

# Standard error
standard_error = sigma / np.sqrt(n)

# Margin of error
margin_of_error = z_critical * standard_error

# Confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)
print(f"Sample Mean: {sample_mean}")
print(f"Confidence Interval: {confidence_interval}")


Sample Mean: 50.3
Confidence Interval: (47.20102483847719, 53.398975161522806)


Example 2: Confidence Interval for a Population Mean (Unknown Population Standard Deviation)


In [2]:
# Sample data
sample_data = [50, 52, 51, 49, 48, 47, 52, 50, 53, 51]
sample_mean = np.mean(sample_data)
n = len(sample_data)
sample_std = np.std(sample_data, ddof=1)  # Sample standard deviation
confidence_level = 0.95

# T-critical value
t_critical = stats.t.ppf((1 + confidence_level) / 2, df=n-1)

# Standard error
standard_error = sample_std / np.sqrt(n)

# Margin of error
margin_of_error = t_critical * standard_error

# Confidence interval
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)
print(f"Sample Mean: {sample_mean}")
print(f"Confidence Interval: {confidence_interval}")


Sample Mean: 50.3
Confidence Interval: (48.9490040857493, 51.65099591425069)


Example 3: Confidence Interval for a Population Proportion



In [3]:
# Sample data
n = 200  # Sample size
successes = 60  # Number of successes
sample_proportion = successes / n
confidence_level = 0.95

# Z-critical value
z_critical = stats.norm.ppf((1 + confidence_level) / 2)

# Standard error
standard_error = np.sqrt((sample_proportion * (1 - sample_proportion)) / n)

# Margin of error
margin_of_error = z_critical * standard_error

# Confidence interval
confidence_interval = (sample_proportion - margin_of_error, sample_proportion + margin_of_error)
print(f"Sample Proportion: {sample_proportion}")
print(f"Confidence Interval: {confidence_interval}")


Sample Proportion: 0.3
Confidence Interval: (0.2364899081898882, 0.3635100918101118)
