# Confidence Intervals

In [1]:
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:
from scipy.stats import t

A **confidence interval (CI)** is a range of values used to estimate an unknown population parameter, such as a **population mean** or **proportion**, with a specified level of certainty, called **confidence level (CL)**.

A 95% CL **does not** imply a 95% probability that the true parameter lies within a particular calculated interval. 
The confidence level instead reflects the long-run reliability of the method used to generate the interval.
For example, a 95% **confidence interval** means that if you were to take 100 different samples and calculate a **confidence interval** for each, you would expect 95 of those intervals to contain the true population parameter.

It is typically expressed as:
$$\text{Confidence Interval} = \text{Point Estimate}\pm\text{Margin of Error}$$
where the **point estimate** is the single value calculated from a sample (e.g., sample mean).

## Method of Derivation

There are many ways of calculating **confidence intervals**, and the best method depends on the situation. 
Two widely applicable methods are 
- **bootstrapping** and 
- **the central limit theorem**.

The latter method works only if the sample is large, since it entails calculating the sample mean $\bar{X}$ and sample standard deviation $s$ and using the asymptotically standard normal quantity
$$\frac{\bar{X}-\mu}{s/\sqrt{n}}$$
where $\mu$ and $n$ are the population mean and the sample size, respectively.

### Example 1

Suppose $X_{1},\ldots ,X_{n}$ is an independent sample from a normally distributed population with unknown parameters mean $\mu$ and variance $\sigma ^{2}$. Define the sample mean $\bar{X}$ and unbiased sample variance $s^{2}$ as
\begin{equation*}
\begin{split}
\bar{X} & = \frac{1}{n}(X_1 + \ldots + X_n), \\
s^2     & = \frac{1}{n-1} \sum_{k=1}^n(X_k-\bar{X})^2 .
\end{split}
\end{equation*}

Then the value
$$T = \frac{\bar{X}-\mu}{s/\sqrt{n}}$$
has a **Student's t distribution** with $n-1$ **degrees of freedom**.

In [3]:
# Define degrees of freedom
df = 10 # For a sample size of n: (n-1)
# Calculate a 95% confidence interval
confidence_interval = t.interval(0.95, df)
print(f"95% Confidence Interval: {confidence_interval}")

95% Confidence Interval: (np.float64(-2.2281388519649385), np.float64(2.2281388519649385))


In [4]:
# Find the critical t-value for a 95% confidence interval (two-tailed)
# This means alpha/2 = 0.025 in each tail, so we look for 0.975
critical_t_value = t.ppf(0.975, df)
# Calculates the Percent Point Function (PPF), also known as the quantile function or inverse CDF. 
# This gives the value x for a given cumulative probability q.
print(f"Critical t-value for 95% CI: {critical_t_value}")

Critical t-value for 95% CI: 2.2281388519649385
