## Confidence Interval

a confidence interval (CI) is a type of estimate computed from the statistics of the observed data.

In contrast to point estimation, which gives a precise value as the estimated parameter of the population, an interval estimation specifies a range within which the parameter is estimated to lie.

### Interpretation
- A 95% confidence level does not mean that for a given realized interval there is a 95% probability that the population parameter lies within the interval
- once an interval is calculated, this interval either covers the parameter value or it does not; it is no longer a matter of probability.
- The 95% probability relates to the reliability of the estimation procedure, not to a specific calculated interval.

Out of 100 estimation procedure(same procedure with which the confidence interval was constructed), there are 95 times that the population parameter lies within the calculated confidence interval.

### Basic steps

1. Identify the sample mean, x_bar.
2. <img src="resources/confidence_interval.png" alt="Drawing" style="width: 400px; margin-left: 4em"/>

More often, z-values are used and the sample standard deviation is used as estimated population standard deviation. These are the critical values of the normal distribution with right tail probability. However, t-values are used when the sample size is below 30 and the population standard deviation is unknown.


|  C  |  z* |
|-----|-----|
|99.9%|3.291|
|99.5%|2.807|
| 99% |2.576|
| 98% |2.326|
| 95% |1.96 |
| 90% |1.645|
| 85% |1.440|
| 80% |1.282|

In [1]:
import numpy as np
from scipy import stats

### Example problem

#### 1. What's the 95% confidence interval of apple's weight in a farm, given the weights of 30 samples. 

In [2]:
alpha = 0.05
n = 30
mu = 85
sigma = 5
samples = np.random.normal(mu, sigma, n)

mean = np.mean(samples)
s = np.std(samples, ddof=1)
# use z-values and estimate the population std by sample std
z_value = stats.norm.ppf(1 - alpha/2)

upper = mean + z_value * s / np.sqrt(n)
lower = mean - z_value * s / np.sqrt(n)

print(f"95% confidence interval is ({lower}, {upper})")

from confidence_interval import get_confidence_interval
print(get_confidence_interval(samples, 0.95))

95% confidence interval is (83.33472798864574, 87.00495965490191)
(83.33472798864574, 87.00495965490191)


#### 2. What's the 90% confidence interval of teen's height in a city, given the weights of 20 samples. 

In [4]:
alpha = 0.1
n = 20
mu = 170
sigma = 12
samples = np.random.normal(mu, sigma, n)

mean = np.mean(samples)
s = np.std(samples, ddof=1)
# use t-values, because number of samples are small, estimate population std by sample std
t_value = stats.t.ppf(1 - alpha/2, n-1)

upper = mean + t_value * s / np.sqrt(n)
lower = mean - t_value * s / np.sqrt(n)

print(f"90% confidence interval is ({lower}, {upper})")
print(get_confidence_interval(samples, 0.90, "t", "right"))

90% confidence interval is (166.05875693960917, 174.59424510363988)
(167.0494802116305, inf)


#### for one tail z-values or t-values

simply use `1 - alpha` or `alpha` instead of `1 - alpha/2` accordingly