# Confidence Interval

## Use Case: Estimate population mean using sample mean
  * intuitively a sample is a good representation of population if the sample mean and population mean are close
  * can think of population mean as a value existing in a range with sample mean centered
  * population mean can be estimated using upper and lower bound
  
### Step 1:  Standardizing Sample Mean

Why?: because different samples have different mean and standard deviation.
  * After standardization, it'll become normal following the Z-distribution. Hence the resulting value Z

$$
Z = \frac{\bar{x} - \mu}{\frac{\sigma}{\sqrt{n}}} \\
\bar{x} = sample~mean \\
\mu = population~mean \\
\sigma = standard~deviation \\
n = sample~size
$$

### Quantiles of Z-distribution
![Zquant.png](./img/Zquant.png)

$$
Z_{1-\frac{\alpha}{2}} = -Z_{\frac{\alpha}{2}}
$$

### Calculating the Confidence Interval at 1 - $\alpha$

$$
P(Z_{\frac{\alpha}{2}} \leq \frac{\bar{x} - \mu}{\frac{\sigma}{\sqrt{n}}} \leq Z_{1 - \frac{\alpha}{2}}) = 1 - \alpha \\
P(\bar{X} - \frac{\sigma}{\sqrt{n}}Z_{\frac{\alpha}{2}} \leq \mu \leq \bar{X} + \frac{\sigma}{\sqrt{n}}Z_{1 - \frac{\alpha}{2}}) = 1 - \alpha
$$

  * normally $\sigma$ (population standard deviation) is unknown, so we can replace it with s (sample standard deviation) given high enough n

In [1]:
import pandas as pd
import numpy as np
from scipy.stats import norm

In [5]:
# Now let's estimate the average stock return with 90% confidence interval
stock = pd.read_csv('./data/TSLA.csv', sep=r'\s*,\s*', encoding='ascii', engine='python')
stock[stock.columns[1:]] = stock[stock.columns[1:]].replace('[\$,]', '', regex=True).astype(float)
stock = stock.iloc[::-1] #reverse dataframe so dates are in ascending order
stock['Date'] = stock['Date'].convert_dtypes(convert_string=True)
stock.index = stock['Date'] #make date the index
stock.head()

Unnamed: 0_level_0,Date,Close/Last,Volume,Open,High,Low
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
06/29/2010,06/29/2010,23.89,18751150.0,19.0,25.0,17.54
06/30/2010,06/30/2010,23.83,17165210.0,25.79,30.4192,23.3
07/01/2010,07/01/2010,21.96,8216789.0,25.0,25.92,20.27
07/02/2010,07/02/2010,19.2,5135795.0,23.0,23.1,18.71
07/06/2010,07/06/2010,16.11,6858092.0,20.0,20.0,15.83


In [6]:
stock['logReturn'] = np.log(stock['Close/Last'].shift(-1)) - np.log(stock['Close/Last'])
sample_size = 