# Confidence Interval of Model Parameter

A confidence interval is a range of values which is expected, with some quantifiable degree of confidence, to contain the value of an unknown value of interest.(Petty, 2012) For example, suppose a random sample of 100 boxes of cereal is selected from among all of the boxes filled by an automatic filling machine during a work shift. The mean weight of the 100 boxes in the sample is found to be 12.05 ounces and the standard deviation to be 0.1 ounces. Using the procedures to be described in the next section, we can calculate an interval [12.0304, 12.0696] for the mean weight of all boxes filled at the station and associate a confidence level of 0.95 (95%) with that interval.1 We call the calculated interval [12.0304, 12.0696], together with its associated confidence level, a confidence interval.

Here, we use a simple regression model to get the confidence interval and hopefully to complement what professor taught by the boostrapping method.

In [3]:
from pandas import DataFrame
import statsmodels.api as sm

Linear regression is used as a predictive model that assumes a linear relationship between the dependent variable (which is the variable we are trying to predict/estimate) and the independent variable/s (input variable/s used in the prediction).

For example, you may use linear regression to predict the price of the stock market (your dependent variable) based on the following Macroeconomics input variables:

Interest Rate
Unemployment Rate

Under Simple Linear Regression, only one independent/input variable is used to predict the dependent variable. It has the following structure:

Y = C + M*X

Y = Dependent variable (output/outcome/prediction/estimation)
C = Constant (Y-Intercept)
M = Slope of the regression line (the effect that X has on Y)
X = Independent variable (input variable used in the prediction of Y)

In [4]:
Stock_Market = {'Year': [2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016],
                'Month': [12, 11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
                'Interest_Rate': [2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75],
                'Unemployment_Rate': [5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1],
                'Stock_Index_Price': [1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719]        
                }

df = DataFrame(Stock_Market,columns=['Year','Month','Interest_Rate','Unemployment_Rate','Stock_Index_Price']) 

X = df[['Interest_Rate','Unemployment_Rate']] # here we have 2 variables for multiple regression. If you just want to use one variable for simple linear regression, then use X = df['Interest_Rate'] for example.Alternatively, you may add additional variables within the brackets
Y = df['Stock_Index_Price']

X = sm.add_constant(X) # adding a constant

model = sm.OLS(Y, X).fit()
predictions = model.predict(X) 

print_model = model.summary()
print(print_model)

                            OLS Regression Results                            
Dep. Variable:      Stock_Index_Price   R-squared:                       0.898
Model:                            OLS   Adj. R-squared:                  0.888
Method:                 Least Squares   F-statistic:                     92.07
Date:                Thu, 04 Apr 2019   Prob (F-statistic):           4.04e-11
Time:                        15:53:47   Log-Likelihood:                -134.61
No. Observations:                  24   AIC:                             275.2
Df Residuals:                      21   BIC:                             278.8
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
const              1798.4040    899.24

We can see from the table above, there are three model parameters, they are constant term, Interest_Rate, Unemployment_Rate with the coefficient 1798.4040, 345.5401, -250.1466 and the 95% confidence interval is [-71.685, 3668.493]

const coefficient is your Y-intercept. It means that if both the Interest_Rate and Unemployment_Rate coefficients are zero, then the expected output (i.e., the Y) would be equal to the const coefficient.
Interest_Rate coefficient represents the change in the output Y due to a change of one unit in the interest rate (everything else held constant)
Unemployment_Rate coefficient represents the change in the output Y due to a change of one unit in the unemployment rate (everything else held constant)

We can also repeat the above model by the boostrapping method, to get the 