<a href="https://colab.research.google.com/github/boyerb/Investments/blob/master/MultipleRegression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

***Foundations of Investment Analysis***, Bates, Boyer, and Fletcher

#Example 7: Running Regressions Using Python
In this example we show how to load in data, run regressions using Python, and estimate the 95% confidence intervals for each parameter.

### Imports and Setup
We first import the pandas and statsmodels.api packages. Pandas enables us to load in data, and statsmodels.api gives us the regression tools we need. We then load in the data from the appropriate tab of the *Examples.xls* workbook.

In [10]:
#import packages
import pandas as pd
import statsmodels.api as sm
# Load in the data by first specifying the URL where the data can be found
url='https://github.com/boyerb/Investments/raw/master/Examples_3.1.xlsx'
 # specify which columns to read
columns_to_read = ['Date', 'Google', 'Tech Portfolio', 'Oil Portfolio']
# read the data into a DataFrame
df = pd.read_excel(url, sheet_name='7-Google Tech and Oil', header=1, usecols=columns_to_read, engine='openpyxl')
print(df.head(10)) # prints the first 10 lines

        Date    Google  Tech Portfolio  Oil Portfolio
0 2017-01-31  0.035005        0.043127      -0.034677
1 2017-02-28  0.030164        0.049392      -0.026522
2 2017-03-31  0.003397        0.022535      -0.010781
3 2017-04-28  0.090493        0.023372      -0.034950
4 2017-05-31  0.067678        0.044092      -0.040502
5 2017-06-30 -0.058161       -0.025369      -0.005047
6 2017-07-31  0.017017        0.041324       0.025763
7 2017-08-31  0.010301        0.031092      -0.055739
8 2017-09-29  0.019346        0.007918       0.104575
9 2017-10-31  0.060921        0.073952      -0.010686


### Run Regression
Here we define our explanatory and dependent variables, run the regression, and then pull the output we want: (1) parameter estaimtes, (2) 95% confindence interval for each parameter, (3) The R-squared of the regression.

In [16]:
X=df[['Tech Portfolio', 'Oil Portfolio']]
Y=df['Google']
X=sm.add_constant(X) # we specify that we want to add a constant to the regression equation
model=sm.OLS(Y,X).fit()
params = model.params
conf_int = model.conf_int(alpha=0.05)  # 95% CIs
r_squared = model.rsquared
results_df = pd.DataFrame({
    'Parameter': params.index,
    'Estimate': params.values,
    'CI Lower': conf_int[0].values,
    'CI Upper': conf_int[1].values
})

print(results_df)
print()
print("R-squared", r_squared)

        Parameter  Estimate  CI Lower  CI Upper
0           const  0.006127 -0.007722  0.019975
1  Tech Portfolio  0.707272  0.425962  0.988581
2   Oil Portfolio  0.114462 -0.032401  0.261325

R-squared 0.49785526371092204
