<a href="https://colab.research.google.com/github/boyerb/Investments/blob/master/Ex9-MultipleRegression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

***Investment Analysis***, Bates, Boyer, and Fletcher

#Example Chapter 9: Running Regressions Using Python
In this example we show how to load in data, run regressions using Python, and estimate the 95% confidence intervals for each parameter.

### Imports and Setup
We first import the pandas and statsmodels.api packages. Pandas enables us to load in data, and statsmodels.api gives us the regression tools we need. We then load in the data from the appropriate tab of the *Examples.xls* workbook.

In [3]:
#import packages
import pandas as pd
import statsmodels.api as sm
# Load in the data by first specifying the URL where the data can be found
url='https://github.com/boyerb/Investments/raw/master/Examples_3.45.xlsx'
 # specify which columns to read
columns_to_read = ['Season', 'PPG', 'Pace', 'Win Percentage']
# read the data into a DataFrame
# The headers are in row 4.  This is row 3 in Python which starts indexing at 0.
df = pd.read_excel(url, sheet_name='9-Lakers2', header=3, usecols=columns_to_read, engine='openpyxl')
print(df.head(10)) # prints the first 10 lines

    Season   PPG   Pace  Win Percentage
0  1974-75  30.0  106.4            36.6
1  1975-76  27.7  108.0            48.8
2  1976-77  26.2  104.7            64.6
3  1977-78  25.8  106.1            54.9
4  1978-79  23.8  105.9            57.3
5  1979-80  24.8  104.1            73.2
6  1980-81  26.2  102.7            65.9
7  1981-82  23.9  103.1            69.5
8  1982-83  21.8  103.8            70.7
9  1983-84  21.5  103.7            65.9


### Run Regression
Here we define our explanatory and dependent variables, run the regression, and then pull the output we want: (1) parameter estimates, (2) 95% confindence interval for each parameter, (3) The R-squared of the regression.

In [4]:
X=df[['PPG', 'Pace']]
Y=df['Win Percentage']
X=sm.add_constant(X) # we specify that we want to add a constant to the regression equation
model=sm.OLS(Y,X).fit()
params = model.params
conf_int = model.conf_int(alpha=0.05)  # 95% CIs
# create a table of output
results_df = pd.DataFrame({
    'Parameter': params.index, # 1st column: parameter name
    'Estimate': params.values, # 2nd column: parameer value
    'CI Lower': conf_int[0].values, # 3rd column: lower bound on 95\% CI
    'CI Upper': conf_int[1].values  # 4th column: upper bound on 95\% CI
})

print(results_df)

  Parameter   Estimate   CI Lower    CI Upper
0     const  40.691994 -41.355820  122.739809
1       PPG   1.012819   0.170883    1.854756
2      Pace  -0.058186  -0.854666    0.738295
