<a href="https://colab.research.google.com/github/boyerb/Investments/blob/master/Ex09-MultipleRegression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Investments: Theory, Fundamental Analysis, and Data Driven Analytics**, Bates, Boyer, and Fletcher

#Example Chapter 9: Running Regressions Using Python
In this example we show how to load in data, run regressions using Python, and estimate the 95% confidence intervals for each parameter.

### Imports and Setup
We first import the pandas and statsmodels.api packages. Pandas enables us to load in data, and statsmodels.api gives us the regression tools we need. We then specify the option to truncate output when printing.

In [None]:
#import packages
import pandas as pd
import statsmodels.api as sm
pd.set_option("display.max_rows", 20) # Force truncation to at most 20 rows (10 from top, 10 from bottom)


### Load in Data
We now load a the data from the Excel Examples workbook under the `EX9.5.1` tab into a pandas DataFrame. We explicitly choose a set of columns to import and specify the header row to ensure the data aligns correctly. Finally, we display the DataFrame contents.

In [None]:
# Load in the data by first specifying the URL where the data can be found
url='https://github.com/boyerb/Investments/raw/master/Excel_Examples_25.01.xlsx'
 # specify which columns to read
columns_to_read = ['Season', 'Leading Scorer', 'PPG', 'Pace', 'Win Percentage']
# read the data into a DataFrame
df = pd.read_excel(url, sheet_name='EX9.5.1', header=3, usecols=columns_to_read, engine='openpyxl')
print(df) # prints the first 10 lines


### Run Regression
Here we define our explanatory and dependent variables, run the regression, and then pull the output we want:
1. parameter estimates,
2. 95% confindence interval for each parameter

In [None]:
X=df[['PPG', 'Pace']]
Y=df['Win Percentage']
X=sm.add_constant(X)  # we specify that we want to add a constant to the regression equation
model=sm.OLS(Y,X).fit() # run the regression
params = model.params # parameter values
conf_int = model.conf_int(alpha=0.05)  # 95% CIs

# create a table of output
results_df = pd.DataFrame({
    'Parameter': params.index, # 1st column: parameter name
    'Estimate': params.values, # 2nd column: parameer value
    'CI Lower': conf_int[0].values, # 3rd column: lower bound on 95\% CI
    'CI Upper': conf_int[1].values  # 4th column: upper bound on 95\% CI
})
print(results_df)