<h1>Import Dependencies</h1>

In [1]:
# Math and dataframe modules
import numpy as np
import pandas as pd

# Stats module
import statsmodels.api as sm

<div style="font-size:18px; border:1px solid black; padding:10px">
<font color="blue">Statsmodels.api</font>
<ul>
<li>This python module is used to provide classes and various functions for statistical models.</li>
<li><a href="https://www.statsmodels.org/stable/index.html">Statsmodels Documentation</a></li>
<li>To obtain the Ordinary Least Squares class from statsmodels.api, the follwing is used to:<br><strong><code>statsmodels.regression.linear_model.OLS(endog, exog=None, missing='none', hasconst=None, **kwargs)</code></strong></li>
<li>Additional information on OLS class: <a href="https://www.statsmodels.org/devel/generated/statsmodels.regression.linear_model.OLS.html">Documentation</a></li>
</ul>
</div>

<h1> Import Data</h1>

<a href="https://scikit-learn.org/stable/datasets/index.html">Documentation</a>

<ul>
    <li>Instances: 506</li>
    <li>Attributes: 13</li>
    <li>Creators: Harrison, D. and Rubinfeld, D.L.</li>   
</ul>

<strong>References</strong>

<li>Belsley, Kuh & Welsch, ‘Regression diagnostics: Identifying Influential Data and Sources of Collinearity’, Wiley, 1980. 244-261.</li>

<li>Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.</li>

In [2]:
from sklearn.datasets import load_boston
boston = load_boston()
X = pd.DataFrame(boston.data, columns=boston.feature_names)
y = pd.DataFrame(boston.target)

<h1>Create dataframe from X, y</h1>

In [3]:
# create dataframe from X, y for easier plot handling
dataframe = pd.concat([X, y], axis=1)
dataframe.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PTRATIO,B,LSTAT,0
0,0.00632,18.0,2.31,0.0,0.538,6.575,65.2,4.09,1.0,296.0,15.3,396.9,4.98,24.0
1,0.02731,0.0,7.07,0.0,0.469,6.421,78.9,4.9671,2.0,242.0,17.8,396.9,9.14,21.6
2,0.02729,0.0,7.07,0.0,0.469,7.185,61.1,4.9671,2.0,242.0,17.8,392.83,4.03,34.7
3,0.03237,0.0,2.18,0.0,0.458,6.998,45.8,6.0622,3.0,222.0,18.7,394.63,2.94,33.4
4,0.06905,0.0,2.18,0.0,0.458,7.147,54.2,6.0622,3.0,222.0,18.7,396.9,5.33,36.2


<h1>Generate Ordinary Least Square Model</h1>

In [4]:
# generate OLS model
model = sm.OLS(y, sm.add_constant(X))
model_fit = model.fit()

<h1>Model Fitted y and Residuals</h1>

In [5]:
# model values
model_fitted_y = model_fit.fittedvalues
# model residuals
model_residuals = model_fit.resid

In [6]:
# Store data
%store dataframe
%store model
%store model_fit
%store model_fitted_y
%store model_residuals

Stored 'dataframe' (DataFrame)
Stored 'model' (OLS)
Stored 'model_fit' (RegressionResultsWrapper)
Stored 'model_fitted_y' (Series)
Stored 'model_residuals' (Series)
