# Multiple Linear Regression in Statsmodels - Lab

## Introduction
In this lab, you'll practice fitting a multiple linear regression model on our Boston Housing Data set!

## Objectives
You will be able to:
* Run linear regression on Boston Housing dataset with all the predictors
* Interpret the parameters of the multiple linear regression model

## The Boston Housing Data

We pre-processed the Boston Housing Data again. This time, however, we did things slightly different:
- We dropped "ZN" and "NOX" completely
- We categorized "RAD" in 3 bins and "TAX" in 4 bins
- We used min-max-scaling on "B", "CRIM" and "DIS" (and logtransformed all of them first, except "B")
- We used standardization on "AGE", "INDUS", "LSTAT" and "PTRATIO" (and logtransformed all of them first, except for "AGE") 

In [13]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_boston
boston = load_boston()

boston_features = pd.DataFrame(boston.data, columns = boston.feature_names)
boston_features = boston_features.drop(["NOX","ZN"],axis=1)

# first, create bins for based on the values observed. 3 values will result in 2 bins
bins = [0,6,  24]
bins_rad = pd.cut(boston_features['RAD'], bins)
bins_rad = bins_rad.cat.as_unordered()

# first, create bins for based on the values observed. 4 values will result in 3 bins
bins = [0, 270, 360, 712]
bins_tax = pd.cut(boston_features['TAX'], bins)
bins_tax = bins_tax.cat.as_unordered()

tax_dummy = pd.get_dummies(bins_tax, prefix="TAX")
rad_dummy = pd.get_dummies(bins_rad, prefix="RAD")
boston_features = boston_features.drop(["RAD","TAX"], axis=1)
boston_features = pd.concat([boston_features, rad_dummy, tax_dummy], axis=1)

In [14]:
age = boston_features["AGE"]
b = boston_features["B"]
logcrim = np.log(boston_features["CRIM"])
logdis = np.log(boston_features["DIS"])
logindus = np.log(boston_features["INDUS"])
loglstat = np.log(boston_features["LSTAT"])
logptratio = np.log(boston_features["PTRATIO"])

# minmax scaling
boston_features["B"] = (b-min(b))/(max(b)-min(b))
boston_features["CRIM"] = (logcrim-min(logcrim))/(max(logcrim)-min(logcrim))
boston_features["DIS"] = (logdis-min(logdis))/(max(logdis)-min(logdis))

#standardization
boston_features["AGE"] = (age-np.mean(age))/np.sqrt(np.var(age))
boston_features["INDUS"] = (logindus-np.mean(logindus))/np.sqrt(np.var(logindus))
boston_features["LSTAT"] = (loglstat-np.mean(loglstat))/np.sqrt(np.var(loglstat))
boston_features["PTRATIO"] = (logptratio-np.mean(logptratio))/(np.sqrt(np.var(logptratio)))

In [3]:
boston_features.head()

Unnamed: 0,CRIM,INDUS,CHAS,RM,AGE,DIS,PTRATIO,B,LSTAT,"RAD_(0, 6]","RAD_(6, 24]","TAX_(0, 270]","TAX_(270, 360]","TAX_(360, 712]"
0,0.0,-1.704344,0.0,6.575,-0.120013,0.542096,-1.443977,1.0,-1.27526,1,0,0,1,0
1,0.153211,-0.263239,0.0,6.421,0.367166,0.623954,-0.230278,1.0,-0.263711,1,0,1,0,0
2,0.153134,-0.263239,0.0,7.185,-0.265812,0.623954,-0.230278,0.989737,-1.627858,1,0,1,0,0
3,0.171005,-1.778965,0.0,6.998,-0.809889,0.707895,0.165279,0.994276,-2.153192,1,0,1,0,0
4,0.250315,-1.778965,0.0,7.147,-0.51118,0.707895,0.165279,1.0,-1.162114,1,0,1,0,0


## Run an linear model in Statsmodels

In [15]:
import statsmodels.api as sm
from statsmodels.formula.api import ols
Y= boston_features['CRIM']
x=boston_features[['INDUS', 'CHAS', 'RM', 'AGE', 'DIS', 'PTRATIO', 'B', 'LSTAT']]
formula = "Y ~ x"
model = ols(formula= formula, data=boston_features).fit()
model.summary()

0,1,2,3
Dep. Variable:,Y,R-squared:,0.701
Model:,OLS,Adj. R-squared:,0.696
Method:,Least Squares,F-statistic:,145.8
Date:,"Fri, 10 May 2019",Prob (F-statistic):,3.63e-125
Time:,00:49:57,Log-Likelihood:,339.99
No. Observations:,506,AIC:,-662.0
Df Residuals:,497,BIC:,-623.9
Df Model:,8,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.6423,0.080,7.991,0.000,0.484,0.800
x[0],0.0658,0.009,6.961,0.000,0.047,0.084
x[1],-0.0064,0.023,-0.285,0.776,-0.051,0.038
x[2],0.0216,0.011,1.910,0.057,-0.001,0.044
x[3],0.0171,0.010,1.730,0.084,-0.002,0.036
x[4],-0.3373,0.046,-7.367,0.000,-0.427,-0.247
x[5],0.0208,0.006,3.244,0.001,0.008,0.033
x[6],-0.1979,0.027,-7.465,0.000,-0.250,-0.146
x[7],0.0287,0.010,2.868,0.004,0.009,0.048

0,1,2,3
Omnibus:,22.288,Durbin-Watson:,0.38
Prob(Omnibus):,0.0,Jarque-Bera (JB):,24.011
Skew:,-0.52,Prob(JB):,6.11e-06
Kurtosis:,3.24,Cond. No.,96.8


In [22]:
from sklearn import linear_model as lm

Y= boston_features['CRIM']
model=lm.LinearRegression()
results=model.fit(Y,x)
results.summary()

ValueError: Expected 2D array, got 1D array instead:
array=[0.         0.15321102 0.15313433 0.1710054  0.25031515 0.16252092
 0.27604647 0.32765636 0.36737091 0.34465818 0.37392596 0.3059401
 0.28236161 0.48172367 0.48307797 0.48132896 0.53563078 0.50468389
 0.50712614 0.4965823  0.55364179 0.51336959 0.55201348 0.52891378
 0.50005215 0.51194703 0.4885058  0.52539628 0.50317663 0.53038822
 0.5430015  0.56191408 0.56445395 0.54491958 0.58017015 0.2426422
 0.28636952 0.26590748 0.34769803 0.15443053 0.17487839 0.31446808
 0.32542386 0.33786744 0.31049161 0.34550435 0.35536975 0.37594524
 0.38661501 0.37151506 0.27656688 0.20162972 0.22380017 0.21612322
 0.08022593 0.07638454 0.12343925 0.08562639 0.33459126 0.29246295
 0.33105511 0.3456813  0.29931862 0.31369305 0.11800253 0.18166581
 0.20263863 0.23186051 0.32091893 0.31505786 0.27601089 0.33747254
 0.27994507 0.3592057  0.2643546  0.28384685 0.29067393 0.27458982
 0.22924208 0.27066993 0.19607822 0.20460428 0.18383389 0.18069744
 0.21774985 0.23087941 0.22038577 0.25397982 0.22950134 0.2226612
 0.20968733 0.1913669  0.19834423 0.15859028 0.20058662 0.30993552
 0.30375185 0.30889241 0.2681433  0.24963068 0.33059137 0.3030946
 0.37571211 0.36755411 0.32400866 0.31863899 0.34536991 0.31748811
 0.31494344 0.3905642  0.29707321 0.28996006 0.31100232 0.37262852
 0.32602141 0.34545548 0.31781481 0.33221248 0.31701617 0.32780834
 0.25022414 0.25418457 0.281476   0.33179563 0.28749156 0.34402832
 0.43084537 0.38876993 0.41261108 0.51689833 0.4172146  0.54860079
 0.47490533 0.41401384 0.52760719 0.46901753 0.41170972 0.42092531
 0.38492311 0.4664988  0.40086872 0.581192   0.65578428 0.67777578
 0.63715823 0.62087602 0.61051118 0.6204033  0.61872682 0.63542017
 0.58297396 0.5723213  0.54260917 0.61022564 0.56638643 0.66232035
 0.62379768 0.55125563 0.56099201 0.56721024 0.55543852 0.56998957
 0.59361072 0.57389751 0.61466876 0.64245482 0.60322666 0.59168118
 0.61734391 0.62391955 0.54986382 0.61795647 0.32366314 0.28010487
 0.27141618 0.2465961  0.25207411 0.22506204 0.24624992 0.23169763
 0.24539534 0.2500571  0.2792459  0.28916809 0.26967918 0.23642509
 0.22842306 0.26407581 0.31310384 0.27045752 0.27884262 0.25040607
 0.27407155 0.12995645 0.08613687 0.08183005 0.19344934 0.20928427
 0.18690688 0.16815329 0.10828216 0.17752491 0.12947668 0.1794817
 0.1210693  0.32159641 0.37613683 0.38583689 0.3211735  0.44316145
 0.3473446  0.4276708  0.37027883 0.32469631 0.40038177 0.3606054
 0.20687864 0.25193985 0.29971659 0.30303048 0.4226229  0.43620815
 0.48068793 0.47918981 0.40931059 0.46306124 0.42942775 0.43740043
 0.40345983 0.44460979 0.46504298 0.44951208 0.47225333 0.41453625
 0.44605239 0.41421361 0.46179202 0.46001749 0.26886963 0.28094555
 0.30214712 0.29530274 0.29207707 0.31457481 0.36478198 0.35700753
 0.41714378 0.35983602 0.34112063 0.35667872 0.32453228 0.36877386
 0.26857715 0.42574774 0.21266187 0.18060896 0.09310208 0.47865026
 0.4871888  0.48610082 0.46564751 0.46448003 0.4617035  0.51002646
 0.4675604  0.50162535 0.50488394 0.47280687 0.46572308 0.27880798
 0.40379981 0.33965854 0.30335068 0.37251535 0.22920499 0.2848545
 0.29388247 0.23780096 0.26543615 0.36694384 0.18149041 0.18514177
 0.23783513 0.09055284 0.03770254 0.05763293 0.11875105 0.1897301
 0.2075651  0.20065973 0.17924283 0.26422193 0.1825674  0.26913595
 0.26829663 0.31600113 0.22403428 0.32507556 0.24343854 0.22765407
 0.20354315 0.1802839  0.28110383 0.28908437 0.22678452 0.22609892
 0.25901004 0.21508829 0.45608927 0.4200511  0.63157931 0.50550962
 0.38979099 0.39282294 0.42582149 0.3864871  0.41028211 0.38298591
 0.43473687 0.45230333 0.3431451  0.35153788 0.42057113 0.39832621
 0.4175312  0.35729711 0.40529726 0.38118172 0.24585515 0.24753443
 0.20651067 0.21700224 0.17816111 0.2182453  0.18607006 0.19213616
 0.17697649 0.16446668 0.17321344 0.22644228 0.23821023 0.07558296
 0.14387542 0.14574449 0.16474171 0.16691637 0.23839727 0.11356348
 0.09055284 0.15946055 0.23922644 0.2650681  0.2553325  0.10413862
 0.20075713 0.29576536 0.75995161 0.67124785 0.70275863 0.68188197
 0.68855811 0.67089756 0.66647771 0.68092144 0.66050626 0.68887915
 0.66700943 0.8027684  0.69646418 0.71178113 0.72670533 0.76281777
 0.75125981 0.7821811  0.83557068 0.84167598 0.81561704 0.76931543
 0.8612829  0.83193444 1.         0.81955736 0.76230307 0.74772135
 0.84418577 0.82556352 0.86453255 0.85652332 0.80886955 0.74978683
 0.73327339 0.70457971 0.78651755 0.75593418 0.80150352 0.75680235
 0.71544704 0.74343749 0.91189994 0.77030247 0.86729512 0.80815502
 0.76685947 0.86626858 0.92023243 0.97173208 0.84742555 0.78983873
 0.73971336 0.80963071 0.94201627 0.80678199 0.83732609 0.88138976
 0.93035656 0.83320345 0.77956744 0.87096866 0.98004485 0.78861579
 0.78198584 0.73417827 0.79068584 0.73459231 0.75770363 0.81946434
 0.79240081 0.90999963 0.73919202 0.76401948 0.75406936 0.77182991
 0.72517793 0.71012657 0.80575399 0.78267283 0.80950375 0.81485557
 0.80396843 0.76459786 0.8539628  0.76825173 0.71171446 0.77082887
 0.79704126 0.77798636 0.72261281 0.77039005 0.76390957 0.74142597
 0.72953198 0.7074684  0.70048823 0.75101692 0.76596015 0.6932997
 0.69144311 0.75041217 0.74452629 0.73082441 0.69460789 0.66690064
 0.72854873 0.71453565 0.74569579 0.65069966 0.66919599 0.68576358
 0.81756878 0.79924854 0.68400929 0.67625767 0.66331273 0.69095
 0.74854732 0.72434727 0.69588962 0.81378931 0.77359093 0.80886955
 0.71458707 0.71248406 0.71290466 0.6386034  0.62084214 0.66634813
 0.71218231 0.69511874 0.33212924 0.35255904 0.36548067 0.2949272
 0.30031073 0.34665225 0.39670989 0.35002816 0.40039984 0.3924336
 0.38034886 0.3493475  0.37368828 0.24009924 0.20611829 0.23692593
 0.29867106 0.21095357].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

## Run the same model in Scikit-learn

In [24]:
import numpy as np
from sklearn.linear_model import LinearRegression
Y= boston_features['CRIM']
reg = LinearRegression().fit(Y, x)

ValueError: Expected 2D array, got 1D array instead:
array=[0.         0.15321102 0.15313433 0.1710054  0.25031515 0.16252092
 0.27604647 0.32765636 0.36737091 0.34465818 0.37392596 0.3059401
 0.28236161 0.48172367 0.48307797 0.48132896 0.53563078 0.50468389
 0.50712614 0.4965823  0.55364179 0.51336959 0.55201348 0.52891378
 0.50005215 0.51194703 0.4885058  0.52539628 0.50317663 0.53038822
 0.5430015  0.56191408 0.56445395 0.54491958 0.58017015 0.2426422
 0.28636952 0.26590748 0.34769803 0.15443053 0.17487839 0.31446808
 0.32542386 0.33786744 0.31049161 0.34550435 0.35536975 0.37594524
 0.38661501 0.37151506 0.27656688 0.20162972 0.22380017 0.21612322
 0.08022593 0.07638454 0.12343925 0.08562639 0.33459126 0.29246295
 0.33105511 0.3456813  0.29931862 0.31369305 0.11800253 0.18166581
 0.20263863 0.23186051 0.32091893 0.31505786 0.27601089 0.33747254
 0.27994507 0.3592057  0.2643546  0.28384685 0.29067393 0.27458982
 0.22924208 0.27066993 0.19607822 0.20460428 0.18383389 0.18069744
 0.21774985 0.23087941 0.22038577 0.25397982 0.22950134 0.2226612
 0.20968733 0.1913669  0.19834423 0.15859028 0.20058662 0.30993552
 0.30375185 0.30889241 0.2681433  0.24963068 0.33059137 0.3030946
 0.37571211 0.36755411 0.32400866 0.31863899 0.34536991 0.31748811
 0.31494344 0.3905642  0.29707321 0.28996006 0.31100232 0.37262852
 0.32602141 0.34545548 0.31781481 0.33221248 0.31701617 0.32780834
 0.25022414 0.25418457 0.281476   0.33179563 0.28749156 0.34402832
 0.43084537 0.38876993 0.41261108 0.51689833 0.4172146  0.54860079
 0.47490533 0.41401384 0.52760719 0.46901753 0.41170972 0.42092531
 0.38492311 0.4664988  0.40086872 0.581192   0.65578428 0.67777578
 0.63715823 0.62087602 0.61051118 0.6204033  0.61872682 0.63542017
 0.58297396 0.5723213  0.54260917 0.61022564 0.56638643 0.66232035
 0.62379768 0.55125563 0.56099201 0.56721024 0.55543852 0.56998957
 0.59361072 0.57389751 0.61466876 0.64245482 0.60322666 0.59168118
 0.61734391 0.62391955 0.54986382 0.61795647 0.32366314 0.28010487
 0.27141618 0.2465961  0.25207411 0.22506204 0.24624992 0.23169763
 0.24539534 0.2500571  0.2792459  0.28916809 0.26967918 0.23642509
 0.22842306 0.26407581 0.31310384 0.27045752 0.27884262 0.25040607
 0.27407155 0.12995645 0.08613687 0.08183005 0.19344934 0.20928427
 0.18690688 0.16815329 0.10828216 0.17752491 0.12947668 0.1794817
 0.1210693  0.32159641 0.37613683 0.38583689 0.3211735  0.44316145
 0.3473446  0.4276708  0.37027883 0.32469631 0.40038177 0.3606054
 0.20687864 0.25193985 0.29971659 0.30303048 0.4226229  0.43620815
 0.48068793 0.47918981 0.40931059 0.46306124 0.42942775 0.43740043
 0.40345983 0.44460979 0.46504298 0.44951208 0.47225333 0.41453625
 0.44605239 0.41421361 0.46179202 0.46001749 0.26886963 0.28094555
 0.30214712 0.29530274 0.29207707 0.31457481 0.36478198 0.35700753
 0.41714378 0.35983602 0.34112063 0.35667872 0.32453228 0.36877386
 0.26857715 0.42574774 0.21266187 0.18060896 0.09310208 0.47865026
 0.4871888  0.48610082 0.46564751 0.46448003 0.4617035  0.51002646
 0.4675604  0.50162535 0.50488394 0.47280687 0.46572308 0.27880798
 0.40379981 0.33965854 0.30335068 0.37251535 0.22920499 0.2848545
 0.29388247 0.23780096 0.26543615 0.36694384 0.18149041 0.18514177
 0.23783513 0.09055284 0.03770254 0.05763293 0.11875105 0.1897301
 0.2075651  0.20065973 0.17924283 0.26422193 0.1825674  0.26913595
 0.26829663 0.31600113 0.22403428 0.32507556 0.24343854 0.22765407
 0.20354315 0.1802839  0.28110383 0.28908437 0.22678452 0.22609892
 0.25901004 0.21508829 0.45608927 0.4200511  0.63157931 0.50550962
 0.38979099 0.39282294 0.42582149 0.3864871  0.41028211 0.38298591
 0.43473687 0.45230333 0.3431451  0.35153788 0.42057113 0.39832621
 0.4175312  0.35729711 0.40529726 0.38118172 0.24585515 0.24753443
 0.20651067 0.21700224 0.17816111 0.2182453  0.18607006 0.19213616
 0.17697649 0.16446668 0.17321344 0.22644228 0.23821023 0.07558296
 0.14387542 0.14574449 0.16474171 0.16691637 0.23839727 0.11356348
 0.09055284 0.15946055 0.23922644 0.2650681  0.2553325  0.10413862
 0.20075713 0.29576536 0.75995161 0.67124785 0.70275863 0.68188197
 0.68855811 0.67089756 0.66647771 0.68092144 0.66050626 0.68887915
 0.66700943 0.8027684  0.69646418 0.71178113 0.72670533 0.76281777
 0.75125981 0.7821811  0.83557068 0.84167598 0.81561704 0.76931543
 0.8612829  0.83193444 1.         0.81955736 0.76230307 0.74772135
 0.84418577 0.82556352 0.86453255 0.85652332 0.80886955 0.74978683
 0.73327339 0.70457971 0.78651755 0.75593418 0.80150352 0.75680235
 0.71544704 0.74343749 0.91189994 0.77030247 0.86729512 0.80815502
 0.76685947 0.86626858 0.92023243 0.97173208 0.84742555 0.78983873
 0.73971336 0.80963071 0.94201627 0.80678199 0.83732609 0.88138976
 0.93035656 0.83320345 0.77956744 0.87096866 0.98004485 0.78861579
 0.78198584 0.73417827 0.79068584 0.73459231 0.75770363 0.81946434
 0.79240081 0.90999963 0.73919202 0.76401948 0.75406936 0.77182991
 0.72517793 0.71012657 0.80575399 0.78267283 0.80950375 0.81485557
 0.80396843 0.76459786 0.8539628  0.76825173 0.71171446 0.77082887
 0.79704126 0.77798636 0.72261281 0.77039005 0.76390957 0.74142597
 0.72953198 0.7074684  0.70048823 0.75101692 0.76596015 0.6932997
 0.69144311 0.75041217 0.74452629 0.73082441 0.69460789 0.66690064
 0.72854873 0.71453565 0.74569579 0.65069966 0.66919599 0.68576358
 0.81756878 0.79924854 0.68400929 0.67625767 0.66331273 0.69095
 0.74854732 0.72434727 0.69588962 0.81378931 0.77359093 0.80886955
 0.71458707 0.71248406 0.71290466 0.6386034  0.62084214 0.66634813
 0.71218231 0.69511874 0.33212924 0.35255904 0.36548067 0.2949272
 0.30031073 0.34665225 0.39670989 0.35002816 0.40039984 0.3924336
 0.38034886 0.3493475  0.37368828 0.24009924 0.20611829 0.23692593
 0.29867106 0.21095357].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

## Remove the necessary variables to make sure the coefficients are the same for Scikit-learn vs Statsmodels

### Statsmodels

### Scikit-learn

## Interpret the coefficients for PTRATIO, PTRATIO, LSTAT

- CRIM: per capita crime rate by town
- INDUS: proportion of non-retail business acres per town
- CHAS: Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- RM: average number of rooms per dwelling
- AGE: proportion of owner-occupied units built prior to 1940
- DIS: weighted distances to five Boston employment centres
- RAD: index of accessibility to radial highways
- TAX: full-value property-tax rate per $10,000
- PTRATIO: pupil-teacher ratio by town
- B: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
- LSTAT: % lower status of the population

## Predict the house price given the following characteristics (before manipulation!!)

Make sure to transform your variables as needed!

- CRIM: 0.15
- INDUS: 6.07
- CHAS: 1        
- RM:  6.1
- AGE: 33.2
- DIS: 7.6
- PTRATIO: 17
- B: 383
- LSTAT: 10.87
- RAD: 8
- TAX: 284

## Summary
Congratulations! You've fitted your first multiple linear regression model on the Boston Housing Data.