## Boston Housing Assignment

In this assignment you'll be using linear regression to estimate the cost of house in boston, using a well known dataset.

Goals:
+  Measure the performance of the model I created using $R^{2}$ and MSE
> Learn how to use sklearn.metrics.r2_score and sklearn.metrics.mean_squared_error
+  Implement a new model using L2 regularization
> Use sklearn.linear_model.Ridge or sklearn.linear_model.Lasso 
+  Get the best model you can by optimizing the regularization parameter.   

In [50]:
from sklearn import datasets
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression

In [51]:
bean = datasets.load_boston()
print bean.DESCR

Boston House Prices dataset

Notes
------
Data Set Characteristics:  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive
    
    :Median Value (attribute 14) is usually the target

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
      

In [53]:
def load_boston():
    scaler = StandardScaler()
    boston = datasets.load_boston()
    X=boston.data
    y=boston.target
    X = scaler.fit_transform(X)
    return train_test_split(X,y)
    

In [54]:
X_train, X_test, y_train, y_test = load_boston()

In [55]:
X_train.shape

(379L, 13L)

### Fitting a Linear Regression

It's as easy as instantiating a new regression object (line 1) and giving your regression object your training data
(line 2) by calling .fit(independent variables, dependent variable)



In [76]:
clf = LinearRegression()
clf.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [57]:
y_pred = clf.predict(X_test)

In [58]:
y_pred

array([ 23.76904102,  22.30007173,  24.98471075,   6.10413563,
         7.51683849,  13.21699029,  17.08201691,  14.21822206,
        36.65563806,  23.2069817 ,   9.62790685,  17.42317883,
        26.4205491 ,  22.61620273,  19.66120088,  39.66004607,
        30.68715851,  30.90451782,   5.83450861,   0.35920888,
        17.6198611 ,   8.1018283 ,  34.61953311,  17.78341499,
        32.11062553,  27.65479849,  17.25462026,  12.10100765,
         7.88054167,  13.09899071,  23.57169443,  24.43986347,
        40.70278375,  13.92067924,  36.95455059,  17.15848519,
        23.02931913,  21.0494038 ,  13.34146959,  11.79755449,
        22.2365532 ,  31.48288655,  22.9749657 ,  16.46596225,
        15.94984119,  18.21664909,  32.35654682,  43.1151829 ,
        37.18750541,  21.91401051,  16.54135201,  17.81668756,
        26.74495255,  18.88557795,  27.8415906 ,  36.95412824,
        22.55895172,  18.80078003,  20.04123296,  28.36249424,
        16.1621005 ,  15.83811206,  16.19360602,  20.26

In [59]:
y_pred.shape

(127L,)

In [60]:
y_test.shape

(127L,)

### Making a Prediction
X_test is our holdout set of data.  We know the answer (y_test) but the computer does not.   

Using the command below, I create a tuple for each observation, where I'm combining the real value (y_test) with
the value our regressor predicts (clf.predict(X_test))

Use a similiar format to get your r2 and mse metrics working.  Using the [scikit learn api](http://scikit-learn.org/stable/modules/model_evaluation.html) if you need help!

In [61]:
zip (y_test, clf.predict(X_test))

[(23.0, 23.769041020573567),
 (33.0, 22.300071728835597),
 (24.600000000000001, 24.984710749515813),
 (10.5, 6.1041356303966658),
 (8.6999999999999993, 7.5168384896687819),
 (14.5, 13.216990286972676),
 (17.800000000000001, 17.082016905462122),
 (13.6, 14.218222063761191),
 (32.399999999999999, 36.655638057593848),
 (22.199999999999999, 23.206981699106102),
 (5.0, 9.6279068450207355),
 (14.5, 17.423178826137647),
 (24.800000000000001, 26.420549102626541),
 (20.0, 22.616202732965345),
 (24.100000000000001, 19.661200879680283),
 (50.0, 39.660046072572015),
 (37.0, 30.687158507417994),
 (28.699999999999999, 30.904517823713164),
 (13.800000000000001, 5.8345086110950142),
 (17.899999999999999, 0.3592088828293285),
 (19.600000000000001, 17.619861103344867),
 (13.199999999999999, 8.1018282958360412),
 (37.299999999999997, 34.619533111194485),
 (12.699999999999999, 17.783414992133196),
 (31.600000000000001, 32.110625525351153),
 (36.200000000000003, 27.654798490114317),
 (19.600000000000001, 1

In [62]:
r2_score(y_test, y_pred)

0.72440600953767298

In [69]:
MSE = mean_squared_error (y_test, y_pred)

In [70]:
MSE

26.045785384986516

In [129]:
import math
math.sqrt(MSE)

5.10116323252565

### Linear Regression Model Results
<li> R^2 = 0.72440600953767298
<li> RMSE = 5.10116323252565

### Next Model Ridge

In [130]:
from sklearn import linear_model

In [77]:
clf = linear_model.Ridge(alpha = .5)

In [78]:
clf.fit(X_train, y_train)

Ridge(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [82]:
y_pred_ridge = clf.predict(X_test)

In [83]:
r2_score(y_test, y_pred_ridge)

0.72445230292939933

In [84]:
MSE = mean_squared_error (y_test, y_pred_ridge)

In [85]:
math.sqrt(MSE)

5.103078511816791

<li> default value used .5
<li> attempting gradient descent in two directions with first step .5

In [131]:
clf = linear_model.Ridge(alpha = 1)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.72449308379032717

In [132]:
MSE = mean_squared_error (y_test, y_pred_ridge)
math.sqrt(MSE)

5.102700871869456

In [133]:
clf = linear_model.Ridge(alpha = 0)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.7244060095376732

In [134]:
MSE = mean_squared_error (y_test, y_pred_ridge)
math.sqrt(MSE)

5.103507165174405

<li> changing step to intervals of 5 in positive direction

In [107]:
clf = linear_model.Ridge(alpha = 5)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.72465554173414026

In [101]:
MSE = mean_squared_error (y_test, y_pred_ridge)
math.sqrt(MSE)

5.101196197242518

In [105]:
clf = linear_model.Ridge(alpha = 10)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.72456042673156129

In [135]:
MSE = mean_squared_error (y_test, y_pred_ridge)
math.sqrt(MSE)

5.103507165174405

<li> optimal R^2 between 5-10

In [112]:
clf = linear_model.Ridge(alpha = 7)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.72465003016326968

In [113]:
MSE = mean_squared_error (y_test, y_pred_ridge)
math.sqrt(MSE)

5.101247252317036

In [117]:
clf = linear_model.Ridge(alpha = 6)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.72465886692503889

In [118]:
MSE = mean_squared_error (y_test, y_pred_ridge)
math.sqrt(MSE)

5.101165394911346

<li> Optimal between alpha ~ 5-6

In [136]:
clf = linear_model.Ridge(alpha = 5.5)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.72465879630539187

In [121]:
MSE = mean_squared_error (y_test, y_pred_ridge)
math.sqrt(MSE)

5.101166049086173

In [123]:
clf = linear_model.Ridge(alpha = 5.7)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.72465919821606295

In [124]:
MSE = mean_squared_error (y_test, y_pred_ridge)
math.sqrt(MSE)

5.101162326044016

In [125]:
clf = linear_model.Ridge(alpha = 5.8)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.7246592111731025

In [126]:
MSE = mean_squared_error (y_test, y_pred_ridge)
math.sqrt(MSE)

5.101162206018284

In [127]:
clf = linear_model.Ridge(alpha = 5.9)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.72465910035938408

In [128]:
MSE = mean_squared_error (y_test, y_pred_ridge)
math.sqrt(MSE)

5.10116323252565

### Ridge Results
alpha = 5.8
<li> R^2 = 0.7246592111731025
<li> RMSE = 5.101162206018284

In [139]:
clf = linear_model.Lasso(alpha = 0.1)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.72036229387815276

In [140]:
MSE = mean_squared_error (y_test, y_pred_ridge)
math.sqrt(MSE)

5.140812008758471

In [147]:
clf = linear_model.Lasso(alpha = .15)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.71370236912778529

In [148]:
MSE = mean_squared_error (y_test, y_pred_ridge)
math.sqrt(MSE)

5.201669254575839

In [164]:
clf = linear_model.Lasso(alpha = 0.01)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.72454428044986097

In [150]:
MSE = mean_squared_error (y_test, y_pred_ridge)
math.sqrt(MSE)

5.107378903050145

In [170]:
clf = linear_model.Lasso(alpha = 0.0099)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.72454600933042124

In [168]:
clf = linear_model.Lasso(alpha = 0.011)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.72455132384192322

In [171]:
clf = linear_model.Lasso(alpha = 0.02)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.72458605740203064

In [172]:
clf = linear_model.Lasso(alpha = 0.09)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.72136333342372361

In [173]:
clf = linear_model.Lasso(alpha = 0.05)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.7239876962472338

In [174]:
clf = linear_model.Lasso(alpha = 0.03)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.72451565700923037

In [175]:
clf = linear_model.Lasso(alpha = 0.025)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.72456724828128061

In [176]:
clf = linear_model.Lasso(alpha = 0.019)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.72458594488453554

In [177]:
clf = linear_model.Lasso(alpha = 0.021)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.72458486895735885

In [178]:
clf = linear_model.Lasso(alpha = 0.02)
clf.fit(X_train, y_train)
y_pred_ridge = clf.predict(X_test)
r2_score(y_test, y_pred_ridge)

0.72458605740203064

In [179]:
MSE = mean_squared_error (y_test, y_pred_ridge)
math.sqrt(MSE)

5.1018398107981335

### Lasso Results


<li> alpha = 0.2
<li> R^2 = 0.72458605740203064
<li> RMSE = 5.1018398107981335

### Linear Regression Model Results
<li> R^2 = 0.72440600953767298
<li> RMSE = 5.10116323252565

### Ridge Results
alpha = 5.8
<li> R^2 = 0.7246592111731025
<li> RMSE = 5.101162206018284

### Final Results
<li> Ridge had the best R^2 and RMSE values of the 3 linear models evaluated