## Boston Housing Assignment

In this assignment you'll be using linear regression to estimate the cost of house in boston, using a well known dataset.

Goals:
+  Measure the performance of the model I created using $R^{2}$ and MSE
> Learn how to use sklearn.metrics.r2_score and sklearn.metrics.mean_squared_error
+  Implement a new model using L2 regularization
> Use sklearn.linear_model.Ridge or sklearn.linear_model.Lasso 
+  Get the best model you can by optimizing the regularization parameter.   

In [1]:
from sklearn import datasets
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge

In [2]:
bean = datasets.load_boston()
print bean.DESCR

Boston House Prices dataset

Notes
------
Data Set Characteristics:  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive
    
    :Median Value (attribute 14) is usually the target

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
      

In [3]:
def load_boston():
    scaler = StandardScaler()
    boston = datasets.load_boston()
    X=boston.data
    y=boston.target
    X = scaler.fit_transform(X)
    return train_test_split(X,y)
    

In [4]:
X_train, X_test, y_train, y_test = load_boston()

In [5]:
X_train.shape

(379, 13)

### Fitting a Linear Regression

It's as easy as instantiating a new regression object (line 1) and giving your regression object your training data
(line 2) by calling .fit(independent variables, dependent variable)



In [6]:

clf = LinearRegression()
clf.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

### Making a Prediction
X_test is our holdout set of data.  We know the answer (y_test) but the computer does not.   

Using the command below, I create a tuple for each observation, where I'm combining the real value (y_test) with
the value our regressor predicts (clf.predict(X_test))

Use a similiar format to get your r2 and mse metrics working.  Using the [scikit learn api](http://scikit-learn.org/stable/modules/model_evaluation.html) if you need help!

In [7]:
zip (y_test, clf.predict(X_test))

[(20.199999999999999, 16.522932142507198),
 (12.699999999999999, 12.042277788410219),
 (16.399999999999999, 18.704755271824169),
 (19.0, 13.637312916396116),
 (48.299999999999997, 37.099137902060583),
 (31.5, 32.553406811290046),
 (24.800000000000001, 25.164945709723636),
 (19.600000000000001, 17.63545323254947),
 (18.0, 18.737668230428213),
 (42.799999999999997, 30.120820969817739),
 (22.5, 22.365825225048926),
 (21.0, 20.862609135064567),
 (16.5, 10.89207848905914),
 (16.699999999999999, 19.248308781648905),
 (41.299999999999997, 32.496164289249577),
 (18.199999999999999, 18.544821290318094),
 (37.0, 30.736386235134184),
 (21.399999999999999, 21.850577965884924),
 (12.6, 17.908998534496689),
 (22.800000000000001, 26.447433436205621),
 (20.600000000000001, 19.356665131413742),
 (11.5, 13.934197505560821),
 (17.300000000000001, 15.745763233235451),
 (17.899999999999999, 0.9199109280868889),
 (16.300000000000001, 11.086582805502509),
 (31.199999999999999, 28.651976487989039),
 (20.10000

In [8]:
r2_score(y_test, clf.predict(X_test))

0.70960013939480404

In [9]:
mean_squared_error(y_test, clf.predict(X_test))

21.621138354977305

# Lasso
Lasso appears to perform the best of the three, but only by a very small amount.

In [61]:
lclf = Lasso(alpha=1e-3)
lclf = lclf.fit(X_train,y_train)

In [62]:
r2_score(y_test,lclf.predict(X_test))

0.70961545589225195

In [73]:
rclf = Ridge(alpha=1e-8)
rclf = rclf.fit(X_train,y_train)

In [74]:
r2_score(y_test,rclf.predict(X_test))

0.70960013939259503

In [75]:
mean_squared_error(y_test,lclf.predict(X_test))

21.619997995922915

In [76]:
mean_squared_error(y_test,rclf.predict(X_test))

21.62113835514177

In [16]:
X_train

array([[-0.34584335, -0.48772236, -0.72032214, ..., -0.48803915,
         0.36967396, -0.38170208],
       [-0.41387091, -0.48772236, -1.12740922, ..., -0.30309415,
         0.40432133, -0.62420272],
       [-0.40680547,  0.97154295, -0.73637217, ..., -1.08911039,
         0.37011253, -1.09238316],
       ..., 
       [-0.41510165,  0.71402554,  0.56951647, ..., -0.11814915,
         0.43480225, -0.90314855],
       [ 0.36374664, -0.48772236,  1.01599907, ...,  0.80657583,
        -3.9071933 ,  0.67100304],
       [ 0.399245  , -0.48772236,  1.01599907, ...,  0.80657583,
        -0.40232651,  0.42710065]])

In [17]:
y_test

array([ 20.2,  12.7,  16.4,  19. ,  48.3,  31.5,  24.8,  19.6,  18. ,
        42.8,  22.5,  21. ,  16.5,  16.7,  41.3,  18.2,  37. ,  21.4,
        12.6,  22.8,  20.6,  11.5,  17.3,  17.9,  16.3,  31.2,  20.1,
        31. ,  24.1,  27.9,  30.1,  35.1,  50. ,  23.3,  20.8,  16.5,
        17.8,  27.1,  21.6,  19.8,  22.5,  30.8,  24.4,  19.4,  32.7,
        17.8,  14.4,  13.8,  16.6,   8.7,  50. ,  20.1,  18.3,  18.5,
        19.9,  26.6,  13.3,  37.3,  15.7,  19.5,  19.9,  22. ,  21. ,
        34.9,  28.6,  22.2,  15.2,  18.4,  21.2,  12.7,  32. ,  17. ,
        28.7,  20.1,  24.7,  16.6,  30.1,  48.5,  22.6,  24.1,  19.8,
        21.2,  37.6,  17.4,  25.2,  11. ,  20.9,  21.1,  22.7,  13. ,
        27.1,  23.2,  32.2,  27.9,  20.2,  26.2,  20.3,  42.3,  23.1,
        23.2,  15.2,  19.1,  13.5,  50. ,  21.5,  17.1,  31.6,  23.8,
        29. ,  20.3,  19.4,  23.4,  12.3,  26.4,  23.3,  22.9,  14.3,
        14.9,  17.5,  22.9,  35.2,  18.8,  23.7,  11.8,  10.5,  17.2,  18.5])