## Boston Housing Assignment

In this assignment you'll be using linear regression to estimate the cost of house in boston, using a well known dataset.

Goals:
+  Measure the performance of the model I created using $R^{2}$ and MSE
> Learn how to use sklearn.metrics.r2_score and sklearn.metrics.mean_squared_error
+  Implement a new model using L2 regularization
> Use sklearn.linear_model.Ridge or sklearn.linear_model.Lasso 
+  Get the best model you can by optimizing the regularization parameter.   

In [6]:
from sklearn import datasets
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression

In [7]:
bean = datasets.load_boston()
print bean.DESCR

Boston House Prices dataset

Notes
------
Data Set Characteristics:  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive
    
    :Median Value (attribute 14) is usually the target

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
      

In [8]:
def load_boston():
    scaler = StandardScaler()
    boston = datasets.load_boston()
    X=boston.data
    y=boston.target
    X = scaler.fit_transform(X)
    return train_test_split(X,y)
    

In [9]:
X_train, X_test, y_train, y_test = load_boston()

In [10]:
X_train.shape

(379L, 13L)

### Fitting a Linear Regression

It's as easy as instantiating a new regression object (line 1) and giving your regression object your training data
(line 2) by calling .fit(independent variables, dependent variable)



In [11]:

clf = LinearRegression()
clf.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

### Making a Prediction
X_test is our holdout set of data.  We know the answer (y_test) but the computer does not.   

Using the command below, I create a tuple for each observation, where I'm combining the real value (y_test) with
the value our regressor predicts (clf.predict(X_test))

Use a similiar format to get your r2 and mse metrics working.  Using the [scikit learn api](http://scikit-learn.org/stable/modules/model_evaluation.html) if you need help!

In [12]:
zip (y_test, clf.predict(X_test))

[(21.600000000000001, 24.814879586981554),
 (13.800000000000001, -0.53973494680865386),
 (15.6, 15.420240819239948),
 (14.9, 17.188585645254022),
 (27.899999999999999, 18.924090904916991),
 (22.0, 21.532993392242556),
 (18.399999999999999, 17.967053672630257),
 (50.0, 25.624951005469399),
 (19.800000000000001, 22.197646988120635),
 (23.0, 29.726335140187423),
 (5.0, 8.5981585338633248),
 (37.899999999999999, 33.242284018880824),
 (20.699999999999999, 22.171967589344586),
 (33.299999999999997, 35.803113722423191),
 (11.9, 22.258406606454557),
 (31.5, 31.380695495334024),
 (20.399999999999999, 22.963894666012752),
 (27.5, 24.510844225016584),
 (22.0, 22.363580085285875),
 (20.100000000000001, 20.950485026720198),
 (39.799999999999997, 34.543918684510061),
 (20.300000000000001, 23.136154367739696),
 (21.199999999999999, 21.006134711863734),
 (21.899999999999999, 38.56235576705744),
 (21.899999999999999, 14.297233301423113),
 (22.699999999999999, 24.055009898118151),
 (24.100000000000001, 

# Boston_Assignment - Solution

In [17]:
from sklearn import linear_model
from sklearn.linear_model import Lasso


In [64]:
lassoClearfig = Lasso(alpha=1.5)
lassoClearfig.fit(X_train, y_train)

Lasso(alpha=1.5, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

### Making a prediction

In [144]:
zip(y_test, clf.predict(X_test))

[(21.600000000000001, 24.814879586981554),
 (13.800000000000001, -0.53973494680865386),
 (15.6, 15.420240819239948),
 (14.9, 17.188585645254022),
 (27.899999999999999, 18.924090904916991),
 (22.0, 21.532993392242556),
 (18.399999999999999, 17.967053672630257),
 (50.0, 25.624951005469399),
 (19.800000000000001, 22.197646988120635),
 (23.0, 29.726335140187423),
 (5.0, 8.5981585338633248),
 (37.899999999999999, 33.242284018880824),
 (20.699999999999999, 22.171967589344586),
 (33.299999999999997, 35.803113722423191),
 (11.9, 22.258406606454557),
 (31.5, 31.380695495334024),
 (20.399999999999999, 22.963894666012752),
 (27.5, 24.510844225016584),
 (22.0, 22.363580085285875),
 (20.100000000000001, 20.950485026720198),
 (39.799999999999997, 34.543918684510061),
 (20.300000000000001, 23.136154367739696),
 (21.199999999999999, 21.006134711863734),
 (21.899999999999999, 38.56235576705744),
 (21.899999999999999, 14.297233301423113),
 (22.699999999999999, 24.055009898118151),
 (24.100000000000001, 

### Caluculating and Printing the r2_score , mean squared error and root mean squared error

In [146]:
print 'Lasso r2: ', r2_score(y_test, clf.predict(X_test))

Lasso r2:  0.698537515434


In [147]:
print 'Lasso MSE: ', mean_squared_error(y_test, clf.predict(X_test))
print 'Lasso RMSE: ', math.sqrt(mean_squared_error(y_test, clf.predict(X_test)))

Lasso MSE:  29.6904201341
Lasso RMSE:  5.44889164272


### Caluculating and Printing the r2_score , mean squared error and root mean squared error after adjusting the parameters

In [148]:
lassoClearfig = Lasso(alpha=0.5)
lassoClearfig.fit(X_train, y_train)

Lasso(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [149]:
print 'Lasso r2: ', r2_score(y_test, lassoClearfig.predict(X_test))
print 'Lasso MSE: ', mean_squared_error(y_test, lassoClearfig.predict(X_test))
print 'Lasso RMSE: ', math.sqrt(mean_squared_error(y_test, lassoClearfig.predict(X_test)))

Lasso r2:  0.648945055716
Lasso MSE:  34.5746795025
Lasso RMSE:  5.88002376718


### L2Regularization for Lasso CV

In [150]:
from sklearn import linear_model
from sklearn.linear_model import LassoCV

In [156]:
clf_LassoCV = LassoCV(alphas=[.2])
clf_LassoCV.fit(X_train, y_train)

LassoCV(alphas=[0.2], copy_X=True, cv=None, eps=0.001, fit_intercept=True,
    max_iter=1000, n_alphas=100, n_jobs=1, normalize=False, positive=False,
    precompute='auto', random_state=None, selection='cyclic', tol=0.0001,
    verbose=False)

In [157]:
print 'LassoCV r2: ', r2_score(y_test, clf_LassoCV.predict(X_test))
print 'LassoCV MSE: ', mean_squared_error(y_test, clf_LassoCV.predict(X_test))
print 'LassoCV RMSE: ', math.sqrt(mean_squared_error(y_test, clf_LassoCV.predict(X_test)))

LassoCV r2:  0.677370761959
LassoCV MSE:  31.7750901534
LassoCV RMSE:  5.63693978622


### Making a Prediction

In [158]:
zip(y_test, clf_LassoCV.predict(X_test))

[(21.600000000000001, 25.384656191823726),
 (13.800000000000001, -1.4228242177755135),
 (15.6, 16.776417154138699),
 (14.9, 17.303525704254529),
 (27.899999999999999, 18.058020159811463),
 (22.0, 22.242048624076752),
 (18.399999999999999, 17.900041359243897),
 (50.0, 23.976532653558557),
 (19.800000000000001, 22.901659326894713),
 (23.0, 28.65036118733115),
 (5.0, 11.155598995380675),
 (37.899999999999999, 32.34821593893701),
 (20.699999999999999, 22.839243556551782),
 (33.299999999999997, 35.724285871493606),
 (11.9, 22.622530092778227),
 (31.5, 29.712236663115753),
 (20.399999999999999, 22.990582977607108),
 (27.5, 25.26460680098544),
 (22.0, 20.186221564915858),
 (20.100000000000001, 21.479876185398968),
 (39.799999999999997, 33.732081506137618),
 (20.300000000000001, 23.043907290519062),
 (21.199999999999999, 21.229955048561958),
 (21.899999999999999, 38.159893993889412),
 (21.899999999999999, 13.461779338778612),
 (22.699999999999999, 24.150612682802766),
 (24.100000000000001, 25.

### Adjusting the Values of the alpha for LassoCV

In [160]:
clf_LassoCV = LassoCV(alphas=[.5,.8])
clf_LassoCV.fit(X_train, y_train)

LassoCV(alphas=[0.5, 0.8], copy_X=True, cv=None, eps=0.001,
    fit_intercept=True, max_iter=1000, n_alphas=100, n_jobs=1,
    normalize=False, positive=False, precompute='auto', random_state=None,
    selection='cyclic', tol=0.0001, verbose=False)

### The Final output after adjsuting the parameters

In [162]:
print 'LassoCV r2: ', r2_score(y_test, clf_LassoCV.predict(X_test))
print 'LassoCV MSE: ', mean_squared_error(y_test, clf_LassoCV.predict(X_test))
print 'LassoCV RMSE: ', math.sqrt(mean_squared_error(y_test, clf_LassoCV.predict(X_test)))

LassoCV r2:  0.648945055716
LassoCV MSE:  34.5746795025
LassoCV RMSE:  5.88002376718


In [164]:
clf_LassoCV1 = LassoCV(alphas=[.9])
clf_LassoCV1.fit(X_train, y_train)

LassoCV(alphas=[0.9], copy_X=True, cv=None, eps=0.001, fit_intercept=True,
    max_iter=1000, n_alphas=100, n_jobs=1, normalize=False, positive=False,
    precompute='auto', random_state=None, selection='cyclic', tol=0.0001,
    verbose=False)

In [165]:
print 'LassoCV1 r2: ', r2_score(y_test, clf_LassoCV1.predict(X_test))
print 'LassoCV1 MSE: ', mean_squared_error(y_test, clf_LassoCV1.predict(X_test))
print 'LassoCV1 RMSE: ', math.sqrt(mean_squared_error(y_test, clf_LassoCV1.predict(X_test)))

LassoCV1 r2:  0.635376598685
LassoCV1 MSE:  35.9110089314
LassoCV1 RMSE:  5.99257948895
