## Boston Housing Assignment

Andrew Peabody apeab2@uis.edu

Uses linear regression to estimate the cost of house in Boston using a well known dataset.

Goals:
+  Measure the performance of the model provided by the instructor using $R^{2}$ and MSE
> Learn how to use sklearn.metrics.r2_score and sklearn.metrics.mean_squared_error
+  Implement a new model using L2 regularization
> Use sklearn.linear_model.Ridge or sklearn.linear_model.Lasso 
+  Get the best model by optimizing the regularization parameter.   

In [137]:
from sklearn import datasets
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Lasso
from sklearn.linear_model import Ridge
from sklearn.linear_model import RidgeCV
from math import sqrt

In [138]:
bean = datasets.load_boston()
print bean.DESCR

Boston House Prices dataset

Notes
------
Data Set Characteristics:  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive
    
    :Median Value (attribute 14) is usually the target

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
      

In [139]:
def load_boston():
    scaler = StandardScaler()
    boston = datasets.load_boston()
    X=boston.data
    y=boston.target
    X = scaler.fit_transform(X)
    return train_test_split(X,y)

In [140]:
X_train, X_test, y_train, y_test = load_boston()

In [141]:
X_train.shape

(379L, 13L)

### Linear Regression

In [142]:
clf = LinearRegression()
clf.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

### Prediction Tuple

In [143]:
zip (y_test, clf.predict(X_test))

[(16.100000000000001, 18.697774538887369),
 (19.199999999999999, 23.659638779611726),
 (21.0, 21.325025453819077),
 (16.199999999999999, 20.222901690909044),
 (36.399999999999999, 32.919383421581045),
 (48.799999999999997, 41.956063448230722),
 (45.399999999999999, 39.054576074854673),
 (25.0, 29.104332840480119),
 (24.699999999999999, 23.829276721901028),
 (22.800000000000001, 27.037032653357823),
 (19.899999999999999, 19.273968884963544),
 (17.800000000000001, 23.11798206020746),
 (43.5, 39.82841119181046),
 (14.800000000000001, 14.953013674708217),
 (17.800000000000001, 19.076014436132581),
 (27.100000000000001, 19.218267589802156),
 (23.800000000000001, 25.991316127006531),
 (22.600000000000001, 27.682532177134444),
 (31.600000000000001, 32.597555607738173),
 (32.0, 33.810839164403234),
 (19.100000000000001, 16.796820144459716),
 (22.0, 27.787281794793827),
 (7.0, 8.8783639839780335),
 (19.800000000000001, 23.212794503288258),
 (13.4, 14.790038073701229),
 (14.300000000000001, 16.5

### $R^{2}$ Score

From the sklearn docs:

"Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a R^2 score of 0.0."

In [144]:
r2_score(y_test, clf.predict(X_test))

0.68922685467600742

### Mean Squared Error (MSE) and Root Mean Squared Error (RMSE)

Lower values indicate a better fit.

In [145]:
mse = mean_squared_error(y_test, clf.predict(X_test))
print "MSE: ", mse
print "RMSE: ", sqrt(mse)

MSE:  24.4802201225
RMSE:  4.9477489955


### Lasso Linear Model

Linear Model trained with L1 prior as regularizer (aka the Lasso)

In [146]:
clf = Lasso(alpha=0.1)
clf.fit(X_train, y_train)

Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [147]:
print "R2: ", r2_score(y_test, clf.predict(X_test))
mse = mean_squared_error(y_test, clf.predict(X_test))
print "MSE: ", mse
print "RMSE: ", sqrt(mse)

R2:  0.679212730476
MSE:  25.2690526469
RMSE:  5.0268332623


### Ridge Linear Model

Linear least squares with l2 regularization.

In [148]:
clf = Ridge(alpha=1.0)
clf.fit(X_train, y_train)

Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [149]:
print "R2: ", r2_score(y_test, clf.predict(X_test))
mse = mean_squared_error(y_test, clf.predict(X_test))
print "MSE: ", mse
print "RMSE: ", sqrt(mse)

R2:  0.689070476959
MSE:  24.492538307
RMSE:  4.94899366609


### Optimization of the Ridge Linear Model

In [150]:
clf = Ridge(alpha=0.1)
clf.fit(X_train, y_train)
print "R2: ", r2_score(y_test, clf.predict(X_test))
mse = mean_squared_error(y_test, clf.predict(X_test))
print "MSE: ", mse
print "RMSE: ", sqrt(mse)

R2:  0.689211664187
MSE:  24.4814167077
RMSE:  4.9478699162


In [151]:
clf = Ridge(alpha=10)
clf.fit(X_train, y_train)
print "R2: ", r2_score(y_test, clf.predict(X_test))
mse = mean_squared_error(y_test, clf.predict(X_test))
print "MSE: ", mse
print "RMSE: ", sqrt(mse)

R2:  0.687416718982
MSE:  24.6228081193
RMSE:  4.96213745469


Use sklearn's Ridge with Cross Validation to check several possible alphas

In [152]:
clf = RidgeCV(alphas=(0.1, 1.0, 5.0, 7.5, 10.0))
clf.fit(X_train, y_train)
clf.alpha_      

7.5

In [153]:
clf = Ridge(alpha=7.5)
clf.fit(X_train, y_train)
print "R2: ", r2_score(y_test, clf.predict(X_test))
mse = mean_squared_error(y_test, clf.predict(X_test))
print "MSE: ", mse
print "RMSE: ", sqrt(mse)

R2:  0.687903365765
MSE:  24.5844739822
RMSE:  4.95827328636


### Analysis

Overall the performance as measured by R2/MSE of the standard Linear Regression, Lasso, and Ridge model are pretty similar for this particular dataset.  Ideally we are looking for the highest R2 score and lowest MSE/RMSE.  Even after optimization of the regularization parameter the Ridge linear model it does NOT significantly deviate from the performance of the original Linear Regression for this dataset.