# Boston Housing Assignment

### Completed by: Jacob Metzger
### Due: 02/22/2016

Assignment template found at: http://nbviewer.jupyter.org/github/mbernico/CS570/blob/master/boston_assignment.ipynb

In this assignment you'll be using linear regression to estimate the cost of house in Boston, using a well known dataset.

Goals:
+  Measure the performance of the model I created using $R^{2}$ and MSE
> Learn how to use sklearn.metrics.r2_score and sklearn.metrics.mean_squared_error
+  Implement a new model using L2 regularization
> Use sklearn.linear_model.Ridge or sklearn.linear_model.Lasso 
+  Get the best model you can by optimizing the regularization parameter.   

In [1]:
from sklearn import datasets
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.linear_model import LinearRegression

In [2]:
bean = datasets.load_boston()
print bean.DESCR

Boston House Prices dataset

Notes
------
Data Set Characteristics:  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive
    
    :Median Value (attribute 14) is usually the target

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
      

In [3]:
def load_boston():
    scaler = StandardScaler()
    boston = datasets.load_boston()
    X=boston.data
    y=boston.target
    X = scaler.fit_transform(X) # This transforms the data about zero (by subtracting the mean) and scales by the std.dev.
    return train_test_split(X,y) #Splits into 3:1 training:test sets
    

In [4]:
X_train, X_test, y_train, y_test = load_boston()

In [5]:
X_train.shape

(379L, 13L)

In [6]:
X_test.shape #Note that the test is 1/3 the size of the training set, confirming the split ratio

(127L, 13L)

### Fitting a Linear Regression

It's as easy as instantiating a new regression object (line 1) and giving your regression object your training data
(line 2) by calling .fit(independent variables, dependent variable)



In [7]:
clf = LinearRegression()
clf.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

### Making a Prediction
X_test is our holdout set of data.  We know the answer (y_test), but the computer does not.   

Using the command below, I create a tuple for each observation, where I'm combining the real value (y_test) with
the value our regressor predicts (clf.predict(X_test))

Use a similiar format to get your r2 and mse metrics working.  Using the [scikit learn api](http://scikit-learn.org/stable/modules/model_evaluation.html) if you need help!

In [8]:
zip (y_test, clf.predict(X_test))

[(30.800000000000001, 31.572171202128374),
 (9.5, 12.883515348386215),
 (19.399999999999999, 23.524233551204453),
 (17.600000000000001, 16.456233534904165),
 (27.5, 32.020722073233351),
 (23.0, 20.344882649944637),
 (8.3000000000000007, 13.596152781186694),
 (22.0, 27.715075084130099),
 (23.100000000000001, 20.401728525542399),
 (16.699999999999999, 20.118122676876347),
 (20.5, 20.290111503700938),
 (7.2000000000000002, 8.6026882033314287),
 (28.100000000000001, 25.005292859204104),
 (18.5, 25.856876231091352),
 (13.5, 12.880815455821381),
 (15.300000000000001, 21.315894547097937),
 (48.5, 42.261333413184389),
 (20.300000000000001, 19.534565492009737),
 (28.699999999999999, 28.545558944227334),
 (17.699999999999999, 20.859570424286908),
 (23.600000000000001, 29.552080914259356),
 (13.6, 12.526871402884163),
 (45.399999999999999, 38.704763090279044),
 (24.800000000000001, 31.331949315151526),
 (20.100000000000001, 19.949710657306628),
 (14.300000000000001, 14.434032199160628),
 (22.0, 2

In [9]:
#Calculate R^2 error
from sklearn.metrics import r2_score
r2_score(y_test, clf.predict(X_test))

0.75527626386029212

In [10]:
#Calculate MSE
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test, clf.predict(X_test))

21.866696865076516

## Implement new models using regularization

### Lasso regression linear model (L1 regularization)

In [11]:
from sklearn.linear_model import Lasso
lassoClf = Lasso()
lassoClf.fit(X_train, y_train)

Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

#### Make a prediction

In [12]:
zip(y_test, lassoClf.predict(X_test))

[(30.800000000000001, 28.104402338555563),
 (9.5, 15.297603360046249),
 (19.399999999999999, 25.30045181031813),
 (17.600000000000001, 19.969861183387607),
 (27.5, 27.032460620285583),
 (23.0, 21.921077134103928),
 (8.3000000000000007, 14.847335028133443),
 (22.0, 27.788203471124586),
 (23.100000000000001, 22.781187848074094),
 (16.699999999999999, 20.637563183563863),
 (20.5, 18.592547926322666),
 (7.2000000000000002, 9.1650097585019363),
 (28.100000000000001, 24.610885910782148),
 (18.5, 24.233983036210844),
 (13.5, 17.097555427121286),
 (15.300000000000001, 21.224109233889322),
 (48.5, 35.239038004545336),
 (20.300000000000001, 21.257299734103366),
 (28.699999999999999, 26.948923515163635),
 (17.699999999999999, 20.916189409979495),
 (23.600000000000001, 26.351975826126502),
 (13.6, 14.377667360758053),
 (45.399999999999999, 34.992409840221107),
 (24.800000000000001, 30.641426784554408),
 (20.100000000000001, 22.508421405389029),
 (14.300000000000001, 15.849789544202789),
 (22.0, 23

#### Calculate R^2 and MSE

In [13]:
r2_score(y_test, lassoClf.predict(X_test))

0.70639408998172737

In [14]:
mean_squared_error(y_test, lassoClf.predict(X_test))

26.234445147974284

### Ridge regression linear model (L2 regularization)

In [15]:
from sklearn.linear_model import Ridge
ridgeClf = Ridge()
ridgeClf.fit(X_train, y_train)

Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

#### Make a prediction

In [16]:
zip(y_test, ridgeClf.predict(X_test))

[(30.800000000000001, 31.525993799992662),
 (9.5, 12.888812065191271),
 (19.399999999999999, 23.554339309127361),
 (17.600000000000001, 16.472412284761951),
 (27.5, 32.002489642185537),
 (23.0, 20.372616104770366),
 (8.3000000000000007, 13.591389336457778),
 (22.0, 27.733859726622835),
 (23.100000000000001, 20.436874292638173),
 (16.699999999999999, 20.110141538923322),
 (20.5, 20.198279492164165),
 (7.2000000000000002, 8.610731232944854),
 (28.100000000000001, 24.993558557746603),
 (18.5, 25.86580077588404),
 (13.5, 12.888846252861097),
 (15.300000000000001, 21.357080463809382),
 (48.5, 42.16546454918101),
 (20.300000000000001, 19.52355788172142),
 (28.699999999999999, 28.497510911957924),
 (17.699999999999999, 20.854021933898455),
 (23.600000000000001, 29.521272280080435),
 (13.6, 12.559343959486634),
 (45.399999999999999, 38.644759818818009),
 (24.800000000000001, 31.332031080163787),
 (20.100000000000001, 19.976994464189389),
 (14.300000000000001, 14.463882592332103),
 (22.0, 21.16

#### Calculate R^2 and MSE

In [17]:
r2_score(y_test, ridgeClf.predict(X_test)) # or ridgeClf.score(X_test, y_test)

0.75534059077366866

In [18]:
mean_squared_error(y_test, ridgeClf.predict(X_test))

21.860949089493893

## Optimize regularization parameter (alpha) for Lasso model 

In [19]:
#Scratch work
## This consists of two phases. The outer loop runs  a number of experiments, given by iters.
## Inside the loop, LassoCV iterates through alphasToTry and tests different Lasso models using cross validation, 
## yielding a Lasso model with an estimated alpha. We take these different estimates for alpha over the experiments
## and return the mean. This is not likely to be the *best* estimate for any particular train:test split, but it will
## help reduce the average error in alpha over the possible splits. This will be used in my final, optimized Lasso model.
import numpy as np
from numpy import array
from sklearn.linear_model import LassoCV
iters = 50 #Adjusted by hand 
alphaResults = np.zeros(iters)
alphasToTry = np.linspace(0.001,2,500) #The space was chosen based on some empirical fiddling
for i in xrange(iters):
    X_cv_train, X_cv_test, y_cv_train, y_cv_test = train_test_split(X_train,y_train) #Split off some of X_train for validation
    lassoClf = LassoCV(alphas=array(alphasToTry), cv=4) 
    lassoClf.fit(X_cv_train, y_cv_train)
    alphaResults[i]=lassoClf.alpha_
    #print lassoClf.alpha_
print "Mean alpha: ",np.mean(alphaResults)
print "Std dev alpha:", np.std(alphaResults)

Mean alpha:  0.0212704208417
Std dev alpha: 0.0236783439153


#### Final Lasso model and relevant stats

In [20]:
#Run on original test set
lassoClf = Lasso(alpha=np.mean(alphaResults)) 
lassoClf.fit(X_train, y_train)
(r2_score(y_test, lassoClf.predict(X_test)),mean_squared_error(y_test, lassoClf.predict(X_test)))

(0.75551861042838553, 21.845042574306106)

Note that generally, the R2 and MSE error scores are improved relative to the model with default alpha=1.

## Optimize regularization parameter (alpha) for Ridge model

Here, we will mimic the way we treated the Lasso model, using RidgeCV in lieu of LassoCV

In [21]:
import numpy as np
from sklearn.linear_model import RidgeCV
iters = 30 #Adjusted by hand 
alphaResults = np.zeros(iters)
alphasToTry = np.linspace(0.001,15,50) #This space was also chosen based on some empirical fiddling.
for i in xrange(iters):
    X_cv_train, X_cv_test, y_cv_train, y_cv_test = train_test_split(X_train,y_train) #Split off some of X_train for validation
    ridgeClf = RidgeCV(alphas=array(alphasToTry), cv=4) 
    ridgeClf.fit(X_cv_train, y_cv_train)
    alphaResults[i]=ridgeClf.alpha_
    #print ridgeClf.alpha_
print "Mean alpha: ",np.mean(alphaResults)
print "Std dev alpha:", np.std(alphaResults)

Mean alpha:  8.92897619048
Std dev alpha: 4.24079923909


#### Final Ridge model and relevant stats

In [22]:
#Run on original test set
ridgeClf = Ridge(alpha=np.mean(alphaResults)) 
ridgeClf.fit(X_train, y_train)
(r2_score(y_test, ridgeClf.predict(X_test)),mean_squared_error(y_test, ridgeClf.predict(X_test)))

(0.7555304886663301, 21.843981223116714)

Similar to the case with Lasso, this choice of alpha for Ridge tends show (roughly) equivalent or better R2 and MSE scores than the default alpha=1 on the test set, though it appears that Ridge is not nearly as sensitive as Lasso to the optimization of this parameter.