# Employing LASSO Regression

**LASSO regression** (where LASSO stands for least absolute shrinkage and selection operator) is another regularized version of regression, resembling ridge regression. However, the penalty term $\alpha$ is applied differently in LASSO. (Ridge regression uses $L_2$ regularization while LASSO uses $L_1$ regularization.)

We will continue to work with the Boston house price dataset.

In [1]:
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import numpy as np

In [2]:
boston_obj = load_boston()
data_train, data_test, price_train, price_test = train_test_split(boston_obj.data, boston_obj.target)
data_train = np.delete(data_train, [2, 6], axis=1)
data_test = np.delete(data_test, [2, 6], axis=1)

## Fitting with LASSO regression

LASSO regression is implemented via the `Lasso` object provided by **scikit-learn**.

In [3]:
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import cross_val_score

In [4]:
lasso1 = Lasso(alpha=1)    # alpha is a hyperparameter controlling regularization
lasso1.fit(data_train, price_train)
lasso1.predict([[    # An example prediction
    1,      # Per capita crime rate
    25,     # Proportion of land zoned for large homes
    1,      # Tract bounds the Charles River
    0.3,    # NOX concentration
    10,     # Average number of rooms per dwelling
    10,     # Weighted distance to employment centers
    3,      # Index for highway accessibility
    400,    # Tax rate
    15,     # Pupil/teacher ratio
    200,    # Index for number of blacks
    5       # % lower status of population
]])

array([ 26.13154937])

In [5]:
predprice = lasso1.predict(data_train)
mean_squared_error(price_train, predprice)

26.602548859655187

Cross-validation via LASSO looks very similar to cross-validation with ridge regression.

In [6]:
import pandas as pd
from pandas import DataFrame

In [7]:
alpha = [.125, .25, .5, 1, 2, 4, 8, 16, 32, 64, 128]    # Candidate alphas
res = dict()

for a in alpha:
    lasso2 = Lasso(alpha=a)
    res[a] = cross_val_score(lasso2, data_train, price_train, scoring='neg_mean_squared_error', cv = 10)

res_df = DataFrame(res)

res_df

Unnamed: 0,0.125,0.25,0.5,1.0,2.0,4.0,8.0,16.0,32.0,64.0,128.0
0,-23.578552,-25.183016,-26.884422,-31.086422,-34.485346,-38.574984,-42.013423,-49.505125,-68.654366,-73.079089,-74.038808
1,-20.727668,-21.359924,-22.538832,-25.758314,-30.071691,-37.159435,-39.127933,-43.239196,-57.843196,-62.193022,-63.710823
2,-24.792004,-24.747319,-23.587147,-22.893052,-24.911051,-29.382137,-31.754362,-38.914857,-62.963909,-67.371269,-68.373116
3,-36.07284,-35.343506,-34.592405,-34.378515,-38.586243,-47.179857,-48.512042,-52.567526,-67.043069,-64.899899,-64.886922
4,-21.143216,-21.856128,-24.002821,-29.682242,-33.611047,-39.124863,-41.797128,-48.082617,-67.442011,-70.980608,-73.746373
5,-17.748703,-18.31582,-18.985668,-21.821554,-22.791731,-23.715189,-22.8662,-23.608628,-36.321938,-38.026512,-44.162521
6,-20.017082,-18.52872,-17.862859,-18.290466,-20.09978,-23.648307,-25.421354,-30.180902,-47.672406,-50.415362,-51.070062
7,-48.076614,-47.579222,-46.86286,-46.316435,-48.783352,-55.088865,-60.914061,-73.309961,-99.531297,-103.554066,-104.517061
8,-24.484715,-25.516003,-28.669419,-37.083196,-43.438188,-53.792811,-54.927445,-58.983855,-73.110627,-76.587099,-79.387674
9,-16.880481,-16.697835,-17.001803,-18.449687,-21.168183,-24.371581,-24.297903,-24.005189,-29.901779,-29.884072,-29.434281


In [8]:
res_df.mean()

0.125     -25.352187
0.250     -25.512749
0.500     -26.098824
1.000     -28.575989
2.000     -31.794661
4.000     -37.203803
8.000     -39.163185
16.000    -44.239786
32.000    -61.048460
64.000    -63.699100
128.000   -65.332764
dtype: float64

Again, a smaller $\alpha$ leads to better fits.

In [9]:
lasso3 = Lasso(alpha=0.125)
lasso3.fit(data_train, price_train)

testpredprice = lasso3.predict(data_test)
mean_squared_error(price_test, testpredprice)

24.883894478966027

Overall it seems that LASSO regression doesn't do better than OLS. In fact, it seems like regularization doesn't produce better models for this dataset.