# Employing LASSO Regression
*Curtis Miller*

**LASSO regression** (where LASSO stands for least absolute shrinkage and selection operator) is another regularized version of regression, resembling ridge regression. However, the penalty term $\alpha$ is applied differently in LASSO. (Ridge regression uses $L_2$ regularization while LASSO uses $L_1$ regularization.)

We will continue to work with the Boston house price dataset.

In [1]:
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import numpy as np

In [2]:
boston_obj = load_boston()
data_train, data_test, price_train, price_test = train_test_split(boston_obj.data, boston_obj.target)
data_train = np.delete(data_train, [2, 6], axis=1)
data_test = np.delete(data_test, [2, 6], axis=1)

## Fitting with LASSO regression

LASSO regression is implemented via the `Lasso` object provided by **scikit-learn**.

In [3]:
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import cross_val_score

In [4]:
lasso1 = Lasso(alpha=1)    # alpha is a hyperparameter controlling regularization
lasso1.fit(data_train, price_train)
lasso1.predict([[    # An example prediction
    1,      # Per capita crime rate
    25,     # Proportion of land zoned for large homes
    1,      # Tract bounds the Charles River
    0.3,    # NOX concentration
    10,     # Average number of rooms per dwelling
    10,     # Weighted distance to employment centers
    3,      # Index for highway accessibility
    400,    # Tax rate
    15,     # Pupil/teacher ratio
    200,    # Index for number of blacks
    5       # % lower status of population
]])

array([29.08615387])

In [5]:
predprice = lasso1.predict(data_train)
mean_squared_error(price_train, predprice)

26.186116476897762

Cross-validation via LASSO looks very similar to cross-validation with ridge regression.

In [6]:
import pandas as pd
from pandas import DataFrame

In [7]:
alpha = [.125, .25, .5, 1, 2, 4, 8, 16, 32, 64, 128]    # Candidate alphas
res = dict()

for a in alpha:
    lasso2 = Lasso(alpha=a)
    res[a] = cross_val_score(lasso2, data_train, price_train, scoring='neg_mean_squared_error', cv = 10)

res_df = DataFrame(res)

res_df

Unnamed: 0,0.125,0.25,0.5,1.0,2.0,4.0,8.0,16.0,32.0,64.0,128.0
0,-15.410308,-15.576114,-16.239303,-18.890937,-24.434237,-27.678038,-28.240305,-31.633401,-46.917994,-50.034332,-51.303476
1,-17.546908,-18.013341,-19.369314,-23.773658,-31.926944,-40.71202,-42.224214,-48.384351,-66.782254,-70.65748,-73.174571
2,-35.77024,-35.329683,-35.042684,-36.845868,-44.128087,-53.322653,-55.709802,-63.464168,-82.109049,-83.930715,-84.749809
3,-16.988829,-16.574853,-16.078654,-16.413008,-16.552518,-16.147738,-18.4091,-25.043977,-41.677179,-44.311565,-46.153342
4,-12.855487,-12.711071,-12.693937,-13.746736,-17.672406,-22.934872,-24.453265,-28.336625,-40.412506,-43.055529,-45.002219
5,-16.945048,-17.132288,-17.661655,-19.339483,-21.774543,-24.691131,-25.728358,-29.988728,-45.548242,-48.157426,-50.052946
6,-21.524633,-22.329022,-24.12809,-28.487369,-34.135331,-37.18496,-39.366339,-44.690851,-58.573889,-59.235519,-58.463009
7,-39.426794,-40.295094,-42.418176,-48.210253,-56.998427,-65.941036,-65.682958,-69.865867,-88.069363,-90.480773,-91.15106
8,-56.903062,-55.973955,-54.262835,-51.428866,-49.251228,-52.096057,-54.887623,-62.859961,-80.202024,-81.718826,-82.134217
9,-17.994109,-18.62072,-20.161896,-24.396093,-32.912563,-42.747876,-43.852608,-48.636116,-63.616723,-66.633446,-68.649208


In [8]:
res_df.mean()

0.125     -25.136542
0.250     -25.255614
0.500     -25.805654
1.000     -28.153227
2.000     -32.978628
4.000     -38.345638
8.000     -39.855457
16.000    -45.290404
32.000    -61.390922
64.000    -63.821561
128.000   -65.083386
dtype: float64

Again, a smaller $\alpha$ leads to better fits.

In [9]:
lasso3 = Lasso(alpha=0.125)
lasso3.fit(data_train, price_train)

testpredprice = lasso3.predict(data_test)
mean_squared_error(price_test, testpredprice)

24.552504055553456

Overall it seems that LASSO regression doesn't do better than ridge regression or OLS. In fact, it seems like regularization doesn't produce better models for this dataset.