# Employing LASSO Regression
*Curtis Miller*

**LASSO regression** (where LASSO stands for least absolute shrinkage and selection operator) is another regularized version of regression, resembling ridge regression. However, the penalty term $\alpha$ is applied differently in LASSO. (Ridge regression uses $L_2$ regularization while LASSO uses $L_1$ regularization.)

We will continue to work with the Boston house price dataset.

In [None]:
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import numpy as np

In [None]:
boston_obj = load_boston()
data_train, data_test, price_train, price_test = train_test_split(boston_obj.data, boston_obj.target)
data_train = np.delete(data_train, [2, 6], axis=1)
data_test = np.delete(data_test, [2, 6], axis=1)

## Fitting with LASSO regression

LASSO regression is implemented via the `Lasso` object provided by **scikit-learn**.

In [None]:
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import cross_val_score

In [None]:
lasso1 = Lasso(alpha=1)    # alpha is a hyperparameter controlling regularization
lasso1.fit(data_train, price_train)
lasso1.predict([[    # An example prediction
    1,      # Per capita crime rate
    25,     # Proportion of land zoned for large homes
    1,      # Tract bounds the Charles River
    0.3,    # NOX concentration
    10,     # Average number of rooms per dwelling
    10,     # Weighted distance to employment centers
    3,      # Index for highway accessibility
    400,    # Tax rate
    15,     # Pupil/teacher ratio
    200,    # Index for number of blacks
    5       # % lower status of population
]])

In [None]:
predprice = lasso1.predict(data_train)
mean_squared_error(price_train, predprice)

Cross-validation via LASSO looks very similar to cross-validation with ridge regression.

In [None]:
import pandas as pd
from pandas import DataFrame

In [None]:
alpha = [.125, .25, .5, 1, 2, 4, 8, 16, 32, 64, 128]    # Candidate alphas
res = dict()

for a in alpha:
    lasso2 = Lasso(alpha=a)
    res[a] = cross_val_score(lasso2, data_train, price_train, scoring='neg_mean_squared_error', cv = 10)

res_df = DataFrame(res)

res_df

In [None]:
res_df.mean()

Again, a smaller $\alpha$ leads to better fits.

In [None]:
lasso3 = Lasso(alpha=0.125)
lasso3.fit(data_train, price_train)

testpredprice = lasso3.predict(data_test)
mean_squared_error(price_test, testpredprice)

Overall it seems that LASSO regression doesn't do better than ridge regression or OLS. In fact, it seems like regularization doesn't produce better models for this dataset.