In [1]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set_style('ticks')
%matplotlib inline

In [2]:
from sklearn import linear_model, datasets

In [3]:
diabetes = datasets.load_diabetes()  # load dataset

In [4]:
diabetes_X = diabetes.data[:, np.newaxis, 2]  # reshape to column vector

# train data
train_X = diabetes_X[:-20]
test_X = diabetes_X[-20:]

# test data
train_y = diabetes.target[:-20][:, np.newaxis]
test_y = diabetes.target[-20:][:, np.newaxis]

# design model
reg = linear_model.Lasso(alpha = 0.1)

# estimate parameters
reg.fit(train_X, train_y)


Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

## A quick explanation for LASSO regression analysis
Good Refs:<br>
http://www.chioka.in/differences-between-l1-and-l2-as-loss-function-and-regularization/<br>
https://towardsdatascience.com/regularization-in-machine-learning-connecting-the-dots-c6e030bfaddd

Similiarly to a Ridge regression, LASSO (Least Absolute Shrinkage and Selection Operator) regression aims to provide the model with addition information in order to make the model more robust. In the case of LASSO, we impose the following constraint to the estimates (our $\beta$ parameters).

<br><center>
    \begin{align}
||\beta|| = \sum|\beta_j| \leq t
    \end{align}
</center></br> the regularization parameter is $\lambda$ and so putting it together

<br><center>
    \begin{align}
    S(\beta) &= ||y - X\beta||^2 + \lambda||\beta||\\
    minimize(S(\beta)) &= \frac{dS}{d\beta} = 0\\
    \end{align}
</center></br>

Unlike with Ridge, the minimization has no closed form so an optimization scheme must be employed.

Additionally unlike Ridge, LASSO can result in sparse outputs (i.e. estimates being equal to zero) which serve as a feature selector. 