In [1]:
%matplotlib inline

## [LARS (Least Angle Regression)](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lars.html#sklearn.linear_model.Lars)
- Good option for high-dimensional data.
- Similar to forward stepwise regression. (Finds most correlated feature at each step. If there are multiple, equally correlated features, next step uses a direction equiangular between them.)
- Numerically efficient when n_features >> n_samples
- Same complexity as OLS
- Returns a full piecewise linear solution path - useful for cross-validation.
- (Overly?) sensitive to noise.

In [2]:
from sklearn import linear_model
reg = linear_model.Lars(n_nonzero_coefs=1)
reg.fit(
    [[-1, 1], [0, 0], [1, 1]], 
    [-1.1111,  0,     -1.1111])

print(reg.coef_)

[ 0.     -1.1111]


## [Lasso Lars](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LassoLars.html#sklearn.linear_model.LassoLars)
- Lasso model implemented with LARS algorithm.
- Instead of including features at each step, the coefficients are increased in a direction equiangular to each correlation with the residual.
- Returns a curve representing a solution for each value of the $\ell_1$ norm. The full path is stored in ```coef_path_``` with size (n_features, max_features+1). The first column is always zero.

In [3]:
from sklearn import linear_model
reg = linear_model.LassoLars(alpha=.1)
reg.fit([[0, 0, 0], [1, 2, 3]], 
        [0, 1])

print(reg.coef_)

[0.         0.         0.23905243]


Example: compute Lasso path vs regularization using the LARS algorithm on the diabetes dataset. Each color = different feature of the coefficient vector.

In [4]:
X, y = datasets.load_diabetes(return_X_y=True)

_, _, coefs = linear_model.lars_path(
    X, y, method='lasso', verbose=True)

xx = np.sum(np.abs(coefs.T), axis=1)
xx /= xx[-1]

plt.plot(xx, coefs.T)
ymin, ymax = plt.ylim()
plt.vlines(xx, ymin, ymax, linestyle='dashed')
plt.xlabel('|coef| / max|coef|')
plt.ylabel('Coefficients')
plt.title('LASSO Path')
plt.axis('tight')
plt.show()

NameError: name 'datasets' is not defined