Across the module, we designate the vector $w = (w_1, ..., w_p)$ as ```coef_``` and $w_0$ as ```intercept_```

# Ordinary Least Squares

In [1]:
import warnings

from sklearn import linear_model

warnings.filterwarnings('ignore')

reg = linear_model.LinearRegression()
reg.fit([[0, 0], [1, 1], [2, 2]], [0, 1, 2])

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

In [2]:
reg.coef_

array([0.5, 0.5])

However, coefficient estimates for Ordinary Least Squares rely on the independence of the model terms. When terms are **correlated** and the columns of the design matrix X have an approximate linear dependence, the design matrix becomes close to **singular** and as a result, the least-squares estimate becomes highly sensitive to random errors in the observed response, producing a large variance.

# Ridge Regression

Ridge regression addresses some of the problems of Ordinary Least Squares by imposing a penalty on the size of coefficients.

$$
min_w ||Xw-y||_2^2 + \alpha||w||_2^2
$$

In [3]:
reg = linear_model.Ridge(alpha=.5)
reg.fit([[0, 0], [0, 0], [1, 1]], [0, .1, 1])

Ridge(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [4]:
reg.coef_

array([0.34545455, 0.34545455])

In [5]:
reg.intercept_

0.1363636363636364

RidgeCV implements ridge regression with built-in cross-validation of the alpha parameter. 

In [6]:
reg = linear_model.RidgeCV(alphas = [0.1, 1.0, 10.0])
reg.fit([[0, 0], [0, 0], [1, 1]], [0, .1, 1])

RidgeCV(alphas=[0.1, 1.0, 10.0], cv=None, fit_intercept=True, gcv_mode=None,
    normalize=False, scoring=None, store_cv_values=False)

In [7]:
reg.alpha_

0.1

# Lasso

The Lasso is a linear model that estimates sparse coefficients. It is useful in some contexts due to its tendency to prefer solutions with fewer parameter values, effectively reducing the number of variables upon which the given solution is dependent.

$$
min_w\frac{1}{2n_{samples}}||Xw-y||_2^2+\alpha||w||_1
$$

In [8]:
reg = linear_model.Lasso(alpha=0.1)
reg.fit([[0, 0], [1, 1]], [0, 1])

Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [9]:
reg.predict([[1, 1]])

array([0.8])