In [9]:
from sklearn import linear_model

reg = linear_model.LinearRegression()

X = [[0, 0], 
     [1, 1], 
     [2, 2]]

y = [0, 1, 2]

reg.fit(X, y)

In [10]:
reg.coef_

array([0.5, 0.5])

### Ridge regression and classification

Ridge regression imposes a penalty on the size of the coefficients. The complexity parameter $\alpha >= 0$ controls the amount of shrinkage.

In [16]:
reg = linear_model.Ridge(alpha=.5)
reg.fit(X, y)
print(reg.coef_)
print(reg.intercept_)

[0.44444444 0.44444444]
0.11111111111111116


In [17]:
reg = linear_model.Ridge(alpha=.8)
reg.fit(X, y)
print(reg.coef_)
print(reg.intercept_)

[0.41666667 0.41666667]
0.16666666666666674


The `RidgeClassifier` converts binary targets to `{-1, 1}` and treats the problem as a regression task, optimizing the same objective as regression. The predicted class correspondis to the sign of the regressor's prediction.

The `RidgeClassifier` can be significantly faster than `LogisticRegression` with a high number of classes.

`RidgeCV` implements ridge regression with built-in cross-validation of the alpha parameter.

In [18]:
import numpy as np
reg = linear_model.RidgeCV(alphas=np.logspace(-6, 6, 13))
reg.fit(X, y)

In [19]:
reg.alpha_

1e-06

### Lasso

The `Lasso` is a linear model that estimates sparse coefficients. It prefers solutions with fewer non-zero coefficients, effectively reducing the number of features upon which the given solution is dependent. For this reason, Lasso and its variants are fundamental to the field of compressed sensing.

In [20]:
reg = linear_model.Lasso(alpha=0.1)
reg.fit(X, y)

In [21]:
reg.predict([[1, 1]])

array([1.])

### Elastic-Net

`ElasticNet` is a linear regression model trained with both L1 and L2-norm regularization of the coefficients.

Elastic-net is useful when there are multiple features that are correlated with one another. Lasso is likely to pick one of these at random, while elastic-net is likely to pick both.

### Logistic regression

`LogisticRegression` is implemented as a linear model for classification rather than regression in scikit-learn.

It can fit binary, One-vs-Rest, or multinomial logistic regression with optional L1, L2 or Elastic-net regularization.

The numerical output of the logistic regression, which is the predicted probability, can be used as a classifier by applying a threshold (by default 0.5) to it.

It expects a categorical target, making the Logistic Regression a classifier.

#### Binary Case

Once fitted, the `predict_proba` method of `LogisticRegression` predicts the probability of the positive class.

Scikit-learn provide four choices for regularization term, including, `None, L1, L2, ElasticNet`, via the `penalty` argument.