# sklearn 01 :: supervised learning

## 1.1 Generalized linear models

### 1.1.1 Ordinary least squares (OLS)
Minimizes the residual sum of squares between the observed responses in the dataset, and the responses predicted by the linear approximation. Risk for multicollinearity.

In [9]:
from sklearn import linear_model
clf = linear_model.LinearRegression()
clf.fit ([[0, 0], [1, 1], [2, 2]], [0, 1, 2])
print('Coefficients: {} \nIntercept: {}'.format(clf.coef_, clf.intercept_))

Coefficients: [ 0.5  0.5] 
Intercept: 2.220446049250313e-16


### 1.1.2 Ridge regression
Puts a penalty on the size of coefficients to reduce collinearity problems from OLS. Minimizes penalized residual sum of squares. Alpha value controls amount of shrinkage (higher = more shrinkage = more robust to collinearity).

In [10]:
from sklearn import linear_model
clf = linear_model.Ridge(alpha=.5)
clf.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
print('Coefficients: {} \nIntercept: {}'.format(clf.coef_, clf.intercept_))

Coefficients: [ 0.34545455  0.34545455] 
Intercept: 0.1363636363636364


RidgeCV implements ridge regression with built-in cross-validation of the alpha parameter. 

In [12]:
from sklearn import linear_model
clf = linear_model.RidgeCV(alphas=[0.1, 1.0, 10.0])
clf.fit([[0, 0], [0, 0], [1, 1]], [0, .1, 1])
clf.alpha_

0.10000000000000001

### 1.1.3 Lasso
The Lasso is a linear model that estimates sparse coefficients. The alpha parameter controls the degree of sparsity of the coefficients estimated. LassoCV (cross validation) and LassoLARSCV (Least Angle Regression + cross validation) available. Akaike information criterion (AIC) and the Bayes Information criterion (BIC) available: faster but less stable.

The MultiTaskLasso is a linear model that estimates sparse coefficients for multiple regression problems jointly: y is a 2D array, of shape (n_samples, n_tasks). The constraint is that the selected features are the same for all the regression problems, also called tasks.

In [13]:
from sklearn import linear_model
clf = linear_model.Lasso(alpha = 0.1)
clf.fit([[0, 0], [1, 1]], [0, 1])
clf.predict([[1, 1]])

array([ 0.8])

### 1.1.4 ElasticNet
ElasticNet is a linear regression model trained with L1 and L2 prior as regularizer. This combination allows for learning a sparse model where few of the weights are non-zero like Lasso, while still maintaining the regularization properties of Ridge. 

Elastic-net is useful when there are multiple features which are correlated with one another. Lasso is likely to pick one of these at random, while elastic-net is likely to pick both.
A practical advantage of trading-off between Lasso and Ridge is it allows Elastic-Net to inherit some of Ridge’s stability under rotation.

### 1.1.6. Least Angle Regression
Least-angle regression (LARS) is a regression algorithm for high-dimensional data (when n_features >> n_samples). Computationally efficient, but sensitive to noise. LassoLARS = lasso model using lars algorithm.