## Generalized Linear Models

$$\hat{y}(\omega, x) = \omega_0 + \omega_1 x_1 + ... + \omega_p x_p$$

def $$\vec{\omega} = (\omega_1, ..., \omega_p)$$ as `coef_` and $$\omega_0$$ as `intercept_`

### Ordinary Least Squares
$ \min\limits_\omega || X_\omega - y ||_2 $

In [2]:
from sklearn import linear_model
clf = linear_model.LinearRegression()
clf.fit([[0, 0], [1,1], [2,2]], [1, 2, 3])
clf.coef_

array([ 0.5,  0.5])

#### Example
[Linear Regression Example](http://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html#example-linear-model-plot-ols-py)
[google]
[google]: http://www.google.com/        "Google test"

#### Ordinary Least Squares Complexity
if X is a matrix of size(n, p), has a cost of $O(np^2)$ , assuming that $ n \geq p $ 

### Ridge Regression

$$ \min\limits_\omega || X_\omega - y|| _2 ^2 + \alpha ||\omega||_2^2 $$

in which, $$ \alpha \geq 0 $$

In [3]:
from sklearn import linear_model
clf = linear_model.Ridge(alpha=.5)
clf.fit([[0, 0], [0, 0], [1, 1]], [0, .1, 1]) 

Ridge(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [4]:
clf.coef_, clf.intercept_

(array([ 0.34545455,  0.34545455]), 0.13636363636363641)

#### Example
- [Plot Ridge coefficients as a function of the regularization]
- [Classification of text documents using sparse features]
[Plot Ridge coefficients as a function of the regularization]: http://scikit-learn.org/stable/auto_examples/linear_model/plot_ridge_path.html#example-linear-model-plot-ridge-path-py       "Plot Ridge coefficients as a function of the regularization"

[Classification of text documents using sparse features]: http://scikit-learn.org/stable/auto_examples/text/document_classification_20newsgroups.html#example-text-document-classification-20newsgroups-py "Classification of text documents using sparse features"

#### Ridge Complexity
same order of complexity than ordinary least squares

#### Setting the regularization parameters: generalized Cross-Validation

In [7]:
from sklearn import linear_model
clf = linear_model.RidgeCV(alphas = [.1, 1, 10])
clf.fit([[0, 0], [0, 0], [1, 1]], [0, .1, 1])   


RidgeCV(alphas=[0.1, 1, 10], cv=None, fit_intercept=True, gcv_mode=None,
    normalize=False, scoring=None, store_cv_values=False)

In [8]:
clf.alpha_

0.10000000000000001

### Lasso

$$ \min \limits_\omega \frac{1}{2n_{samples}} || X_\omega - y|| _2^2 + \alpha||\omega||_1$$

In [10]:
from sklearn import linear_model
clf = linear_model.Lasso(alpha=0.1)
clf.fit([[0,0], [1, 1]], [0,1])

Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)

In [11]:
clf.predict([[1, 1]])

array([ 0.8])

#### example
- [Lasso and Elastic Net for Sparse Signals]
- [Compressive sensing: tomography reconstruction with L1 prior (Lasso)]

[Lasso and Elastic Net for Sparse Signals]: http://scikit-learn.org/stable/auto_examples/linear_model/plot_lasso_and_elasticnet.html#example-linear-model-plot-lasso-and-elasticnet-py "Lasso and Elastic Net for Sparse Signals"

[Compressive sensing: tomography reconstruction with L1 prior (Lasso)]: http://scikit-learn.org/stable/auto_examples/applications/plot_tomography_l1_reconstruction.html#example-applications-plot-tomography-l1-reconstruction-py "Compressive sensing: tomography reconstruction with L1 prior (Lasso)"


#### Future selection with Lasso
- [L1-based feature selection]

[L1-based feature selection]: http://scikit-learn.org/stable/modules/feature_selection.html#l1-feature-selection "L1-based feature selection"

#### Randomized sparsity
- [Randomized sparse models]

[Randomized sparse models]: http://scikit-learn.org/stable/modules/feature_selection.html#randomized-l1 "Randomized sparse models"

#### Setting regularization parameter
- $\alpha$ controls the degree of sparsity of the coefficients estimated

##### Using cross-validation
- LassoCV, often used
- LassoLarsCV, based on the Least Angle Regression, exploring more relevant values, Often fast

##### information-criteria based model selection
- [Lasso model selection: Cross-Validation / AIC / BIC]

[Lasso model selection: Cross-Validation / AIC / BIC]: http://scikit-learn.org/stable/auto_examples/linear_model/plot_lasso_model_selection.html#example-linear-model-plot-lasso-model-selection-py "Lasso model selection: Cross-Validation / AIC / BIC"




### Elastic Net
### Multi-task Lasso
### Least Angle Regression
### LARS Lasso
### Orthogonal Matching Pursuit (OMP)

### Bayesian Regression

### Logistic Regression
- logistic function
    $$ f(x) = \frac{L}{1+ e^{-k(x-x_0)}}$$
- use class LogisticRegression 
    can fit a multiclass(one-vs-rest) LR with optional L2 or L1 regularization
- L2 penalized
    $$ \min\limits_{\omega,c}\frac{1}{2}\omega^T\omega + C\sum_{i=1}^n\log(\exp(-y_i(X^T\omega +c))+1) $$
- L1 penalized
    $$ \min\limits_{\omega, c} ||\omega||_1 + C\sum_{i=1}^n\log(\exp(-y_i(X^T\omega+c))+1) $$
- choosing sover
    - Small dataset or L1 penaalty: `liblinear`
    - Multinomial loss: `lbfgs` or `newton-cg`
    - Large dataset: `sag`    

#### Example
- [L1 Penalty and Sparsity in Logistic Regression]
- [Path with L1- Logistic Regression]

[L1 Penalty and Sparsity in Logistic Regression]: http://scikit-learn.org/stable/auto_examples/linear_model/plot_logistic_l1_l2_sparsity.html#example-linear-model-plot-logistic-l1-l2-sparsity-py "L1 Penalty and Sparsity in Logistic Regression"

[Path with L1- Logistic Regression]: http://scikit-learn.org/stable/auto_examples/linear_model/plot_logistic_path.html#example-linear-model-plot-logistic-path-py "Path with L1- Logistic Regression"

#### Differences from liblinear


#### Featur selection with sparse logistic regression
- [L1-based feature selection]
- `LogisticRegressionCV`
[L1-based feature selection]: http://scikit-learn.org/stable/modules/feature_selection.html#l1-feature-selection "L1-based feature selection"

