# Regularization
Regularisation is a process of introducing additional information in order to prevent overfitting. The focus for this notebook is L1 and L2 regularization.

### What is L1 and L2?
L1 and L2 regularisation owes its name to L1 and L2 norm of a vector w respectively. Here’s a primer on norms:

###### 1-norm (also known as L1 norm):
$ ||w||_1 = |w_1|+|w_2|+...+|w_N| $  

###### 2-norm (also known as L2 norm or Euclidean norm)
$ ||w||_2 = (|w_1|^2 + |w_2|^2 + ... + |w_N|^2)^\frac{1}{2} $

######  p-norm
$ ||w||_p = (|w_1|^p + |w_2|^p + ... + |w_N|^p)^\frac{1}{p} $

A linear regression model that implements L1 norm for regularization is called **lasso regression**.

## Regularization Exercise
Perhaps it's not too surprising at this point, but there are classes in sklearn that will help us perform regularization with our linear regression. Let's get practice with implementing that in this exercise. In `data/regular.csv`, there is data for a bunch of points including six predictor variables and one outcome variable. Use sklearn's [Lasso](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html) class to fit a linear regression model to the data, while also using L1 regularization to control for model complexity.

#### 1. Import libraries and load the data
* The data is in the file `data/regular.csv`. Note that there's **no header row** on this file.
* Split the data so that the six predictor features (first six columns) are stored in `X`, and the outcome feature (last column) is stored in `y`.

In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import Lasso

In [2]:
train_data = pd.read_csv('data/regular.csv', header=None)
X = train_data.iloc[:,:-1]
y = train_data.iloc[:,-1]

#### 2. Fit data using linear regression with Lasso regularization
* Create an instance of sklearn's `Lasso` class and assign it to the variable `lasso_reg`. No need to set any parameter values: use the default values for this exercise.

In [3]:
lasso_reg = Lasso()

* Use the `Lasso` object's [.fit()](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html#sklearn.linear_model.Lasso.fit) method to fit the regression model onto the data.

In [4]:
lasso_reg.fit(X,y)

Lasso(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=1000,
      normalize=False, positive=False, precompute=False, random_state=None,
      selection='cyclic', tol=0.0001, warm_start=False)

#### 3. Inspect the coefficients of the regression model
* Obtain the coefficients of the fit regression model using the `coef_` attribute of the `Lasso` object. Store this in the `reg_coef` variable. Finally, the coefficients should be printed out.

In [5]:
reg_coef = lasso_reg.coef_
print(reg_coef)

[ 0.          2.35793224  2.00441646 -0.05511954 -3.92808318  0.        ]
