# Lasso (L1) Regression

In this notebook, we're going to focus on Lasso Regression which is a type of regularized linear regression that uses the L1 regularization. Lasso Regression not only helps in reducing over-fitting but it can help us in feature selection.

Just like Ridge Regression, the cost function is altered by adding a penalty equivalent to the absolute value of the magnitude of the coefficients.

Let's recall the cost function for Linear Regression:

$$ J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2 $$

The cost function for Lasso Regression looks like this:

$$ J(\theta) = \frac{1}{2m} \sum_{i=1}^{m} (h_\theta(x^{(i)}) - y^{(i)})^2 + \lambda \sum_{j=1}^{n} |\theta_j| $$

The only difference is the term $\lambda \sum_{j=1}^{n} |\theta_j|$. This term is the sum of the absolute value of all the coefficients in the model. The Lasso Regression will try to minimize it.

The hyperparameter $\lambda$ will decide the amount of penalty that will be added:

- $\lambda$ = 0: The objective becomes same as simple linear regression.
- $\lambda$ = ∞: The coefficients will be zero because of infinite penalty.
- 0 < $\lambda$ < ∞: The magnitude of $\lambda$ will decide the weightage given to different parts of cost function.

The main advantage of Lasso Regression is that it can end up using only a subset of the most important features. In other words, Lasso Regression automatically performs feature selection and outputs a model that is easier to interpret.

Let's start by importing the necessary libraries:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression

Let's create a synthetic regression dataset using `make_regression` function. This function will create a dataset with 100 samples and 30 features. Out of 30 features, 10 are informative and rest are not.

In [None]:
X, y, coef = make_regression(n_samples=100, n_features=30, n_informative=10, noise=0.1, coef=True, random_state=42)

Now, let's split the dataset into training and testing sets:

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

Now, let's train our Lasso Regression model. We will start with $\lambda$ = 0.1:

In [None]:
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)

Let's print out the coefficients of the model:

In [None]:
lasso.coef_

We can see that many coefficients are 0. This means Lasso Regression has performed feature selection and these features have been excluded from the model.

Let's now calculate the model score:

In [None]:
lasso.score(X_test, y_test)

As we can see, the score of our model is quite high. Thus, despite excluding some of the features, Lasso Regression has managed to perform well on this dataset.

In summary, Lasso Regression not only helps in reducing over-fitting but it can help us in feature selection. It is useful when we have a large number of features and we need a simple model.