# Lasso Regression(L1 Regularisation)

Let's recall the Ridge Regression penalty term **$\lambda \sum_{j=1}^{d}\boldsymbol{\beta_j^2}$** which shrinks all coefficients nearly equal to zero but not exactly to zero. This is the shortcoming of Ridge Regression which generates a model containing all features instead of selecting only those important variables that affect the target/outcome.

The objective of **Lasso**(**L**east **A**bsolute **S**hrinkage and **S**election **O**perator) **Regression** is to find a column matrix or vector $\beta$ that minimizes the SSE and penalty eqvt. to the sum of absolute value of coefficients which can be written as:

$$\text{Cost}_{Lasso}={RSS} + \lambda \sum_{j=1}^{d} \boldsymbol{\mid{\beta_j\mid}} = \sum_{i=1}^{n} (y_i-\hat{y_i})^2 + \lambda \sum_{j=1}^{d} \boldsymbol{\mid{\beta_j\mid}}\\
=\sum_{i=1}^{n}\epsilon_i^2 + \lambda \sum_{j=1}^{d}\boldsymbol{\mid{\beta_j\mid}}$$

It can also be viewed as a minimization problem with argmin or "Argument of Minimum":

$$
\underset{\boldsymbol{\beta\in\mathbb{R}}}{\arg\min}\sum[y_i-\hat{y_i}]=\underset{\boldsymbol{\beta\in\mathbb{R}}}{\arg\min}\sum[y_i-(\beta_{0} + \beta_{1}x_{i1} + \beta_{i2}x_{i2} + \cdots +\beta_{d}x_{id})]
$$

argmin finds the coefficients that minimize the SSE. The lasso constraint is L1 vector norm and its equation is:

$$
||\beta||_1^1={|\beta_{0}| + |\beta_{1}| + |\beta_{2}| + \cdots +|\beta_{d}|}
$$

The lasso loss function with argmin and vector norm is as shown below. Notice that the first term of the equation is the OLS loss function and the second term is the lasso penalty:

$$
\therefore\boldsymbol{\beta_{lasso}} = \underset{\boldsymbol{\beta\in\mathbb{R}}}{\arg\min}||y-X\beta||_2^2+\lambda||\beta||_1^1
$$

Earlier for Ridge Regression, we derived closed form equation but LASSO doesn't have a closed form equation. Still, its solution can be found using convex optimization form.


## Geometrical Interpretation of Lasso Regression

<img src="../../assets/Lasso_Regression.png">

Lasso Regression, also known as **L1 Regularization**, uses the L1 norm (absolute value function) of the parameter estimates as a penalty. The L1 norm is convex but not differentiable and is geometrically represented as a diamond shape when visualizing the constraint $$\sum_{j=1}^{d} |\beta_j| \leq c$$ For $d=2$, this forms the equation $\beta_1| + |\beta_2| \leq c$, which is diamond-shaped.

In comparison, the term $(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})^T(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})$ represents the OLS solution, forming an ellipse where the center denotes the least square error (RSS) minimum. The ellipse's contour plot represents increasing RSS values from inner to outer ellipses.

The goal of Lasso Regression is to find the parameter values that minimize the penalized loss function. 

> - When $\lambda=0$, the L1 norm diamond passes through the OLS estimate, which is prone to overfitting. The optimal solution is found by identifying the intersection point of the diamond (L1 norm constraint) and the ellipse (OLS solution). This intersection, called the lasso estimate point, often falls on the axis line due to the sharp contours of the diamond. The corresponding $\lambda$ at this point is the optimal value, reducing overfitting compared to the OLS estimate.</br></br>
> - $L_1$ norm is Diamond shaped in 2D space and Octahedron shaped in 3D space.

## Coordinate descent for Lasso Regression