# **Lasso Regression**

# Lasso Regression (L1 Regularization)

Lasso (Least Absolute Shrinkage and Selection Operator) is a linear regression technique that performs both regularization and variable selection to enhance prediction accuracy and interpretability.

### 1. Mathematical Objective
Lasso minimizes the sum of squared residuals plus a penalty proportional to the sum of the absolute values of the coefficients:

$$ \min_{\beta} \left\{ \sum_{i=1}^{n} (y_i - \beta_0 - \sum_{j=1}^{p} x_{ij}\beta_j)^2 + \lambda \sum_{j=1}^{p} |\beta_j| \right\} $$

Where:
- $\lambda \ge 0$ is the regularization parameter (often called `alpha` in libraries like scikit-learn).
- $\lambda \sum_{j=1}^{p} |\beta_j|$ is the **L1 penalty**.

### 2. Key Characteristics
*   **Feature Selection:** Unlike Ridge (L2), Lasso can shrink coefficients to exactly zero, effectively performing automatic feature selection.
*   **Sparsity:** It produces "sparse" models, which are easier to interpret in high-dimensional datasets.
*   **Scaling Requirement:** Since the penalty is based on the magnitude of coefficients, features must be standardized (mean=0, variance=1) before fitting.

### 3. Bias-Variance Trade-off
*   **Increasing $\lambda$:** Increases bias but decreases variance (prevents overfitting).
*   **Decreasing $\lambda$:** Decreases bias but increases variance (approaches OLS).
