<h1 align=center>Lasso And Ridge Regression In Depth </h1>

- Lasso and Ridge regression are two popular techniques for regularizing linear regression models, helping to prevent overfitting and improving model generalizability.

![lassoridge.png](attachment:lassoridge.png)

## 1. Lasso Regression (L1 Regularization)

- Lasso (Least Absolute Shrinkage and Selection Operator) adds a penalty equal to the sum of the value of the absolute weights
- This penalty can shrink some coefficients to exactly zero (shrink the slope to zero), effectively performing variable selection
- Robust to outliers

**Mathematical Formulation**:

- The objective function for lasso regression is:

$$
\text{Minimize } \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y_i})^2 + \lambda \sum_{i=1}^n \|w_i\|_1 \\ \text{where } |w|_1 \text{ is the L1 norm of the coefficients} \\ \text{λ is the regularization parameter}

$$

**Properties**:

- Encourages sparsity in the coefficients, leading to simpler and more interpretable models
- Can set some coefficients exactly to zero, thus performing feature selection

**Hyperparameter**:

- lambda (λ): Controls the strength of regularization. A larger λ leads to more coefficients being shrunk to zero

**Advantages**:

- Performs automatic feature selection
- Can handle high-dimensional data (where p>n) efficiently
- `p` is predictors, and `n` is  number of observations

**Disadvantages**:

- When predictors are highly correlated, lasso tends to select one and ignore the others
- More complex optimization problem compared to ridge regression

## 2. Ridge Regression (L2 Regularization)

- Ridge regression adds a penalty equal to sum of square values of weights
- Sensitive to outliers
- Shrink the slop near to zero

**Mathematical Formulation**:

- The objective function for ridge regression is:
    
    $$
    \text{Minimize } \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y_i})^2 + \lambda \sum_{i=1}^n ||w_i||^2_2 \\ \text{where } |W|^2_2 \text{ is the L2 norm of the coefficients} \\ \text{λ is the regularization parameter}
    $$
    

**Properties**:

- Shrinks coefficients, but unlike Lasso, does not set any of them exactly to zero
- Useful when you have multicollinearity (highly correlated predictors) because it can stabilize the solution
- Tends to perform better when the number of predictors p is larger than the number of observations n

**Hyperparameter**:

- lambda (λ): Controls the strength of regularization. A larger λ implies more regularization

**Advantages**:

- Reduces model complexity by shrinking coefficients
- Improves prediction accuracy by trading off a small amount of bias for a larger reduction in variance

**Disadvantages**:

- Does not perform feature selection (all coefficients are shrunk but none are eliminated)

## 3. Elastic Net

- Combines both Lasso and Ridge regression penalties
- Useful when there are multiple correlated features.

**Mathematical Formulation**:

- The objective function for elastic net is:

$$
\text{Minimize } \frac{1}{n} \sum_{i=1}^n (y_i - \hat{y_i})^2 + \lambda_1 \sum_{i=1}^n \|w_i\|_1 +  \lambda_2 \sum_{i=1}^n \|w_i\|^2_2
$$

- where λ1 and λ2 are regularization parameters for L1 and L2 norms, respectively

**Properties**:

- Balances between the benefits of Lasso and Ridge regression
- Can handle grouped variables and correlated features more effectively

**Hyperparameters**:

- λ1 and λ2: Control the strength of L1 and L2 regularization, respectively

### Practical Considerations

**Selecting Hyperparameters**:

- Cross-validation is commonly used to select the best value of λ (and λ1, λ2 for elastic net)
- Grid search, random search, or more advanced methods like Bayesian optimization can be employed for this purpose.

In [1]:
from sklearn.linear_model import Ridge, Lasso

ridge = Ridge(alpha=1.0)  # alpha is the regularization parameter λ
lasso = Lasso(alpha=1.0)  # alpha is the regularization parameter λ

# ridge.fit(X_train, y_train)
# lasso.fit(X_train, y_train)

In [2]:
from sklearn.linear_model import ElasticNet
# alpha is the overall regularization parameter, l1_ratio is the mixing parameter between Lasso and Ridge
elastic_net = ElasticNet(alpha=1.0, l1_ratio=0.5)
# elastic_net.fit(X_train, y_train)