<a href="https://colab.research.google.com/github/cloudpedagogy/models/blob/main/ml/Lasso_Regression_(L1_Regularization).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Background

Lasso Regression, also known as L1 Regularization, is a linear regression technique that adds a penalty term to the ordinary least squares (OLS) cost function. This penalty term is the sum of the absolute values of the coefficients multiplied by a regularization parameter, lambda (λ). The primary purpose of Lasso Regression is to perform feature selection and prevent overfitting by shrinking some of the coefficients to exactly zero, effectively eliminating those features from the model.

Mathematically, the cost function of Lasso Regression is represented as:

Cost = RSS (Residual Sum of Squares) + λ * Σ|βi|

where:
- RSS is the Residual Sum of Squares (the error term used in OLS).
- λ is the regularization parameter (controls the strength of the penalty).
- Σ|βi| represents the sum of the absolute values of the model coefficients.

**Pros of Lasso Regression (L1 Regularization)**:

1. Feature Selection: Lasso tends to force some coefficients to exactly zero, effectively performing automatic feature selection. This can be beneficial when dealing with high-dimensional datasets with many irrelevant or redundant features.

2. Simplicity: The resulting model from Lasso Regression is typically simpler and more interpretable, as it focuses on a subset of relevant features.

3. Regularization: Lasso helps in reducing overfitting by penalizing large coefficients, making the model more robust and generalizable to new data.

4. Coefficient Shrinkage: Lasso encourages coefficient shrinkage, which helps in dealing with multicollinearity (highly correlated predictor variables).

**Cons of Lasso Regression (L1 Regularization)**:

1. Feature Shrinkage: While Lasso is effective for feature selection, it can also lead to over-shrinking of coefficients, resulting in biased parameter estimates for predictors with strong predictive power.

2. Unstable: Lasso tends to be unstable when dealing with datasets with a high correlation among features. In such cases, small changes in the data can lead to significant changes in the selected features.

3. Tuning Parameter: Selecting the right value of the regularization parameter (λ) can be challenging and often requires cross-validation to find the optimal value.

4. Only Suitable for Linear Models: Lasso Regression is designed for linear models, so it may not be directly applicable to non-linear models.

**When to use Lasso Regression**:

Lasso Regression is particularly useful in the following scenarios:

1. High-dimensional datasets: When you have a large number of features relative to the number of observations, Lasso can help in identifying and prioritizing relevant features.

2. Feature Selection: If you suspect that many of your features are irrelevant or redundant, Lasso can be employed to perform automatic feature selection.

3. Regularization: When dealing with models that are prone to overfitting, Lasso can be used to add regularization and prevent overfitting by penalizing large coefficients.

4. Interpretability: When you need a simpler and more interpretable model, Lasso can be preferred over traditional linear regression.

However, it's essential to consider other regularization techniques like Ridge Regression (L2 Regularization) and Elastic Net, as they might offer advantages in different situations. The choice between these regularization methods often depends on the specific characteristics of your dataset and the underlying problem you are trying to solve.

# Code Example

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error

# Generate some sample data (You can replace this with your dataset)
np.random.seed(42)
X = np.random.rand(100, 5)
y = 2*X[:, 0] + 3*X[:, 1] + 4*X[:, 2] + 5*X[:, 3] + 6*X[:, 4] + np.random.randn(100)

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Lasso Regression model
alpha = 0.1  # Regularization strength (higher values make the model more regularized)
lasso_model = Lasso(alpha=alpha)

# Train the model
lasso_model.fit(X_train, y_train)

# Predict on the test set
y_pred = lasso_model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
print("Mean Squared Error:", mse)

# Print the coefficients
print("Coefficients:")
for feature, coef in zip(range(X.shape[1]), lasso_model.coef_):
    print(f"Feature {feature}: {coef}")

# Plot the predicted vs. actual values
plt.scatter(y_test, y_pred)
plt.xlabel("Actual Values")
plt.ylabel("Predicted Values")
plt.title("Lasso Regression - Actual vs. Predicted")
plt.show()


# Code breakdown


1. **Import required libraries:**
   - NumPy (`np`): A library for numerical computations in Python.
   - Pandas (`pd`): A library for data manipulation and analysis.
   - Matplotlib (`plt`): A library for creating data visualizations.
   - `train_test_split` from `sklearn.model_selection`: A function to split data into training and test sets.
   - `Lasso` from `sklearn.linear_model`: The Lasso regression model.
   - `mean_squared_error` from `sklearn.metrics`: A metric to evaluate the model's performance.

2. **Generate some sample data:**
   - It creates a random array (`X`) of shape (100, 5) with values between 0 and 1.
   - It generates a target variable (`y`) as a linear combination of the columns of `X`, adding some random noise to it.

3. **Split the data into training and test sets:**
   - `train_test_split` is used to split the data into training and test sets with a test size of 20% of the data.
   - `X_train`, `X_test`, `y_train`, and `y_test` are the resulting training and test sets.

4. **Create a Lasso Regression model:**
   - `Lasso(alpha=alpha)`: Initializes a Lasso Regression model with a regularization strength `alpha` set to 0.1.

5. **Train the model:**
   - `lasso_model.fit(X_train, y_train)`: Trains the Lasso Regression model using the training data (`X_train` and `y_train`).

6. **Predict on the test set:**
   - `lasso_model.predict(X_test)`: Predicts the target variable (`y_pred`) using the trained model and the test data (`X_test`).

7. **Evaluate the model:**
   - `mean_squared_error(y_test, y_pred)`: Calculates the mean squared error (MSE) between the actual target values (`y_test`) and the predicted values (`y_pred`). MSE is a metric used to measure the performance of regression models.

8. **Print the coefficients:**
   - The loop iterates over the features (columns) of the original `X` array and prints the corresponding coefficients learned by the Lasso model. Lasso regression applies L1 regularization, which can lead to some coefficients being exactly zero, effectively performing feature selection.

9. **Plot the predicted vs. actual values:**
   - A scatter plot is created to visualize the relationship between the actual target values (`y_test`) and the predicted values (`y_pred`) obtained from the Lasso model.

10. **Show the plot:**
   - `plt.show()`: Displays the scatter plot created in the previous step.

In summary, this code demonstrates how to use the Lasso Regression model from scikit-learn to perform a simple regression task on some sample data. It shows how to split the data, train the model, make predictions, and evaluate the model's performance using mean squared error. The code also prints the coefficients of the model, which can provide insights into feature importance and selection. Finally, it visualizes the predicted vs. actual values using a scatter plot.

# Real world application

In a healthcare setting, Lasso Regression (L1 regularization) can be used for various purposes, including medical research, disease prediction, and feature selection. Here's a real-world example of how Lasso Regression can be applied in the healthcare domain:

**Example: Predicting Diabetes Progression**

Let's consider a scenario where a research team aims to predict the progression of diabetes in patients based on various clinical and demographic features. The dataset contains information about diabetic patients, including age, body mass index (BMI), blood pressure, insulin levels, and other health-related measurements. The target variable is the progression of diabetes over a certain period.

The team wants to build a predictive model using Lasso Regression to identify which features are most relevant in predicting the disease progression. Lasso Regression's L1 regularization can help with feature selection by encouraging some feature coefficients to be exactly zero, effectively removing irrelevant features from the model.

**Steps:**

1. **Data Collection and Preprocessing:**
   - Collect relevant clinical data of diabetic patients, including demographic features and various health measurements.
   - Preprocess the data, handle missing values, and encode categorical variables if necessary.

2. **Feature Selection:**
   - Apply Lasso Regression to the preprocessed dataset.
   - Use k-fold cross-validation to find the optimal value of the regularization parameter (alpha) that maximizes model performance.
   - The L1 regularization term in Lasso Regression will automatically shrink some feature coefficients to zero, effectively selecting the most relevant features for predicting diabetes progression.

3. **Model Training and Evaluation:**
   - Train the Lasso Regression model on the training data, using the selected features.
   - Evaluate the model's performance on a separate validation dataset or using cross-validation metrics (e.g., mean squared error, R-squared) to assess how well it predicts diabetes progression.

4. **Interpretation and Insights:**
   - Examine the coefficients of the Lasso Regression model to identify the most important features that contribute to diabetes progression.
   - Interpret the results to gain insights into which clinical and demographic factors are strong predictors of the disease's advancement.

5. **Clinical Application:**
   - Use the trained Lasso Regression model to predict the progression of diabetes for new patients.
   - The model can assist healthcare professionals in understanding the risk factors associated with diabetes progression and aid in personalized treatment planning.

6. **Monitoring and Continuous Improvement:**
   - Continuously monitor the model's performance in real-world healthcare settings.
   - Collect new data and update the model periodically to improve prediction accuracy and adapt to changing patient populations.

Lasso Regression's ability to perform feature selection by pushing some feature coefficients to zero is particularly useful in healthcare settings, where the availability of large datasets with many features may lead to overfitting and increased complexity. By identifying the most relevant features, Lasso Regression can produce a more interpretable and effective predictive model for diabetes progression, allowing healthcare practitioners to make informed decisions for patient care.

# FAQ


1. What is Lasso Regression?
   Lasso Regression, short for "Least Absolute Shrinkage and Selection Operator," is a linear regression technique that introduces L1 regularization to the linear regression model. It adds a penalty term proportional to the absolute value of the coefficients, encouraging sparsity and feature selection.

2. How does Lasso Regression perform feature selection?
   Lasso Regression performs feature selection by penalizing the absolute values of the coefficients. This penalty encourages some coefficients to become exactly zero, effectively eliminating corresponding features from the model. As a result, Lasso can automatically select relevant features and exclude irrelevant ones.

3. What is the significance of the L1 regularization term in Lasso?
   The L1 regularization term in Lasso is responsible for adding the penalty to the linear regression cost function. It is defined as the sum of the absolute values of the coefficients multiplied by a regularization parameter (lambda). By adjusting the lambda value, you can control the strength of the regularization and its impact on feature selection.

4. How does Lasso differ from Ridge Regression (L2 Regularization)?
   Lasso and Ridge Regression are both linear regression techniques with different regularization terms. While Lasso uses L1 regularization (penalizing the absolute values of coefficients), Ridge Regression employs L2 regularization (penalizing the squared values of coefficients). Lasso tends to produce sparse models with many coefficients being exactly zero, while Ridge tends to shrink coefficients toward zero without necessarily eliminating them entirely.

5. When should I use Lasso Regression?
   Lasso Regression is particularly useful when you have a high-dimensional dataset with potentially many irrelevant or redundant features. It is a powerful tool for feature selection and can help you identify the most important predictors in your model.

6. What are the challenges of using Lasso Regression?
   One of the challenges of Lasso Regression is selecting the appropriate value for the regularization parameter (lambda). If lambda is too small, Lasso may not effectively perform feature selection; if lambda is too large, it might cause important features to be excluded from the model.

7. Can Lasso handle multicollinearity in the data?
   Lasso Regression can handle multicollinearity to some extent by shrinking the coefficients of correlated features towards zero. However, in cases of severe multicollinearity, Lasso may still struggle to accurately select features.

8. Are there any alternatives to Lasso Regression for feature selection?
   Yes, there are other feature selection techniques, such as Ridge Regression, Elastic Net (combination of Lasso and Ridge), Recursive Feature Elimination (RFE), and Forward/Backward selection, among others. The choice of technique depends on the specific dataset and modeling requirements.

9. How can I determine the best lambda value for Lasso Regression?
   You can use techniques like cross-validation to find the optimal lambda value. By trying different values of lambda and evaluating their performance on validation data, you can select the lambda that gives the best trade-off between model simplicity and predictive accuracy.

10. Is Lasso Regression suitable for non-linear relationships between variables?
    Lasso Regression is a linear regression technique and is more appropriate for problems with linear relationships between variables. For capturing non-linear relationships, you may consider using non-linear regression techniques or adding polynomial features to the input data before applying Lasso Regression.