Lasso Regression, which stands for `Least Absolute Shrinkage and Selection Operator`, is a type of linear regression that uses shrinkage. Shrinkage here means that the data values are shrunk towards a central point, like the mean. The lasso technique encourages simple, sparse models (i.e., models with fewer parameters). This particular type of regression is well-suited for models showing high levels of multicollinearity or when you want to automate certain parts of model selection, like variable selection/parameter elimination.

### Key Features of Lasso Regression:

1. **Regularization Term**: The key characteristic of Lasso Regression is that it adds an L1 penalty to the regression model, which is the absolute value of the magnitude of the coefficients. The cost function for Lasso regression is:

   $$ \text{Minimize } \sum_{i=1}^{n} (y_i - \sum_{j=1}^{p} x_{ij} \beta_j)^2 + \lambda \sum_{j=1}^{p} |\beta_j| $$

   where $ \lambda $ is the regularization parameter.

2. **Feature Selection**: One of the advantages of lasso regression over ridge regression is that it can result in sparse models with few coefficients; some coefficients can become exactly zero and be eliminated from the model. This property is called automatic feature selection and is a form of embedded method.

3. **Parameter Tuning**: The strength of the L1 penalty is determined by a parameter, typically denoted as alpha or lambda. Selecting a good value for this parameter is crucial and is typically done using cross-validation.

4. **Bias-Variance Tradeoff**: Similar to ridge regression, lasso also manages the bias-variance tradeoff in model training. Increasing the regularization strength increases bias but decreases variance, potentially leading to better generalization on unseen data.

5. **Scaling**: Before applying lasso, it is recommended to scale/normalize the data as lasso is sensitive to the scale of input features.

### Implementation in Scikit-Learn:

Lasso regression can be implemented using the `Lasso` class from Scikit-Learn's `linear_model` module. Here's a basic example:

In [8]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression,Ridge,Lasso
from sklearn.metrics import mean_squared_error

X,y = make_regression(random_state=42, n_samples=1000, n_features=15 , noise=0.1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

lr = LinearRegression()
lasso = Lasso(alpha=0.2)
ridge = Ridge(alpha=1.0)

lr.fit(X_train, y_train)
lasso.fit(X_train, y_train)
ridge.fit(X_train, y_train)

lr_pred = lr.predict(X_test)
lasso_pred = lasso.predict(X_test)
ridge_pred = ridge.predict(X_test)

print("Linear Regression MSE:", mean_squared_error(y_test, lr_pred))
print("Lasso Regression MSE:", mean_squared_error(y_test, lasso_pred))
print("Ridge Regression MSE:", mean_squared_error(y_test, ridge_pred))


Linear Regression MSE: 0.011183765115093535
Lasso Regression MSE: 0.3847492638484498
Ridge Regression MSE: 0.05090866185225746


In this example, `alpha` is the parameter that controls the amount of L1 regularization applied to the model. Fine-tuning `alpha` through techniques like cross-validation is a common practice to find the best model.

In [9]:
%%time
# Fine tune alpha value using cv
import numpy as np
from sklearn.model_selection import GridSearchCV

param_grid = {'alpha' : np.arange(1,10,0.01)}

lasso = Lasso()

lasso_cv = GridSearchCV(lasso, param_grid, cv=10, n_jobs=-1)
lasso_cv.fit(X_train, y_train)

print("Best alpha for Lasso Regression:", lasso_cv.best_params_)
print("Best score for Lasso Regression:", lasso_cv.best_score_)

ridge = Ridge()
ridge_cv = GridSearchCV(ridge, param_grid, cv=10, n_jobs=-1)
ridge_cv.fit(X_train, y_train)

print("Best alpha for Ridge Regression:", ridge_cv.best_params_)
print("Best score for Ridge Regression:", ridge_cv.best_score_)



Best alpha for Lasso Regression: {'alpha': 1.0}
Best score for Lasso Regression: 0.9995945087010266
Best alpha for Ridge Regression: {'alpha': 1.0}
Best score for Ridge Regression: 0.9999974601713119
CPU times: total: 7.59 s
Wall time: 34.4 s


In [6]:
%%time
# Fine tune alpha value using cv
import numpy as np
from sklearn.model_selection import RandomizedSearchCV

param_grid = {'alpha' : np.arange(1,10,0.01)}

lasso = Lasso()

lasso_cv = RandomizedSearchCV(lasso, param_grid, cv=10, n_jobs=-1)
lasso_cv.fit(X_train, y_train)

print("Best alpha for Lasso Regression:", lasso_cv.best_params_)
print("Best score for Lasso Regression:", lasso_cv.best_score_)

ridge = Ridge()
ridge_cv = RandomizedSearchCV(ridge, param_grid, cv=10, n_jobs=-1)
ridge_cv.fit(X_train, y_train)

print("Best alpha for Ridge Regression:", ridge_cv.best_params_)
print("Best score for Ridge Regression:", ridge_cv.best_score_)



Best alpha for Lasso Regression: {'alpha': 1.4200000000000004}
Best score for Lasso Regression: 0.9991907614175057
Best alpha for Ridge Regression: {'alpha': 1.2400000000000002}
Best score for Ridge Regression: 0.999996308187975
CPU times: total: 172 ms
Wall time: 474 ms


### **Lasso Regression Explained Simply**  

#### **What is Lasso Regression?**  
Lasso (Least Absolute Shrinkage and Selection Operator) Regression is a modified version of linear regression that helps prevent **overfitting** and can **automatically select important features** by shrinking some coefficients to **zero**.  

### **Key Idea:**  
- Like Ridge Regression, Lasso adds a penalty to the model's coefficients to keep them small.  
- But unlike Ridge (which only shrinks coefficients), Lasso can **completely eliminate** unimportant features by setting their coefficients to **zero**.  
- This makes Lasso useful for **feature selection**—helping you identify which variables actually matter.  

### **Why Use Lasso?**  
✔ **Reduces overfitting** (makes the model simpler).  
✔ **Automatically selects important features** (good when you have many useless/redundant features).  
✔ Works well when only a few features actually impact the outcome.  

### **How Does It Work?**  
Lasso modifies the usual linear regression cost function by adding a penalty:  

\[
\text{Cost} = \text{Sum of Squared Errors} + \lambda \times (\text{Sum of Absolute Coefficients})
\]  

- **λ (lambda)**: Controls penalty strength (higher λ → more coefficients become zero).  
- Uses **absolute values** (L1 penalty) instead of squared values (like Ridge).  

### **Example:**  
Imagine predicting house prices using features like:  
- **Size** (important)  
- **Number of bedrooms** (somewhat important)  
- **Wall color** (useless)  

Lasso might:  
✅ Keep **Size** (big coefficient)  
✅ Slightly shrink **Bedrooms** (small coefficient)  
❌ Eliminate **Wall color** (coefficient = 0)  

### **Lasso vs. Ridge:**  
| **Lasso (L1)** | **Ridge (L2)** |  
|--------------|--------------|  
| Shrinks some coefficients to **zero** (feature selection) | Shrinks coefficients but **never to zero** |  
| Good when **few features matter** | Good when **many features contribute** |  
| Uses **absolute values** in penalty | Uses **squared values** in penalty |  

### **When to Use Lasso?**  
- You have **many features** and suspect only a few are useful.  
- You want **automatic feature selection**.  
- Your data has **some useless/redundant features**.  

### **Final Thought:**  
Lasso is like a "smart filter" for your regression model—it keeps what’s important and discards the rest! 🎯  