# Module63 Regression Lasso Assignment

Q1. What is Lasso Regression, and how does it differ from other regression techniques?

A1. Lasso Regression (Least Absolute Shrinkage and Selection Operator):

A linear regression technique that adds a penalty term proportional to the absolute values of the coefficients (∣𝛽∣) to the loss function. This leads to both shrinkage and feature selection.

Objective Function:

`L = MSE + 𝜆 * ∑(j=1 to p) ∣𝛽𝑗∣ `

where, L = Loss function

𝜆 = Regularization parameter(controls penalty strength).

∣𝛽𝑗∣ = Absolute value of the coefficients.



**Key Difference:**

Unlike standard linear regression, Lasso can shrink some coefficients to exactly zero, effectively performing feature selection.





Q2. What is the main advantage of using Lasso Regression in feature selection?

A2. **Automatic Feature Selection:**

1.) Lasso shrinks the coefficients of less important features to exactly zero, removing them from the model.

2.) This simplifies the model, reduces overfitting, and makes it interpretable.

**Advantage Over Ridge Regression:**

While Ridge shrinks coefficients close to zero, it does not eliminate features entirely.

Q3. How do you interpret the coefficients of a Lasso Regression model?

A3. **Non-zero Coefficients:**

Represent the features selected by the model. Larger coefficients indicate stronger influence on the target variable.

**Zero Coefficients:**

Features with zero coefficients are irrelevant to the model and are effectively excluded.

**Example:**

In a house price prediction model, if the coefficient for `location` is non-zero and `age of house` is zero, the model considers `location` significant but ignores `age`.

Q4. What are the tuning parameters that can be adjusted in Lasso Regression, and how do they affect the model's performance?

A4.

#  Tuning Parameter - λ(Regularization Strength):

1.) Controls the magnitude of the penalty term.

2.) Effects:

**Small λ:** Minimal penalty, behaves like linear regression.

**Large λ:** Strong penalty, shrinks more coefficients to zero, reducing overfitting but increasing bias.

3.) Optimal λ balances model complexity and performance.

Q5. Can Lasso Regression be used for non-linear regression problems? If yes, how?

A5. Directly No: Lasso is inherently a linear regression technique.

# Solution:

Transform input features using polynomial features or other non-linear transformations, then apply Lasso.
Example:

Use polynomial regression with Lasso to capture non-linear relationships while performing feature selection.

Q6. What is the difference between Ridge Regression and Lasso Regression?

A6. Difference between Ridge and Lasso Regression according to the following aspects :-

**1.) Penalty**

Ridge - λ ∑ (j= 1 to p) βj^2

Lasso - λ ∑ (j= 1 to p) |βj|

**2.) Featue Selection**

Ridge - Does not perform feature selection; keeps all coefficients small.

Lasso - Shrinks some coefficients to exactly zero, removing features.

**3.) Usage**

Ridge - For multicollinearity and when all features are important.

Lasso - For sparse models and feature selection.

**4.) Type of Regularization**

Ridge - L2 Regularization

Lasso - L1 Regularization

Q7. Can Lasso Regression handle multicollinearity in the input features? If yes, how?

A7. Yes: Lasso can handle multicollinearity by:

1.) Selecting one of the correlated features while shrinking others to zero.

2.) Reducing redundancy and improving interpretability.

**Limitation:** If two features are equally important, Lasso may arbitrarily select one.

Q8. How do you choose the optimal value of the regularization parameter (lambda) in Lasso Regression?

A8. There are 2 methods to choose optimal regularization parameter :-

**1.) Grid Search with Cross-Validation:**

a.) Test a range of λ values and evaluate the model's performance using k-fold cross-validation.

b.) Select the λ that minimizes validation error.

**2.) Use Built-in Tools:**

a.) In Python, LassoCV from scikit-learn automatically selects the optimal λ.

In [7]:
# Method - 1:  Grid Search with Cross-Validation

from sklearn.linear_model import Lasso
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.metrics import mean_squared_error
import numpy as np

# Example dataset: Create a synthetic regression dataset
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=1000, n_features=20, noise=0.1, random_state=42)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the Lasso regression model
lasso = Lasso()

# Define the grid of regularization parameters (alpha values)
param_grid = {'alpha': np.logspace(-4, 0, 50)}  # Values ranging from 0.0001 to 1

# Perform Grid Search with 5-Fold Cross-Validation
grid_search = GridSearchCV(estimator=lasso, param_grid=param_grid, cv=5, scoring='neg_mean_squared_error', verbose=1)

# Fit the model to the training data
grid_search.fit(X_train, y_train)

# Retrieve the best regularization parameter
optimal_alpha = grid_search.best_params_['alpha']
best_model = grid_search.best_estimator_

print(f"Optimal Regularization Parameter (λ): {optimal_alpha}")
print(f"Best Model: {best_model}")

# Evaluate the best model on the test set
y_pred = best_model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Test Set Mean Squared Error (MSE): {mse:.4f}")


Fitting 5 folds for each of 50 candidates, totalling 250 fits
Optimal Regularization Parameter (λ): 0.0016768329368110067
Best Model: Lasso(alpha=0.0016768329368110067)
Test Set Mean Squared Error (MSE): 0.0114


# Explanation  

**1.) Dataset:**

Make_regression is used to create a synthetic regression dataset. Replace it with your dataset if available.

**2.) Grid of Regularization Parameters:**

The `np.logspace(-4, 0, 50)` generates 50 values for α between 10e-4 and 10e0. You can adjust the range or number of values as needed.

**3.) GridSearchCV:**

`cv=5` : 5-fold cross-validation is used to evaluate each α value.

`scoring ='neg_mean_squared_error'`: Negative MSE is used to measure performance (lower is better).

**4.) Optimal Alpha:**

The best regularization parameter (α) is retrieved using
`grid_search.best_params_['alpha'].`

**5.) Model Evaluation:**

The best Lasso model is evaluated on the test set using Mean Squared Error (MSE).

In [5]:
# Method- 2: Using Built in tools:-

# Creating dataframe for implementing LassoCV
from sklearn.datasets import fetch_california_housing
import pandas as pd

dataset = fetch_california_housing()
df = pd.DataFrame(dataset.data, columns= dataset.feature_names)
df['target'] = dataset.target

# Diving the dataset into dependent and independent features
x = df.iloc[:, :-1]      # independent features
y = df.iloc[:, -1]       # dependent feature


from sklearn.linear_model import LassoCV
from sklearn.model_selection import train_test_split

# Splitting data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 42)

# Fit Lasso CV
lasso_cv = LassoCV(cv=5).fit(x_train, y_train)

# Finding Optimal lambda
optimal_lambda = lasso_cv.alpha_
print(f"Optimal Regularization parameter(λ) is: {optimal_lambda}")


Optimal Regularization parameter(λ) is: 0.034222561573497685
