# Q1. What is Lasso Regression, and how does it differ from other regression techniques?

Lasso regression is a regularization technique. It is used over regression methods for a more accurate prediction. This model uses shrinkage. Shrinkage is where data values are shrunk towards a central point as the mean. The lasso procedure encourages simple, sparse models (i.e. models with fewer parameters). This particular type of regression is well-suited for models showing high levels of multicollinearity or when you want to automate certain parts of model selection, like variable selection/parameter elimination.

Lasso Regression uses L1 regularization technique (will be discussed later in this article). It is used when we have more features because it automatically performs feature selection.

Y = β0 + β1X1 + β2X2 + … + βpXp + ε
where:

Y: The response variable
Xj: The jth predictor variable
βj: The average effect on Y of a one unit increase in Xj, holding all other predictors fixed
ε: The error term

The values for β0, β1, B2, … , βp are chosen using the least square method, which minimizes the sum of squared residuals (RSS):

RSS = Σ(yi – ŷi)2

where:

Σ: A greek symbol that means sum
yi: The actual response value for the ith observation
ŷi: The predicted response value based on the multiple linear regression model

However, when the predictor variables are highly correlated then multicollinearity can become a problem. This can cause the coefficient estimates of the model to be unreliable and have high variance. That is, when the model is applied to a new set of data it hasn’t seen before, it’s likely to perform poorly.


One way to get around this issue is to use a method known as lasso regression, which instead seeks to minimize the following:

RSS + λΣ|βj|

where j ranges from 1 to p and λ ≥ 0.

This second term in the equation is known as a shrinkage penalty.

When λ = 0, this penalty term has no effect and lasso regression produces the same coefficient estimates as least squares.

However, as λ approaches infinity the shrinkage penalty becomes more influential and the predictor variables that aren’t importable in the model get shrunk towards zero and some even get dropped from the model.

In [22]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

df=pd.read_csv('Algerian_forest_fires_dataset_UPDATE.csv')

In [23]:
df.drop(['day','month','year'],axis=1,inplace=True)
## Encoding
df['Classes']=np.where(df['Classes'].str.contains("not fire"),0,1)

In [24]:
X=df.drop('FWI',axis=1)
y=df['FWI']

In [25]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25,random_state=42)

In [26]:
def correlation(dataset, threshold):
    col_corr = set()
    corr_matrix = dataset.corr()
    for i in range(len(corr_matrix.columns)):
        for j in range(i):
            if abs(corr_matrix.iloc[i, j]) > threshold: 
                colname = corr_matrix.columns[i]
                col_corr.add(colname)
    return col_corr

In [27]:
corr_features=correlation(X_train,0.85)
X_train.drop(corr_features,axis=1,inplace=True)
X_test.drop(corr_features,axis=1,inplace=True)
X_train.shape,X_test.shape

((182, 9), (61, 9))

In [28]:
from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
X_train_scaled=scaler.fit_transform(X_train)
X_test_scaled=scaler.transform(X_test)

In [29]:
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import r2_score
lasso=Lasso()
lasso.fit(X_train_scaled,y_train)
y_pred=lasso.predict(X_test_scaled)
mae=mean_absolute_error(y_test,y_pred)
score=r2_score(y_test,y_pred)
print("Mean absolute error", mae)
print("R2 Score", score)

Mean absolute error 1.133175994914409
R2 Score 0.9492020263112388


The advantage of lasso regression compared to least squares regression lies in the bias-variance tradeoff.

Recall that mean squared error (MSE) is a metric we can use to measure the accuracy of a given model and it is calculated as:

MSE = Var(f̂(x0)) + [Bias(f̂(x0))]2 + Var(ε)

MSE = Variance + Bias2 + Irreducible error

The basic idea of lasso regression is to introduce a little bias so that the variance can be substantially reduced, which leads to a lower overall MSE.
Notice that as λ increases, variance drops substantially with very little increase in bias. Beyond a certain point, though, variance decreases less rapidly and the shrinkage in the coefficients causes them to be significantly underestimated which results in a large increase in bias.

We can see from the chart that the test MSE is lowest when we choose a value for λ that produces an optimal tradeoff between bias and variance.

When λ = 0, the penalty term in lasso regression has no effect and thus it produces the same coefficient estimates as least squares. However, by increasing λ to a certain point we can reduce the overall test MSE.



# Q2. What is the main advantage of using Lasso Regression in feature selection?

Trying to minimize the cost function, Lasso regression will automatically select those features that are useful, discarding the useless or redundant features. In Lasso regression, discarding a feature will make its coefficient equal to 0.

So, the idea of using Lasso regression for feature selection purposes is very simple: we fit a Lasso regression on a scaled version of our dataset and we consider only those features that have a coefficient different from 0. Obviously, we first need to tune α hyperparameter in order to have the right kind of Lasso regression.

That’s pretty easy and will make us easily detect the useful features and discard the useless features.

The main advantage of using Lasso Regression for feature selection is its ability to automatically identify and select the most important features from a large pool of potential predictors. It does this by driving some coefficients to exactly zero, resulting in a sparse model. This helps simplify the model, improve interpretability, and prevent overfitting by excluding irrelevant or redundant features

# Q3. How do you interpret the coefficients of a Lasso Regression model?

Y = β0 + β1X1 + β2X2 + … + βpXp + ε
RSS = Σ(yi – ŷi)2

Lasso regression works by shrinking the magnitude of the coefficients towards zero. It achieves this by adding a penalty term to the cost function that is proportional to the sum of the absolute values of the coefficients.

The penalty term is defined as:
RSS + λΣ|βj|

where j ranges from 1 to p and λ ≥ 0.

How to Interpret Lasso Regression Results
Interpreting the results of a Lasso regression model can be challenging, but there are a few key steps that you can follow to make sense of the output.

Step 1: Check the Model’s Coefficients
The first step in interpreting the results of a Lasso regression model is to examine the values of the model’s coefficients. The coefficients represent the strength and direction of the relationship between the features and the target variable.

In Lasso regression, some of the coefficients will be set to zero, which means that the corresponding feature has been excluded from the model. The non-zero coefficients represent the features that are most important for predicting the target variable.

Step 2: Check the Model’s Performance Metrics
The second step in interpreting the results of a Lasso regression model is to check the model’s performance metrics. These metrics provide an indication of how well the model is performing on the test data set.

The most common performance metrics for regression models are:

Mean Squared Error (MSE)
R-squared (R^2)
Mean Absolute Error (MAE)
A good Lasso regression model should have a low MSE and MAE and a high R^2 value.

Step 3: Check for Overfitting
The third step in interpreting the results of a Lasso regression model is to check for overfitting. Overfitting occurs when the model is too complex and fits the training data too closely, resulting in poor performance on the test data set.

One way to check for overfitting is to compare the model’s performance on the training and test data sets. If the model performs significantly better on the training data set than the test data set, it may be overfitting.

Another way to check for overfitting is to use cross-validation. Cross-validation involves splitting the data set into multiple subsets and training the model on each subset while using the remaining subsets for testing. This can help to ensure that the model is not overfitting to any particular subset of the data.


In [32]:
from sklearn.linear_model import LassoCV
from sklearn.model_selection import train_test_split

# X_train, y_train: Training data and target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define alpha values to explore
alphas = [0.001, 0.01, 0.1, 1, 10, 100]

# Initialize LassoCV with cross-validation folds
lasso_cv = LassoCV(alphas=alphas, cv=5)

# Fit LassoCV to training data
lasso_cv.fit(X_train, y_train)

# Get the best alpha value
best_alpha = lasso_cv.alpha_

# Retrain Lasso model with best alpha
best_lasso_model = Lasso(alpha=best_alpha)
best_lasso_model.fit(X_train, y_train)

# Evaluate on the test set
test_score = best_lasso_model.score(X_test, y_test)
test_score

0.9860929691203884

# Q4. What are the tuning parameters that can be adjusted in Lasso Regression, and how do they affect the model's performance?

In Lasso Regression, there is one main tuning parameter that you can adjust, which is the regularization strength parameter (often denoted as "alpha/λ"). The regularization strength controls the trade-off between fitting the data well and keeping the model simple. It affects how aggressively the Lasso algorithm penalizes the magnitude of the coefficients.

--The alpha parameter controls the strength of the regularization. A higher value of alpha will result in more features being set to zero, which means they are excluded from the model.

--When λ = 0, this penalty term has no effect and lasso regression produces the same coefficient estimates as least squares.

--However, as λ approaches infinity the shrinkage penalty becomes more influential and the predictor variables that aren’t importable in the model get shrunk towards zero and some even get dropped from the model.

# Q5. Can Lasso Regression be used for non-linear regression problems? If yes, how?

Lasso Regression, by itself, is designed for linear regression problems. However, it can be extended to handle non-linear regression problems by using non-linear transformations of the input features. This approach is often referred to as "Lasso Regression with Polynomial Features" or "Lasso with Non-linear Basis Functions." Here's how you can adapt Lasso Regression for non-linear regression problems:
Non-linear Transformations:
Start by transforming the original features into non-linear forms. This can involve polynomial features, trigonometric functions, logarithmic functions, exponential functions, or any other suitable non-linear transformations.

Feature Expansion:
Create new features based on these non-linear transformations. For example, if you have a feature x, you can create new features like x^2, sin(x), log(x), etc. These new features capture the non-linear relationships between the original features and the target variable.

Apply Lasso Regression:
Use the expanded set of non-linear features as input to the Lasso Regression model. The regularization in Lasso Regression will still work with these non-linear features.

Hyperparameter Tuning:
Tune the Lasso hyperparameter (alpha) to find the optimal level of regularization. Cross-validation can help you determine the best value of alpha that balances between fitting the data and preventing overfitting.

Model Evaluation:
Evaluate the performance of the trained Lasso model using appropriate evaluation metrics like Mean Squared Error (MSE), R-squared, or others depending on the problem.

In [34]:
import numpy as np
from sklearn.linear_model import Lasso
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Generate non-linear data
X = np.random.rand(100, 1) * 10
y = 2 * np.sin(X) + np.random.randn(100, 1)

# Create non-linear features
X_poly = np.hstack([X, np.sin(X)])

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X_poly, y, test_size=0.2, random_state=42)

# Initialize Lasso model
lasso = Lasso(alpha=0.1)

# Fit the model to the training data
lasso.fit(X_train, y_train)

# Predict on the test data
y_pred = lasso.predict(X_test)

# Calculate Mean Squared Error
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")


Mean Squared Error: 0.9255923117832386


# Q6. What is the difference between Ridge Regression and Lasso Regression?

1.Regularization Term:


Ridge Regression: Adds the squared magnitude of coefficients (L2 norm) as a penalty term to the loss function. The regularization term is given by α * Σ(coefficient^2), where α is the regularization parameter.

Lasso Regression: Adds the absolute magnitude of coefficients (L1 norm) as a penalty term to the loss function. The regularization term is given by α * Σ|coefficient|, where α is the regularization parameter.

--------------------------------------------------------------------------------------------------------------------------------------------

2.Coefficient Shrinkage:

Ridge Regression: Shrinks the coefficients towards zero, but they rarely become exactly zero. Ridge can significantly reduce the magnitude of coefficients, effectively preventing them from becoming too large.

Lasso Regression: Can shrink coefficients all the way to zero, resulting in sparse models where some features are entirely excluded from the model. This leads to automatic feature selection.


Feature Selection:

Ridge Regression: Does not perform automatic feature selection, as it retains all features in the model but with reduced magnitudes.

Lasso Regression: Performs automatic feature selection by driving some coefficients to zero, favoring the most important features.

-------------------------------------------------------------------------------------------------------------------------------------------

3.Handling Multicollinearity:

Ridge Regression: Handles multicollinearity (high correlation between predictors) well by spreading the impact of correlated features across multiple coefficients.

Lasso Regression: Can struggle with multicollinearity, as it tends to pick one feature and discard the others in cases of high correlation.

-------------------------------------------------------------------------------------------------------------------------------------
4.Tuning Parameter:

Ridge Regression: The tuning parameter α controls the strength of regularization. Smaller α values result in weaker regularization.

Lasso Regression: The tuning parameter α controls the strength of regularization. Larger α values result in weaker regularization.

# Q7. Can Lasso Regression handle multicollinearity in the input features? If yes, how?

Lasso Regression: Can struggle with multicollinearity, as it tends to pick one feature and discard the others in cases of high correlation.
Lasso Regression can handle multicollinearity by selecting a subset of correlated features and driving the coefficients of others to zero. However, it might struggle in cases of extreme multicollinearity, and in those situations, other techniques or combinations of techniques might be more effective.

# Q8. How do you choose the optimal value of the regularization parameter (lambda) in Lasso Regression?

Choosing the optimal value of the regularization parameter, often denoted as λ (lambda), in Lasso Regression involves finding the value that strikes the right balance between model complexity and fitting the data. Cross-validation is a commonly used technique to determine the optimal λ value. Here's a step-by-step process:

Divide Data into Training and Validation Sets:
Split your dataset into a training set and a validation set. This is typically done using techniques like k-fold cross-validation or hold-out validation.

Define a Range of λ Values:
Choose a range of λ values to explore. You can use a logarithmic scale to cover a wide range of values, such as [0.001, 0.01, 0.1, 1, 10, 100].

Cross-Validation Loop:
For each λ value in your chosen range, perform the following steps:

a. Train Lasso Regression:
Train a Lasso Regression model using the training data and the current λ value.

b. Validate Model:
Evaluate the model's performance on the validation set using an appropriate metric (e.g., Mean Squared Error, R-squared).

Select the Best λ:
Choose the λ value that results in the best validation performance. This can be the λ value with the lowest validation error or the highest R-squared, depending on your problem.

Refit Model with Optimal λ:
After selecting the optimal λ value, retrain the Lasso Regression model using the entire training dataset and the chosen λ.

Evaluate on Test Set:
Finally, evaluate the performance of the model with the optimal λ on a separate test dataset that was not used during the cross-validation process. This gives you an unbiased estimate of the model's generalization performance.

In [36]:
from sklearn.linear_model import LassoCV
from sklearn.model_selection import train_test_split

# X_train, y_train: Training data and target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define a range of lambda values to explore
alphas = [0.001, 0.01, 0.1, 1, 10, 100]

# Initialize LassoCV with cross-validation folds
lasso_cv = LassoCV(alphas=alphas, cv=5)

# Fit LassoCV to training data
lasso_cv.fit(X_train, y_train)

# Get the best lambda value
best_alpha = lasso_cv.alpha_

# Retrain Lasso model with best lambda
best_lasso_model = Lasso(alpha=best_alpha)
best_lasso_model.fit(X_train, y_train)

# Evaluate on the test set
test_score = best_lasso_model.score(X_test, y_test)
test_score

-0.026219437458035566