<a href="https://colab.research.google.com/github/2403a52009-bot/ML/blob/main/ml_Asn_8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Lab 8: L1 Regularization - Lasso Regression  
Dataset: CO2 Emission by Vehicles  

---
## Objective
Implement Linear Regression and Lasso Regression.
Use GridSearchCV to find the best alpha.
Compare model performance.


## STEP 1 — Import Required Libraries

In [1]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LinearRegression, Lasso
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler


## STEP 2 — Load Dataset

In [4]:

# Make sure CO2 Emissions_Canada.csv is in the same folder
df = pd.read_csv('CO2 Emissions_Canada.csv')

print("Dataset Shape:", df.shape)
df.head()


Dataset Shape: (7385, 12)


Unnamed: 0,Make,Model,Vehicle Class,Engine Size(L),Cylinders,Transmission,Fuel Type,Fuel Consumption City (L/100 km),Fuel Consumption Hwy (L/100 km),Fuel Consumption Comb (L/100 km),Fuel Consumption Comb (mpg),CO2 Emissions(g/km)
0,ACURA,ILX,COMPACT,2.0,4,AS5,Z,9.9,6.7,8.5,33,196
1,ACURA,ILX,COMPACT,2.4,4,M6,Z,11.2,7.7,9.6,29,221
2,ACURA,ILX HYBRID,COMPACT,1.5,4,AV7,Z,6.0,5.8,5.9,48,136
3,ACURA,MDX 4WD,SUV - SMALL,3.5,6,AS6,Z,12.7,9.1,11.1,25,255
4,ACURA,RDX AWD,SUV - SMALL,3.5,6,AS6,Z,12.1,8.7,10.6,27,244



Selected Features:
- Engine Size (L)
- Cylinders
- Fuel Consumption Comb (L/100 km)

Target:
- CO2 Emissions (g/km)


In [6]:

X = df[['Engine Size(L)', 'Cylinders', 'Fuel Consumption Comb (L/100 km)']]
y = df['CO2 Emissions(g/km)']


## STEP 3 — Train-Test Split (80/20)

In [7]:

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


## STEP 4 — Apply StandardScaler

In [8]:

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


## STEP 5 — Linear Regression Model

In [9]:

lin_model = LinearRegression()
lin_model.fit(X_train_scaled, y_train)

y_pred_lin = lin_model.predict(X_test_scaled)

mse_lin = mean_squared_error(y_test, y_pred_lin)
rmse_lin = np.sqrt(mse_lin)
r2_lin = r2_score(y_test, y_pred_lin)

print("Linear Regression Performance")
print("MSE:", mse_lin)
print("RMSE:", rmse_lin)
print("R2 Score:", r2_lin)


Linear Regression Performance
MSE: 421.92233190519977
RMSE: 20.540748085335153
R2 Score: 0.8773348735033225


## STEP 6 — Lasso Regression with GridSearchCV

In [10]:

lasso = Lasso()

param_grid = {'alpha': [0.001, 0.01, 0.1, 1, 10, 100]}

grid = GridSearchCV(lasso, param_grid, cv=5, scoring='r2')
grid.fit(X_train_scaled, y_train)

best_alpha = grid.best_params_['alpha']
print("Best Alpha:", best_alpha)

best_lasso = grid.best_estimator_

y_pred_lasso = best_lasso.predict(X_test_scaled)

mse_lasso = mean_squared_error(y_test, y_pred_lasso)
rmse_lasso = np.sqrt(mse_lasso)
r2_lasso = r2_score(y_test, y_pred_lasso)

print("Lasso Regression Performance")
print("MSE:", mse_lasso)
print("RMSE:", rmse_lasso)
print("R2 Score:", r2_lasso)


Best Alpha: 0.01
Lasso Regression Performance
MSE: 421.930066872253
RMSE: 20.540936367951996
R2 Score: 0.8773326247228237


## STEP 7 — Compare Coefficients

In [11]:

coef_df = pd.DataFrame({
    "Feature": X.columns,
    "Linear Coefficients": lin_model.coef_,
    "Lasso Coefficients": best_lasso.coef_
})

print(coef_df)


                            Feature  Linear Coefficients  Lasso Coefficients
0                    Engine Size(L)             7.605027            7.621766
1                         Cylinders            11.702725           11.686923
2  Fuel Consumption Comb (L/100 km)            38.330321           38.319008


## STEP 8 — Final Comparison

In [12]:

print("Linear R2:", r2_lin)
print("Lasso R2:", r2_lasso)

if r2_lasso > r2_lin:
    print("Lasso performed better.")
else:
    print("Linear Regression performed better.")


Linear R2: 0.8773348735033225
Lasso R2: 0.8773326247228237
Linear Regression performed better.



## Conclusion

Lasso Regression applies L1 penalty which can shrink some coefficients to zero.
This helps in feature selection and prevents overfitting.
Best alpha is chosen using GridSearchCV.
Performance is compared using R2, RMSE, and MSE.
