<a href="https://colab.research.google.com/github/2403a52009-bot/ML/blob/main/ml_Asn_7A.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Lab 7: L2 Regularization - Ridge Regression  
Dataset: insurance.csv  

---
## Objective
Implement Linear Regression and Ridge Regression using the Insurance dataset.
Compare performance for different alpha values and identify the best alpha.


## STEP 1 — Import Required Libraries

In [1]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler


## STEP 2 — Load Dataset

In [2]:

# Ensure insurance.csv is in the same directory
df = pd.read_csv("insurance.csv")
df.head()


Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
1,18,male,33.77,1,no,southeast,1725.5523
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552



Encoding categorical columns: sex, smoker, region.
Using one-hot encoding for sex and region.
Binary encoding for smoker.


In [3]:

# One-hot encoding for categorical columns
df = pd.get_dummies(df, columns=['sex', 'region'], drop_first=True)

# Binary encoding for smoker
df['smoker'] = df['smoker'].map({'yes':1, 'no':0})

df.head()


Unnamed: 0,age,bmi,children,smoker,charges,sex_male,region_northwest,region_southeast,region_southwest
0,19,27.9,0,1,16884.924,False,False,False,True
1,18,33.77,1,0,1725.5523,True,False,True,False
2,28,33.0,3,0,4449.462,True,False,True,False
3,33,22.705,0,0,21984.47061,True,True,False,False
4,32,28.88,0,0,3866.8552,True,True,False,False



## STEP 3 — Select Features and Target

Features: age, bmi, children, smoker  
Target: charges


In [4]:

X = df[['age', 'bmi', 'children', 'smoker']]
y = df['charges']


## STEP 4 — Train-Test Split (80/20)

In [5]:

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


## STEP 5 — Feature Scaling

In [6]:

scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


## STEP 6 — Linear Regression Model

In [7]:

lin_model = LinearRegression()
lin_model.fit(X_train_scaled, y_train)

y_pred_lin = lin_model.predict(X_test_scaled)

mse_lin = mean_squared_error(y_test, y_pred_lin)
rmse_lin = np.sqrt(mse_lin)
r2_lin = r2_score(y_test, y_pred_lin)

print("Linear Regression Results")
print("MSE:", mse_lin)
print("RMSE:", rmse_lin)
print("R2 Score:", r2_lin)


Linear Regression Results
MSE: 33981653.95019775
RMSE: 5829.378521780666
R2 Score: 0.7811147722517887


## STEP 7 — Ridge Regression (α = 0.1, 1, 10, 100)

In [8]:

alphas = [0.1, 1, 10, 100]
ridge_results = {}

for alpha in alphas:
    ridge_model = Ridge(alpha=alpha)
    ridge_model.fit(X_train_scaled, y_train)

    y_pred_ridge = ridge_model.predict(X_test_scaled)

    mse = mean_squared_error(y_test, y_pred_ridge)
    rmse = np.sqrt(mse)
    r2 = r2_score(y_test, y_pred_ridge)

    ridge_results[alpha] = (mse, rmse, r2)

    print(f"Alpha = {alpha}")
    print("MSE:", mse)
    print("RMSE:", rmse)
    print("R2 Score:", r2)
    print()


Alpha = 0.1
MSE: 33982227.34605835
RMSE: 5829.427703133331
R2 Score: 0.7811110788505281

Alpha = 1
MSE: 33987477.22312754
RMSE: 5829.877976692783
R2 Score: 0.781077262940931

Alpha = 10
MSE: 34048646.19152439
RMSE: 5835.121780350808
R2 Score: 0.7806832567045574

Alpha = 100
MSE: 35377800.84957989
RMSE: 5947.9240790026815
R2 Score: 0.7721218040905247



## STEP 8 — Identify Best Alpha (Highest R2 Score)

In [9]:

best_alpha = max(ridge_results, key=lambda x: ridge_results[x][2])

print("Best Alpha:", best_alpha)
print("Best Ridge Performance (MSE, RMSE, R2):", ridge_results[best_alpha])


Best Alpha: 0.1
Best Ridge Performance (MSE, RMSE, R2): (33982227.34605835, np.float64(5829.427703133331), 0.7811110788505281)


## STEP 9 — Compare Coefficients

In [10]:

coef_comparison = pd.DataFrame({
    "Feature": X.columns,
    "Linear_Coefficients": lin_model.coef_
})

print(coef_comparison)


    Feature  Linear_Coefficients
0       age          3616.318176
1       bmi          1978.420432
2  children           519.225287
3    smoker          9559.323158



## Conclusion

Linear Regression does not include regularization.
Ridge Regression applies L2 penalty to shrink coefficients.
As alpha increases, coefficients become smaller.
Best alpha is selected based on highest R2 score.
Ridge helps reduce overfitting and improves model stability.
