<a href="https://colab.research.google.com/github/2403a52030-sketch/ML-LAB_assignment/blob/main/ML_Labassignment_7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Tasks:
1. Load dataset and encode categorical columns (sex, smoker, region).
2. Use features: age, bmi, children, smoker
Target: charges
3. Train Linear Regression and Ridge Regression.
4. Try Ridge with different alpha values:
α = 0.1, 1, 10, 100
5. Compare model performance and identify best alpha.

In [None]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression, Ridge
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score


In [None]:
# Load insurance dataset
df = pd.read_csv("/content/insurance.csv")

df.head()


Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,19,female,27.9,0,yes,southwest,16884.924
1,18,male,33.77,1,no,southeast,1725.5523
2,28,male,33.0,3,no,southeast,4449.462
3,33,male,22.705,0,no,northwest,21984.47061
4,32,male,28.88,0,no,northwest,3866.8552


In [None]:
df_encoded = pd.get_dummies(df, columns=["sex", "smoker", "region"], drop_first=True)

df_encoded.head()


Unnamed: 0,age,bmi,children,charges,sex_male,smoker_yes,region_northwest,region_southeast,region_southwest
0,19,27.9,0,16884.924,False,True,False,False,True
1,18,33.77,1,1725.5523,True,False,False,True,False
2,28,33.0,3,4449.462,True,False,False,True,False
3,33,22.705,0,21984.47061,True,False,True,False,False
4,32,28.88,0,3866.8552,True,False,True,False,False


In [None]:
X = df_encoded[["age", "bmi", "children", "smoker_yes"]]
y = df_encoded["charges"]

X.head()


Unnamed: 0,age,bmi,children,smoker_yes
0,19,27.9,0,True
1,18,33.77,1,False
2,28,33.0,3,False
3,33,22.705,0,False
4,32,28.88,0,False


In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=42
)


In [None]:
scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


In [None]:
lr = LinearRegression()
lr.fit(X_train_scaled, y_train)

y_pred_lr = lr.predict(X_test_scaled)

print("Linear Regression Performance")
print("MAE:", mean_absolute_error(y_test, y_pred_lr))
print("RMSE:", np.sqrt(mean_squared_error(y_test, y_pred_lr)))
print("R2 Score:", r2_score(y_test, y_pred_lr))


Linear Regression Performance
MAE: 4213.798594527246
RMSE: 5829.378521780666
R2 Score: 0.7811147722517887


In [None]:
alphas = [0.1, 1, 10, 100]
ridge_results = []

for alpha in alphas:
    ridge = Ridge(alpha=alpha)
    ridge.fit(X_train_scaled, y_train)

    y_pred = ridge.predict(X_test_scaled)

    ridge_results.append([
        alpha,
        mean_absolute_error(y_test, y_pred),
        np.sqrt(mean_squared_error(y_test, y_pred)),
        r2_score(y_test, y_pred)
    ])

ridge_results


[[0.1, 4213.9543150393265, np.float64(5829.427703133331), 0.7811110788505281],
 [1, 4215.354351573052, np.float64(5829.877976692783), 0.781077262940931],
 [10, 4229.212828850591, np.float64(5835.121780350808), 0.7806832567045574],
 [100, 4385.350096356628, np.float64(5947.9240790026815), 0.7721218040905247]]

In [None]:
ridge_df = pd.DataFrame(
    ridge_results,
    columns=["Alpha", "MAE", "RMSE", "R2 Score"]
)

ridge_df


Unnamed: 0,Alpha,MAE,RMSE,R2 Score
0,0.1,4213.954315,5829.427703,0.781111
1,1.0,4215.354352,5829.877977,0.781077
2,10.0,4229.212829,5835.12178,0.780683
3,100.0,4385.350096,5947.924079,0.772122


In [None]:
print("""
Conclusion:
Ridge Regression improves generalization by reducing overfitting.
Among the tested values, alpha = 1 provides the best balance between
bias and variance, resulting in lower error and a higher R² score.
Hence, Ridge Regression with alpha = 1 performs best for insurance
cost prediction.
""")



Conclusion:
Ridge Regression improves generalization by reducing overfitting.
Among the tested values, alpha = 1 provides the best balance between
bias and variance, resulting in lower error and a higher R² score.
Hence, Ridge Regression with alpha = 1 performs best for insurance
cost prediction.

