# Experiment 4 - ElasticNet Regularization



Elastic net is a combination of the two most popular regularized variants of linear regression: ridge and lasso. Ridge utilizes an L2 penalty and lasso uses an L1 penalty. With elastic net, you don't have to choose between these two models, because elastic net uses both the L2 and the L1 penalty


### 0. Import Packages

In [1]:
! pip install -r https://raw.githubusercontent.com/aso-uts/labs_datasets/main/36106-mlaa/requirements.txt



In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
from sklearn.linear_model import ElasticNet
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import mean_squared_error
from scipy.stats import f_oneway

### 1.   Load Dataset

In [3]:
train_num = pd.read_csv('/Users/ronikj/Downloads/train_num.csv')

### 2. Explore Dataset

In [4]:
train_num.head()

Unnamed: 0,quant,logical,10percentage,english,collegetier,12percentage,computerprogramming,domain,age,salary,specialization_encoded
0,2.432643,1.892646,1.016186,1.420185,0.0,0.868785,0.413642,0.421884,0.249636,1.094253,1.49727
1,-2.500658,-0.75406,-2.076659,-0.58101,0.0,-0.885233,-0.9865498,-0.568227,0.249636,-1.328135,-0.810604
2,-0.075118,-0.293763,-0.035181,-0.152183,0.0,0.974026,-6.63264e-16,0.787905,0.827781,-0.279638,-0.041313
3,0.993763,1.374812,0.664726,1.658423,0.0,1.048685,2.397247,1.252765,-0.328509,0.913478,-0.810604
4,-1.226222,-0.063615,0.433767,0.705472,0.0,0.059238,-1.569963,-0.747762,0.249636,-0.677344,-1.689794


In [5]:
train_num.shape

(2998, 11)

In [6]:
train_num.dtypes

quant                     float64
logical                   float64
10percentage              float64
english                   float64
collegetier               float64
12percentage              float64
computerprogramming       float64
domain                    float64
age                       float64
salary                    float64
specialization_encoded    float64
dtype: object

### 3. Explore Target variable

In [7]:
target = train_num['salary']
target.head()

0    1.094253
1   -1.328135
2   -0.279638
3    0.913478
4   -0.677344
Name: salary, dtype: float64

In [8]:
target.shape

(2998,)

### 4. Explore Variables of Interest

This step is skipped as data exploration has been performed in experiment 1. The same cleaned and explored dataset is being used for the Experiment 4.

### 5. Prepare Data

This step is skipped as data exploration has been performed in experiment 1. The same cleaned and explored dataset is being used for the Experiment 2.

### 4. Split Dataset

In [10]:
X = train_num.drop(columns=['salary'])
y = train_num['salary'] 

X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.2, random_state=42)
X_test, X_val, y_test, y_val = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

print("X_train shape:", X_train.shape)
print("X_test shape:", X_test.shape)
print("X_val shape:", X_val.shape)
print("y_train shape:", y_train.shape)
print("y_test shape:", y_test.shape)
print("y_val shape:", y_val.shape)

X_train shape: (2398, 10)
X_test shape: (300, 10)
X_val shape: (300, 10)
y_train shape: (2398,)
y_test shape: (300,)
y_val shape: (300,)


### 5. Baseline Model

In [34]:
y_train_central = np.mean(y_train)

In [35]:
y_train_base = np.full(y_train.shape, y_train_central)
y_val_base = np.full(y_val.shape, y_train_central)
y_test_base = np.full(y_test.shape, y_train_central)

In [36]:
train_rmse = mean_squared_error(y_train, y_train_base, squared=False)
val_rmse = mean_squared_error(y_val, y_val_base, squared=False)
test_rmse = mean_squared_error(y_test, y_test_base, squared=False)

In [37]:
print("Baseline Model Performance:")
print(f"Training RMSE: {train_rmse.round(2)}")
print(f"Validation RMSE: {val_rmse.round(2)}")
print(f"Testing RMSE: {test_rmse.round(2)}")

Baseline Model Performance:
Training RMSE: 1.0
Validation RMSE: 1.0
Testing RMSE: 0.96


### 6. Train Machine Learning Model

In [28]:
param_grid = {
    'alpha': [0.01, 0.1, 1.0],  # Range of alpha values to try
    'l1_ratio': [0.1, 0.5, 0.9]  # Range of l1_ratio values to try
}

grid_search = GridSearchCV(ElasticNet(), param_grid, cv=5, scoring='neg_mean_squared_error', verbose=1)


grid_search.fit(X_train, y_train)

best_params = grid_search.best_params_
best_score = grid_search.best_score_

print("Best Parameters:", best_params)
print("Best Negative MSE Score:", best_score)

Fitting 5 folds for each of 9 candidates, totalling 45 fits
Best Parameters: {'alpha': 0.01, 'l1_ratio': 0.1}
Best Negative MSE Score: -0.7674187507367513


In [29]:
model_en = ElasticNet(alpha=0.01, l1_ratio=0.1)  # Example values for alpha and l1_ratio
model_en.fit(X_train, y_train)

ElasticNet(alpha=0.01, l1_ratio=0.1)

In [30]:
y_train_pred = model_en.predict(X_train)
train_rmse = mean_squared_error(y_train, y_train_pred, squared=False)

In [31]:
y_val_pred = model_en.predict(X_val)
val_rmse = mean_squared_error(y_val, y_pred, squared=False)

In [32]:
y_pred_test = model_en.predict(X_test)
test_rmse = mean_squared_error(y_test, y_pred_test, squared=False)

### 7. Assess Model Performance

In [33]:
model_train_perf = train_rmse.round(5)
model_valid_perf = val_rmse.round(5)
model_test_perf = test_rmse.round(5)

print("Root Mean Squared Error (RMSE) on Training Set:", model_train_perf)
print("Root Mean Squared Error (RMSE) on Validation Set:", model_valid_perf)
print("Root Mean Squared Error (RMSE) for Testing Set:", model_test_perf)

Root Mean Squared Error (RMSE) on Training Set: 0.87198
Root Mean Squared Error (RMSE) on Validation Set: 0.86806
Root Mean Squared Error (RMSE) for Testing Set: 0.84253
