## Milestone 2: Neural Network Baseline and Hyperparameter Optimization

LIS 640 - Introduction to Applied Deep Learning

Due 3/7/25

## **Overview**
In Milestone 1 you have:
1. **Defined a deep learning problem** where AI can make a meaningful impact.
2. **Identified three datasets** that fit your topic and justified their relevance.
3. **Explored and visualized** the datasets to understand their structure.
4. **Implemented a PyTorch Dataset class** to prepare data for deep learning.

In Milestone 2 we will take the next step and implement a neural network baseline based on what we have learned in class! For this milestone, please use one of the datasets you picked in the last milestone. If you pick a new one, make sure to do Steps 2 - 4 again. 


## **Step 1: Define Your Deep Learning Problem**

The first step is to be clear about what you want your model to predict. Is your goal a classification or a regression task? what are the input features and what are you prediction targets y? Make sure that you have a sensible choice of features and a sensible choice of prediction targets y in your dataloader.

**Write down one paragraph of justification for how you set up your DataLoader below. If it makes sense to change the DataLoader from Milestone 1, describe what you changed and why:**

In this deep learning task, we are treating employee attrition prediction as a binary classification problem. Our DataLoader is set up to provide input features (x) that include key employee attributes such as age, gender, education level, performance rating, job satisfaction, compensation, and tenure, while the target variable (y) indicates whether an employee has left the company (1 for attrition, 0 for retention). Compared to the Milestone 1 DataLoader, I have made several modifications: numerical features are now normalized to mitigate scale differences, categorical features are one-hot encoded to better capture discrete information, and I plan to incorporate feature selection techniques in later stages to eliminate redundant or noisy variables. These changes ensure that the DataLoader delivers a more standardized and effective input, enhancing both the training efficiency and predictive performance of our deep learning model.

## **Step 2: Train a Neural Network in PyTorch**

We learned in class how to implement and train a feed forward neural network in pytorch. You can find reference implementations [here](https://github.com/mariru/Intro2ADL/blob/main/Week5/Week5_Lab_Example.ipynb) and [here](https://www.kaggle.com/code/girlboss/mmlm2025-pytorch-lb-0-00000). Tip: Try to implement the neural network by yourself from scratch before looking at the reference.


In [9]:
# import
import pandas as pd

df = pd.read_csv("HR-Employee-Attrition.csv")
print("Dataset shape:", df.shape)
print(df.head())

print("\nMissing values per column:")
print(df.isnull().sum())

cols_to_drop = ['EmployeeCount', 'Over18', 'StandardHours']
df.drop(columns=[col for col in cols_to_drop if col in df.columns], inplace=True)

for col in df.columns:
    if df[col].isnull().sum() > 0:
        if df[col].dtype in ['int64', 'float64']:
            df[col].fillna(df[col].median(), inplace=True)
        else:
            df[col].fillna(df[col].mode()[0], inplace=True)

categorical_cols = df.select_dtypes(include=['object']).columns
for col in categorical_cols:
    df[col] = df[col].astype('category')

if 'Attrition' in df.columns:
    df['Attrition'] = df['Attrition'].apply(lambda x: 1 if x.strip().lower() == 'yes' else 0)

print("\nCleaned data info:")
print(df.info())

df.to_csv("HR-Employee-Attrition_cleaned.csv", index=False)

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

data = pd.read_csv("HR-Employee-Attrition_cleaned.csv")
for col in data.columns:
    if data[col].dtype == 'object':
        data[col] = data[col].astype('category').cat.codes

X = data.drop("Attrition", axis=1).values
y = data["Attrition"].values

X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_val = scaler.transform(X_val)
X_test = scaler.transform(X_test)

X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1)
X_val_tensor = torch.tensor(X_val, dtype=torch.float32)
y_val_tensor = torch.tensor(y_val, dtype=torch.float32).unsqueeze(1)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)

train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
val_dataset = TensorDataset(X_val_tensor, y_val_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)

# define dataloaders: make sure to have a train, validation and a test loader
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

# define the model
class FeedForwardNN(nn.Module):
    def __init__(self, input_dim):
        super(FeedForwardNN, self).__init__()
        self.fc1 = nn.Linear(input_dim, 64)
        self.fc2 = nn.Linear(64, 32)
        self.fc3 = nn.Linear(32, 1)
        self.relu = nn.ReLU()
        
    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x

input_dim = X_train_tensor.shape[1]
model = FeedForwardNN(input_dim)

# define the loss function and the optimizer
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# train the model
num_epochs = 20
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for X_batch, y_batch in train_loader:
        optimizer.zero_grad()
        outputs = model(X_batch)
        loss = criterion(outputs, y_batch)
        loss.backward()
        optimizer.step()
        running_loss += loss.item() * X_batch.size(0)
    epoch_loss = running_loss / len(train_loader.dataset)
    
    model.eval()
    running_val_loss = 0.0
    with torch.no_grad():
        for X_val_batch, y_val_batch in val_loader:
            outputs = model(X_val_batch)
            loss = criterion(outputs, y_val_batch)
            running_val_loss += loss.item() * X_val_batch.size(0)
    val_loss = running_val_loss / len(val_loader.dataset)
    
    print(f"Epoch {epoch+1}/{num_epochs} - Train Loss: {epoch_loss:.4f} - Val Loss: {val_loss:.4f}")

# test the model
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for X_test_batch, y_test_batch in test_loader:
        outputs = model(X_test_batch)
        predictions = torch.sigmoid(outputs)
        predicted = (predictions > 0.5).float()
        total += y_test_batch.size(0)
        correct += (predicted == y_test_batch).sum().item()
accuracy = correct / total
print(f"Test Accuracy: {accuracy*100:.2f}%")

# define the model with Dropout and BatchNorm, try different learning rates and early stopping
class FeedForwardNN_DropoutBN(nn.Module):
    def __init__(self, input_dim):
        super(FeedForwardNN_DropoutBN, self).__init__()
        self.fc1 = nn.Linear(input_dim, 64)
        self.bn1 = nn.BatchNorm1d(64)
        self.dropout1 = nn.Dropout(0.5)
        self.fc2 = nn.Linear(64, 32)
        self.bn2 = nn.BatchNorm1d(32)
        self.dropout2 = nn.Dropout(0.5)
        self.fc3 = nn.Linear(32, 1)
        self.relu = nn.ReLU()
        
    def forward(self, x):
        x = self.fc1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.dropout1(x)
        x = self.fc2(x)
        x = self.bn2(x)
        x = self.relu(x)
        x = self.dropout2(x)
        x = self.fc3(x)
        return x

input_dim = X_train_tensor.shape[1]
model = FeedForwardNN_DropoutBN(input_dim)

learning_rate = 0.0005
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
criterion = nn.BCEWithLogitsLoss()

num_epochs = 50
patience = 5
best_val_loss = float('inf')
epochs_no_improve = 0

for epoch in range(num_epochs):
    model.train()
    train_loss = 0.0
    for X_batch, y_batch in train_loader:
        optimizer.zero_grad()
        outputs = model(X_batch)
        loss = criterion(outputs, y_batch)
        loss.backward()
        optimizer.step()
        train_loss += loss.item() * X_batch.size(0)
    train_loss /= len(train_loader.dataset)
    
    model.eval()
    val_loss = 0.0
    with torch.no_grad():
        for X_val_batch, y_val_batch in val_loader:
            outputs = model(X_val_batch)
            loss = criterion(outputs, y_val_batch)
            val_loss += loss.item() * X_val_batch.size(0)
    val_loss /= len(val_loader.dataset)
    
    print(f"Epoch {epoch+1}/{num_epochs} - Train Loss: {train_loss:.4f} - Val Loss: {val_loss:.4f}")
    
    if val_loss < best_val_loss:
        best_val_loss = val_loss
        epochs_no_improve = 0
        best_model = model.state_dict()
    else:
        epochs_no_improve += 1
        if epochs_no_improve >= patience:
            print("Early stopping!")
            break

model.load_state_dict(best_model)

model.eval()
correct = 0
total = 0
with torch.no_grad():
    for X_test_batch, y_test_batch in test_loader:
        outputs = model(X_test_batch)
        predictions = torch.sigmoid(outputs)
        predicted = (predictions > 0.5).float()
        total += y_test_batch.size(0)
        correct += (predicted == y_test_batch).sum().item()
accuracy = correct / total
print(f"Test Accuracy: {accuracy*100:.2f}%")

Dataset shape: (1470, 35)
   Age Attrition     BusinessTravel  DailyRate              Department  \
0   41       Yes      Travel_Rarely       1102                   Sales   
1   49        No  Travel_Frequently        279  Research & Development   
2   37       Yes      Travel_Rarely       1373  Research & Development   
3   33        No  Travel_Frequently       1392  Research & Development   
4   27        No      Travel_Rarely        591  Research & Development   

   DistanceFromHome  Education EducationField  EmployeeCount  EmployeeNumber  \
0                 1          2  Life Sciences              1               1   
1                 8          1  Life Sciences              1               2   
2                 2          2          Other              1               4   
3                 3          4  Life Sciences              1               5   
4                 2          1        Medical              1               7   

   ...  RelationshipSatisfaction StandardHours  

## **Step 2 continued: Try Stuff**

Use your code above to try different architectures. Make sure to use early stopping! Try adding Dropout and BatchNorm, try different learning rates. How do they affect training and validation performance? 

 **Summarize your observations in a paragraph below:**
 
Using the basic feedforward model without regularization, the training and validation losses gradually decreased over 20 epochs, and the model achieved a test accuracy of around 86%. However, when adding dropout and batch normalization, lowering the learning rate, and using early stopping, the training process became noticeably more stable, with a smoother convergence and reduced overfitting, even though the final test accuracy remained similar. Overall, these enhancements improved training robustness and helped maintain consistent validation performance, highlighting the benefits of regularization and careful hyperparameter tuning in deep learning models.

## **Step 3: Hyperparameter Optimization with Optuna**

As you can see, hyperparameter optimization can be tedious. In class we used [optuna](https://optuna.org/#code_examples) to automate the process. Your next task is to wrap your code from Step 2 into an objective which you can then optimize with optuna. Under the [code exaples](https://optuna.org/#code_examples) there is a tab *PyTorch* which should be helpful as it provides a minimal example on how to wrap PyTorch code inside an objective.

**Important: Make sure the model is evaluated on a validation set, not the training data!!**


In [20]:
!pip install optuna
import optuna

# Define an objective function to be maximized.
def objective(trial):
    learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 1e-2)
    n_units1 = trial.suggest_int("n_units1", 32, 128, step=32)
    n_units2 = trial.suggest_int("n_units2", 16, 64, step=16)
    dropout_rate = trial.suggest_uniform("dropout_rate", 0.1, 0.5)
    
    input_dim = X_train_tensor.shape[1]
    model = nn.Sequential(
        nn.Linear(input_dim, n_units1),
        nn.ReLU(),
        nn.Dropout(dropout_rate),
        nn.Linear(n_units1, n_units2),
        nn.ReLU(),
        nn.Dropout(dropout_rate),
        nn.Linear(n_units2, 1)
    )
    
    criterion = nn.BCEWithLogitsLoss()
    optimizer = optim.Adam(model.parameters(), lr=learning_rate)
    
    num_epochs = 50
    patience = 5
    best_val_loss = float('inf')
    epochs_no_improve = 0
    for epoch in range(num_epochs):
        model.train()
        running_loss = 0.0
        for X_batch, y_batch in train_loader:
            optimizer.zero_grad()
            outputs = model(X_batch)
            loss = criterion(outputs, y_batch)
            loss.backward()
            optimizer.step()
            running_loss += loss.item() * X_batch.size(0)
        train_loss = running_loss / len(train_loader.dataset)
        
        model.eval()
        running_val_loss = 0.0
        for X_val_batch, y_val_batch in val_loader:
            outputs = model(X_val_batch)
            loss = criterion(outputs, y_val_batch)
            running_val_loss += loss.item() * X_val_batch.size(0)
        val_loss = running_val_loss / len(val_loader.dataset)
        
        if val_loss < best_val_loss:
            best_val_loss = val_loss
            epochs_no_improve = 0
        else:
            epochs_no_improve += 1
            if epochs_no_improve >= patience:
                break
                
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for X_val_batch, y_val_batch in val_loader:
            outputs = model(X_val_batch)
            predictions = torch.sigmoid(outputs)
            predicted = (predictions > 0.5).float()
            total += y_val_batch.size(0)
            correct += (predicted == y_val_batch).sum().item()
    val_accuracy = correct / total
    return val_accuracy

# Create a study object
study = optuna.create_study(direction="maximize")

# Optimize the objective function.
study.optimize(objective, n_trials=20)

# Print out the best parameters.
print("Best parameters:")
print(study.best_trial.params)

Collecting optuna
  Downloading optuna-4.2.1-py3-none-any.whl.metadata (17 kB)
Collecting colorlog (from optuna)
  Downloading colorlog-6.9.0-py3-none-any.whl.metadata (10 kB)
Downloading optuna-4.2.1-py3-none-any.whl (383 kB)
Downloading colorlog-6.9.0-py3-none-any.whl (11 kB)
Installing collected packages: colorlog, optuna
Successfully installed colorlog-6.9.0 optuna-4.2.1


[I 2025-03-06 18:40:17,373] A new study created in memory with name: no-name-379529f3-49e3-4435-8143-225e9a00c029
  learning_rate = trial.suggest_loguniform("learning_rate", 1e-5, 1e-2)
  dropout_rate = trial.suggest_uniform("dropout_rate", 0.1, 0.5)
[I 2025-03-06 18:40:17,974] Trial 0 finished with value: 0.85 and parameters: {'learning_rate': 0.0007029968021810993, 'n_units1': 64, 'n_units2': 64, 'dropout_rate': 0.19980482847729727}. Best is trial 0 with value: 0.85.
[I 2025-03-06 18:40:18,613] Trial 1 finished with value: 0.85 and parameters: {'learning_rate': 8.51906918304292e-05, 'n_units1': 96, 'n_units2': 16, 'dropout_rate': 0.2621733123565095}. Best is trial 0 with value: 0.85.
[I 2025-03-06 18:40:19,267] Trial 2 finished with value: 0.8590909090909091 and parameters: {'learning_rate': 0.0001100115950956175, 'n_units1': 128, 'n_units2': 48, 'dropout_rate': 0.23065064936164179}. Best is trial 2 with value: 0.8590909090909091.
[I 2025-03-06 18:40:19,943] Trial 3 finished with val

Best parameters:
{'learning_rate': 0.007455691970096293, 'n_units1': 96, 'n_units2': 48, 'dropout_rate': 0.12025257540349202}


## **Step 3 continued: Insights**

Did you find the hyperparameter search helpful? Does it help to increase the number of trials in the optimization? Note that so far we have used the simplest version of optuna which has many nice features. Can you discover more useful features by browsing the optuna website? (Hint: try pruning)

I found that using Optuna for hyperparameter optimization was extremely helpful in tuning my model. By increasing the number of trials, I was able to explore a broader range of parameter combinations, which sometimes led to better validation performance. I observed that even small adjustments in the learning rate, the number of hidden units, or the dropout rate had a noticeable impact on model accuracy. Additionally, I discovered that incorporating features like pruning can further enhance the efficiency of the search process by terminating unpromising trials early. Overall, I learned that automated hyperparameter optimization not only simplifies the tuning process but also provides valuable insights into how sensitive my model is to various hyperparameters.

## **Step 4: Final Training**

Now that you have found a good hyperparameter setting the validation set is no longer needed. The last step is to combine the training and validation set into a combined training set and retrain the model under the best parameter setting found. Report your final loss on your test data.

In [23]:
import pandas as pd

df = pd.read_csv("HR-Employee-Attrition.csv")
print("Dataset shape:", df.shape)
print(df.head())

print("\nMissing values per column:")
print(df.isnull().sum())

cols_to_drop = ['EmployeeCount', 'Over18', 'StandardHours']
df.drop(columns=[col for col in cols_to_drop if col in df.columns], inplace=True)

for col in df.columns:
    if df[col].isnull().sum() > 0:
        if df[col].dtype in ['int64', 'float64']:
            df[col].fillna(df[col].median(), inplace=True)
        else:
            df[col].fillna(df[col].mode()[0], inplace=True)

categorical_cols = df.select_dtypes(include=['object']).columns
for col in categorical_cols:
    df[col] = df[col].astype('category')

if 'Attrition' in df.columns:
    df['Attrition'] = df['Attrition'].apply(lambda x: 1 if x.strip().lower() == 'yes' else 0)

print("\nCleaned data info:")
print(df.info())

df.to_csv("HR-Employee-Attrition_cleaned.csv", index=False)

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Step 4: Final Training
data = pd.read_csv("HR-Employee-Attrition_cleaned.csv")
for col in data.columns:
    if data[col].dtype == 'object':
        data[col] = data[col].astype('category').cat.codes

X = data.drop("Attrition", axis=1).values
y = data["Attrition"].values

# Combine training and validation sets into a single training set (using 85% for training and 15% for testing)
X_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.15, random_state=42, stratify=y)

scaler = StandardScaler()
X_trainval = scaler.fit_transform(X_trainval)
X_test = scaler.transform(X_test)

X_trainval_tensor = torch.tensor(X_trainval, dtype=torch.float32)
y_trainval_tensor = torch.tensor(y_trainval, dtype=torch.float32).unsqueeze(1)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32).unsqueeze(1)

trainval_dataset = TensorDataset(X_trainval_tensor, y_trainval_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)

trainval_loader = DataLoader(trainval_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

class FeedForwardNN(nn.Module):
    def __init__(self, input_dim):
        super(FeedForwardNN, self).__init__()
        self.fc1 = nn.Linear(input_dim, 96)
        self.relu1 = nn.ReLU()
        self.dropout1 = nn.Dropout(0.12025257540349202)
        self.fc2 = nn.Linear(96, 48)
        self.relu2 = nn.ReLU()
        self.dropout2 = nn.Dropout(0.12025257540349202)
        self.fc3 = nn.Linear(48, 1)
        
    def forward(self, x):
        x = self.relu1(self.fc1(x))
        x = self.dropout1(x)
        x = self.relu2(self.fc2(x))
        x = self.dropout2(x)
        x = self.fc3(x)
        return x

input_dim = X_trainval_tensor.shape[1]
model = FeedForwardNN(input_dim)

criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=0.007455691970096293)

num_epochs = 50
for epoch in range(num_epochs):
    model.train()
    running_loss = 0.0
    for X_batch, y_batch in trainval_loader:
        optimizer.zero_grad()
        outputs = model(X_batch)
        loss = criterion(outputs, y_batch)
        loss.backward()
        optimizer.step()
        running_loss += loss.item() * X_batch.size(0)
    epoch_loss = running_loss / len(trainval_loader.dataset)
    print(f"Epoch {epoch+1}/{num_epochs} - Training Loss: {epoch_loss:.4f}")

model.eval()
test_loss = 0.0
with torch.no_grad():
    for X_batch, y_batch in test_loader:
        outputs = model(X_batch)
        loss = criterion(outputs, y_batch)
        test_loss += loss.item() * X_batch.size(0)
test_loss = test_loss / len(test_loader.dataset)
print(f"Final Test Loss: {test_loss:.4f}")

Dataset shape: (1470, 35)
   Age Attrition     BusinessTravel  DailyRate              Department  \
0   41       Yes      Travel_Rarely       1102                   Sales   
1   49        No  Travel_Frequently        279  Research & Development   
2   37       Yes      Travel_Rarely       1373  Research & Development   
3   33        No  Travel_Frequently       1392  Research & Development   
4   27        No      Travel_Rarely        591  Research & Development   

   DistanceFromHome  Education EducationField  EmployeeCount  EmployeeNumber  \
0                 1          2  Life Sciences              1               1   
1                 8          1  Life Sciences              1               2   
2                 2          2          Other              1               4   
3                 3          4  Life Sciences              1               5   
4                 2          1        Medical              1               7   

   ...  RelationshipSatisfaction StandardHours  

## **Final Submission**
Upload your submission for Milestone 2 to Canvas. 
Happy Deep Learning! 🚀