# Deep Learning First Individual Assignment | Predicting Abalone Age
## *Berkay Kulak, 2118985*

---

### Project Evaluation Criteria:

#### Data Loading and Preprocessing (2 points max)
- Load the data from the CSV files using the appropriate data import methods in PyTorch.
- Apply dataset splits that enable you to assess network overfitting.
- Preprocess the categorical and numerical data accordingly (refer to **Lesson 5 – Training Practices and Regularization** and **Lesson 6 – PyTorch in Production**).
- Make data loading protocols for mini-batch training.

#### Approach and Methodology (2 points max)
- Implement a neural network architecture suited to the task and justify your choice (refer to **Lesson 6 – PyTorch in Production**).
- Apply regularization techniques that help prevent network overfitting.

#### Results and Evaluation (2 points max)
- Build a training protocol to train your neural network on the dataset with a user-specified number of epochs.
- Implement methods that effectively regulate the training process (refer to **Lab 5 – Training Practices and Regularization**).
- Provide visualizations, if useful, that give an indication of the training procedure (e.g., a loss curve). These visualizations can be included in your report as part of your submission package.
- Implement a software method that saves the network’s weight with the best performance.  
  - These network weights should be saved as `.pt` / `.pth` file.
  - Include them as part of your submission package so that the teacher and the teaching assistant can reproduce your results. 

#### Code Quality and Reproducibility (2 points max)
- Your code should be clean, well-structured, and properly commented.
- Ensure the code runs in inference mode with your network weights (`.pt` / `.pth` file) to produce the target variable (the number of rings) using the data from `test.csv`.
- Ensure that the model generates predictions for the number of rings for every data point in the test set.
- Write a `README` file for running your code in inference mode to produce the targets.  
  - This will help the teacher and the teaching assistant reproduce your results. 

#### Summary and Justification (2 points max)
- Clearly summarize the key novelties of your approach.
- Justify the choices made in the data preprocessing, neural network design, training protocol, and inference protocol.
- Discuss any challenges encountered and potential improvements.

---

First we need to import the libraries. As we can see below I have made a global variable called `SKIP_TRAINING`. If this variable is set to `True`, the training will be skipped and the model will be loaded from the saved weights. This is useful for testing the inference part of the code.

In [345]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, random_split
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random

SKIP_TRAINING = True

device, device_name = (torch.device("cuda"), torch.cuda.get_device_name(0)) if torch.cuda.is_available() else (torch.device("cpu"), "CPU")
print(f"Device: {device}, {device_name}")

Device: cpu, CPU


Below the seed is set to 42 for reproducibility.

In [346]:
def set_seed(seed=42):
    torch.manual_seed(seed)               
    torch.cuda.manual_seed_all(seed)      
    np.random.seed(seed)                  
    random.seed(seed)                     
    torch.backends.cudnn.deterministic = True  
    torch.backends.cudnn.benchmark = False     

set_seed(42)

Lets load the data.

In [347]:
train_data = pd.read_csv('Data/train.csv')
test_data = pd.read_csv('Data/test.csv')

This custom dataset class is designed for the Abalone age prediction task. It handles:
- Reading data from a CSV file
- One-hot encoding of the 'Sex' feature
- Standardization of numeric features (with mean/std from training data)
- Support for both training/validation and test mode (with/without targets)

In [348]:
class AbaloneDataset(Dataset):
    def __init__(self, csv_path, is_test=False, mean=None, std=None):
        self.data = pd.read_csv(csv_path)
        self.is_test = is_test

        self.numeric_cols = [
            'Length', 'Diameter', 'Height', 'Whole_weight',
            'Shucked_weight', 'Viscera_weight', 'Shell_weight'
        ]
        self.sex_mapping = {'M': 0, 'F': 1, 'I': 2}

        # Compute the mean and std for standardization
        if not is_test:
            self.mean = torch.tensor(self.data[self.numeric_cols].mean().values, dtype=torch.float32) 
            self.std = torch.tensor(self.data[self.numeric_cols].std().values, dtype=torch.float32)
        else:
            self.mean = mean.clone().detach()
            self.std = std.clone().detach()

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        row = self.data.iloc[idx]

        # One-hot encode 'Sex'
        sex_index = self.sex_mapping[row['Sex']]
        sex_one_hot = torch.nn.functional.one_hot(torch.tensor(sex_index), num_classes=3).float()

        # Get numeric features and standardize
        numeric = torch.tensor([row[col] for col in self.numeric_cols], dtype=torch.float32)
        numeric = (numeric - self.mean) / self.std

        # Combine features
        features = torch.cat([sex_one_hot, numeric])

        if self.is_test:
            return features
        else:
            target = torch.tensor(row['Rings'], dtype=torch.float32).view(1)
            return features, target

The code below does the following:
- Loads the full training dataset from `train.csv`
- Computes and stores the mean and standard deviation for standardization
- Splits the dataset into training and validation subsets (80/20) using a fixed random seed for reproducibility
- Re-wraps the validation set into a proper `AbaloneDataset` using the same standardization stats
- Loads the test dataset using the same mean and std to ensure consistent preprocessing
- Initializes `DataLoader` objects for training, validation, and test sets

In [349]:
# Load full training dataset
full_train_dataset = AbaloneDataset("Data/train.csv")

# Save mean and std
mean = full_train_dataset.mean
std = full_train_dataset.std

# Split the full training dataset
train_size = int(0.8 * len(full_train_dataset))
val_size = len(full_train_dataset) - train_size
generator = torch.Generator().manual_seed(42)
train_subset, val_subset = random_split(full_train_dataset, [train_size, val_size], generator=generator)

# Re-wrap val_subset into a proper AbaloneDataset with the same mean/std
val_indices = val_subset.indices
val_df = full_train_dataset.data.iloc[val_indices].reset_index(drop=True)
val_df.to_csv("Data/val_temp.csv", index=False)
val_dataset = AbaloneDataset("Data/val_temp.csv", is_test=False, mean=mean, std=std)

# Test dataset (already correct)
test_dataset = AbaloneDataset("Data/test.csv", is_test=True, mean=mean, std=std)

# Dataloaders
train_loader = DataLoader(train_subset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

Below is a simple feedforward neural network for regression. It consists of:
- Input layer with 10 features
- Two hidden layers with ReLU activations (64 → 32 units)
- A single linear output neuron for predicting the number of rings

This serves as the **baseline model** for the abalone age prediction task.

In [350]:
class CleanAbaloneNetwork(nn.Module):
    def __init__(self):
        super(CleanAbaloneNetwork, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(10, 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, 1)  # Output layer for regression
        )

    def forward(self, x):
        return self.model(x)

The code below does the following:
- Selects the available device (GPU if available, otherwise CPU)
- Initializes the `CleanAbaloneNetwork` model and moves it to the selected device
- Uses **Mean Squared Error (MSELoss)** as the loss function for regression
- Applies the **Adam optimizer** with a learning rate of 0.001 to update model parameters

In [351]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

clean_model = CleanAbaloneNetwork().to(device)
criterion = nn.MSELoss()
optimizer = optim.Adam(clean_model.parameters(), lr=0.001)

This function calculates the **L1 regularization penalty** for a given model:

- Iterates over all model parameters
- Sums the absolute values of the weights (L1 norm)
- Multiplies by a regularization strength `lambda_reg`

This extra penalty can be added to the main loss to help the model keep its weights small, which can prevent overfitting.

In [352]:
def l1_regularization(model, lambda_reg):
    l1_penalty = sum(param.abs().sum() for param in model.parameters())
    return lambda_reg * l1_penalty

The `train_model` function trains a PyTorch model with the following features:

- **Tracks training and validation loss** across epochs
- Adds **L1 regularization** during training to encourage smaller weights
- Computes and prints additional metrics: **MAE** and **RMSE**
- Implements **early stopping**: stops training if validation loss doesn't improve after 10 consecutive epochs
- Saves the best-performing model based on lowest validation loss (`best_model.pt`)

In [353]:
def train_model(model, criterion, optimizer):
    train_losses = []
    val_losses = []
    patience = 10           # How many epochs to wait before stopping
    counter = 0             # How many epochs since last improvement
    best_val_loss = float('inf')

    for epoch in range(1, 51):
        model.train()
        running_loss = 0.0

        for inputs, targets in train_loader:
            inputs, targets = inputs.to(device), targets.to(device)
            optimizer.zero_grad()
            outputs = model(inputs)
            loss = criterion(outputs, targets)
            lambda_reg = 1e-5
            loss += l1_regularization(model, lambda_reg)
            loss.backward()
            optimizer.step()
            running_loss += loss.item() * inputs.size(0)

        avg_train_loss = running_loss / len(train_loader.dataset)

        # Validation
        model.eval()
        val_loss = 0.0
        val_preds = []
        val_targets = []

        with torch.no_grad():
            for inputs, targets in val_loader:
                inputs, targets = inputs.to(device), targets.to(device)
                outputs = model(inputs)
                loss = criterion(outputs, targets)
                val_loss += loss.item() * inputs.size(0)

                val_preds.append(outputs.cpu())
                val_targets.append(targets.cpu())

        avg_val_loss = val_loss / len(val_loader.dataset)
        val_preds = torch.cat(val_preds)
        val_targets = torch.cat(val_targets)

        # Calculate additional metrics
        mae = F.l1_loss(val_preds, val_targets).item()
        rmse = torch.sqrt(F.mse_loss(val_preds, val_targets)).item()

        # Save for visualization
        train_losses.append(avg_train_loss)
        val_losses.append(avg_val_loss)

        # Logging
        print(f"Epoch {epoch:02d} | Train Loss: {avg_train_loss:.4f} | Val Loss: {avg_val_loss:.4f} | MAE: {mae:.4f} | RMSE: {rmse:.4f}")

        # Save best model 
        if avg_val_loss < best_val_loss:
            best_val_loss = avg_val_loss
            torch.save(model.state_dict(), "best_model.pt")
            counter = 0  # Reset the early stopping counter
        else:
            counter += 1
            print(f"No improvement in validation loss for {counter} epoch(s).")

            if counter >= patience:
                print("Early stopping triggered!")
                break
        
    return train_losses, val_losses

Below a function to create training plots, where we can see the loss curve of the training and validation sets.

In [354]:
def create_training_plot(train_losses, val_losses, model_name):
    plt.figure(figsize=(10, 5))
    plt.plot(train_losses, label="Train Loss")
    plt.plot(val_losses, label="Validation Loss")
    plt.xlabel("Epoch")
    plt.ylabel("Loss")
    plt.title(f"Training vs Validation Loss {model_name}")
    plt.legend()
    plt.grid(True)
    plt.show()

Let's train the baseline model.

In [355]:
if SKIP_TRAINING == False:
    train_losses, val_losses = train_model(clean_model, criterion, optimizer)
    create_training_plot(train_losses, val_losses, "Clean Model")

Below is an enhanced version of the baseline neural network for abalone age prediction.

Key improvements include:
- **Batch Normalization** after each hidden layer for more stable and faster training
- **LeakyReLU** activation to avoid dying ReLU problems
- **Dropout (0.3)** for regularization to reduce overfitting

The model has:
- An input layer for 10 features
- Two hidden layers (64 → 32 units)
- A single output node for regression (predicting the number of rings)

In [356]:
class ModifiedAbaloneNetwork(nn.Module):
    def __init__(self):
        super(ModifiedAbaloneNetwork, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(10, 64),
            nn.BatchNorm1d(64),
            nn.LeakyReLU(),
            nn.Dropout(0.3),
            
            nn.Linear(64, 32),
            nn.BatchNorm1d(32),
            nn.LeakyReLU(),
            nn.Dropout(0.3),
            
            nn.Linear(32, 1)  # Output layer for regression
        )

    def forward(self, x):
        return self.model(x)

The code below:
- Initializes the `ModifiedAbaloneNetwork` and moves it to the selected device (CPU or GPU)
- Uses **Mean Squared Error (MSELoss)** as the loss function for regression
- Applies the **Adam optimizer** with a learning rate of 0.01 to update the model’s parameters during training

In [357]:
modified_model = ModifiedAbaloneNetwork().to(device)
criterion = nn.MSELoss()
optimizer = optim.Adam(modified_model.parameters(), lr=0.01)

Lets train the enhanced model.

In [358]:
if SKIP_TRAINING == False:
    train_losses, val_losses = train_model(modified_model, criterion, optimizer)
    create_training_plot(train_losses, val_losses, "Modified Model")

The final piece of code below does the following:
- Loads the trained weights from `best_model.pt` into the modified model
- Sets the model to **evaluation mode** to disable dropout and use running stats for batch normalization
- Performs **inference** on the test dataset using `torch.no_grad()` for memory efficiency
- Collects the predicted number of rings and adds them to the original `test.csv` as a new column
- Saves the final predictions to `test_predictions.csv`

In [359]:
# Load model
modified_model.load_state_dict(torch.load("best_model.pt"))
modified_model.eval()

predictions = []

with torch.no_grad():
    for inputs in test_loader:
        inputs = inputs.to(device)
        outputs = modified_model(inputs)
        predictions.extend(outputs.cpu().numpy().flatten())

# Save to a new CSV
test_df = pd.read_csv("Data/test.csv")
test_df["Rings"] = predictions
test_df.to_csv("test_predictions.csv", index=False)
print("Saved test_predictions.csv")

Saved test_predictions.csv


---

### Final Note

For a more detailed explanation of the steps I followed, the decisions I made, and the reasoning behind them, please refer to the accompanying document. It also includes a brief conclusion reflecting on the outcomes of this project.