## Task 2 — Modeling and Tuning (40 points)

In this task, we develop machine learning models to classify univariate ECG time series into one of four classes: normal, atrial fibrillation (AF), other rhythms, and noisy.

The modeling pipeline involves the following key components:
- Designing at least two different model architectures
- Training each model on the training set
- Tuning hyperparameters such as learning rate, number of channels, optimizer, etc.
- Evaluating performance using appropriate metrics (e.g., accuracy, macro F1-score)
- Comparing model performance on the validation set
- Generating predictions on the test data with the best-performing model

We begin with the baseline model suggested in the exercise description:  
A pipeline consisting of STFT → Conv2D → RNN → FC, which first transforms the ECG signals into the frequency domain, processes them with convolutional and recurrent layers, and finally outputs class probabilities through a fully connected layer.

This baseline provides a strong foundation that leverages both frequency and temporal information in the ECG signals.


### ECGDataset: Custom PyTorch Dataset for ECG Signals

We define a custom ECGDataset class to load and serve raw ECG signals for model training. This class inherits from torch.utils.data.Dataset and supports:

- Index-based selection for validation splits
- Conversion of variable-length 1d ECG signals (as numpy arrays) into tensor format
- Association of each signal with its corresponding class label

Since the ECG time series are univariate and vary in length, we keep each sample as an individual 1d tensor rather than padding them in the dataset. Padding will instead be handled dynamically at the batch level using prep_batch.

### Training and Evaluation Functions

We define two utility functions to encapsulate the training and evaluation logic for our model.

train_one_epoch(model, dataloader, optimizer, loss_fn, device)

This function performs one full training pass over the data:
- Puts the model in training mode
- Iterates over batches of data
- Computes the forward pass and loss
- Performs backpropagation and optimizer updates
- Tracks predictions and computes accuracy and macro F1-score at the end

It returns the average loss, accuracy, and F1-score for the epoch.

evaluate(model, dataloader, loss_fn, device)

This function evaluates the model on the validation set:
- Runs in no-grad mode to avoid gradient tracking
- Computes predictions and loss
- Aggregates performance metrics (accuracy and macro F1-score)

This separation ensures clean logging and enables early stopping or validation monitoring during training.

In [3]:
import sys
import os

project_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
if project_root not in sys.path:
    sys.path.append(project_root)

In [4]:
import torch
import pandas as pd
import numpy as np
import data
from torch.utils.data import DataLoader
from src.parser import read_zip_binary
from src.train import train_model
from src.train_utils import train_one_epoch, evaluate
from src.ecg_dataset import ECGDataset, prep_batch
from src.stft_baseline import BaselineSTFTModel
from sklearn.utils.class_weight import compute_class_weight

# load training data
X_train = read_zip_binary("../data/X_train.zip")

# load training labels
y_train = pd.read_csv("../data/y_train.csv", header=None)
y_train.columns = ["y"]

# load split index
train_idx = np.load("../data/train_idx.npy")
val_idx = np.load("../data/val_idx.npy")

# dataloader
train_dataset = ECGDataset(X_train, y_train, indices=train_idx)
val_dataset = ECGDataset(X_train, y_train, indices=val_idx)
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True, collate_fn=prep_batch)
val_loader = DataLoader(val_dataset, batch_size=16, shuffle=False, collate_fn=prep_batch)

device = "cpu"

# models is baseline stft model right now
model = BaselineSTFTModel().to(device)

# loss and optimizer

# weights to avoid one class collapse
weights = compute_class_weight(class_weight="balanced", classes=np.array([0, 1, 2, 3]), y=y_train["y"])
weights = torch.tensor(weights, dtype=torch.float32).to(device)
loss_fn = torch.nn.CrossEntropyLoss(weight=weights)

# careful with the learning rate
optimizer = torch.optim.Adam(
    model.parameters(),
    lr=5e-4,
    weight_decay=1e-2  # L2 regularization to prevent overfitting
)

# train model
model = train_model(model, train_loader, val_loader, optimizer, loss_fn, device, num_epochs=40)


Unique predictions: (array([0, 2]), array([962, 274]))
Current learning rate: 0.0005
Epoch 01 | Time: 27.6s
  Train Loss: 1.3792 | Acc: 0.3953 | F1: 0.2510
  Val   Loss: 1.3848 | Acc: 0.5437 | F1: 0.2450
Unique predictions: (array([0, 1, 2, 3]), array([259,   1, 958,  18]))
Current learning rate: 0.0005
Epoch 02 | Time: 27.2s
  Train Loss: 1.3588 | Acc: 0.4758 | F1: 0.2606
  Val   Loss: 1.3623 | Acc: 0.3511 | F1: 0.2121
Unique predictions: (array([0, 1, 2]), array([1104,   40,   92]))
Current learning rate: 0.0005
Epoch 03 | Time: 27.3s
  Train Loss: 1.3444 | Acc: 0.5112 | F1: 0.2684
  Val   Loss: 1.3216 | Acc: 0.5906 | F1: 0.2753
Unique predictions: (array([0, 1, 2, 3]), array([162, 953, 120,   1]))
Current learning rate: 0.0005
Epoch 04 | Time: 27.2s
  Train Loss: 1.3419 | Acc: 0.4653 | F1: 0.2590
  Val   Loss: 1.3226 | Acc: 0.2015 | F1: 0.1635
Unique predictions: (array([0, 2, 3]), array([962, 272,   2]))
Current learning rate: 0.0005
Epoch 05 | Time: 27.1s
  Train Loss: 1.3299 | Ac