## Hyperparamter Optimization 


### Random Search

This section performs a randomized hyperparameter search within a predefined search space to optimize a `GATv2EdgePredictor` model for edge attribute regression.

For each trial, a random configuration is sampled. A model is then instantiated using this configuration. It is trained on the training set for up to `num_epochs`, using the Adam optimizer and Mean Squared Error (MSE) loss. The prediction task targets the specific edge attribute `tracks` (`target_idx = 4`).

**Early Stopping:** During training, the model is evaluated on the validation set every 50 epochs. If the validation loss does not improve for a defined number of evaluations (`early_stopping_patience`), training is stopped early to prevent overfitting.

**Output:** After all trials are completed, the best-performing model (based on validation loss) and its corresponding hyperparameter configuration are saved:
- Model weights: `hpo_models/best_model_overall.pth`
- Configuration file: `hpo_models/best_config_overall.json`

The `hpo_models/` directory is automatically created if it does not already exist, to store the results. Random seeds are fixed using `random.seed(42)` and `torch.manual_seed(42)` to ensure reproducibility across runs.


In [None]:
import os
import random
import json
import torch
import sys
import import_ipynb
import joblib

src_path = os.path.abspath(os.path.join(os.getcwd(), "..",  ".."))
if src_path not in sys.path:
    sys.path.append(src_path)

from gatv2 import GATv2EdgePredictor
from sklearn.metrics import mean_squared_error, root_mean_squared_error

def perform_hpo(train_data, val_data, search_space, num_trials=3, num_epochs=1000, early_stopping_patience=1, min_delta=0.01):
    """
    Performs random hyperparameter optimization for a GATv2 edge predictor model.

    Parameters:
    -----------
    train_data : torch_geometric.data.Data
        The training graph data.
    val_data : torch_geometric.data.Data
        The validation graph data.
    search_space : dict
        Dictionary specifying possible values for hyperparameters (lr, hidden_channels, etc.).
    num_trials : int, optional (default=3)
        Number of random hyperparameter configurations to try.
    num_epochs : int, optional (default=1000)
        Maximum number of training epochs per trial.
    early_stopping_patience : int, optional (default=1)
        Number of evaluations with no improvement before early stopping.
    min_delta : float, optional (default=0.01)
        Minimum change in the monitored quantity to qualify as an improvement. 

    Returns:
    --------
    None
        Saves the best model and configuration to disk as 'best_model_overall.pth' and 'best_config_overall.json'.
    """

    # Set random seeds for reproducibility
    torch.manual_seed(42)
    random.seed(42)

    # Create directory for saving models and configs
    os.makedirs("hpo_models", exist_ok=True)

    best_overall_val_loss = float("inf")
    best_config = None

    # Begin hyperparameter search loop
    for trial in range(num_trials):
        # Randomly sample a configuration from the search space
        config = {
            "lr": random.choice(search_space["lr"]),
            "hidden_channels": random.choice(search_space["hidden_channels"]),
            "out_channels": random.choice(search_space["out_channels"]),
            "heads": random.choice(search_space["heads"]),
        }

        print(f"\n Trial {trial+1}/{num_trials} | Config: {config}")

        # Instantiate model with current configuration
        model = GATv2EdgePredictor(
            in_channels=train_data.num_node_features,
            hidden_channels=config["hidden_channels"],
            out_channels=config["out_channels"],
            edge_dim=train_data.edge_attr.shape[1],
            heads=config["heads"]
        )

        optimizer = torch.optim.Adam(model.parameters(), lr=config["lr"])
        criterion = torch.nn.MSELoss()

        best_val_loss = float("inf")
        epochs_without_improvement = 0

        # Training loop
        for epoch in range(1, num_epochs + 1):
            model.train()
            optimizer.zero_grad()

            pred = model(train_data)
            target = train_data.y
            loss = criterion(pred, target)
            loss.backward()
            optimizer.step()

            # Print training loss periodically
            if epoch % 10 == 0 or epoch == 1:
                print(f"Epoch {epoch:03d} | Training Loss: {loss.item():.4f}")

            # === Evaluate on validation set every 50 epochs ===
            if epoch % 50 == 0:
                model.eval()
                with torch.no_grad():
                    val_pred = model(val_data)
                    val_target = val_data.y

                    # Compute MSE on validation set using scaled values
                    val_mse = mean_squared_error(val_target.cpu().numpy(), val_pred.cpu().numpy())

                    # Compute RMSE in original scale
                    y_scaler = joblib.load(os.path.join("..", "..", "utils", "helper_functions", "scalers", "target_scaler.pkl"))
                    val_pred_orig = y_scaler.inverse_transform(val_pred.cpu().numpy())
                    val_target_orig = y_scaler.inverse_transform(val_target.cpu().numpy())
                    val_rmse = root_mean_squared_error(val_target_orig, val_pred_orig)

                print(f"Epoch {epoch:03d} | Validation Loss: {val_mse:.4f} || RMSE (original scale): {val_rmse:.2f}")

                # Save the model if it improves the best validation RMSE
                if best_val_loss - val_rmse > min_delta:
                    best_val_loss = val_rmse
                    epochs_without_improvement = 0
                else:
                    epochs_without_improvement += 1

                # Apply early stopping if no improvement
                if epochs_without_improvement >= early_stopping_patience:
                    print(f"Early stopping at epoch {epoch}")
                    break

        print(f"Trial {trial+1} complete | Best Validation RMSE: {best_val_loss:.2f}")

        # Update global best model and config if current trial is better
        if best_val_loss < best_overall_val_loss:
            best_overall_val_loss = best_val_loss
            best_config = {
                **config,               # original hyperparameters
                "trial": trial + 1,     # current trial number (1-based)
                "val_loss": best_val_loss
            }
            torch.save(model.state_dict(), "hpo_models/best_model_overall.pth")
            with open("hpo_models/best_config_overall.json", "w") as f:
                json.dump(best_config, f, indent=2)  # nicely formatted for readability

        # Clean up memory
        del model, optimizer, pred, target, val_pred, val_target, loss, val_loss
        import gc
        gc.collect()

    print("\nRandom search completed.")
    print(f"Best configuration: {best_config}")
    print(f"Best validation loss: {best_overall_val_loss:.4f}")


### Defining the Hyperparameter Search Space and Training Settings

The hyperparameter search space is defined as a set of possible values for several key hyperparameters, including:
- the learning rate (`lr`),
- the number of hidden units in intermediate layers (`hidden_channels`),
- the size of the output features (`out_channels`),
- and the number of attention heads (`heads`) used in the GATv2 model.

This search space can be expanded or modified depending on the model's requirements. For example:
- **Optimizer**: We currently use Adam, but alternatives like SGD or AdamW could also be tested.
- **Loss function**: MSELoss is used here, but other functions such as L1Loss could be considered.
- **Dropout**: This could be applied within GATv2 or in the edge-level MLP, although it is not used in the current implementation.
- **Batch size**: Currently, training is performed on the entire graph (no mini-batching). However, support for mini-batching could be implemented.

In addition to defining the search space, the training and validation settings are specified as follows:
- The number of hyperparameter optimization (HPO) trials (`num_trials`) is set to 20.
- Each model is trained for a maximum of 1000 epochs (`num_epochs`).
- Early stopping is applied with a patience of X (`early_stopping_patience`), which corresponds to X × 50 training epochs without improvement in the validation loss (since evaluation occurs every 50 epochs).
- The minimum change required for a validation loss to be considered an improvement is defined by `min_delta` (set to 0.01). 

In [23]:
# Define hyperparameter search space (can be extended or modified)
search_space = {
    "lr": [1e-1, 1e-2, 1e-3],                    # Learning rates (optional: 1e-4)
    "hidden_channels": [8, 16, 32],              # Hidden layer sizes (optional: 64)
    "out_channels": [8, 16],                     # Output feature sizes from GNN (optional: 32)
    "heads": [1, 2],                             # Number of attention heads in GATv2 (optional: 4)
    # "dropout": [0.0, 0.1, 0.3, 0.5],           # Dropout
}

train_data = torch.load("../../../data/data_splits/train_data.pt", weights_only=False)
val_data = torch.load("../../../data/data_splits/val_data.pt", weights_only=False)

# Define number of HPO trials and training settings
perform_hpo(train_data, val_data, search_space, num_trials=3, num_epochs=1000, early_stopping_patience=3, min_delta = 0.1)



 Trial 1/3 | Config: {'lr': 0.001, 'hidden_channels': 8, 'out_channels': 8, 'heads': 2}
Epoch 001 | Training Loss: 1.0453
Epoch 010 | Training Loss: 1.0133
Epoch 020 | Training Loss: 0.9848
Epoch 030 | Training Loss: 0.9630
Epoch 040 | Training Loss: 0.9473
Epoch 050 | Training Loss: 0.9350
Epoch 050 | Validation Loss (MSE): 0.9282 || RMSE (original scale): 28.72
Epoch 060 | Training Loss: 0.9253


KeyboardInterrupt: 