# Tutorial: Hyperparameter Tuning with Ray Tune (Detailed Guide)

This tutorial explores the automated capabilities of `HyperNOs` for the hyperparameter optimization and tuning of models. We use [Ray Tune](https://docs.ray.io/en/latest/tune/index.html) to automatically search for the best model configuration.

> **Prerequisites**: Before getting started, ensure you have set up the environment and downloaded the data. Please refer to the [Installation Guide](README.md#installation) in the main README for details.

## Why is this powerful?
Unlike simple grid searches, this setup allows you to tune:
- **Continuous Parameters**: Learning rates, weight decays (using distributions).
- **Discrete Choices**: Activation functions, Architecture types.
- **System Parameters**: Batch sizes, number of modes.

### Tuning External Libraries
Just like the training pipeline, the tuning pipeline is **model-agnostic**. You can use this exact notebook to find the optimal hyperparameters for:
- An official model from `neuraloperator` like `TFNO`, `CODANO`, `UNO`, `RNO`, `LocalNO`, `OTNO` and many others.
- A model from `deepxde` like `DeepONet`, `MIONet`, `POD-DeepONet`, `POD-MIONet`.
- Your own custom `nn.Module` or one of our already implemented models.

You simply define the parameters relevant to *that* model in the `config_space`.

In [None]:
import os
import sys
import torch
from ray import tune

# Ensure the 'neural_operators' package is in the path
sys.path.append(os.getcwd())
sys.path.append(os.path.join(os.getcwd(), "neural_operators"))

from neural_operators.architectures import FNO
from neural_operators.datasets import NO_load_data_model
from neural_operators.loss_fun import loss_selector
from neural_operators.tune import tune_hyperparameters
from neural_operators.utilities import initialize_hyperparameters
from neural_operators.wrappers import wrap_model_builder

## 1. Baseline Configuration

Before we start tuning, we need a baseline. These values will be used for any parameter that we *don't* explicitly tune. Moreover this configuration will be executed as a first run to get the baseline performance. The obtained result will be used as a reference for the comparison of the tuned models, so we can optimize the optimization process.

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Explicit Defaults (Baseline)
default_params = {
    "training_samples": 128,
    "val_samples": 64,
    "test_samples": 64,
    "learning_rate": 0.001,
    "epochs": 5, 
    "batch_size": 32,
    "weight_decay": 1e-4,
    "beta": 1,
    # ... FNO specific args ...
    "width": 32,
    "modes": 8,
    "n_layers": 2,
    "padding": 10,
    "fno_arc": "Residual",
    "fun_act": "gelu",
    "in_dim": 1,
    "out_dim": 1,
    "fft_norm": None,
    "FourierF": 0,
    "RNN": False,
    "include_grid": 1,
    "weights_norm": "Kaiming",
    "retrain": 4,
    "problem_dim": 2,
    "filename": None
}

## 2. Defining the Search Space

This is the core of the experiment. We define a dictionary where keys are parameter names and values are **distributions**.

- `tune.choice([A, B])`: Randomly pick A or B.
- `tune.uniform(min, max)`: Uniform float sampling.
- `tune.quniform(min, max, q)`: Quantized uniform sampling (good for discrete steps like hidden units).
- `tune.randint(min, max)`: Random integer.

In [None]:
config_space = {
    # --- Optimization Tuning ---
    # Explore learning rates between 0.0001 and 0.01
    "learning_rate": tune.quniform(1e-4, 1e-2, 1e-5),
    # Explore weight decay regularization
    "weight_decay": tune.quniform(1e-6, 1e-3, 1e-6),
    
    # --- Architecture Tuning ---
    # Try different network widths (channel capacity)
    "width": tune.choice([16, 32]),
    # Try different depths
    "n_layers": tune.randint(2, 4),
    # Try different number of Fourier modes
    "modes": tune.choice([8, 12]),
    
    # --- Component Tuning ---
    # Compare Activation functions
    "fun_act": tune.choice(["gelu", "relu"])
}

# Merge strategy: Take defaults, overwrite with search space keys
fixed_params = default_params.copy()
for param in config_space.keys():
    fixed_params.pop(param, None)

config_space.update(fixed_params)

# Start the search with our known 'best' default (Optional but recommended)
default_hyper_params = [default_params]

## 3. The Builder Interface

The `tune_hyperparameters` function works by calling this builder for *every trial* with a new `config` sampled from the space above.

This allows for dynamic graph construction. For example, if `config["n_layers"]` changes, the model effectively grows or shrinks.

In [None]:
which_example = "darcy"

# Just to get the output normalizer
dummy_example = NO_load_data_model(
    which_example=which_example,
    no_architecture={
        "FourierF": default_hyper_params[0]["FourierF"],
        "retrain": default_hyper_params[0]["retrain"],
    },
    batch_size=default_hyper_params[0]["batch_size"],
    training_samples=default_hyper_params[0]["training_samples"],
    filename=default_hyper_params[0].get("filename", None),
)

def builder(config):
    return FNO(
        config["problem_dim"],
        config["in_dim"],
        config["width"],
        config["out_dim"],
        config["n_layers"],
        config["modes"],
        config["fun_act"],
        config["weights_norm"],
        config["fno_arc"],
        config["RNN"],
        config["fft_norm"],
        config["padding"],
        device,
        (
            dummy_example.output_normalizer
            if ("internal_normalization" in config and config["internal_normalization"])
            else None
        ),
        config["retrain"],
    )

model_builder = wrap_model_builder(builder, which_example)

dataset_builder = lambda config: NO_load_data_model(
    which_example=which_example,
    no_architecture={
        "FourierF": config.get("FourierF", 0),
        "retrain": config.get("retrain", 42),
    },
    batch_size=config["batch_size"],
    training_samples=config["training_samples"],
    filename=config.get("filename", None),
)

## 4. Launching the Experiment

We use `tune_hyperparameters` to kick off the Ray session. This will:
1.  Allocate resources (`runs_per_cpu`, `runs_per_gpu`).
2.  Schedule parallel trials.
3.  Log metrics (Loss across epochs).

Results will be saved in `../tests/<experiment_name>`.

In [None]:
loss_fn_str = "L2"

loss_fn = loss_selector(
    loss_fn_str=loss_fn_str,
    problem_dim=config_space["problem_dim"],
    beta=config_space["beta"],
)

tune_hyperparameters(
    config_space,
    model_builder,
    dataset_builder,
    loss_fn,
    default_hyper_params,
    # --- Resource Allocation ---
    # Adjust these based on your machine's logical cores and GPUs
    # If float, can run multiple trials per device (e.g. 0.5 = 2 runs per GPU)
    runs_per_cpu=5.0, # number of logical CPU's cores allocated per trial
    runs_per_gpu=1.0, # fraction of GPU allocated per trial
    
    # --- Tuning Budget ---
    # Total number of trials (combinations) to sample from the config_space
    num_samples=20,
    
    # Maximum training epochs per trial
    max_epochs=5,
    
    # --- Early Stopping (ASHA Scheduler) ---
    # Grace period: Minimum epochs to run before considering stopping a trial
    grace_period=2,
    
    # Reduction factor: Quantifies how aggressive the pruning is.
    # E.g., 4 means only top 1/4 of trials are kept after each run.
    reduction_factor=4,
    
    # --- Checkpointing ---
    # Frequency (in epochs) to save model state
    checkpoint_freq=10,
)