# LG-CoTrain: Per-Experiment Optuna Hyperparameter Tuning

This notebook runs **120 separate Optuna studies** — one for each
(event, budget, seed_set) combination — to find experiment-specific
optimal hyperparameters.

### Why per-experiment tuning?

- Different disaster events and budget levels may benefit from different
  hyperparameters (e.g., low-budget experiments may need different LR or patience).
- Each study optimizes `dev_macro_f1` — **no test-set leakage**.
- Results are saved as JSON files for easy inspection and reuse.

### Search space (6 hyperparameters)

| Parameter | Type | Range | Default |
|-----------|------|-------|--------|
| `lr` | Float (log) | 1e-5 to 1e-3 | 2e-5 |
| `batch_size` | Categorical | [8, 16, 32, 64] | 32 |
| `cotrain_epochs` | Integer | 5 to 20 | 10 |
| `finetune_patience` | Integer | 4 to 10 | 5 |
| `weight_decay` | Float | 0.0 to 0.1 | 0.01 |
| `warmup_ratio` | Float | 0.0 to 0.3 | 0.1 |

### Workflow

1. Run this notebook to find 120 sets of optimal hyperparameters
2. Use notebook 08 to run final experiments with the optimized hyperparameters
3. Compare against run-3 (default hyperparameters)

### Incremental scaling

Results are stored under `trials_{n}/` subfolders. Running with a higher
`N_TRIALS` automatically continues from previous results:

- First run with `N_TRIALS=10` → saves to `trials_10/`
- Later run with `N_TRIALS=20` → loads 10 previous trials, runs 10 new, saves to `trials_20/`
- Both `trials_10/` and `trials_20/` coexist — no data is overwritten

### Resume support

Studies whose `trials_{N_TRIALS}/best_params.json` already exists are
automatically skipped. You can interrupt and restart this notebook safely.

In [1]:
import importlib
import sys
import time
from pathlib import Path


def _find_repo_root(marker: str = "lg_cotrain") -> Path:
    for candidate in [Path().resolve()] + list(Path().resolve().parents):
        if (candidate / marker).is_dir():
            return candidate
    raise RuntimeError(
        f"Cannot find repo root: no ancestor directory contains '{marker}/'. "
        "Run the notebook from inside the repository."
    )


repo_root = _find_repo_root()
if str(repo_root) not in sys.path:
    sys.path.insert(0, str(repo_root))

import lg_cotrain.optuna_per_experiment
importlib.reload(lg_cotrain.optuna_per_experiment)
from lg_cotrain.optuna_per_experiment import (
    ALL_EVENTS, BUDGETS, SEED_SETS,
    run_all_studies, load_best_params,
)

print(f"Repo root: {repo_root}")
print(f"Events ({len(ALL_EVENTS)}): {ALL_EVENTS}")
print(f"Budgets: {BUDGETS}")
print(f"Seed sets: {SEED_SETS}")
print(f"Total studies: {len(ALL_EVENTS) * len(BUDGETS) * len(SEED_SETS)}")

Repo root: D:\Workspace\Co-Training
Events (10): ['california_wildfires_2018', 'canada_wildfires_2016', 'cyclone_idai_2019', 'hurricane_dorian_2019', 'hurricane_florence_2018', 'hurricane_harvey_2017', 'hurricane_irma_2017', 'hurricane_maria_2017', 'kaikoura_earthquake_2016', 'kerala_floods_2018']
Budgets: [5, 10, 25, 50]
Seed sets: [1, 2, 3]
Total studies: 120


## 1. Configuration

- **`N_TRIALS`**: Number of Optuna trials per study. More trials = better
  hyperparameters but longer runtime. Start with 10-15.
- **`NUM_GPUS`**: Number of GPUs for parallel study execution. Each study
  runs on one GPU; multiple studies run simultaneously.
- **`STORAGE_DIR`**: Where to save results. Each trial count gets its own
  subfolder (`trials_{N_TRIALS}/`), so increasing `N_TRIALS` later won't
  overwrite previous results.

### Runtime estimate

- ~7 min per pipeline run x N_TRIALS per study x 120 studies
- With 2 GPUs: N_TRIALS=15 -> ~105 hours (~4.4 days)
- Incremental: if you previously ran with N_TRIALS=10, running N_TRIALS=20
  only executes 10 new trials per study (saves ~50% time)

In [2]:
# ---- Tuning Configuration ----

N_TRIALS  = 10        # Optuna trials per study (incremental: continues from previous runs)
NUM_GPUS  = 2         # Number of GPUs for parallel execution

DATA_ROOT    = str(repo_root / "data")
STORAGE_DIR  = str(repo_root / "results" / "optuna" / "per_experiment")

PSEUDO_LABEL_SOURCE = "gpt-4o"

# Optionally restrict to a subset (set to None for all)
EVENTS    = None   # or e.g. ["hurricane_harvey_2017", "kerala_floods_2018"]
BUDGETS_  = None   # or e.g. [5, 50]
SEEDS_    = None   # or e.g. [1]

events_to_use  = EVENTS or ALL_EVENTS
budgets_to_use = BUDGETS_ or BUDGETS
seeds_to_use   = SEEDS_ or SEED_SETS
total_studies   = len(events_to_use) * len(budgets_to_use) * len(seeds_to_use)
total_trials    = total_studies * N_TRIALS

est_hours = total_trials * 7 / 60 / max(NUM_GPUS, 1)

print(f"Studies        : {total_studies}")
print(f"Trials/study   : {N_TRIALS}")
print(f"Total trials   : {total_trials} (max — fewer if continuing from previous runs)")
print(f"GPUs           : {NUM_GPUS}")
print(f"Storage dir    : {STORAGE_DIR}")
print(f"Results folder : trials_{N_TRIALS}/")
print(f"Est. runtime   : ~{est_hours:.0f} hours ({est_hours/24:.1f} days) from scratch")
print()
print("Search space:")
print("  lr               : 1e-5 to 1e-3  (log-uniform)")
print("  batch_size       : [8, 16, 32, 64]")
print("  cotrain_epochs   : 5 to 20")
print("  finetune_patience: 4 to 10")
print("  weight_decay     : 0.0 to 0.1")
print("  warmup_ratio     : 0.0 to 0.3")

Studies        : 120
Trials/study   : 10
Total trials   : 1200 (max — fewer if continuing from previous runs)
GPUs           : 2
Storage dir    : D:\Workspace\Co-Training\results\optuna\per_experiment
Results folder : trials_10/
Est. runtime   : ~70 hours (2.9 days) from scratch

Search space:
  lr               : 1e-5 to 1e-3  (log-uniform)
  batch_size       : [8, 16, 32, 64]
  cotrain_epochs   : 5 to 20
  finetune_patience: 4 to 10
  weight_decay     : 0.0 to 0.1
  warmup_ratio     : 0.0 to 0.3


## 2. Run All Optuna Studies

This cell launches all studies. With `NUM_GPUS > 1`, studies run in parallel
across GPUs using `ProcessPoolExecutor` with spawn context.

**Progress tracking**: After each study completes, you'll see the event,
budget, seed, status, and best dev F1 found.

**Incremental**: If previous trials exist (e.g., `trials_10/`), they are
replayed into the TPE sampler and only the remaining trials execute.

**Resume**: If `trials_{N_TRIALS}/best_params.json` already exists for a
study, that study is skipped entirely.

In [None]:
class StudyProgressTracker:
    """Track progress across all Optuna studies."""

    def __init__(self, total_studies: int, start_time: float):
        self.total = total_studies
        self.done = 0
        self.start_time = start_time

    def on_study_done(self, event, budget, seed_set, status):
        self.done += 1
        elapsed = time.time() - self.start_time
        elapsed_h = elapsed / 3600
        if self.done > 0:
            avg_per_study = elapsed / self.done
            remaining = (self.total - self.done) * avg_per_study
            eta_h = remaining / 3600
        else:
            eta_h = 0
        print(
            f"  [{self.done}/{self.total}] {event} b={budget} s={seed_set}"
            f" -> {status} | Elapsed: {elapsed_h:.2f}h | ETA: {eta_h:.2f}h"
        )


start_time = time.time()
tracker = StudyProgressTracker(total_studies, start_time)

all_results = run_all_studies(
    events=events_to_use,
    budgets=budgets_to_use,
    seed_sets=seeds_to_use,
    n_trials=N_TRIALS,
    num_gpus=NUM_GPUS,
    storage_dir=STORAGE_DIR,
    data_root=DATA_ROOT,
    pseudo_label_source=PSEUDO_LABEL_SOURCE,
    on_study_done=tracker.on_study_done,
)

elapsed = time.time() - start_time
print(f"\nTotal time: {elapsed / 3600:.2f}h ({elapsed / 60:.1f}min)")

  california_wildfires_2018 budget=5 seed=1 -- SKIPPED (trials_10 exists)
  [1/120] california_wildfires_2018 b=5 s=1 -> skipped | Elapsed: 0.00h | ETA: 0.00h
  california_wildfires_2018 budget=5 seed=2 -- SKIPPED (trials_10 exists)
  [2/120] california_wildfires_2018 b=5 s=2 -> skipped | Elapsed: 0.00h | ETA: 0.00h

Optuna per-experiment: 120 total, 2 skipped, 118 pending
Running 118 Optuna studies in parallel across 2 GPUs...


## 3. Results Overview

Load the summary and display a table of all 120 best hyperparameters.

In [None]:
import json
from pathlib import Path

# Load summary for the current N_TRIALS
summary_path = Path(STORAGE_DIR) / f"summary_{N_TRIALS}.json"
with open(summary_path) as f:
    summary = json.load(f)

print(f"Total studies : {summary['total_studies']}")
print(f"Completed     : {summary['completed']}")
print(f"Failed        : {summary['failed']}")
print(f"Trials/study  : {summary['n_trials_per_study']}")
print()

# Show best results table
done_studies = [s for s in summary['studies'] if s['status'] == 'done']

print(f"{'Event':>30}  {'Budget':>6}  {'Seed':>4}  {'Best Dev F1':>11}  "
      f"{'LR':>10}  {'Batch':>5}  {'CoEp':>4}  {'Pat':>3}  {'WD':>6}  {'WR':>5}")
print("-" * 105)

for s in done_studies:
    bp = s['best_params']
    if bp is None:
        continue
    print(
        f"{s['event']:>30}  {s['budget']:>6}  {s['seed_set']:>4}  "
        f"{s['best_value']:>11.4f}  "
        f"{bp.get('lr', 0):>10.2e}  "
        f"{bp.get('batch_size', 0):>5}  "
        f"{bp.get('cotrain_epochs', 0):>4}  "
        f"{bp.get('finetune_patience', 0):>3}  "
        f"{bp.get('weight_decay', 0):>6.4f}  "
        f"{bp.get('warmup_ratio', 0):>5.3f}"
    )

# Aggregate stats
if done_studies:
    values = [s['best_value'] for s in done_studies if s['best_value'] is not None]
    import statistics
    print(f"\nBest dev F1 across all studies:")
    print(f"  Mean: {statistics.mean(values):.4f}")
    print(f"  Std:  {statistics.stdev(values):.4f}" if len(values) > 1 else "")
    print(f"  Min:  {min(values):.4f}")
    print(f"  Max:  {max(values):.4f}")

## 4. Visualizations

Box plots of best dev F1 by event and budget, and parameter distribution
analysis across all 120 studies.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

done_studies = [s for s in summary['studies'] if s['status'] == 'done' and s['best_value'] is not None]

# --- Box plot: Best dev F1 by event ---
fig, axes = plt.subplots(1, 2, figsize=(16, 6))

# Group by event
event_groups = {}
for s in done_studies:
    event_groups.setdefault(s['event'], []).append(s['best_value'])

events_sorted = sorted(event_groups.keys())
data_by_event = [event_groups[e] for e in events_sorted]
labels_event = [e.replace('_', '\n') for e in events_sorted]

axes[0].boxplot(data_by_event, labels=labels_event)
axes[0].set_ylabel('Best dev macro-F1')
axes[0].set_title('Best Dev F1 by Event')
axes[0].tick_params(axis='x', rotation=45, labelsize=8)
axes[0].grid(True, alpha=0.3, axis='y')

# Group by budget
budget_groups = {}
for s in done_studies:
    budget_groups.setdefault(s['budget'], []).append(s['best_value'])

budgets_sorted = sorted(budget_groups.keys())
data_by_budget = [budget_groups[b] for b in budgets_sorted]

axes[1].boxplot(data_by_budget, labels=[str(b) for b in budgets_sorted])
axes[1].set_xlabel('Budget')
axes[1].set_ylabel('Best dev macro-F1')
axes[1].set_title('Best Dev F1 by Budget')
axes[1].grid(True, alpha=0.3, axis='y')

plt.suptitle('Per-Experiment Optuna: Best Dev F1 Distribution', fontsize=13)
plt.tight_layout()
plt.show()

# --- Parameter distributions ---
params = ['lr', 'batch_size', 'cotrain_epochs', 'finetune_patience', 'weight_decay', 'warmup_ratio']
fig, axes = plt.subplots(2, 3, figsize=(15, 8))

for ax, param in zip(axes.flat, params):
    values = [s['best_params'][param] for s in done_studies if param in s['best_params']]
    if not values:
        continue

    if param == 'lr':
        log_values = [np.log10(v) for v in values]
        ax.hist(log_values, bins=15, alpha=0.7, color='tab:blue', edgecolor='white')
        ax.set_xlabel(f'{param} (log10)')
    elif param == 'batch_size':
        from collections import Counter
        counts = Counter(values)
        categories = [8, 16, 32, 64]
        bar_counts = [counts.get(c, 0) for c in categories]
        ax.bar([str(c) for c in categories], bar_counts, alpha=0.7,
               color='tab:blue', edgecolor='white')
        ax.set_xlabel(param)
    else:
        ax.hist(values, bins=15, alpha=0.7, color='tab:blue', edgecolor='white')
        ax.set_xlabel(param)

    ax.set_ylabel('Count')
    ax.set_title(f'Best {param} across studies')
    ax.grid(True, alpha=0.3, axis='y')

plt.suptitle('Optimal Hyperparameter Distributions (120 Studies)', fontsize=13)
plt.tight_layout()
plt.show()

## 5. Next Steps

Use the optimized hyperparameters to run the final 120 experiments.
The `load_best_params()` function loads all `best_params.json` files
into a dict keyed by `(event, budget, seed_set)`.

### CLI equivalent

```bash
# Run all 120 Optuna studies with 10 trials each
python -m lg_cotrain.optuna_per_experiment --n-trials 10 --num-gpus 2

# Later, scale to 20 trials (continues from 10, only 10 new trials per study)
python -m lg_cotrain.optuna_per_experiment --n-trials 20 --num-gpus 2

# Run a subset
python -m lg_cotrain.optuna_per_experiment --n-trials 15 --num-gpus 2 \
    --events hurricane_harvey_2017 kerala_floods_2018 --budgets 50 --seed-sets 1
```

### Using optimized hyperparameters for final experiments

```python
from lg_cotrain.optuna_per_experiment import load_best_params

# Load latest results (highest trial count)
best = load_best_params("results/optuna/per_experiment")
params = best[("hurricane_harvey_2017", 50, 1)]["best_params"]
# params = {"lr": 0.0003, "batch_size": 16, ...}

# Or load from a specific trial count
best_10 = load_best_params("results/optuna/per_experiment", n_trials=10)
```

### Storage structure

```
results/optuna/per_experiment/
  {event}/
    {budget}_set{seed_set}/
      trials_10/best_params.json   # 10-trial results
      trials_20/best_params.json   # 20-trial results (all 20 trials)
  summary_10.json
  summary_20.json
```