# ML Hyperparameter Tuning - Random Forest Grid Search

This notebook demonstrates hyperparameter tuning for Random Forest using `ml_tune_grid`, which performs exhaustive grid search over all parameter combinations.

Random Forest is a fast classical ML algorithm, making it ideal for quick hyperparameter tuning experiments.

Results are written to `tuning_results.json`.

In [1]:
import openeo # type: ignore

In [2]:
connection = openeo.connect(url="http://127.0.0.1:8000")
connection.authenticate_basic("brian", "123456")

<Connection to 'http://127.0.0.1:8000/' with BasicBearerAuth>

In [3]:
training_set = "https://github.com/e-sensing/sitsdata/raw/main/data/samples_deforestation_rondonia.rds"

In [4]:
# Define parameter grid for Random Forest
# num_trees: number of trees in the forest
# max_variables: number of variables randomly sampled as candidates at each split
#   Note: For tuning, use numeric values. The model default uses "sqrt" which
#   is calculated based on feature count, but for grid search we use fixed values.
param_grid = {
    "num_trees": [50, 100, 200],
    "max_variables": [10, 20, 30]  # Numeric values for grid search
}

In [5]:
process_graph = {
    "rf1": {
        "process_id": "mlm_class_random_forest",
        "arguments": {
            "num_trees": 100,
            "seed": 42
        },
    },
    "tune1": {
        "process_id": "ml_tune_grid",
        "arguments": {
            "model": {"from_node": "rf1"},
            "training_data": training_set,
            "target": "label",
            "parameters": param_grid,
            "scoring": "accuracy",
            "cv": 0,  # Use hold-out split for faster execution
            "seed": 42
        },
        "result": True,
    },
}

job = connection.create_job(
    process_graph=process_graph,
    title="Random Forest hyperparameter tuning (grid search)",
    description="Exhaustive grid search for Random Forest hyperparameters",
)
job.start_and_wait()
results = job.get_results()

0:00:00 Job 'dd78898df8d1cc9025756843a123fe48': send 'start'
0:00:01 Job 'dd78898df8d1cc9025756843a123fe48': running (progress N/A)
0:00:06 Job 'dd78898df8d1cc9025756843a123fe48': running (progress N/A)
0:00:13 Job 'dd78898df8d1cc9025756843a123fe48': running (progress N/A)
0:00:20 Job 'dd78898df8d1cc9025756843a123fe48': running (progress N/A)
0:00:30 Job 'dd78898df8d1cc9025756843a123fe48': running (progress N/A)
0:00:42 Job 'dd78898df8d1cc9025756843a123fe48': running (progress N/A)
0:00:58 Job 'dd78898df8d1cc9025756843a123fe48': running (progress N/A)
0:01:17 Job 'dd78898df8d1cc9025756843a123fe48': running (progress N/A)
0:01:41 Job 'dd78898df8d1cc9025756843a123fe48': running (progress N/A)
0:02:10 Job 'dd78898df8d1cc9025756843a123fe48': running (progress N/A)
0:02:48 Job 'dd78898df8d1cc9025756843a123fe48': running (progress N/A)
0:03:34 Job 'dd78898df8d1cc9025756843a123fe48': finished (progress N/A)


In [6]:
results.download_files("data/outputs_rf_grid/")

[PosixPath('data/outputs_rf_grid/tuning_results'),
 PosixPath('data/outputs_rf_grid/job-results.json')]

## View Results

Load and display the tuning results.

In [7]:
import json
from pathlib import Path

# Load tuning results
tuning_results_file = Path("data/outputs_rf_grid/tuning_results")
if tuning_results_file.exists():
    with open(tuning_results_file) as f:
        tuning_results = json.load(f)
    
    print(f"Total parameter combinations evaluated: {len(tuning_results)}")
    print("\nAll Results:")
    print(json.dumps(tuning_results, indent=2))
    
    # Find best result
    best = max(tuning_results, key=lambda x: x.get("metric", 0))
    print("\nBest Parameters:")
    print(json.dumps(best, indent=2))
else:
    print(f"Tuning results file not found: {tuning_results_file}")

Total parameter combinations evaluated: 9

All Results:
[
  {
    "max_variables": 10,
    "num_trees": 50,
    "metric": 0.8843594009983361
  },
  {
    "max_variables": 20,
    "num_trees": 50,
    "metric": 0.8910149750415973
  },
  {
    "max_variables": 30,
    "num_trees": 50,
    "metric": 0.8976705490848585
  },
  {
    "max_variables": 10,
    "num_trees": 100,
    "metric": 0.8851913477537438
  },
  {
    "max_variables": 20,
    "num_trees": 100,
    "metric": 0.8968386023294509
  },
  {
    "max_variables": 30,
    "num_trees": 100,
    "metric": 0.9051580698835274
  },
  {
    "max_variables": 10,
    "num_trees": 200,
    "metric": 0.8718801996672213
  },
  {
    "max_variables": 20,
    "num_trees": 200,
    "metric": 0.8768718801996672
  },
  {
    "max_variables": 30,
    "num_trees": 200,
    "metric": 0.8943427620632279
  }
]

Best Parameters:
{
  "max_variables": 30,
  "num_trees": 100,
  "metric": 0.9051580698835274
}
