# Finetuning with Ultralytics YOLO Multi-GPU

This notebook demonstrates fine-tuning the latest Ultralytics YOLO models (YOLO11/YOLOv8) on custom datasets in a distributed manner.

**References:**
- https://docs.ultralytics.com/modes/train/
- https://docs.ultralytics.com/datasets/detect/

**Prerequisites**
- MLR 17.3 LTS - need numpy 2.z compatibility
- Cluster with Multi-GPU
- Cluster started with `scripts/init_script_ultralytics.sh` init script
- **YOLO dataset already prepared** with:
  - `data.yaml` config file in your dataset volume
  - Images in `images/` directory
  - Labels in `labels/` directory (YOLO format: class_id center_x center_y width height) 


In [None]:
%pip install -U mlflow psutil nvidia-ml-py
%restart_python


# Setup and Configure

Initialize all variables needed for training.


In [None]:
import mlflow
import os
from pathlib import Path

# Dataset configuration
ds_catalog = 'brian_ml_dev'
ds_schema = 'image_processing'
dataset_volume = 'coco_dataset'
logging_folder = 'training'

# Paths
dataset_path = f"/Volumes/{ds_catalog}/{ds_schema}/{dataset_volume}"
config_path = f"{dataset_path}/data.yaml"  # YOLO config file (must exist)
training_volume_path = f"/local_disk0/ultralytics_logging_folder"
logging_vol_path = f"/Volumes/{ds_catalog}/{ds_schema}/{logging_folder}"

# MLflow
mlflow_experiment = '/Users/brian.law@databricks.com/brian_yolo_training'

# MLflow connectivity
browser_host = spark.conf.get("spark.databricks.workspaceUrl")
db_host = f"https://{browser_host}"
db_token = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().get()
workspace_id = dbutils.notebook.entry_point.getDbutils().notebook().getContext().tags().apply("orgId")

# YOLO model to start from (yolo11n.pt, yolo11s.pt, yolo11m.pt, yolo11l.pt, yolo11x.pt)
YOLO_MODEL = 'yolo11n.pt'

print(f"Dataset location: {dataset_path}")
print(f"Config file: {config_path}")
print(f"Training outputs: {training_volume_path}")

In [None]:
# Create Databricks widgets for hyperparameters
dbutils.widgets.text("epochs", "2", "Epochs")
dbutils.widgets.text("batch_size", "128", "Batch Size")
dbutils.widgets.text("img_size", "640", "Image Size")
dbutils.widgets.text("initial_lr", "0.005", "Initial Learning Rate")
dbutils.widgets.text("final_lr", "0.1", "Final LR Factor")
dbutils.widgets.text("device_config", "[0,1]", "Device Config (e.g., [0,1])")
dbutils.widgets.text("run_name", "multi_gpu_run", "Run Name")

# Get hyperparameters from widgets
EPOCHS = int(dbutils.widgets.get("epochs"))
BATCH_SIZE = int(dbutils.widgets.get("batch_size"))
IMG_SIZE = int(dbutils.widgets.get("img_size"))
initial_lr = float(dbutils.widgets.get("initial_lr"))
final_lr = float(dbutils.widgets.get("final_lr"))
device_config = eval(dbutils.widgets.get("device_config"))  # Parse list from string
run_name = dbutils.widgets.get("run_name")

print("Training Hyperparameters:")
print(f"  Epochs: {EPOCHS}")
print(f"  Batch Size: {BATCH_SIZE}")
print(f"  Image Size: {IMG_SIZE}")
print(f"  Initial LR: {initial_lr}")
print(f"  Final LR Factor: {final_lr}")
print(f"  Device Config: {device_config}")
print(f"  Run Name: {run_name}")

## Validate Dataset

Verify that the YOLO dataset is properly configured and ready for training.


In [None]:
import yaml
from pathlib import Path

# Verify config file exists
if not Path(config_path).exists():
    raise FileNotFoundError(f"Config file not found: {config_path}\n"
                          f"Please run dataset preparation first.")

# Load and display config
with open(config_path, 'r') as f:
    yolo_config = yaml.safe_load(f)

print("YOLO Dataset Configuration:")
print(f"  Path: {yolo_config['path']}")
print(f"  Train: {yolo_config['train']}")
print(f"  Val: {yolo_config['val']}")
print(f"  Classes: {yolo_config['nc']}")
print(f"  Class names (first 10): {yolo_config['names'][:10]}...")
print(f"\n✓ Dataset configuration validated!")


## Multi-GPU Training with TorchDistributor

For multi-GPU training on Databricks, we use `TorchDistributor` which properly manages distributed processes. This is required because Ultralytics' built-in `device=[0,1]` approach doesn't work in Databricks notebook environments.

**How it works:**
- TorchDistributor spawns one process per GPU
- Each process runs the training function with its own `local_rank`
- NCCL handles communication between GPUs
- Only rank 0 handles MLflow logging


In [None]:
import torch

# Check GPU availability
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    gpu_count = torch.cuda.device_count()
    print(f"Available GPUs: {gpu_count}")
    for i in range(gpu_count):
        print(f"  GPU {i}: {torch.cuda.get_device_name(i)}")
else:
    raise RuntimeError("CUDA not available. This notebook requires GPU.")

In [None]:
# Prepare configuration for training script
import json

# Extract Databricks job run ID (works in job and interactive contexts)
try:
    #job_run_id = dbutils.notebook.entry_point.getDbutils().notebook().getContext().tags().get("runId").get()
    job_run_id = dbutils.notebook.entry_point.getDbutils().notebook().getContext().tags().apply("multitaskParentRunId")
    run_id = job_run_id
except:
    job_run_id = "no_id"
    run_id = None

try:
    #job_run_id = dbutils.notebook.entry_point.getDbutils().notebook().getContext().tags().get("runId").get()
    job_id = dbutils.notebook.entry_point.getDbutils().notebook().getContext().tags().apply("jobId")
except:
    job_id = None


print(f"Job Run ID: {job_run_id}")

config = {
    'model': YOLO_MODEL,
    'data_config': config_path,
    'epochs': EPOCHS,
    'batch_size': BATCH_SIZE,
    'img_size': IMG_SIZE,
    'initial_lr': initial_lr,
    'final_lr': final_lr,
    'run_name': run_name,
    'project_path': training_volume_path,
    'mlflow_experiment': mlflow_experiment,
    'db_host': db_host,
    'db_token': db_token,
    'db_workspace_id': workspace_id,
    'databricks_job_id': job_id,
    'databricks_run_id': run_id,
    'dataset_path': dataset_path,
    'num_classes': yolo_config['nc'],
    'dataset_name': f"{ds_catalog}.{ds_schema}.{dataset_volume}",
    'patience': 50,
    'save_period': 5
}

# Script path for TorchDistributor
train_script_path = "../scripts/train_yolo.py"
config_json = json.dumps(config)

print("\nTraining Configuration:")
for key, value in config.items():
    if 'token' not in key.lower():  # Don't print tokens
        print(f"  {key}: {value}")
print(f"\nTraining script: {train_script_path}")
print(f"Metrics output directory: {logging_vol_path}")

In [None]:
from pyspark.ml.torch.distributor import TorchDistributor
import time

# Set MLflow experiment
mlflow.set_experiment(mlflow_experiment)

# Create TorchDistributor and run training
num_processes = len(device_config) if isinstance(device_config, list) else 1
print(f"\n{'='*60}")
print(f"Starting distributed training with {num_processes} GPUs")
print(f"{'='*60}\n")

# Start MLflow run with system metrics logging
with mlflow.start_run(run_name=run_name, log_system_metrics=True) as run:
    active_run_id = run.info.run_id
    print(f"MLflow Run ID: {active_run_id}\n")
    
    # Log hyperparameters upfront
    mlflow.log_params({
        'model': YOLO_MODEL,
        'epochs': EPOCHS,
        'batch_size': BATCH_SIZE,
        'img_size': IMG_SIZE,
        'initial_lr': initial_lr,
        'final_lr': final_lr,
        'num_gpus': num_processes,
        'device_config': str(device_config),
        'run_name': run_name,
        'dataset_path': dataset_path,
    })
    
    # Create distributor
    distributor = TorchDistributor(
        num_processes=num_processes,
        local_mode=True,  # Single node multi-GPU
        use_gpu=True
    )
    
    # Run distributed training using script
    print("Starting TorchDistributor execution...")
    print(f"  - MLflow Run ID: {active_run_id}")
    print(f"  - Job Run ID: {job_run_id}")
    print(f"  - Metrics output: {logging_vol_path}/metrics_{job_run_id}.json\n")
    
    # Pass all required parameters to training script
    output = distributor.run(train_script_path, active_run_id, config_json, job_run_id, logging_vol_path)

# Display training results
if output:
    print("\nTraining Results:")
    for key, value in output.items():
        print(f"  {key}: {value}")
else:
    print("\nTraining completed (no metrics returned from worker processes)")

print(f"\n{'='*60}")
print("Training complete! All MLflow logging finished.")
print(f"View results at: {mlflow_experiment}")
print(f"{'='*60}")

# Small delay to ensure all background processes fully exit
time.sleep(2)

# Training Complete!

The training function above includes all MLflow integration:

**What's Logged:**
- ✅ **Hyperparameters** - Model config, training settings, dataset info
- ✅ **Dataset Lineage** - Dataset reference with class information  
- ✅ **Training Metrics** - Ultralytics built-in MLflow callback logs metrics automatically
- ✅ **Wrapped Model** - Deployment-ready PyTorch model with proper signature
- ✅ **System Metrics** - GPU/CPU/memory usage during training

**Next Steps:**
- View results in MLflow UI at the experiment path above
- Uncomment `registered_model_name` in the training function to register the model to Unity Catalog
- Use the logged model artifact for deployment or further evaluation