# Finetuning with Ultralytics YOLO

This notebook demonstrates fine-tuning the latest Ultralytics YOLO models (YOLO11/YOLOv8) on custom datasets using single GPU training.

**Key Advantages over DETR:**
- Simpler API and less boilerplate code
- Faster training and inference
- Native COCO format support
- Better production-ready tooling
- Straightforward single GPU training

**References:**
- https://docs.ultralytics.com/modes/train/
- https://docs.ultralytics.com/datasets/detect/

**Prerequesites**
- MLR 17.3 LTS - need numpy 2.z compatability
- Cluster with GPU
- Cluster started with `scripts/init_script_ultralytics.sh` init script - this will install ultralytics on the base env before starting it to avoid version clashes if we were to just pip install in the notbook or via the usual cluster config
- This notebook assumse that you have run at least: 
  - The `Precessing_w_ffmpeg` notebook to standardise a set of input videos
  - The `2_Batch_inference_w_opencv` notebook to create a coco format dataset 

In [None]:
%sh
# Make sure to install these via init scripts for sustainability
# /databricks/python/bin/pip install -U ultralytics supervision

In [None]:
%pip install -U mlflow psutil nvidia-ml-py
%restart_python


# Setup and Configure

Initialize all variables needed for training. We'll use the same UC Volumes structure as the DETR notebook.


In [None]:
import mlflow
import os
from pathlib import Path

ds_catalog = 'brian_ml_dev'
ds_schema = 'image_processing'
coco_volume = 'coco_dataset'
training_volume = 'training'

mlflow_experiment = '/Users/brian.law@databricks.com/brian_yolo_training'

volume_path = f"/Volumes/{ds_catalog}/{ds_schema}/{coco_volume}"
training_volume_path = f"/local_disk0/ultralytics_logging_folder"
image_path = f'{volume_path}/images'
annotation_json = f'{volume_path}/annotations.json'

# YOLO model to start from (yolo11n.pt, yolo11s.pt, yolo11m.pt, yolo11l.pt, yolo11x.pt)
# or use yolov8n.pt, yolov8s.pt, etc.
YOLO_MODEL = 'yolo11n.pt'  # Start with nano for faster training

print(f"Dataset location: {volume_path}")
print(f"Training outputs: {training_volume_path}")


In [None]:
# Training Hyperparams parameters
EPOCHS = 2
BATCH_SIZE = 128
IMG_SIZE = 640
initial_lr = 0.005
final_lr = 0.1
run_name = 'single_gpu_run'

## Prepare YOLO Dataset Configuration

YOLO expects a `.yaml` file that defines:
- Path to train/val images
- Number of classes
- Class names

We'll convert our COCO format to YOLO's expected structure.


In [None]:
import json
import yaml

# Read COCO annotations to get class information
with open(annotation_json, 'r') as f:
    coco_data = json.load(f)

# Extract categories and create mapping from COCO category_id to YOLO class_id (0-indexed)
categories = coco_data['categories']
sorted_categories = sorted(categories, key=lambda x: x['id'])
class_names = [cat['name'] for cat in sorted_categories]
num_classes = len(class_names)

# CRITICAL: Create mapping from COCO category_id to YOLO class index (0-based)
# COCO IDs might be non-contiguous (e.g., 1, 2, 3, ..., 90) but YOLO needs 0, 1, 2, ..., n-1
coco_id_to_yolo_id = {cat['id']: idx for idx, cat in enumerate(sorted_categories)}

print(f"Number of classes: {num_classes}")
print(f"Classes: {class_names[:10]}...")  # Show first 10
print(f"COCO ID to YOLO ID mapping sample: {dict(list(coco_id_to_yolo_id.items())[:5])}")

# Create YOLO dataset config
yolo_config = {
    'path': volume_path,  # Root directory
    'train': 'images',  # Train images relative to 'path'
    'val': 'images',    # Using same for now - split in production
    'nc': num_classes,  # Number of classes
    'names': class_names  # Class names
}

# Save config file
config_path = f"{volume_path}/data.yaml"
with open(config_path, 'w') as f:
    yaml.dump(yolo_config, f)

print(f"\nYOLO config saved to: {config_path}")


## Convert COCO to YOLO Format

YOLO expects annotations in a specific format:
- One `.txt` file per image
- Each line: `class_id center_x center_y width height` (normalized 0-1)

We'll convert the COCO annotations to YOLO format.


In [None]:
## Diagnose COCO Data Format (Run this to debug coordinate issues)

# Inspect a few annotations to understand the coordinate format
print("=== COCO Data Inspection ===\n")

# Check images
sample_images = coco_data['images'][:3]
print(f"Total images: {len(coco_data['images'])}")
print(f"\nSample images:")
for img in sample_images:
    print(f"  ID: {img['id']}, File: {img['file_name']}, Size: {img['width']}x{img['height']}")

# Check annotations
sample_annotations = coco_data['annotations'][:5]
print(f"\nTotal annotations: {len(coco_data['annotations'])}")
print(f"\nSample annotations:")
for ann in sample_annotations:
    img_info = next((img for img in coco_data['images'] if img['id'] == ann['image_id']), None)
    if img_info:
        bbox = ann['bbox']
        print(f"\n  Annotation ID: {ann['id']}")
        print(f"    Image: {img_info['file_name']} ({img_info['width']}x{img_info['height']})")
        print(f"    Category ID: {ann['category_id']}")
        print(f"    Bbox: {bbox}")
        print(f"    Bbox/Image ratio: x={bbox[0]/img_info['width']:.3f}, y={bbox[1]/img_info['height']:.3f}, w={bbox[2]/img_info['width']:.3f}, h={bbox[3]/img_info['height']:.3f}")
        
        # Check if coordinates seem normalized or in pixels
        if bbox[0] < 2 and bbox[1] < 2 and bbox[2] < 2 and bbox[3] < 2:
            print(f"    ⚠️  WARNING: Bbox values are < 2, might already be normalized!")
        if bbox[0] > img_info['width'] or bbox[1] > img_info['height']:
            print(f"    ⚠️  WARNING: Bbox x/y exceed image dimensions!")
        if (bbox[0] + bbox[2]) > img_info['width'] or (bbox[1] + bbox[3]) > img_info['height']:
            print(f"    ⚠️  WARNING: Bbox extends beyond image boundaries!")


In [None]:
from pathlib import Path

# Create labels directory
labels_dir = Path(volume_path) / 'labels'
labels_dir.mkdir(exist_ok=True)

print(f"Converting COCO annotations to YOLO format...")

# Group annotations by image_id
image_annotations = {}
for ann in coco_data['annotations']:
    image_id = ann['image_id']
    if image_id not in image_annotations:
        image_annotations[image_id] = []
    image_annotations[image_id].append(ann)

# Convert each image's annotations
images_dict = {img['id']: img for img in coco_data['images']}
converted_count = 0
skipped_annotations = 0
invalid_coords_count = 0

# Debug: Check first annotation to understand the coordinate format
if coco_data['annotations']:
    first_ann = coco_data['annotations'][0]
    first_img = images_dict[first_ann['image_id']]
    print(f"\nDebug - First annotation:")
    print(f"  Image size: {first_img['width']}x{first_img['height']}")
    print(f"  Bbox (COCO): {first_ann['bbox']}")
    print(f"  Bbox format should be: [x, y, width, height] in pixels")

for image_id, image_info in images_dict.items():
    img_width = image_info['width']
    img_height = image_info['height']
    
    # Get corresponding annotations
    annotations = image_annotations.get(image_id, [])
    
    # Create label file
    image_filename = Path(image_info['file_name']).stem
    label_file = labels_dir / f"{image_filename}.txt"
    
    with open(label_file, 'w') as f:
        for ann in annotations:
            # Get COCO category_id and map to YOLO class index
            coco_category_id = ann['category_id']
            
            # Skip if category_id is not in the mapping (should not happen with valid COCO data)
            if coco_category_id not in coco_id_to_yolo_id:
                print(f"Warning: Unknown category_id {coco_category_id} in image {image_filename}")
                skipped_annotations += 1
                continue
            
            # Map to YOLO class index (0-based)
            yolo_class_id = coco_id_to_yolo_id[coco_category_id]
            
            # COCO format: [x, y, width, height] (top-left corner)
            x, y, w, h = ann['bbox']
            
            # Validate bbox values are positive and reasonable
            if x < 0 or y < 0 or w <= 0 or h <= 0:
                if invalid_coords_count == 0:
                    print(f"Warning: Invalid bbox in {image_filename}: x={x}, y={y}, w={w}, h={h}")
                invalid_coords_count += 1
                continue
            
            # Convert to YOLO format: [center_x, center_y, width, height] (normalized)
            center_x = (x + w / 2) / img_width
            center_y = (y + h / 2) / img_height
            norm_w = w / img_width
            norm_h = h / img_height
            
            # Validate normalized coordinates are in valid range [0, 1]
            # Allow slight overflow due to floating point, but clip to [0, 1]
            if center_x > 1.05 or center_y > 1.05 or norm_w > 1.05 or norm_h > 1.05:
                if invalid_coords_count < 5:  # Print first few examples
                    print(f"\nWarning: Out of bounds coordinates in {image_filename}:")
                    print(f"  Image size: {img_width}x{img_height}")
                    print(f"  COCO bbox: x={x}, y={y}, w={w}, h={h}")
                    print(f"  YOLO (before clip): cx={center_x:.4f}, cy={center_y:.4f}, w={norm_w:.4f}, h={norm_h:.4f}")
                invalid_coords_count += 1
                # Skip this annotation if severely out of bounds
                if center_x > 1.5 or center_y > 1.5 or norm_w > 1.5 or norm_h > 1.5:
                    continue
            
            # Clip coordinates to valid range [0, 1]
            center_x = max(0.0, min(1.0, center_x))
            center_y = max(0.0, min(1.0, center_y))
            norm_w = max(0.0, min(1.0, norm_w))
            norm_h = max(0.0, min(1.0, norm_h))
            
            # Write in YOLO format (using mapped class_id)
            f.write(f"{yolo_class_id} {center_x:.6f} {center_y:.6f} {norm_w:.6f} {norm_h:.6f}\n")
    
    converted_count += 1
    if converted_count % 100 == 0:
        print(f"Converted {converted_count}/{len(images_dict)} images...")

print(f"\nConversion complete! {converted_count} label files created in {labels_dir}")
if skipped_annotations > 0:
    print(f"Warning: Skipped {skipped_annotations} annotations with unknown category IDs")
if invalid_coords_count > 0:
    print(f"Warning: Found {invalid_coords_count} annotations with invalid/out-of-bounds coordinates (clipped or skipped)")


## Training on Single GPU

Start with single GPU training to validate the setup.


In [None]:
from ultralytics import YOLO
from ultralytics import settings
import torch
import torch.distributed as dist

settings.update({"mlflow": True})

# Setting MLflow configs for ultralytics
os.environ['MLFLOW_EXPERIMENT_NAME'] = mlflow_experiment
os.environ['MLFLOW_ENABLE_SYSTEM_METRICS_LOGGING'] = "true"

# keep run active to log the best model into mlflow
os.environ['MLFLOW_KEEP_RUN_ACTIVE'] = "true"

# setup torch routines that Ultralytics requires
if not dist.is_initialized():
    dist.init_process_group(backend="nccl")

In [None]:
# Check GPU availability
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")

# Load pretrained model
model = YOLO(YOLO_MODEL)

print(f"\nLoaded {YOLO_MODEL}")
print(f"Model summary:")
model.info()

In [None]:
# Train the model
results = model.train(
        data=config_path,
        epochs=EPOCHS,
        batch=BATCH_SIZE,
        lr0=initial_lr,       # initial learning rate
        lrf=final_lr,         # final LR factor (relative to lr0)
        imgsz=IMG_SIZE,
        name=run_name,
        project=training_volume_path,
        device=0,  # Use GPU 0
        workers=8,
        patience=50,
        save=True,
        save_period=5,  # Save checkpoint every 5 epochs
        verbose=True
    )

# MLflow Integration

With the Ultralytics Yolo Framework, there is a default mlflow callback integration. Rather than rewrite this we will build on top of it. 

Things that we can add include: 
- Dataset Tracking and Lineage
- Proper Model logging to simplify deployment

## Log Dataset to Active MLflow Run

Use `mlflow.data` module to log the dataset as a trackable input.


In [None]:
import mlflow
import pandas as pd
from pathlib import Path

# Get the active run (should be the training run)
active_run = mlflow.active_run()

if active_run:
    print(f"Active Run ID: {active_run.info.run_id}")
    print(f"Logging dataset using mlflow.data module...\n")
    
    # Prepare dataset metadata DataFrame
    # Include image info and link to annotations
    image_data = []
    
    # Create mapping of image_id to annotations
    image_to_anns = {}
    for ann in coco_data['annotations']:
        img_id = ann['image_id']
        if img_id not in image_to_anns:
            image_to_anns[img_id] = []
        image_to_anns[img_id].append(ann)
    
    # Build comprehensive dataset representation
    for img in coco_data['images']:
        img_id = img['id']
        anns = image_to_anns.get(img_id, [])
        
        # Get class distribution for this image
        classes_in_image = [ann['category_id'] for ann in anns]
        
        image_data.append({
            'image_id': img_id,
            'file_name': img['file_name'],
            'width': img['width'],
            'height': img['height'],
            'num_annotations': len(anns),
            'classes': ','.join(map(str, classes_in_image)) if classes_in_image else ''
        })
    
    dataset_df = pd.DataFrame(image_data)
    
    # Create MLflow Dataset with full metadata
    dataset = mlflow.data.from_pandas(
        dataset_df,
        source=volume_path,
        name=f"{ds_catalog}.{ds_schema}.{coco_volume}",
        targets="num_annotations",  # What we're predicting
    )
    
    # Log the dataset as an input to the training run
    mlflow.log_input(dataset, context="training")
    
    print(f"✓ Logged dataset with {len(dataset_df)} images")
    print(f"  - Source: {volume_path}")
    print(f"  - Name: {ds_catalog}.{ds_schema}.{coco_volume}")
    print(f"  - Total annotations: {dataset_df['num_annotations'].sum()}")
    print(f"  - Avg annotations/image: {dataset_df['num_annotations'].mean():.2f}")
    print(f"\n{'='*60}")
    print(f"Dataset logged to active run!")
    print(f"View in MLflow UI at: {mlflow_experiment}")
    print(f"{'='*60}")
    
else:
    print("⚠ No active MLflow run found!")
    print("The run may have already been ended by Ultralytics.")
    print("\nTo manually log the dataset, use the cell below with a specific run_id.")


## Log Yolo Model as MLflow object

The raw pytorch weights are not suitable for easy deployment with databricks model serving or the standard mlflow model format.
We can setup a raw pytorch nn module but the Ultralytics Model outputs raw head logits not standard coordinate, score and class formats as expected in object detection.

We will create a special torch nn module wrapper that will fix this for us

In [None]:
# Raw Model Wrapper definiton
import torch, torch.nn as nn
from torchvision.ops import nms

class YoloDetWrapper(nn.Module):
    def __init__(self, base: nn.Module, conf_thres=0.25, iou_thres=0.5, max_det=300):
        super().__init__()
        self.base = base.eval()
        self.conf_thres, self.iou_thres, self.max_det = conf_thres, iou_thres, max_det

    @staticmethod
    def xywh_to_xyxy(b):
        x,y,w,h = b.unbind(-1)
        x1 = x - w/2; y1 = y - h/2; x2 = x + w/2; y2 = y + h/2
        return torch.stack([x1,y1,x2,y2], dim=-1)

    def forward(self, x):
        with torch.no_grad():
            out = self.base(x)                         # e.g. [N, C, A]
            if isinstance(out, (list, tuple)): out = out[0]
            if out.dim() == 3 and out.shape[1] < out.shape[2]:
                out = out.permute(0, 2, 1).contiguous()  # [N, A, C]
            # split: [x,y,w,h] + obj + num_classes
            boxes_xywh = out[..., :4]
            obj = out[..., 4:5].sigmoid()
            cls = out[..., 5:].sigmoid()                # shape [N, A, nc]
            conf, cls_id = (obj * cls).max(-1)          # [N, A]
            boxes = self.xywh_to_xyxy(boxes_xywh)       # [N, A, 4]

            N, A = conf.shape
            K = self.max_det
            out_pad = x.new_full((N, K, 6), -1.0)       # pad with -1

            for i in range(N):
                mask = conf[i] >= self.conf_thres
                if mask.sum() == 0: continue
                b = boxes[i][mask]
                s = conf[i][mask]
                c = cls_id[i][mask].float()
                keep = nms(b, s, self.iou_thres)[:K]
                k = keep.numel()
                if k == 0: continue
                out_pad[i, :k, :4] = b[keep]
                out_pad[i, :k, 4]  = s[keep]
                out_pad[i, :k, 5]  = c[keep]
            return out_pad

In [None]:
# load best YOLO, wrap
best_model_raw = YOLO(f"{training_volume_path}/{run_name}/weights/best.pt").model
wrapped_model = YoloDetWrapper(best_model_raw, conf_thres=0.25, iou_thres=0.5, max_det=300)

In [None]:
from mlflow.models.signature import ModelSignature
from mlflow.types import Schema, TensorSpec
import numpy as np

# Input: batch of images as float tensors [N,3,H,W]
input_schema = Schema([
    TensorSpec(type=np.dtype(np.float32), shape=(-1, 3, 640, 640), name="images")
])

# Output: detections per image [N, num_det, 6]
# (cx, cy, w, h, confidence, class)
output_schema = Schema([
    TensorSpec(type=np.dtype(np.float32), shape=(-1, None, 6), name="detections")
])

signature = ModelSignature(inputs=input_schema, outputs=output_schema)

mlflow.pytorch.log_model(
        pytorch_model=wrapped_model,
        name="best_model",
        signature=signature,
        #registered_model_name="yolo_best_model"  # optional: register to UC model registry
    )

In [None]:
# Close out active mlflow run
mlflow.end_run()

# Next Steps

With the mlflow object that we created above, we now have full lineage on the training dataset plus a wrapped model for deployment.

If we comment out the `registered_model_name` and include a full `<catalog>.<schema>.model_name` that will include the model registration.
Typically when running a lot of experiments, we may skip this so that we don't end up with a lot of superfluous model versions in the end.