# YOLOv8 Export with Embeddings

Export YOLOv8n to ONNX format with multiple outputs:
- Detection output (bounding boxes, classes, confidences)
- Embedding output (backbone features for similarity search)

## Imports

In [1]:
import torch
import torch.nn as nn
from ultralytics import YOLO
import onnx
import onnxruntime as ort
import numpy as np
import polars as pl
from pathlib import Path

[0;93m2026-02-14 19:55:21.277681093 [W:onnxruntime:Default, device_discovery.cc:131 GetPciBusId] Skipping pci_bus_id for PCI path at "/sys/devices/LNXSYSTM:00/LNXSYBUS:00/ACPI0004:00/MSFT1000:00/5620e0c7-8062-4dce-aeb7-520c7ef76171" because filename ""5620e0c7-8062-4dce-aeb7-520c7ef76171"" dit not match expected pattern of [0-9a-f]+:[0-9a-f]+:[0-9a-f]+[.][0-9a-f]+[m


## Setup

In [2]:
# Setup model directory
model_dir = Path("models")
model_dir.mkdir(parents=True, exist_ok=True)

download_dir = Path("var/downloads")
download_dir.mkdir(parents=True, exist_ok=True)

print(f"Model directory: {model_dir.absolute()}")
print(f"Download directory: {download_dir.absolute()}")
print(f"ONNX Runtime: {ort.__version__}")

print(f"ONNX: {onnx.__version__}")
print(f"PyTorch: {torch.__version__}")



Model directory: /home/bryanc/repos/rook_lifewatch/rook_lw_model_dev/models
Download directory: /home/bryanc/repos/rook_lifewatch/rook_lw_model_dev/var/downloads
ONNX Runtime: 1.24.1
ONNX: 1.20.1
PyTorch: 2.10.0+cu128


## Load YOLOv8 Nano

In [3]:
model_path = download_dir / 'yolov8n.pt'
model = YOLO(str(model_path))
print(f"‚úì YOLOv8n loaded from {model_path}")
model.info()

[KDownloading https://github.com/ultralytics/assets/releases/download/v8.4.0/yolov8n.pt to 'var/downloads/yolov8n.pt': 23% ‚îÅ‚îÅ‚ï∏‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ 1.4/6.2MB 14.0MB/s 0.1s<0.3s

[KDownloading https://github.com/ultralytics/assets/releases/download/v8.4.0/yolov8n.pt to 'var/downloads/yolov8n.pt': 100% ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ 6.2MB 39.4MB/s 0.2s




‚úì YOLOv8n loaded from var/downloads/yolov8n.pt
YOLOv8n summary: 129 layers, 3,157,200 parameters, 0 gradients, 8.9 GFLOPs


(129, 3157200, 0, 8.8575488)

## Inspect Architecture

In [4]:
pt_model = model.model

# Find key architectural boundaries
# First pass: find architectural boundaries
sppf_layer_idx = None
first_upsample_idx = None

for i, module in enumerate(pt_model.model):
    layer_type = module.__class__.__name__
    
    if layer_type == "SPPF":
        sppf_layer_idx = i
    
    if layer_type == "Upsample" and first_upsample_idx is None:
        first_upsample_idx = i

# Second pass: build data with section labels
layers_data = []
for i, module in enumerate(pt_model.model):
    layer_type = module.__class__.__name__
    
    # Determine section based on discovered boundaries
    if i <= sppf_layer_idx:
        section = "Backbone"
    elif layer_type == "Detect":
        section = "Head"
    else:
        section = "Neck"
    
    layers_data.append({"Index": i, "Type": layer_type, "Section": section})

df = pl.DataFrame(layers_data)
print("\nYOLOv8n Layers:")
with pl.Config(tbl_rows=-1):  # Show all rows (-1), or use a number like 50
    print(df)

print(f"\nüìç Embedding Extraction Point:")
print(f"   Layer {sppf_layer_idx} (SPPF) = End of backbone, best for embeddings")
print(f"   - Rich semantic features (~512 channels)")
print(f"   - Global context via spatial pyramid pooling")
print(f"   - Before multi-scale fusion (neck) begins")
print(f"\n   Architecture breakdown:")
print(f"   - Backbone: Layers 0-{sppf_layer_idx}")
print(f"   - Neck: Layers {first_upsample_idx}-{len(pt_model.model)-2}")
print(f"   - Head: Layer {len(pt_model.model)-1}")


YOLOv8n Layers:
shape: (23, 3)
‚îå‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚î¨‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îÄ‚îê
‚îÇ Index ‚îÜ Type     ‚îÜ Section  ‚îÇ
‚îÇ ---   ‚îÜ ---      ‚îÜ ---      ‚îÇ
‚îÇ i64   ‚îÜ str      ‚îÜ str      ‚îÇ
‚ïû‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ï™‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ï™‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ïê‚ï°
‚îÇ 0     ‚îÜ Conv     ‚îÜ Backbone ‚îÇ
‚îÇ 1     ‚îÜ Conv     ‚îÜ Backbone ‚îÇ
‚îÇ 2     ‚îÜ C2f      ‚îÜ Backbone ‚îÇ
‚îÇ 3     ‚îÜ Conv     ‚îÜ Backbone ‚îÇ
‚îÇ 4     ‚îÜ C2f      ‚îÜ Backbone ‚îÇ
‚îÇ 5     ‚îÜ Conv     ‚îÜ Backbone ‚îÇ
‚îÇ 6     ‚îÜ C2f      ‚îÜ Backbone ‚îÇ
‚îÇ 7     ‚îÜ Conv     ‚îÜ Backbone ‚îÇ
‚îÇ 8     ‚îÜ C2f      ‚îÜ Backbone ‚îÇ
‚îÇ 9     ‚îÜ SPPF     ‚îÜ Backbone ‚îÇ
‚îÇ 10    ‚îÜ Upsample ‚îÜ Neck     ‚îÇ
‚îÇ 11    ‚îÜ Concat   ‚îÜ Neck     ‚îÇ
‚îÇ 12    ‚îÜ C2f      ‚îÜ Neck     ‚îÇ
‚îÇ 13    ‚îÜ Upsample ‚îÜ Neck     ‚îÇ
‚îÇ 14    ‚îÜ Concat   ‚îÜ Neck     ‚îÇ
‚îÇ 15    ‚îÜ C2f      ‚îÜ Neck     ‚îÇ
‚îÇ 16    ‚îÜ Conv  

## Create Multi-Output Wrapper

In [5]:
class YOLOv8WithEmbeddings(nn.Module):
    """
    Wrapper that extracts both detections and embeddings from YOLOv8.
    
    Architecture:
    - Backbone: Feature extraction, ends with SPPF layer
    - Neck: Multi-scale feature fusion (starts with Upsample)
    - Head: Detection output (Detect layer)
    
    This wrapper captures backbone features at the SPPF layer which provides:
    - High-level semantic representations (~512 channels)
    - Global spatial context via pyramid pooling
    - Pre-fusion features ideal for similarity comparison
    """
    def __init__(self, yolo_model, embedding_layer_idx=None):
        super().__init__()
        # Get the underlying DetectionModel from YOLO wrapper
        self.model = yolo_model.model
        
        # Auto-detect embedding layer if not specified
        if embedding_layer_idx is None:
            # Find SPPF layer (end of backbone)
            for i, layer in enumerate(self.model.model):
                if layer.__class__.__name__ == "SPPF":
                    embedding_layer_idx = i
                    break
            
            if embedding_layer_idx is None:
                raise RuntimeError("Could not find SPPF layer in model architecture")
        
        self.embedding_layer_idx = embedding_layer_idx
        self.embedding_layer = self.model.model[self.embedding_layer_idx]
        self._embeddings = None
        
        # Register hook to capture SPPF output
        self.embedding_layer.register_forward_hook(self._hook_fn)
        
        print(f"Extracting embeddings from layer {self.embedding_layer_idx} ({self.embedding_layer.__class__.__name__})")
    
    def _hook_fn(self, module, input, output):
        """Hook function to capture SPPF layer output"""
        # Apply global average pooling: [Batch, Channels, Height, Width] -> [Batch, Channels]
        self._embeddings = torch.nn.functional.adaptive_avg_pool2d(output, (1, 1)).squeeze(-1).squeeze(-1)
        
    def forward(self, x):
        # Reset embeddings
        self._embeddings = None
        
        # Run normal YOLOv8 forward pass (hook will capture SPPF output)
        detections = self.model(x)
        
        # YOLOv8 may return tuple/list of outputs, extract the main detection tensor
        if isinstance(detections, (list, tuple)):
            detections = detections[0]
        
        if self._embeddings is None:
            raise RuntimeError(f"Failed to capture embeddings from layer {self.embedding_layer_idx}")
        
        # detections: [1, 84, 8400] - (4 bbox coords + 80 class scores, 8400 predictions)
        # embeddings: [1, channels] - feature vector for similarity search
        return detections, self._embeddings

# Create wrapper - automatically finds SPPF layer
wrapped_model = YOLOv8WithEmbeddings(model)
wrapped_model.eval()
print("‚úì Wrapper created")

Extracting embeddings from layer 9 (SPPF)
‚úì Wrapper created


## Test Wrapper

In [6]:
dummy_input = torch.randn(1, 3, 640, 640)

with torch.no_grad():
    detections, embeddings = wrapped_model(dummy_input)

print(f"Detections: {detections.shape}")
print(f"Embeddings: {embeddings.shape}")
print(f"‚úì {embeddings.shape[1]}-dim feature vector")

Detections: torch.Size([1, 84, 8400])
Embeddings: torch.Size([1, 256])
‚úì 256-dim feature vector


## Export to ONNX

In [7]:
output_path = model_dir / "yolov8n_with_embeddings.onnx"

torch.onnx.export(
    wrapped_model,
    dummy_input,
    output_path,
    export_params=True,
    opset_version=17,
    do_constant_folding=True,
    input_names=['images'],
    output_names=['detections', 'embeddings'],
    dynamic_axes={
        'images': {0: 'batch'},
        'detections': {0: 'batch'},
        'embeddings': {0: 'batch'}
    },
    # Keep all data in single file (no external .data file)
    dynamo=False
)

size_mb = output_path.stat().st_size / 1024 / 1024
print(f"‚úì Exported: {output_path.absolute()}")
print(f"  Size: {size_mb:.2f} MB")

  torch.onnx.export(


‚úì Exported: /home/bryanc/repos/rook_lifewatch/rook_lw_model_dev/models/yolov8n_with_embeddings.onnx
  Size: 12.25 MB


## Verify ONNX Model

In [8]:
onnx_model = onnx.load(str(output_path))
onnx.checker.check_model(onnx_model)
print("‚úì ONNX model valid")

print("\nInputs:")
for inp in onnx_model.graph.input:
    print(f"  {inp.name}")

print("\nOutputs:")
for out in onnx_model.graph.output:
    print(f"  {out.name}")

‚úì ONNX model valid

Inputs:
  images

Outputs:
  detections
  embeddings


## Test ONNX Runtime

In [9]:
session = ort.InferenceSession(str(output_path))
input_name = session.get_inputs()[0].name
output_names = [o.name for o in session.get_outputs()]

outputs = session.run(output_names, {input_name: dummy_input.numpy()})
det_onnx, emb_onnx = outputs

print(f"‚úì ONNX Runtime inference successful")
print(f"Detections: {det_onnx.shape}")
print(f"Embeddings: {emb_onnx.shape}")

# Compare with PyTorch
with torch.no_grad():
    det_pt, emb_pt = wrapped_model(dummy_input)

det_diff = np.abs(det_onnx - det_pt.numpy()).max()
emb_diff = np.abs(emb_onnx - emb_pt.numpy()).max()

print(f"\nMax diff from PyTorch:")
print(f"  Detections: {det_diff:.6f}")
print(f"  Embeddings: {emb_diff:.6f}")

‚úì ONNX Runtime inference successful
Detections: (1, 84, 8400)
Embeddings: (1, 256)

Max diff from PyTorch:
  Detections: 0.001221
  Embeddings: 0.000001


## Summary

Model exported with dual outputs:
- `detections`: [1, 84, 8400] - Standard YOLOv8 format
- `embeddings`: [1, channels] - Feature vector

Ready for use in Rust/ONNX Runtime!