Skip to content

[FEA] Add TensorRT inference example: packet reorder → ResNet-50 feature extraction → PCA visualization #73

@RamyaGuru

Description

@RamyaGuru

Summary

Add a new example application that demonstrates an end-to-end DAQIRI + TensorRT inference pipeline. The example receives packets via GPUDirect, reassembles them into input tensors (images or time-series signals), runs ResNet-50 feature extraction through a TensorRT engine, and produces a live PCA plot of the accumulated latent-space vectors.

This fills a gap in the current examples: all existing benchmarks (raw_gpudirect, raw_hds, raw_reorder_*, rdma, socket) focus on network I/O throughput. There is no example showing how received data flows into a GPU inference workload — the use case that the BASE_IMAGE=torch container build path was designed to support.

Motivation

  • Users frequently ask how to connect DAQIRI packet ingestion to a downstream AI/ML pipeline.
  • The README already mentions TensorRT inference workflows but provides no working example.
  • A concrete, runnable example lowers the barrier for data-acquisition + inference adoption on DGX Spark, IGX Thor, and RTX Pro Server hardware.

Proposed Design

Data flow

NIC → DAQIRI RX (GPUDirect, HDS or batched-GPU) → packet reorder (sequence number)
    → reassemble into 224×224×3 image tensor
    → TensorRT ResNet-50 engine (feature extraction, output = 2048-d vector)
    → accumulate latent vectors on host
    → incremental PCA → print PC1, PC2 to stdout

Input mode

Reorder incoming packets by sequence number and reconstruct a 224×224×3 image (or batch of images) suitable for ResNet-50 input. This models a camera or imaging-sensor ingest workflow where raw pixel data arrives as UDP packets and must be reassembled in order before inference.

TensorRT integration

  • Build a ResNet-50 TensorRT engine from an ONNX model at first run (cache the serialized engine for subsequent runs).
  • Use the engine for feature extraction only — remove the final classification head and output the 2048-dimensional average-pool vector.
  • Batch size should be configurable in the YAML (default 8).

PCA output

  • Accumulate feature vectors in a host-side buffer.
  • After every N batches (configurable), run incremental PCA (2 components) and print the PC1 and PC2 values to stdout.
  • Fully headless — no graphics dependencies, no plot files. Users can pipe the output to a CSV or their own visualization tool if desired.

Deliverables

New files

File Description
examples/tensorrt_inference_example.cpp Main application source
examples/daqiri_example_tensorrt_inference.yaml YAML config (loopback-friendly defaults)
examples/tensorrt_inference_utils.h TensorRT engine build/load helpers
examples/tensorrt_inference/README.md Step-by-step build and run instructions

README and build instructions

  • Add examples/tensorrt_inference/README.md with step-by-step instructions covering:
    • How to build the torch container (BASE_IMAGE=torch scripts/build-container.sh)
    • How to download or export the ResNet-50 ONNX model
    • Manual compile commands (g++ / nvcc with the correct TensorRT and DAQIRI include/link flags)
    • How to run the example with the provided YAML config
    • Expected terminal output and how to interpret the PC1/PC2 values
  • The example is not wired into the main examples/CMakeLists.txt — it lives as a standalone recipe that users compile manually inside the torch container.

Documentation updates (per docs-sync rules)

  • docs/tutorials/benchmarking_examples.md — add a section for the TensorRT inference example.
  • docs/tutorials/configuration-walkthrough.md — add a leaf in the "Choosing an example config" decision tree for the new YAML configs.
  • CLAUDE.md — add the new executable and configs to the Benchmarks table.
  • README.md — mention the inference example in the Features list or a new "AI/ML Integration" bullet.

Acceptance criteria

  • tensorrt_inference_example compiles successfully following the README instructions inside the torch container.
  • The example runs end-to-end in software loopback mode (daqiri_bench_raw_sw_loopback.yaml pattern) without a physical NIC link.
  • ResNet-50 ONNX → TensorRT engine conversion succeeds on first run; cached engine loads on subsequent runs.
  • Feature vectors are extracted and PC1/PC2 values are printed to stdout.
  • Image reassembly from reordered packets produces valid ResNet-50 input.
  • Documentation updated per the docs-sync rules.
  • Code passes clang-format -style=file.

Hardware targets

  • NVIDIA DGX Spark (GB10, sm_121)
  • NVIDIA IGX Thor (H100/A100, sm_90/sm_80)
  • x86_64 RTX Pro Server

Dependencies

  • TensorRT (≥ 8.6)
  • ONNX model: ResNet-50 from the ONNX Model Zoo or torch.onnx.export
  • CUDA runtime (already a project dependency)

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentation

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions