Summary
Add a new example application that demonstrates an end-to-end DAQIRI + TensorRT inference pipeline. The example receives packets via GPUDirect, reassembles them into input tensors (images or time-series signals), runs ResNet-50 feature extraction through a TensorRT engine, and produces a live PCA plot of the accumulated latent-space vectors.
This fills a gap in the current examples: all existing benchmarks (raw_gpudirect, raw_hds, raw_reorder_*, rdma, socket) focus on network I/O throughput. There is no example showing how received data flows into a GPU inference workload — the use case that the BASE_IMAGE=torch container build path was designed to support.
Motivation
- Users frequently ask how to connect DAQIRI packet ingestion to a downstream AI/ML pipeline.
- The README already mentions TensorRT inference workflows but provides no working example.
- A concrete, runnable example lowers the barrier for data-acquisition + inference adoption on DGX Spark, IGX Thor, and RTX Pro Server hardware.
Proposed Design
Data flow
NIC → DAQIRI RX (GPUDirect, HDS or batched-GPU) → packet reorder (sequence number)
→ reassemble into 224×224×3 image tensor
→ TensorRT ResNet-50 engine (feature extraction, output = 2048-d vector)
→ accumulate latent vectors on host
→ incremental PCA → print PC1, PC2 to stdout
Input mode
Reorder incoming packets by sequence number and reconstruct a 224×224×3 image (or batch of images) suitable for ResNet-50 input. This models a camera or imaging-sensor ingest workflow where raw pixel data arrives as UDP packets and must be reassembled in order before inference.
TensorRT integration
- Build a ResNet-50 TensorRT engine from an ONNX model at first run (cache the serialized engine for subsequent runs).
- Use the engine for feature extraction only — remove the final classification head and output the 2048-dimensional average-pool vector.
- Batch size should be configurable in the YAML (default 8).
PCA output
- Accumulate feature vectors in a host-side buffer.
- After every N batches (configurable), run incremental PCA (2 components) and print the PC1 and PC2 values to stdout.
- Fully headless — no graphics dependencies, no plot files. Users can pipe the output to a CSV or their own visualization tool if desired.
Deliverables
New files
| File |
Description |
examples/tensorrt_inference_example.cpp |
Main application source |
examples/daqiri_example_tensorrt_inference.yaml |
YAML config (loopback-friendly defaults) |
examples/tensorrt_inference_utils.h |
TensorRT engine build/load helpers |
examples/tensorrt_inference/README.md |
Step-by-step build and run instructions |
README and build instructions
- Add
examples/tensorrt_inference/README.md with step-by-step instructions covering:
- How to build the
torch container (BASE_IMAGE=torch scripts/build-container.sh)
- How to download or export the ResNet-50 ONNX model
- Manual compile commands (
g++ / nvcc with the correct TensorRT and DAQIRI include/link flags)
- How to run the example with the provided YAML config
- Expected terminal output and how to interpret the PC1/PC2 values
- The example is not wired into the main
examples/CMakeLists.txt — it lives as a standalone recipe that users compile manually inside the torch container.
Documentation updates (per docs-sync rules)
docs/tutorials/benchmarking_examples.md — add a section for the TensorRT inference example.
docs/tutorials/configuration-walkthrough.md — add a leaf in the "Choosing an example config" decision tree for the new YAML configs.
CLAUDE.md — add the new executable and configs to the Benchmarks table.
README.md — mention the inference example in the Features list or a new "AI/ML Integration" bullet.
Acceptance criteria
Hardware targets
- NVIDIA DGX Spark (GB10, sm_121)
- NVIDIA IGX Thor (H100/A100, sm_90/sm_80)
- x86_64 RTX Pro Server
Dependencies
- TensorRT (≥ 8.6)
- ONNX model: ResNet-50 from the ONNX Model Zoo or
torch.onnx.export
- CUDA runtime (already a project dependency)
Summary
Add a new example application that demonstrates an end-to-end DAQIRI + TensorRT inference pipeline. The example receives packets via GPUDirect, reassembles them into input tensors (images or time-series signals), runs ResNet-50 feature extraction through a TensorRT engine, and produces a live PCA plot of the accumulated latent-space vectors.
This fills a gap in the current examples: all existing benchmarks (
raw_gpudirect,raw_hds,raw_reorder_*,rdma,socket) focus on network I/O throughput. There is no example showing how received data flows into a GPU inference workload — the use case that theBASE_IMAGE=torchcontainer build path was designed to support.Motivation
Proposed Design
Data flow
Input mode
Reorder incoming packets by sequence number and reconstruct a 224×224×3 image (or batch of images) suitable for ResNet-50 input. This models a camera or imaging-sensor ingest workflow where raw pixel data arrives as UDP packets and must be reassembled in order before inference.
TensorRT integration
PCA output
Deliverables
New files
examples/tensorrt_inference_example.cppexamples/daqiri_example_tensorrt_inference.yamlexamples/tensorrt_inference_utils.hexamples/tensorrt_inference/README.mdREADME and build instructions
examples/tensorrt_inference/README.mdwith step-by-step instructions covering:torchcontainer (BASE_IMAGE=torch scripts/build-container.sh)g++/nvccwith the correct TensorRT and DAQIRI include/link flags)examples/CMakeLists.txt— it lives as a standalone recipe that users compile manually inside thetorchcontainer.Documentation updates (per docs-sync rules)
docs/tutorials/benchmarking_examples.md— add a section for the TensorRT inference example.docs/tutorials/configuration-walkthrough.md— add a leaf in the "Choosing an example config" decision tree for the new YAML configs.CLAUDE.md— add the new executable and configs to the Benchmarks table.README.md— mention the inference example in the Features list or a new "AI/ML Integration" bullet.Acceptance criteria
tensorrt_inference_examplecompiles successfully following the README instructions inside thetorchcontainer.daqiri_bench_raw_sw_loopback.yamlpattern) without a physical NIC link.clang-format -style=file.Hardware targets
Dependencies
torch.onnx.export