Add FPGA-based test application for realtime predecoder by wsttiger · Pull Request #490 · NVIDIA/cudaqx

wsttiger · 2026-04-08T02:12:55Z

Summary

Add FPGA RDMA transport for the AI predecoder + PyMatching pipeline. Syndrome data arrives from a Hololink FPGA via RoCE v2, passes through TensorRT inference (AI predecoder), then PyMatching MWPM decoding, all orchestrated by realtime_pipeline.

Changes

New: FPGA predecoder bridge (`hololink_predecoder_bridge.cpp`)

Creates a Hololink GpuRoceTransceiver (DOCA GPU-RoCE) and feeds its ring buffer into realtime_pipeline via the external_ringbuffer path
Runs 8 TRT predecoder workers + 16 PyMatching decode threads
Supports --data-dir for ground-truth correctness verification against observables.bin
Logs per-shot diagnostic output confirming RDMA receipt, TRT inference result, and PyMatching decode

New: Orchestration script (`hololink_predecoder_test.sh`)

2-process FPGA mode: bridge + hololink_fpga_syndrome_playback
3-process emulated mode: emulator + bridge + playback
Converts binary detectors.bin to the text format the playback tool expects
Config-aware defaults for page size, num shots, and BRAM constraints

Fix: `realtime_pipeline.cu` external ring buffer consumer

The consumer_loop gated slot processing on slot_occupied[], which is only set by the software ring_buffer_injector. With an FPGA-sourced external ring buffer, slots were silently skipped and no requests ever completed. Skip the slot_occupied and drain checks when external_ring_ is true.

Refactor: Consolidate shared predecoder code

Extract ~350 lines of duplicated types (PipelineConfig, DecoderContext, PreLaunchCopyCtx, WorkerCtx, PyMatchQueue, TestData, SparseCSR, loaders) into predecoder_pipeline_common.{h,cpp}
Move both drivers and the orchestration script into libs/qec/unittests/realtime/
test_realtime_predecoder_w_pymatching.cpp is a git mv from libs/qec/lib/realtime/ with the shared code removed

Test results

Software benchmark (d13_r104, 20s, 104 us injection rate):

Metric	Value
Submitted / Completed	192,309 / 192,309
Throughput	9,610 req/s
Mean latency	370 us (p50=332, p99=1,204)
PyMatching decode	218 us avg
Syndrome reduction	98.3%
Pipeline LER	0.0020 (384 / 192,309)
Predecoder-only LER	0.3980

FPGA bridge (d13_r104, 1 shot via real Hololink FPGA on GB200):

…Hololink Signed-off-by: Scott Thornton <wsttiger@gmail.com>

…ernal ring buffer Three main changes: 1. Add hololink_predecoder_bridge: receives syndrome data from the Hololink FPGA via RDMA and runs AI predecoder (TRT) + PyMatching through realtime_pipeline using the external_ringbuffer path. Includes --data-dir for ground-truth correctness verification. 2. Fix consumer_loop for external ring buffers: the consumer gated slot processing on slot_occupied[], which is only set by the software ring_buffer_injector. With an FPGA-sourced external ring buffer, slots were silently skipped. Skip the slot_occupied and drain checks when external_ring_ is true. 3. Consolidate shared code: extract ~350 lines of duplicated types (PipelineConfig, DecoderContext, PreLaunchCopyCtx, WorkerCtx, PyMatchQueue, TestData, SparseCSR, loaders) into predecoder_pipeline_common.{h,cpp}. Move both drivers and the orchestration script into unittests/realtime/. Tested: 20s software benchmark reproduces 192,309 requests at 9,610 req/s with LER=0.0020. FPGA bridge completes 1 shot of d13_r104 via real RDMA from the Hololink FPGA on GB200. Signed-off-by: Scott Thornton <wsttiger@gmail.com>

…er_fpga2

Add a per-shot log line in the bridge CPU stage that confirms each pipeline step: RDMA receipt (detector count + input nonzero), TRT inference (logical_pred + residual nonzero count). This makes it possible to verify the full data path without ground-truth data. Update hololink_predecoder_test.sh to resolve the bridge binary from its new location in unittests/realtime/ instead of unittests/utils/. Signed-off-by: Scott Thornton <wsttiger@gmail.com>

copy-pr-bot · 2026-04-08T02:13:00Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

bmhowe23 · 2026-04-08T02:37:40Z

/ok to test 87a6872

bmhowe23 · 2026-04-08T03:28:37Z

I submitted the regular per-PR CI plus

https://github.com/NVIDIA/cudaqx/actions/runs/24115254143 (All libs (Release))
https://github.com/NVIDIA/cudaqx/actions/runs/24115823050 (Build wheels)

bmhowe23

The changes to the core part of the library some pretty minimal and low-risk, so this LGTM.

libs/qec/unittests/realtime/hololink_predecoder_test.sh

libs/qec/unittests/realtime/CMakeLists.txt

Signed-off-by: Scott Thornton <wsttiger@gmail.com>

wsttiger added 4 commits April 6, 2026 23:21

Initial changes to update predecoder / PyMatching test to use FPGA / …

56f57bc

…Hololink Signed-off-by: Scott Thornton <wsttiger@gmail.com>

Merge remote-tracking branch 'origin/main' into add_realtime_predecod…

83c0298

…er_fpga2

bmhowe23 reviewed Apr 8, 2026

View reviewed changes

bmhowe23 changed the title ~~Add realtime predecoder fpga2~~ Add FPGA-based test application for realtime predecoder Apr 8, 2026

cketcham2333 requested changes Apr 8, 2026

View reviewed changes

This was referenced Apr 8, 2026

Update CUDA-QX CI to run additional tests for realtime decoders #491

Open

Update CI to build hololink bridge #492

Open

(Not user facing) Update realtime decoder build script to have -- build option #493

Open

Updated path in hololink predecoder demo script

297dda0

Signed-off-by: Scott Thornton <wsttiger@gmail.com>

bmhowe23 approved these changes Apr 8, 2026

View reviewed changes

bmhowe23 merged commit 31ae759 into NVIDIA:main Apr 8, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FPGA-based test application for realtime predecoder#490

Add FPGA-based test application for realtime predecoder#490
bmhowe23 merged 5 commits intoNVIDIA:mainfrom
wsttiger:add_realtime_predecoder_fpga2

wsttiger commented Apr 8, 2026

Uh oh!

copy-pr-bot bot commented Apr 8, 2026

Uh oh!

bmhowe23 commented Apr 8, 2026

Uh oh!

bmhowe23 commented Apr 8, 2026

Uh oh!

bmhowe23 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

wsttiger commented Apr 8, 2026

Summary

Changes

New: FPGA predecoder bridge (hololink_predecoder_bridge.cpp)

New: Orchestration script (hololink_predecoder_test.sh)

Fix: realtime_pipeline.cu external ring buffer consumer

Refactor: Consolidate shared predecoder code

Test results

Uh oh!

copy-pr-bot bot commented Apr 8, 2026

Uh oh!

bmhowe23 commented Apr 8, 2026

Uh oh!

bmhowe23 commented Apr 8, 2026

Uh oh!

bmhowe23 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

New: FPGA predecoder bridge (`hololink_predecoder_bridge.cpp`)

New: Orchestration script (`hololink_predecoder_test.sh`)

Fix: `realtime_pipeline.cu` external ring buffer consumer