Add FPGA-based test application for realtime predecoder#490
Merged
bmhowe23 merged 5 commits intoNVIDIA:mainfrom Apr 8, 2026
Merged
Add FPGA-based test application for realtime predecoder#490bmhowe23 merged 5 commits intoNVIDIA:mainfrom
bmhowe23 merged 5 commits intoNVIDIA:mainfrom
Conversation
…Hololink Signed-off-by: Scott Thornton <wsttiger@gmail.com>
…ernal ring buffer
Three main changes:
1. Add hololink_predecoder_bridge: receives syndrome data from the
Hololink FPGA via RDMA and runs AI predecoder (TRT) + PyMatching
through realtime_pipeline using the external_ringbuffer path.
Includes --data-dir for ground-truth correctness verification.
2. Fix consumer_loop for external ring buffers: the consumer gated
slot processing on slot_occupied[], which is only set by the
software ring_buffer_injector. With an FPGA-sourced external ring
buffer, slots were silently skipped. Skip the slot_occupied and
drain checks when external_ring_ is true.
3. Consolidate shared code: extract ~350 lines of duplicated types
(PipelineConfig, DecoderContext, PreLaunchCopyCtx, WorkerCtx,
PyMatchQueue, TestData, SparseCSR, loaders) into
predecoder_pipeline_common.{h,cpp}. Move both drivers and the
orchestration script into unittests/realtime/.
Tested: 20s software benchmark reproduces 192,309 requests at 9,610
req/s with LER=0.0020. FPGA bridge completes 1 shot of d13_r104 via
real RDMA from the Hololink FPGA on GB200.
Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Add a per-shot log line in the bridge CPU stage that confirms each pipeline step: RDMA receipt (detector count + input nonzero), TRT inference (logical_pred + residual nonzero count). This makes it possible to verify the full data path without ground-truth data. Update hololink_predecoder_test.sh to resolve the bridge binary from its new location in unittests/realtime/ instead of unittests/utils/. Signed-off-by: Scott Thornton <wsttiger@gmail.com>
Collaborator
|
/ok to test 87a6872 |
Collaborator
|
I submitted the regular per-PR CI plus
|
bmhowe23
reviewed
Apr 8, 2026
Collaborator
bmhowe23
left a comment
There was a problem hiding this comment.
The changes to the core part of the library some pretty minimal and low-risk, so this LGTM.
cketcham2333
requested changes
Apr 8, 2026
This was referenced Apr 8, 2026
Signed-off-by: Scott Thornton <wsttiger@gmail.com>
bmhowe23
approved these changes
Apr 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add FPGA RDMA transport for the AI predecoder + PyMatching pipeline. Syndrome data arrives from a Hololink FPGA via RoCE v2, passes through TensorRT inference (AI predecoder), then PyMatching MWPM decoding, all orchestrated by
realtime_pipeline.Changes
New: FPGA predecoder bridge (
hololink_predecoder_bridge.cpp)GpuRoceTransceiver(DOCA GPU-RoCE) and feeds its ring buffer intorealtime_pipelinevia theexternal_ringbufferpath--data-dirfor ground-truth correctness verification againstobservables.binNew: Orchestration script (
hololink_predecoder_test.sh)hololink_fpga_syndrome_playbackdetectors.binto the text format the playback tool expectsFix:
realtime_pipeline.cuexternal ring buffer consumerconsumer_loopgated slot processing onslot_occupied[], which is only set by the softwarering_buffer_injector. With an FPGA-sourced external ring buffer, slots were silently skipped and no requests ever completed. Skip theslot_occupiedand drain checks whenexternal_ring_is true.Refactor: Consolidate shared predecoder code
PipelineConfig,DecoderContext,PreLaunchCopyCtx,WorkerCtx,PyMatchQueue,TestData,SparseCSR, loaders) intopredecoder_pipeline_common.{h,cpp}libs/qec/unittests/realtime/test_realtime_predecoder_w_pymatching.cppis agit mvfromlibs/qec/lib/realtime/with the shared code removedTest results
Software benchmark (d13_r104, 20s, 104 us injection rate):
FPGA bridge (d13_r104, 1 shot via real Hololink FPGA on GB200):