Updating podring to run on single node without need to force single pod by gilbertlee-amd · Pull Request #280 · ROCm/TransferBench

gilbertlee-amd · 2026-05-01T22:23:39Z

Motivation

Podring preset can run on a single node, and shouldn't need forcing a single pod to do so.
This code enables this, and also allows for rings of size 1 (self-copies).

Copilot

Pull request overview

This PR updates the PodRing preset behavior so it can run on a single node without requiring TB_FORCE_SINGLE_POD, and it relaxes ring sizing to allow size-1 rings (self-copies).

Changes:

Adjust PodRing topology/pod detection to only require pod metadata when running with multiple ranks.
Remove the minimum ring size restriction (previously required GROUP_SIZE >= 2) to allow single-device rings.
Tidy Makefile configuration variable placement and clarify optional-feature comments.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`src/client/Presets/PodRing.hpp`	Updates rank/pod validation logic and relaxes ring size constraints.
`Makefile`	Reorders/clarifies top-level configuration variables and optional feature documentation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

- Initial pod communication support (#235) - cuda + MNNVL update & pod presets (#241) - Increase CQ size for high qps (#244) - fix hang when NVML is present but fabricmanager isnt (#246) - Adding nica2a preset (#248) - Adding HBM read bandwidth preset (#250) - Pod Ring preset (#251) - gfxsweep preset (#254) (#256) - Adding Batched DMA support (hipMemcpyBatchAsync), and bmasweep preset (#255) - Adding a wallclock consistency detection preset (#258) - Adding smoketest preset for simple correctness tests (#266) - Help / envvars / presets presets (#267) - Modernize CMake build (#268) - Replace version-based pod/amd-smi detection with compile-time API probes (#269) - Fix collective mismatch hangs in multi-rank error paths (#270) - Fix SHOW_ITERATIONS table truncation with multiple transfers per executor (#271) - Reformat a2asweep output to match gfxsweep style (#272) - Gfx sweep update (#274) - Increasing flush frequency in smoketest (#275) - Adding new experimental copy-only GFX kernel, gfxsweep update (#277) - Fixes for cuMem compilation and invalid device ordinal (#278) - Simplifying socket connect, allow for using host address (#279) - Updating podring to run on single node without need to force single pod (#280) - Adding SHOW_PERCENTILES to show extra per-iteration statistics (#281) --------- Co-authored-by: AtlantaPepsi <timhu102@gmail.com> Co-authored-by: Pak Nin Lui <pak.lui@amd.com> Co-authored-by: pierreantoineH <PierreAntoine.Harraud@amd.com> Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

- Initial pod communication support (#235) - cuda + MNNVL update & pod presets (#241) - Increase CQ size for high qps (#244) - fix hang when NVML is present but fabricmanager isnt (#246) - Adding nica2a preset (#248) - Adding HBM read bandwidth preset (#250) - Pod Ring preset (#251) - gfxsweep preset (#254) (#256) - Adding Batched DMA support (hipMemcpyBatchAsync), and bmasweep preset (#255) - Adding a wallclock consistency detection preset (#258) - Adding smoketest preset for simple correctness tests (#266) - Help / envvars / presets presets (#267) - Modernize CMake build (#268) - Replace version-based pod/amd-smi detection with compile-time API probes (#269) - Fix collective mismatch hangs in multi-rank error paths (#270) - Fix SHOW_ITERATIONS table truncation with multiple transfers per executor (#271) - Reformat a2asweep output to match gfxsweep style (#272) - Gfx sweep update (#274) - Increasing flush frequency in smoketest (#275) - Adding new experimental copy-only GFX kernel, gfxsweep update (#277) - Fixes for cuMem compilation and invalid device ordinal (#278) - Simplifying socket connect, allow for using host address (#279) - Updating podring to run on single node without need to force single pod (#280) - Adding SHOW_PERCENTILES to show extra per-iteration statistics (#281) --------- Co-authored-by: Tim <43156029+AtlantaPepsi@users.noreply.github.com> Co-authored-by: Pak Nin Lui <pak.lui@amd.com> Co-authored-by: pierreantoineH <PierreAntoine.Harraud@amd.com> Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Updating podring to run on single node without need to force single pod

a15e8e4

gilbertlee-amd requested a review from AtlantaPepsi May 1, 2026 22:23

gilbertlee-amd requested a review from a team as a code owner May 1, 2026 22:23

AtlantaPepsi requested a review from Copilot May 1, 2026 23:45

Copilot started reviewing on behalf of AtlantaPepsi May 1, 2026 23:47 View session

Copilot AI reviewed May 1, 2026

View reviewed changes

Comment thread src/client/Presets/PodRing.hpp Outdated

Comment thread src/client/Presets/Rings.hpp

Comment thread src/client/Presets/PodRing.hpp Outdated

gilbertlee-amd and others added 2 commits May 1, 2026 21:35

Potential fix for pull request finding

afb6beb

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Fixing typo

ce77c0e

AtlantaPepsi approved these changes May 2, 2026

View reviewed changes

Renaming podring preset to rings

18f35ed

gilbertlee-amd requested a review from a team as a code owner May 2, 2026 04:54

AtlantaPepsi merged commit fd7257c into ROCm:candidate May 2, 2026
1 check passed

nileshnegi mentioned this pull request May 2, 2026

TransferBench v1.67.0 #273

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating podring to run on single node without need to force single pod#280

Updating podring to run on single node without need to force single pod#280
AtlantaPepsi merged 4 commits intoROCm:candidatefrom
gilbertlee-amd:PodRingUpdate

gilbertlee-amd commented May 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gilbertlee-amd commented May 1, 2026

Motivation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants