Simplifying socket connect, allow for using host address#279
Merged
nileshnegi merged 1 commit intoROCm:candidatefrom May 1, 2026
Merged
Simplifying socket connect, allow for using host address#279nileshnegi merged 1 commit intoROCm:candidatefrom
nileshnegi merged 1 commit intoROCm:candidatefrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR streamlines TransferBench’s socket-based multi-rank startup so rank 0 can start listening with only TB_NUM_RANKS set, auto-detect (or resolve) a reachable master address, and print concrete connection instructions for worker ranks.
Changes:
- Update socket communicator enablement to trigger when
TB_NUM_RANKS>=2, defaultTB_RANKto 0, and requireTB_MASTER_ADDRonly for workers. - Add IPv4 auto-detection for rank 0 (
TB_MASTER_IFACEoptional) and allow hostname resolution forTB_MASTER_ADDRviagetaddrinfo. - Refresh env var help/usage text and changelog to reflect the simplified socket flow.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/header/TransferBench.hpp | Implements new socket setup flow, IPv4 detection, and hostname-to-IPv4 resolution. |
| src/client/EnvVars.hpp | Updates printed documentation for socket-related environment variables (incl. TB_MASTER_IFACE). |
| src/client/Client.cpp | Updates CLI usage examples to match the new socket startup flow. |
| CHANGELOG.md | Notes the socket communicator usability change for the release notes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
nileshnegi
approved these changes
May 1, 2026
nileshnegi
added a commit
that referenced
this pull request
May 2, 2026
- Initial pod communication support (#235) - cuda + MNNVL update & pod presets (#241) - Increase CQ size for high qps (#244) - fix hang when NVML is present but fabricmanager isnt (#246) - Adding nica2a preset (#248) - Adding HBM read bandwidth preset (#250) - Pod Ring preset (#251) - gfxsweep preset (#254) (#256) - Adding Batched DMA support (hipMemcpyBatchAsync), and bmasweep preset (#255) - Adding a wallclock consistency detection preset (#258) - Adding smoketest preset for simple correctness tests (#266) - Help / envvars / presets presets (#267) - Modernize CMake build (#268) - Replace version-based pod/amd-smi detection with compile-time API probes (#269) - Fix collective mismatch hangs in multi-rank error paths (#270) - Fix SHOW_ITERATIONS table truncation with multiple transfers per executor (#271) - Reformat a2asweep output to match gfxsweep style (#272) - Gfx sweep update (#274) - Increasing flush frequency in smoketest (#275) - Adding new experimental copy-only GFX kernel, gfxsweep update (#277) - Fixes for cuMem compilation and invalid device ordinal (#278) - Simplifying socket connect, allow for using host address (#279) - Updating podring to run on single node without need to force single pod (#280) - Adding SHOW_PERCENTILES to show extra per-iteration statistics (#281) --------- Co-authored-by: AtlantaPepsi <timhu102@gmail.com> Co-authored-by: Pak Nin Lui <pak.lui@amd.com> Co-authored-by: pierreantoineH <PierreAntoine.Harraud@amd.com> Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
nileshnegi
added a commit
that referenced
this pull request
May 2, 2026
- Initial pod communication support (#235) - cuda + MNNVL update & pod presets (#241) - Increase CQ size for high qps (#244) - fix hang when NVML is present but fabricmanager isnt (#246) - Adding nica2a preset (#248) - Adding HBM read bandwidth preset (#250) - Pod Ring preset (#251) - gfxsweep preset (#254) (#256) - Adding Batched DMA support (hipMemcpyBatchAsync), and bmasweep preset (#255) - Adding a wallclock consistency detection preset (#258) - Adding smoketest preset for simple correctness tests (#266) - Help / envvars / presets presets (#267) - Modernize CMake build (#268) - Replace version-based pod/amd-smi detection with compile-time API probes (#269) - Fix collective mismatch hangs in multi-rank error paths (#270) - Fix SHOW_ITERATIONS table truncation with multiple transfers per executor (#271) - Reformat a2asweep output to match gfxsweep style (#272) - Gfx sweep update (#274) - Increasing flush frequency in smoketest (#275) - Adding new experimental copy-only GFX kernel, gfxsweep update (#277) - Fixes for cuMem compilation and invalid device ordinal (#278) - Simplifying socket connect, allow for using host address (#279) - Updating podring to run on single node without need to force single pod (#280) - Adding SHOW_PERCENTILES to show extra per-iteration statistics (#281) --------- Co-authored-by: Tim <43156029+AtlantaPepsi@users.noreply.github.com> Co-authored-by: Pak Nin Lui <pak.lui@amd.com> Co-authored-by: pierreantoineH <PierreAntoine.Harraud@amd.com> Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Attempting to make using socket connections simpler.
The new flow is now:
First rank just needs to specify the number of ranks in total using TB_NUM_RANKS
Node 0> TB_NUM_RANKS=4 ./TransferBenchThis will then provide connection information for other ranks:
Other ranks can then connect by providing those enviroment variables (and adjusting TB_RANK for each node)
Host address can also be used for setting TB_MASTER_ADDR
Node 0> TB_NUM_RANKS=4 TB_MASTER_ADDR=hostname./TransferBenchTechnical Details
Minor changes to the socket code