Reformat a2asweep output to match gfxsweep style#272
Merged
nileshnegi merged 2 commits intocandidatefrom Apr 27, 2026
Merged
Conversation
- Default BLOCKSIZES changed from {256} to {256,512,768,1024}
- Single unified table: header printed once, blockSize is a row column
- USE_HIP_EVENTS controls timing mode (preset default: 1 = GPU-event timed)
USE_HIP_EVENTS=1: GPU-event-timed per-executor minBw; SHOW_MIN_ONLY=0
adds a maxBw column per SE
USE_HIP_EVENTS=0: CPU wall-clock avgTotalBandwidthGbPerSec
- Banner and best-result summary reflect the active timing mode
- Results map keyed by (blockSize, numSes, unroll) for verbose output
- Increase column width for output values
- Fixed [WARN} typo
Co-authored-by: Claude <claude@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the a2asweep preset output formatting to align more closely with the gfxsweep style, including a unified table layout and an end-of-run “best result” summary, while also expanding the default blocksize sweep set.
Changes:
- Default
USE_HIP_EVENTSbehavior fora2asweepand reflect timing mode in the banner/output. - Reformat sweep output into a single table with CSV-aware separators and a “best bandwidth” summary block.
- Expand default
BLOCKSIZESfrom{256}to{256,512,768,1024}.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix verbose block using stale numSubExecs: copy transfers per (blockSize,c,unroll) combination before calling PrintResults so subexec count matches stored result - Gate results map insertion on verbose flag to avoid storing all TestResults when VERBOSE=0 - Guard best-result summary block on bestBlock != -1 to suppress misleading -1 output if all RunTransfers calls fail - Widen value columns from %7.2f to %8.2f to accommodate 4-digit GB/s values - Add note on spray targetCount asymmetry for non-uniform A2A_DIRECT topologies Co-authored-by: Claude <claude@anthropic.com>
gilbertlee-amd
approved these changes
Apr 27, 2026
nileshnegi
added a commit
that referenced
this pull request
May 2, 2026
- Initial pod communication support (#235) - cuda + MNNVL update & pod presets (#241) - Increase CQ size for high qps (#244) - fix hang when NVML is present but fabricmanager isnt (#246) - Adding nica2a preset (#248) - Adding HBM read bandwidth preset (#250) - Pod Ring preset (#251) - gfxsweep preset (#254) (#256) - Adding Batched DMA support (hipMemcpyBatchAsync), and bmasweep preset (#255) - Adding a wallclock consistency detection preset (#258) - Adding smoketest preset for simple correctness tests (#266) - Help / envvars / presets presets (#267) - Modernize CMake build (#268) - Replace version-based pod/amd-smi detection with compile-time API probes (#269) - Fix collective mismatch hangs in multi-rank error paths (#270) - Fix SHOW_ITERATIONS table truncation with multiple transfers per executor (#271) - Reformat a2asweep output to match gfxsweep style (#272) - Gfx sweep update (#274) - Increasing flush frequency in smoketest (#275) - Adding new experimental copy-only GFX kernel, gfxsweep update (#277) - Fixes for cuMem compilation and invalid device ordinal (#278) - Simplifying socket connect, allow for using host address (#279) - Updating podring to run on single node without need to force single pod (#280) - Adding SHOW_PERCENTILES to show extra per-iteration statistics (#281) --------- Co-authored-by: AtlantaPepsi <timhu102@gmail.com> Co-authored-by: Pak Nin Lui <pak.lui@amd.com> Co-authored-by: pierreantoineH <PierreAntoine.Harraud@amd.com> Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
nileshnegi
added a commit
that referenced
this pull request
May 2, 2026
- Initial pod communication support (#235) - cuda + MNNVL update & pod presets (#241) - Increase CQ size for high qps (#244) - fix hang when NVML is present but fabricmanager isnt (#246) - Adding nica2a preset (#248) - Adding HBM read bandwidth preset (#250) - Pod Ring preset (#251) - gfxsweep preset (#254) (#256) - Adding Batched DMA support (hipMemcpyBatchAsync), and bmasweep preset (#255) - Adding a wallclock consistency detection preset (#258) - Adding smoketest preset for simple correctness tests (#266) - Help / envvars / presets presets (#267) - Modernize CMake build (#268) - Replace version-based pod/amd-smi detection with compile-time API probes (#269) - Fix collective mismatch hangs in multi-rank error paths (#270) - Fix SHOW_ITERATIONS table truncation with multiple transfers per executor (#271) - Reformat a2asweep output to match gfxsweep style (#272) - Gfx sweep update (#274) - Increasing flush frequency in smoketest (#275) - Adding new experimental copy-only GFX kernel, gfxsweep update (#277) - Fixes for cuMem compilation and invalid device ordinal (#278) - Simplifying socket connect, allow for using host address (#279) - Updating podring to run on single node without need to force single pod (#280) - Adding SHOW_PERCENTILES to show extra per-iteration statistics (#281) --------- Co-authored-by: Tim <43156029+AtlantaPepsi@users.noreply.github.com> Co-authored-by: Pak Nin Lui <pak.lui@amd.com> Co-authored-by: pierreantoineH <PierreAntoine.Harraud@amd.com> Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Unify
a2asweepoutput with the better-formattedgfxsweepoutput.Also adds a summary table at the end.
Technical Details
Test Plan
Test Result
Submission Checklist