Modifying the gfxsweep preset by gilbertlee-amd · Pull Request #256 · ROCm/TransferBench

gilbertlee-amd · 2026-04-15T22:43:24Z

Motivation

Tweak the gfxsweep preset to make it easier to import into spreadsheets / track which set of options yields the best results.
Also allow the use of multiple Transfers / wildcards

Technical Details

Re-wrote parts of preset

Test Result

Here's example output:

[GFX Sweep Related]
BLOCKSIZES           =            4 : 256,512,768,1024
GFX_TRANSFER         = R0G0->R0G0->R0G0 : GFX Transfer to sweep (see config file format)
NUM_SUB_EXECS        =            5 : 4,8,16,32,64
TEMPORAL_MODES       =            1 : 0
UNROLLS              =            4 : 1,2,4,8
WAVE_ORDERS          =            1 : 0
WORDSIZES            =            1 : 4
VERBOSE              =            0 : Display summary only

GFX sweep: (1048576 bytes per Transfer). All values are CPU-timed GB/s
=======================================================================================
Transfer     0: (G0->G0->G0)
=======================================================================================
 WvO   WSz   TpM   BlkS   UnR   SE 004  SE 008  SE 016  SE 032  SE 064
  0     4     0     256     1     5.38    6.00    7.79    6.76    4.22
  0     4     0     256     2    11.81    9.23    9.50    6.84    4.64
  0     4     0     256     4    12.05   11.29    8.98    6.44    4.58
  0     4     0     256     8    12.01   11.25    9.52    6.48    4.59
  0     4     0     512     1    11.75   10.65    9.33    6.80    4.63
  0     4     0     512     2    12.27   10.95    9.70    6.66    4.59
  0     4     0     512     4    11.76   11.26    8.54    6.62    4.58
  0     4     0     512     8    10.66   11.44    9.46    7.01    4.51
  0     4     0     768     1    10.09   11.37    8.69    6.91    4.65
  0     4     0     768     2    12.07   11.38    9.03    6.81    4.63
  0     4     0     768     4    12.10   11.30    8.97    6.40    4.66
  0     4     0     768     8    12.33   11.28    9.38    7.13    4.52
  0     4     0    1024     1    12.07   10.96    9.37    7.05    4.62
  0     4     0    1024     2    12.82   10.94    9.34    6.81    4.68
  0     4     0    1024     4    12.11   10.72    9.61    6.28    4.65
  0     4     0    1024     8    11.58   10.82    9.17    6.73    4.46
=======================================================================================
Highest bandwidth found:   12.82 GB/s (CPU-timed)
          WaveOrder    :       0
          WordSize     :       4
          Temporal Mode:       0
          BlockSize    :    1024
          Unroll       :       2
          NumSubExec   :       4

Copilot

Pull request overview

Updates the gfxsweep preset to produce more spreadsheet-friendly tabular output and to support sweeping across a configurable set of GFX kernel parameters while tracking the best-performing combination.

Changes:

Reworked gfxsweep output into a single table with per-configuration rows and per-NUM_SUB_EXECS columns, plus a “best bandwidth” summary.
Switched transfer selection to an env-var-driven string parsed via ParseTransfers, and iterates over multiple transfers (if produced by wildcard expansion).
Simplified/updated the set of printed “related” environment variables for this preset.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

- Initial pod communication support (#235) - cuda + MNNVL update & pod presets (#241) - Increase CQ size for high qps (#244) - fix hang when NVML is present but fabricmanager isnt (#246) - Adding nica2a preset (#248) - Adding HBM read bandwidth preset (#250) - Pod Ring preset (#251) - gfxsweep preset (#254) (#256) - Adding Batched DMA support (hipMemcpyBatchAsync), and bmasweep preset (#255) - Adding a wallclock consistency detection preset (#258) - Adding smoketest preset for simple correctness tests (#266) - Help / envvars / presets presets (#267) - Modernize CMake build (#268) - Replace version-based pod/amd-smi detection with compile-time API probes (#269) - Fix collective mismatch hangs in multi-rank error paths (#270) - Fix SHOW_ITERATIONS table truncation with multiple transfers per executor (#271) - Reformat a2asweep output to match gfxsweep style (#272) - Gfx sweep update (#274) - Increasing flush frequency in smoketest (#275) - Adding new experimental copy-only GFX kernel, gfxsweep update (#277) - Fixes for cuMem compilation and invalid device ordinal (#278) - Simplifying socket connect, allow for using host address (#279) - Updating podring to run on single node without need to force single pod (#280) - Adding SHOW_PERCENTILES to show extra per-iteration statistics (#281) --------- Co-authored-by: AtlantaPepsi <timhu102@gmail.com> Co-authored-by: Pak Nin Lui <pak.lui@amd.com> Co-authored-by: pierreantoineH <PierreAntoine.Harraud@amd.com> Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

- Initial pod communication support (#235) - cuda + MNNVL update & pod presets (#241) - Increase CQ size for high qps (#244) - fix hang when NVML is present but fabricmanager isnt (#246) - Adding nica2a preset (#248) - Adding HBM read bandwidth preset (#250) - Pod Ring preset (#251) - gfxsweep preset (#254) (#256) - Adding Batched DMA support (hipMemcpyBatchAsync), and bmasweep preset (#255) - Adding a wallclock consistency detection preset (#258) - Adding smoketest preset for simple correctness tests (#266) - Help / envvars / presets presets (#267) - Modernize CMake build (#268) - Replace version-based pod/amd-smi detection with compile-time API probes (#269) - Fix collective mismatch hangs in multi-rank error paths (#270) - Fix SHOW_ITERATIONS table truncation with multiple transfers per executor (#271) - Reformat a2asweep output to match gfxsweep style (#272) - Gfx sweep update (#274) - Increasing flush frequency in smoketest (#275) - Adding new experimental copy-only GFX kernel, gfxsweep update (#277) - Fixes for cuMem compilation and invalid device ordinal (#278) - Simplifying socket connect, allow for using host address (#279) - Updating podring to run on single node without need to force single pod (#280) - Adding SHOW_PERCENTILES to show extra per-iteration statistics (#281) --------- Co-authored-by: Tim <43156029+AtlantaPepsi@users.noreply.github.com> Co-authored-by: Pak Nin Lui <pak.lui@amd.com> Co-authored-by: pierreantoineH <PierreAntoine.Harraud@amd.com> Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Modifying the gfxsweep preset

46a1315

gilbertlee-amd requested a review from AtlantaPepsi April 15, 2026 22:43

gilbertlee-amd requested a review from a team as a code owner April 15, 2026 22:43

Flipping unroll/numsubexec typo

b27032b

nileshnegi requested a review from Copilot April 15, 2026 23:07

Copilot started reviewing on behalf of nileshnegi April 15, 2026 23:08 View session

Copilot AI reviewed Apr 15, 2026

View reviewed changes

Comment thread src/client/Presets/GfxSweep.hpp Outdated

Comment thread src/client/Presets/GfxSweep.hpp Outdated

Comment thread src/client/Presets/GfxSweep.hpp Outdated

alex-breslow-amd reviewed Apr 15, 2026

View reviewed changes

Comment thread src/client/Presets/GfxSweep.hpp

alex-breslow-amd approved these changes Apr 15, 2026

View reviewed changes

Update src/client/Presets/GfxSweep.hpp

ccce8bf

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

AtlantaPepsi approved these changes Apr 15, 2026

View reviewed changes

gilbertlee-amd added 2 commits April 16, 2026 22:41

Fixing HIP version check

1cfd8bd

Adding ability to specify multiple Transfers, adding flush

ce53433

gilbertlee-amd merged commit 2aa036c into ROCm:candidate Apr 17, 2026
4 checks passed

gilbertlee-amd deleted the gfxSweepUpdate branch April 17, 2026 16:06

nileshnegi mentioned this pull request Apr 27, 2026

TransferBench v1.67.0 #273

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modifying the gfxsweep preset#256

Modifying the gfxsweep preset#256
gilbertlee-amd merged 5 commits intoROCm:candidatefrom
gilbertlee-amd:gfxSweepUpdate

gilbertlee-amd commented Apr 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

gilbertlee-amd commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Result

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gilbertlee-amd commented Apr 15, 2026 •

edited

Loading