Skip to content

gfxsweep draft#254

Merged
AtlantaPepsi merged 3 commits intoROCm:candidatefrom
AtlantaPepsi:sweepPreset
Apr 10, 2026
Merged

gfxsweep draft#254
AtlantaPepsi merged 3 commits intoROCm:candidatefrom
AtlantaPepsi:sweepPreset

Conversation

@AtlantaPepsi
Copy link
Copy Markdown
Contributor

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

@AtlantaPepsi AtlantaPepsi requested a review from a team as a code owner April 10, 2026 17:11
@AtlantaPepsi AtlantaPepsi requested review from Copilot and removed request for a team April 10, 2026 23:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new gfxsweep preset to TransferBench for sweeping GFX executor tuning parameters over a single user-specified transfer.

Changes:

  • Introduces GfxSweepPreset (new preset implementation) with env-var controlled sweeps (e.g., block size, unroll, sub-execs, etc.).
  • Registers the new preset in the preset dispatch map so it’s available via ./TransferBench gfxsweep.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File Description
src/client/Presets/Presets.hpp Registers the new gfxsweep preset and exposes it in the preset list.
src/client/Presets/GfxSweep.hpp Implements the sweep logic, parsing a single transfer spec and iterating across tuning parameter grids.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/client/Presets/Presets.hpp
Comment thread src/client/Presets/GfxSweep.hpp Outdated
Comment thread src/client/Presets/GfxSweep.hpp Outdated
Comment thread src/client/Presets/GfxSweep.hpp
Comment thread src/client/Presets/GfxSweep.hpp
Comment thread src/client/Presets/GfxSweep.hpp Outdated
Comment thread src/client/Presets/GfxSweep.hpp
Comment thread src/client/Presets/GfxSweep.hpp Outdated
@AtlantaPepsi AtlantaPepsi merged commit 2900b4e into ROCm:candidate Apr 10, 2026
1 check was pending
@nileshnegi nileshnegi mentioned this pull request Apr 27, 2026
1 task
nileshnegi added a commit that referenced this pull request May 2, 2026
- Initial pod communication support (#235)
- cuda + MNNVL update & pod presets (#241)
- Increase CQ size for high qps (#244)
- fix hang when NVML is present but fabricmanager isnt (#246)
- Adding nica2a preset  (#248)
- Adding HBM read bandwidth preset (#250)
- Pod Ring preset (#251)
- gfxsweep preset (#254) (#256)
- Adding Batched DMA support (hipMemcpyBatchAsync), and bmasweep preset (#255)
- Adding a wallclock consistency detection preset (#258)
- Adding smoketest preset for simple correctness tests (#266)
- Help / envvars / presets presets (#267)
- Modernize CMake build (#268)
- Replace version-based pod/amd-smi detection with compile-time API probes (#269)
- Fix collective mismatch hangs in multi-rank error paths (#270)
- Fix SHOW_ITERATIONS table truncation with multiple transfers per executor (#271)
- Reformat a2asweep output to match gfxsweep style (#272)
- Gfx sweep update (#274)
- Increasing flush frequency in smoketest (#275)
- Adding new experimental copy-only GFX kernel, gfxsweep update (#277)
- Fixes for cuMem compilation and invalid device ordinal (#278)
- Simplifying socket connect, allow for using host address (#279)
- Updating podring to run on single node without need to force single pod (#280)
- Adding SHOW_PERCENTILES to show extra per-iteration statistics (#281)

---------

Co-authored-by: AtlantaPepsi <timhu102@gmail.com>
Co-authored-by: Pak Nin Lui <pak.lui@amd.com>
Co-authored-by: pierreantoineH <PierreAntoine.Harraud@amd.com>
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
nileshnegi added a commit that referenced this pull request May 2, 2026
- Initial pod communication support (#235)
- cuda + MNNVL update & pod presets (#241)
- Increase CQ size for high qps (#244)
- fix hang when NVML is present but fabricmanager isnt (#246)
- Adding nica2a preset  (#248)
- Adding HBM read bandwidth preset (#250)
- Pod Ring preset (#251)
- gfxsweep preset (#254) (#256)
- Adding Batched DMA support (hipMemcpyBatchAsync), and bmasweep preset (#255)
- Adding a wallclock consistency detection preset (#258)
- Adding smoketest preset for simple correctness tests (#266)
- Help / envvars / presets presets (#267)
- Modernize CMake build (#268)
- Replace version-based pod/amd-smi detection with compile-time API probes (#269)
- Fix collective mismatch hangs in multi-rank error paths (#270)
- Fix SHOW_ITERATIONS table truncation with multiple transfers per executor (#271)
- Reformat a2asweep output to match gfxsweep style (#272)
- Gfx sweep update (#274)
- Increasing flush frequency in smoketest (#275)
- Adding new experimental copy-only GFX kernel, gfxsweep update (#277)
- Fixes for cuMem compilation and invalid device ordinal (#278)
- Simplifying socket connect, allow for using host address (#279)
- Updating podring to run on single node without need to force single pod (#280)
- Adding SHOW_PERCENTILES to show extra per-iteration statistics (#281)

---------

Co-authored-by: Tim <43156029+AtlantaPepsi@users.noreply.github.com>
Co-authored-by: Pak Nin Lui <pak.lui@amd.com>
Co-authored-by: pierreantoineH <PierreAntoine.Harraud@amd.com>
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants