Skip to content

Modernize CMake build#268

Merged
nileshnegi merged 9 commits intocandidatefrom
users/nileshnegi/update-cmake-build
Apr 27, 2026
Merged

Modernize CMake build#268
nileshnegi merged 9 commits intocandidatefrom
users/nileshnegi/update-cmake-build

Conversation

@nileshnegi
Copy link
Copy Markdown
Collaborator

@nileshnegi nileshnegi commented Apr 26, 2026

Motivation

Improve the CMake and Makefile build system for TransferBench. The core idea is making CMake work correctly out-of-the-box (i.e., no flags required), while tightening the correctness of every feature-detection probe.

Technical Details

  • Bump cmake_minimum_required to 3.16 (required for MPI::MPI_CXX and hip:: targets)
  • Remove toolchain-linux.cmake; Move ROCM_PATH, compiler, and flag detection into CMakeLists (as users rarely explicitly point to the toolchain file)
  • Fix GPU_TARGETS seeding: respect AMDGPU_TARGETS, use CACHE STRING with FORCE so re-runs don't revert to defaults
  • Add cmake_push_check_state/pop around check_symbol_exists calls to prevent CMAKE_REQUIRED_* leaking between checks
  • Fix HSA find_library to use NO_DEFAULT_PATH and search lib64 as well
  • Fix MPI_CXX_INCLUDE_PATH typo -> MPI_CXX_INCLUDE_DIRS
  • Remove redundant double include of cmake/Dependencies.cmake
  • Modernize target_link_libraries: use target_link_options for -fgpu-rdc, consolidate hip::host/device/dl, guard numa and hsa-runtime64 with if(TARGET)
  • Compact target_include_directories into a single call
  • Move PACKAGE_NAME/LIBRARY_NAME/CMAKE_RUNTIME_OUTPUT_DIRECTORY before add_executable

Test Plan

Test Result

Submission Checklist

Copilot AI review requested due to automatic review settings April 26, 2026 08:15
@nileshnegi nileshnegi requested a review from a team as a code owner April 26, 2026 08:15
@nileshnegi nileshnegi force-pushed the users/nileshnegi/update-cmake-build branch from 9a9ff6c to 0f13c28 Compare April 26, 2026 08:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Modernizes TransferBench’s CMake build to better support ROCm/HIP tooling, GPU target selection, and dependency discovery while reducing fragile hard-coded paths.

Changes:

  • Updated CMake minimum version and reworked ROCm/HIP discovery and search paths to avoid accidentally selecting the wrong ROCm.
  • Reworked GPU target seeding/validation (AMDGPU_TARGETS support, cache seeding/forcing, and rocm_check_target_ids filtering).
  • Modernized linking/include handling (push/pop check state, imported targets for optional deps, target_link_options usage, and cleanup of redundant includes/typos).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
toolchain-linux.cmake Rewritten ROCm/toolchain detection (ROCM_PATH + compiler selection) and consolidated default build flags.
CMakeLists.txt Raises CMake minimum, improves ROCm fallback/search logic, stabilizes GPU target cache behavior, and modernizes target wiring/linking.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CMakeLists.txt Outdated
Comment thread toolchain-linux.cmake Outdated
Comment thread CMakeLists.txt Outdated
Comment thread CMakeLists.txt Outdated
Comment thread toolchain-linux.cmake Outdated
nileshnegi and others added 2 commits April 26, 2026 03:44
…ection

- Bump cmake_minimum_required to 3.16 (required for MPI::MPI_CXX and hip:: targets)
- Rewrite toolchain-linux.cmake with macros for compiler and flag detection;
  auto-detects ROCM_PATH from PATH and /opt/rocm fallback
- Fix GPU_TARGETS seeding: respect AMDGPU_TARGETS, use CACHE STRING with FORCE
  so re-runs don't revert to defaults
- Move ROCM_PATH fallback into CMakeLists.txt so builds without the toolchain
  still find ROCm via $ROCM_PATH env or /opt/rocm
- Replace unconditional /opt/rocm prefix paths with conditional: only append
  /opt/rocm fallbacks when ROCM_PATH differs to avoid wrong ROCm being picked
- Add cmake_push_check_state/pop around check_symbol_exists calls to prevent
  CMAKE_REQUIRED_* leaking between checks
- Fix HSA find_library to use NO_DEFAULT_PATH and search lib64 as well
- Fix MPI_CXX_INCLUDE_PATH typo -> MPI_CXX_INCLUDE_DIRS
- Remove redundant double include of cmake/Dependencies.cmake
- Modernize target_link_libraries: use target_link_options for -fgpu-rdc,
  consolidate hip::host/device/dl, guard numa and hsa-runtime64 with if(TARGET)
- Compact target_include_directories into a single call
- Move PACKAGE_NAME/LIBRARY_NAME/CMAKE_RUNTIME_OUTPUT_DIRECTORY before add_executable

Co-authored-by: Claude <claude@anthropic.com>
…g, hint unset

- CMakeLists.txt: use `if(NOT ROCM_PATH)` instead of `if(NOT DEFINED ROCM_PATH)`
  so an explicitly empty -DROCM_PATH= is treated as unset and triggers the fallback
- CMakeLists.txt: quote `"${ROCM_PATH}"` in set(ENV{ROCM_PATH} ...) for path-with-spaces safety
- toolchain-linux.cmake: add bare `unset(_rocm_bin_hint)` alongside `unset(_rocm_bin_hint CACHE)`
  since find_program writes both a normal variable and a cache entry
nileshnegi and others added 2 commits April 26, 2026 15:13
Replace plain set(CMAKE_CXX_FLAGS_DEBUG ...) calls in tb_set_build_flags
with the CMake-idiomatic CMAKE_<LANG>_FLAGS_<CONFIG>_INIT variables.

_INIT variables are read by CMake when creating cache entries on first
configure, so the cache correctly reflects the actual flags used.
The previous approach set normal variables that shadowed the cache,
leaving ccmake/cmake-gui showing stale values. The if(NOT
CMAKE_CXX_FLAGS_DEBUG) guards are dropped — _INIT vars are already
no-ops when the cache entry exists, which is the same behavior.

Co-authored-by: Claude <claude@anthropic.com>
Copilot AI review requested due to automatic review settings April 26, 2026 20:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CMakeLists.txt Outdated
Comment thread CMakeLists.txt
Comment thread CMakeLists.txt Outdated
Comment thread CMakeLists.txt Outdated
Comment thread CMakeLists.txt Outdated
- AMDGPU_TARGETS: add ENV{AMDGPU_TARGETS} fallback so setting the env
  var (common HIP practice) seeds GPU_TARGETS on first configure,
  not just the CMake variable form.

- parallel-jobs: include(CheckCXXCompilerFlag) and add
  check_cxx_compiler_flag(-parallel-jobs=12 HAVE_PARALLEL_JOBS) before
  the conditional. HAVE_PARALLEL_JOBS was never set so the flag was
  never applied.

- DISABLE_DMABUF → DISABLE_DMA_BUF: align CMake env var name with
  the Makefile's public interface.

- AMD_SMI: add find_library/find_path for amd_smi inside the version-
  check success branch so AMD_SMI_LIBRARY and AMD_SMI_INCLUDE_DIR are
  populated before the target_link_libraries/target_include_directories
  block. Previously both variables were always empty, causing silent
  link failure with ENABLE_AMD_SMI=ON.

- NUMA/HSA: move FATAL_ERROR to the find_library/find_path site so
  configure fails fast with a clear diagnostic. Remove late TARGET
  guards near target_link_libraries; link calls are now unconditional
  since a missing library aborts configure earlier.

Co-authored-by: Claude <claude@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CMakeLists.txt Outdated
Comment thread CMakeLists.txt
nileshnegi and others added 2 commits April 26, 2026 17:46
toolchain-linux.cmake is only loaded when -DCMAKE_TOOLCHAIN_FILE= is passed on
the very first configure run; the set(CMAKE_TOOLCHAIN_FILE ...) inside
CMakeLists.txt fires too late for CMake to honour it on a fresh build directory.

Move all toolchain logic (ROCM_PATH detection, compiler selection, per-config
build flags) into a pre-project() block in CMakeLists.txt where it executes
unconditionally on every configure. The detection priority is preserved:

  -DROCM_PATH / $ROCM_PATH env > PATH walk (amdclang++/clang++) > /opt/rocm

Compiler selection still respects -DCMAKE_CXX_COMPILER and $CXX/$CC; build
flags use _INIT variables so cache entries are visible in cmake-gui and user
overrides via -DCMAKE_CXX_FLAGS_<CONFIG>=... or $CXXFLAGS/$CFLAGS win.

Remove the now-redundant post-project() ROCM_PATH fallback block and delete
toolchain-linux.cmake entirely.

Co-authored-by: Claude <claude@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

CMakeLists.txt:438

  • An imported ibverbs target is created earlier, but the executable still links via raw ${IBVERBS_LIBRARY} and manually adds ${IBVERBS_INCLUDE_DIR}. This duplicates dependency wiring and makes it easier for include/link settings to drift. Prefer linking TransferBench against the imported ibverbs target (and drop the manual include dir), or remove the unused imported target and keep the variable-based approach.
if(IBVERBS_FOUND)
  target_include_directories(TransferBench PRIVATE ${IBVERBS_INCLUDE_DIR})
  target_link_libraries(TransferBench PRIVATE ${IBVERBS_LIBRARY})
  target_compile_definitions(TransferBench PRIVATE NIC_EXEC_ENABLED)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CMakeLists.txt
Comment thread CMakeLists.txt Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 26, 2026 23:39
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CMakeLists.txt Outdated
Comment thread CMakeLists.txt
Remove the /opt/rocm fallback from CMAKE_PREFIX_PATH. When ROCM_PATH is a
non-default path, appending /opt/rocm* silently mixes config packages and
headers from a different ROCm install, leading to hard-to-diagnose ABI
mismatches. ROCM_PATH is already fully resolved by the pre-project() detection
block, so CMAKE_PREFIX_PATH only needs to cover that single install.

Also extend the MPI manual-path find_library search to include lib64 alongside
lib, matching the lib/lib64 handling used for other dependencies.

Co-authored-by: Claude <claude@anthropic.com>
@nileshnegi nileshnegi merged commit 59fbe56 into candidate Apr 27, 2026
3 of 4 checks passed
@nileshnegi nileshnegi deleted the users/nileshnegi/update-cmake-build branch April 27, 2026 00:23
@nileshnegi nileshnegi mentioned this pull request Apr 27, 2026
1 task
nileshnegi added a commit that referenced this pull request May 2, 2026
- Initial pod communication support (#235)
- cuda + MNNVL update & pod presets (#241)
- Increase CQ size for high qps (#244)
- fix hang when NVML is present but fabricmanager isnt (#246)
- Adding nica2a preset  (#248)
- Adding HBM read bandwidth preset (#250)
- Pod Ring preset (#251)
- gfxsweep preset (#254) (#256)
- Adding Batched DMA support (hipMemcpyBatchAsync), and bmasweep preset (#255)
- Adding a wallclock consistency detection preset (#258)
- Adding smoketest preset for simple correctness tests (#266)
- Help / envvars / presets presets (#267)
- Modernize CMake build (#268)
- Replace version-based pod/amd-smi detection with compile-time API probes (#269)
- Fix collective mismatch hangs in multi-rank error paths (#270)
- Fix SHOW_ITERATIONS table truncation with multiple transfers per executor (#271)
- Reformat a2asweep output to match gfxsweep style (#272)
- Gfx sweep update (#274)
- Increasing flush frequency in smoketest (#275)
- Adding new experimental copy-only GFX kernel, gfxsweep update (#277)
- Fixes for cuMem compilation and invalid device ordinal (#278)
- Simplifying socket connect, allow for using host address (#279)
- Updating podring to run on single node without need to force single pod (#280)
- Adding SHOW_PERCENTILES to show extra per-iteration statistics (#281)

---------

Co-authored-by: AtlantaPepsi <timhu102@gmail.com>
Co-authored-by: Pak Nin Lui <pak.lui@amd.com>
Co-authored-by: pierreantoineH <PierreAntoine.Harraud@amd.com>
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
nileshnegi added a commit that referenced this pull request May 2, 2026
- Initial pod communication support (#235)
- cuda + MNNVL update & pod presets (#241)
- Increase CQ size for high qps (#244)
- fix hang when NVML is present but fabricmanager isnt (#246)
- Adding nica2a preset  (#248)
- Adding HBM read bandwidth preset (#250)
- Pod Ring preset (#251)
- gfxsweep preset (#254) (#256)
- Adding Batched DMA support (hipMemcpyBatchAsync), and bmasweep preset (#255)
- Adding a wallclock consistency detection preset (#258)
- Adding smoketest preset for simple correctness tests (#266)
- Help / envvars / presets presets (#267)
- Modernize CMake build (#268)
- Replace version-based pod/amd-smi detection with compile-time API probes (#269)
- Fix collective mismatch hangs in multi-rank error paths (#270)
- Fix SHOW_ITERATIONS table truncation with multiple transfers per executor (#271)
- Reformat a2asweep output to match gfxsweep style (#272)
- Gfx sweep update (#274)
- Increasing flush frequency in smoketest (#275)
- Adding new experimental copy-only GFX kernel, gfxsweep update (#277)
- Fixes for cuMem compilation and invalid device ordinal (#278)
- Simplifying socket connect, allow for using host address (#279)
- Updating podring to run on single node without need to force single pod (#280)
- Adding SHOW_PERCENTILES to show extra per-iteration statistics (#281)

---------

Co-authored-by: Tim <43156029+AtlantaPepsi@users.noreply.github.com>
Co-authored-by: Pak Nin Lui <pak.lui@amd.com>
Co-authored-by: pierreantoineH <PierreAntoine.Harraud@amd.com>
Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants