Modernize CMake build#268
Conversation
9a9ff6c to
0f13c28
Compare
There was a problem hiding this comment.
Pull request overview
Modernizes TransferBench’s CMake build to better support ROCm/HIP tooling, GPU target selection, and dependency discovery while reducing fragile hard-coded paths.
Changes:
- Updated CMake minimum version and reworked ROCm/HIP discovery and search paths to avoid accidentally selecting the wrong ROCm.
- Reworked GPU target seeding/validation (AMDGPU_TARGETS support, cache seeding/forcing, and
rocm_check_target_idsfiltering). - Modernized linking/include handling (push/pop check state, imported targets for optional deps,
target_link_optionsusage, and cleanup of redundant includes/typos).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| toolchain-linux.cmake | Rewritten ROCm/toolchain detection (ROCM_PATH + compiler selection) and consolidated default build flags. |
| CMakeLists.txt | Raises CMake minimum, improves ROCm fallback/search logic, stabilizes GPU target cache behavior, and modernizes target wiring/linking. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ection - Bump cmake_minimum_required to 3.16 (required for MPI::MPI_CXX and hip:: targets) - Rewrite toolchain-linux.cmake with macros for compiler and flag detection; auto-detects ROCM_PATH from PATH and /opt/rocm fallback - Fix GPU_TARGETS seeding: respect AMDGPU_TARGETS, use CACHE STRING with FORCE so re-runs don't revert to defaults - Move ROCM_PATH fallback into CMakeLists.txt so builds without the toolchain still find ROCm via $ROCM_PATH env or /opt/rocm - Replace unconditional /opt/rocm prefix paths with conditional: only append /opt/rocm fallbacks when ROCM_PATH differs to avoid wrong ROCm being picked - Add cmake_push_check_state/pop around check_symbol_exists calls to prevent CMAKE_REQUIRED_* leaking between checks - Fix HSA find_library to use NO_DEFAULT_PATH and search lib64 as well - Fix MPI_CXX_INCLUDE_PATH typo -> MPI_CXX_INCLUDE_DIRS - Remove redundant double include of cmake/Dependencies.cmake - Modernize target_link_libraries: use target_link_options for -fgpu-rdc, consolidate hip::host/device/dl, guard numa and hsa-runtime64 with if(TARGET) - Compact target_include_directories into a single call - Move PACKAGE_NAME/LIBRARY_NAME/CMAKE_RUNTIME_OUTPUT_DIRECTORY before add_executable Co-authored-by: Claude <claude@anthropic.com>
…g, hint unset
- CMakeLists.txt: use `if(NOT ROCM_PATH)` instead of `if(NOT DEFINED ROCM_PATH)`
so an explicitly empty -DROCM_PATH= is treated as unset and triggers the fallback
- CMakeLists.txt: quote `"${ROCM_PATH}"` in set(ENV{ROCM_PATH} ...) for path-with-spaces safety
- toolchain-linux.cmake: add bare `unset(_rocm_bin_hint)` alongside `unset(_rocm_bin_hint CACHE)`
since find_program writes both a normal variable and a cache entry
0f13c28 to
4c9bba9
Compare
Replace plain set(CMAKE_CXX_FLAGS_DEBUG ...) calls in tb_set_build_flags with the CMake-idiomatic CMAKE_<LANG>_FLAGS_<CONFIG>_INIT variables. _INIT variables are read by CMake when creating cache entries on first configure, so the cache correctly reflects the actual flags used. The previous approach set normal variables that shadowed the cache, leaving ccmake/cmake-gui showing stale values. The if(NOT CMAKE_CXX_FLAGS_DEBUG) guards are dropped — _INIT vars are already no-ops when the cache entry exists, which is the same behavior. Co-authored-by: Claude <claude@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- AMDGPU_TARGETS: add ENV{AMDGPU_TARGETS} fallback so setting the env
var (common HIP practice) seeds GPU_TARGETS on first configure,
not just the CMake variable form.
- parallel-jobs: include(CheckCXXCompilerFlag) and add
check_cxx_compiler_flag(-parallel-jobs=12 HAVE_PARALLEL_JOBS) before
the conditional. HAVE_PARALLEL_JOBS was never set so the flag was
never applied.
- DISABLE_DMABUF → DISABLE_DMA_BUF: align CMake env var name with
the Makefile's public interface.
- AMD_SMI: add find_library/find_path for amd_smi inside the version-
check success branch so AMD_SMI_LIBRARY and AMD_SMI_INCLUDE_DIR are
populated before the target_link_libraries/target_include_directories
block. Previously both variables were always empty, causing silent
link failure with ENABLE_AMD_SMI=ON.
- NUMA/HSA: move FATAL_ERROR to the find_library/find_path site so
configure fails fast with a clear diagnostic. Remove late TARGET
guards near target_link_libraries; link calls are now unconditional
since a missing library aborts configure earlier.
Co-authored-by: Claude <claude@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
toolchain-linux.cmake is only loaded when -DCMAKE_TOOLCHAIN_FILE= is passed on the very first configure run; the set(CMAKE_TOOLCHAIN_FILE ...) inside CMakeLists.txt fires too late for CMake to honour it on a fresh build directory. Move all toolchain logic (ROCM_PATH detection, compiler selection, per-config build flags) into a pre-project() block in CMakeLists.txt where it executes unconditionally on every configure. The detection priority is preserved: -DROCM_PATH / $ROCM_PATH env > PATH walk (amdclang++/clang++) > /opt/rocm Compiler selection still respects -DCMAKE_CXX_COMPILER and $CXX/$CC; build flags use _INIT variables so cache entries are visible in cmake-gui and user overrides via -DCMAKE_CXX_FLAGS_<CONFIG>=... or $CXXFLAGS/$CFLAGS win. Remove the now-redundant post-project() ROCM_PATH fallback block and delete toolchain-linux.cmake entirely. Co-authored-by: Claude <claude@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
CMakeLists.txt:438
- An imported
ibverbstarget is created earlier, but the executable still links via raw${IBVERBS_LIBRARY}and manually adds${IBVERBS_INCLUDE_DIR}. This duplicates dependency wiring and makes it easier for include/link settings to drift. Prefer linkingTransferBenchagainst the importedibverbstarget (and drop the manual include dir), or remove the unused imported target and keep the variable-based approach.
if(IBVERBS_FOUND)
target_include_directories(TransferBench PRIVATE ${IBVERBS_INCLUDE_DIR})
target_link_libraries(TransferBench PRIVATE ${IBVERBS_LIBRARY})
target_compile_definitions(TransferBench PRIVATE NIC_EXEC_ENABLED)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Remove the /opt/rocm fallback from CMAKE_PREFIX_PATH. When ROCM_PATH is a non-default path, appending /opt/rocm* silently mixes config packages and headers from a different ROCm install, leading to hard-to-diagnose ABI mismatches. ROCM_PATH is already fully resolved by the pre-project() detection block, so CMAKE_PREFIX_PATH only needs to cover that single install. Also extend the MPI manual-path find_library search to include lib64 alongside lib, matching the lib/lib64 handling used for other dependencies. Co-authored-by: Claude <claude@anthropic.com>
- Initial pod communication support (#235) - cuda + MNNVL update & pod presets (#241) - Increase CQ size for high qps (#244) - fix hang when NVML is present but fabricmanager isnt (#246) - Adding nica2a preset (#248) - Adding HBM read bandwidth preset (#250) - Pod Ring preset (#251) - gfxsweep preset (#254) (#256) - Adding Batched DMA support (hipMemcpyBatchAsync), and bmasweep preset (#255) - Adding a wallclock consistency detection preset (#258) - Adding smoketest preset for simple correctness tests (#266) - Help / envvars / presets presets (#267) - Modernize CMake build (#268) - Replace version-based pod/amd-smi detection with compile-time API probes (#269) - Fix collective mismatch hangs in multi-rank error paths (#270) - Fix SHOW_ITERATIONS table truncation with multiple transfers per executor (#271) - Reformat a2asweep output to match gfxsweep style (#272) - Gfx sweep update (#274) - Increasing flush frequency in smoketest (#275) - Adding new experimental copy-only GFX kernel, gfxsweep update (#277) - Fixes for cuMem compilation and invalid device ordinal (#278) - Simplifying socket connect, allow for using host address (#279) - Updating podring to run on single node without need to force single pod (#280) - Adding SHOW_PERCENTILES to show extra per-iteration statistics (#281) --------- Co-authored-by: AtlantaPepsi <timhu102@gmail.com> Co-authored-by: Pak Nin Lui <pak.lui@amd.com> Co-authored-by: pierreantoineH <PierreAntoine.Harraud@amd.com> Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
- Initial pod communication support (#235) - cuda + MNNVL update & pod presets (#241) - Increase CQ size for high qps (#244) - fix hang when NVML is present but fabricmanager isnt (#246) - Adding nica2a preset (#248) - Adding HBM read bandwidth preset (#250) - Pod Ring preset (#251) - gfxsweep preset (#254) (#256) - Adding Batched DMA support (hipMemcpyBatchAsync), and bmasweep preset (#255) - Adding a wallclock consistency detection preset (#258) - Adding smoketest preset for simple correctness tests (#266) - Help / envvars / presets presets (#267) - Modernize CMake build (#268) - Replace version-based pod/amd-smi detection with compile-time API probes (#269) - Fix collective mismatch hangs in multi-rank error paths (#270) - Fix SHOW_ITERATIONS table truncation with multiple transfers per executor (#271) - Reformat a2asweep output to match gfxsweep style (#272) - Gfx sweep update (#274) - Increasing flush frequency in smoketest (#275) - Adding new experimental copy-only GFX kernel, gfxsweep update (#277) - Fixes for cuMem compilation and invalid device ordinal (#278) - Simplifying socket connect, allow for using host address (#279) - Updating podring to run on single node without need to force single pod (#280) - Adding SHOW_PERCENTILES to show extra per-iteration statistics (#281) --------- Co-authored-by: Tim <43156029+AtlantaPepsi@users.noreply.github.com> Co-authored-by: Pak Nin Lui <pak.lui@amd.com> Co-authored-by: pierreantoineH <PierreAntoine.Harraud@amd.com> Co-authored-by: Nilesh M Negi <Nilesh.Negi@amd.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Motivation
Improve the CMake and Makefile build system for TransferBench. The core idea is making CMake work correctly out-of-the-box (i.e., no flags required), while tightening the correctness of every feature-detection probe.
Technical Details
Test Plan
Test Result
Submission Checklist