Skip to content

[Common] Always define cuBLASMp comm GEMM API#2963

Merged
vcherepanov-nv merged 2 commits into
NVIDIA:mainfrom
vcherepanov-nv:cublasmp-dummy
May 6, 2026
Merged

[Common] Always define cuBLASMp comm GEMM API#2963
vcherepanov-nv merged 2 commits into
NVIDIA:mainfrom
vcherepanov-nv:cublasmp-dummy

Conversation

@vcherepanov-nv
Copy link
Copy Markdown
Collaborator

Description

Make cuBLASMp-related TE API unconditional.
NVTE_WITH_CUBLASMP options now controls whether the user gets the real implementation or subs, emitting runtime errors.

Fixes # (issue)

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • Make comm-gemm API unconditional, stubs emit runtime errors if called without NVTE_WITH_CUBLASMP flag set.

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Signed-off-by: Vladimir Cherepanov <vcherepanov@nvidia.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 6, 2026

Greptile Summary

This PR makes the cuBLASMp comm-GEMM API unconditional: comm_gemm/comm_gemm.cpp is now always compiled, with the real implementation guarded by #ifdef NVTE_WITH_CUBLASMP and a #else block of stubs that throw a descriptive runtime error when called without cuBLASMp support.

  • CMakeLists.txt: comm_gemm/comm_gemm.cpp is moved from a conditional NVTE_WITH_CUBLASMP-gated append to transformer_engine_SOURCES into the unconditional transformer_engine_cpp_sources list, which is already merged into transformer_engine_SOURCES at build time.
  • comm_gemm.cpp: #include <cublasmp.h> and using namespace transformer_engine are now inside the #ifdef NVTE_WITH_CUBLASMP block; a new #else block provides stub definitions for all five public API functions that immediately throw std::runtime_error with a clear build-configuration message.

Confidence Score: 4/5

Safe to merge. The change is a straightforward build-system and conditional-compilation refactor with no logic modifications to the real cuBLASMp path.

The CMakeLists.txt change is correct — transformer_engine_cpp_sources is merged into transformer_engine_SOURCES at line 237, so the file ends up in exactly the right place. The stub implementations cover all five public API functions and use the standard NVTE_ERROR pattern. The only nit is that two non-void stubs (nvte_comm_gemm_ctx_create and nvte_comm_gemm_numroc) omit explicit return statements after NVTE_ERROR, which is harmless in practice since the macro always throws, but may generate warnings under stricter toolchain configurations.

No files require special attention; both changed files are straightforward.

Important Files Changed

Filename Overview
transformer_engine/common/comm_gemm/comm_gemm.cpp Wraps the real cuBLASMp implementation in #ifdef NVTE_WITH_CUBLASMP and adds a #else block with stub functions that throw runtime errors. Two non-void stub functions (nvte_comm_gemm_ctx_create and nvte_comm_gemm_numroc) omit explicit return statements, which may produce warnings on some toolchains.
transformer_engine/common/CMakeLists.txt Moves comm_gemm/comm_gemm.cpp from a conditional transformer_engine_SOURCES append (guarded by NVTE_WITH_CUBLASMP) to the unconditional transformer_engine_cpp_sources list, which is merged into transformer_engine_SOURCES at line 237. The change is correct and consistent with how all other unconditional C++ sources are registered.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[comm_gemm.cpp compiled unconditionally] --> B{NVTE_WITH_CUBLASMP defined?}
    B -- Yes --> C[#include cublasmp.h\nusing namespace transformer_engine\nReal implementation]
    B -- No --> D[Stub implementations\nNVTE_ERROR on all calls]
    C --> E[nvte_comm_gemm_ctx_create\nnvte_comm_gemm_ctx_destroy\nnvte_all_gather_gemm\nnvte_gemm_reduce_scatter\nnvte_gemm_all_reduce\nnvte_comm_gemm_numroc]
    D --> F[Same API surface\nthrows runtime_error]
Loading

Reviews (1): Last reviewed commit: "[pre-commit.ci] auto fixes from pre-comm..." | Re-trigger Greptile

Comment on lines +541 to +543
NVTECommGemmCtx* nvte_comm_gemm_ctx_create(ncclComm_t comm, int nranks, int rank) {
NVTE_ERROR("Transformer Engine has not been built with cuBLASMp support.");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Both nvte_comm_gemm_ctx_create (returns NVTECommGemmCtx*) and nvte_comm_gemm_numroc (returns int64_t) in the stub block have no explicit return statement after NVTE_ERROR. NVTE_ERROR always throws so execution never reaches the end, but certain compilers or static-analysis passes may still flag these as missing returns. Adding unreachable return values makes the intent explicit and is consistent with many defensive coding patterns for non-[[noreturn]] macros.

Suggested change
NVTECommGemmCtx* nvte_comm_gemm_ctx_create(ncclComm_t comm, int nranks, int rank) {
NVTE_ERROR("Transformer Engine has not been built with cuBLASMp support.");
}
NVTECommGemmCtx* nvte_comm_gemm_ctx_create(ncclComm_t comm, int nranks, int rank) {
NVTE_ERROR("Transformer Engine has not been built with cuBLASMp support.");
return nullptr;
}

Comment on lines +573 to +575
int64_t nvte_comm_gemm_numroc(NVTECommGemmCtx* ctx, int64_t global_size) {
NVTE_ERROR("Transformer Engine has not been built with cuBLASMp support.");
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 nvte_comm_gemm_numroc also returns a non-void type (int64_t) without an explicit return statement in the stub. Same reasoning as above — add an unreachable sentinel to be safe.

Suggested change
int64_t nvte_comm_gemm_numroc(NVTECommGemmCtx* ctx, int64_t global_size) {
NVTE_ERROR("Transformer Engine has not been built with cuBLASMp support.");
}
int64_t nvte_comm_gemm_numroc(NVTECommGemmCtx* ctx, int64_t global_size) {
NVTE_ERROR("Transformer Engine has not been built with cuBLASMp support.");
return 0;
}

Copy link
Copy Markdown
Member

@timmoon10 timmoon10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strong approve. This will help avoid linker weirdness.

@vcherepanov-nv vcherepanov-nv merged commit 3c89426 into NVIDIA:main May 6, 2026
2 checks passed
faradawn pushed a commit to faradawn/TransformerEngine that referenced this pull request May 14, 2026
* Always define cuBLASMp comm GEMM API

Signed-off-by: Vladimir Cherepanov <vcherepanov@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Vladimir Cherepanov <vcherepanov@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants