NCCL 2.29 – LLVM Intermediate Representation (IR) #2010

gab9talavera · 2026-02-02T17:52:32Z

gab9talavera
Feb 2, 2026
Maintainer

LLVM Intermediate Representation (IR)

With the 2.29 release, the NCCL Device API is not restricted to CUDA kernels only. NCCL now exposes LLVM IR for NCCL Device APIs so that emerging compiler technologies, high-level languages, and domain-specific systems can consume them directly. Instead of being gated behind C++ templates and CUDA-only build paths, the device APIs become accessible to any toolchain that can ingest LLVM bitcode.

C-Compatible Device API

The base NCCL host API has long been callable through extern "C" as a stable C interface. This change brings the device API to that same model: it exposes a C-compatible, ABI-stable surface for device-side primitives that were previously available only as C++ template APIs.

Integration Benefits

At a high level, this turns NCCL’s device primitives into a language-agnostic interface. A JIT compiler, a DSL runtime, or a custom compiler backend can link against the bitcode and call NCCL Device APIs as regular functions. That opens the door to new forms of integration, including dynamic code generation and fine-grained composition that would otherwise be impractical.

Practically, this means you can build fused computation–communication kernels, experiment with custom collective patterns, and control communication–computation overlap from a higher-level environment. It also makes it easier to prototype new distributed algorithms without having to ship large, monolithic, pre-compiled CUDA kernels for every variant. Note that some convenience helpers and specialized primitives remain C++‑only for now.

Build and Tests

To generate the LLVM IR bitcode, compile NCCL with the EMIT_LLVM_IR=1 flag:

make EMIT_LLVM_IR=1

This build requires Clang 21 and CUDA 12. By default, the resulting bitcode library targets sm_90 when built with CUDA 12. You can override the target architecture using BITCODE_LIB_ARCH=sm_xx if your toolchain needs a different GPU target.

The bitcode library can be found at build/lib/libnccl_device.bc in the build directory. If you want to inspect the available APIs, you can either search the source for NCCL_IR_EXTERN_C or disassemble the bitcode with:

llvm-dis libnccl_device.bc -o libnccl_device.ll

We encourage compiler developers, DSL authors, and framework builders to integrate this interface and explore what becomes possible when fine‑grained distributed GPU communication is just a function call away.

—

Authored by Subhadeep Bhattacharya (@sb17v)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NCCL 2.29 – LLVM Intermediate Representation (IR) #2010

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

NCCL 2.29 – LLVM Intermediate Representation (IR) #2010

Uh oh!

gab9talavera Feb 2, 2026 Maintainer

LLVM Intermediate Representation (IR)

C-Compatible Device API

Integration Benefits

Build and Tests

Replies: 0 comments

gab9talavera
Feb 2, 2026
Maintainer