[cccl.c] Split build step into compile + load by shwina · Pull Request #8484 · NVIDIA/cccl

shwina · 2026-04-16T14:20:15Z

Description

Closes #8410.

This PR introduces two changes to all algorithms in CCCL C:

The build structs are extended to hold the lowered names of the kernels contained in the cubin, as well as the size of the void* runtime_policy where applicable.
Each cccl_<algo>_build algorithm is split into two steps: cccl_<algo>_compile and cccl_<algo>_load.

These changes are needed to support ahead-of-time compilation. Specifically, splitting off the compile step into its own API allows the user to compile on a CPU-only machine (say), and for different target architectures.

Currently, the cccl_<algo>_build functions include calls to cuLibraryLoadData and cuLibraryGetKernel which would fail in the above scenarios.

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

shwina · 2026-04-16T16:53:23Z

  size_t cubin_size;
  CUlibrary library;
  CUkernel kernel;
+  // Lowered (mangled) kernel name, heap-allocated, freed by cccl_device_binary_search_cleanup():


Could do without these comments.

bernhardmgruber · 2026-04-16T23:48:26Z

+  build_ptr->runtime_policy             = std::malloc(sizeof(cub::detail::scan::policy_selector));
+  build_ptr->runtime_policy_size        = sizeof(cub::detail::scan::policy_selector);
+  std::memcpy(build_ptr->runtime_policy, &policy_sel, sizeof(cub::detail::scan::policy_selector));


Q: Why can't we use new to allocate the policy selector anymore?

We can keep the allocation/deallocation pattern the same. I undid this change.

NaderAlAwar

You did not make the same changes to for, is that intentional?

NaderAlAwar · 2026-04-17T15:41:08Z

+  build_ptr->cc                  = cc;
+  build_ptr->cubin               = (void*) result.data.release();
+  build_ptr->cubin_size          = result.size;
+  build_ptr->kernel_lowered_name = strdup(lowered_name.c_str());


Important: we should not use strdup (applies to all other files in the PR) since it is POSIX, not standard C++. I would suggest something like

char* duplicate_c_string(std::string_view s) { auto p = std::make_unique<char[]>(s.size() + 1); std::memcpy(p.get(), s.data(), s.size()); p[s.size()] = '\0'; return p.release(); }

and use that everywhere

NaderAlAwar · 2026-04-17T15:42:43Z


  std::unique_ptr<char[]> cubin(reinterpret_cast<char*>(build_ptr->cubin));
-  check(cuLibraryUnload(build_ptr->library));
+  std::free(build_ptr->kernel_lowered_name);


Important: after replacing strdup, we should use something similar to the std::unique_ptr<char[]> pattern above

NaderAlAwar · 2026-04-17T15:46:05Z

+#include <cstring> // strdup, free, memcpy
 #include <format>
 #include <memory>
+#include <new> // std::nothrow


Suggestion: this seems like it's unused, should be removed.

github-actions · 2026-04-17T16:42:37Z

🥳 CI Workflow Results

🟩 Finished in 1h 08m: Pass: 100%/20 | Total: 5h 07m | Max: 42m 24s | Hits: 95%/1422

See results here.

shwina requested review from a team as code owners April 16, 2026 14:20

shwina requested a review from alliepiper April 16, 2026 14:20

github-project-automation bot added this to CCCL Apr 16, 2026

shwina requested a review from gevtushenko April 16, 2026 14:20

github-project-automation bot moved this to Todo in CCCL Apr 16, 2026

cccl-authenticator-app bot moved this from Todo to In Review in CCCL Apr 16, 2026

shwina changed the title ~~Split build into compile + load~~ [cccl.c] Split build step into compile + load Apr 16, 2026

This comment has been minimized.

Sign in to view

shwina commented Apr 16, 2026

View reviewed changes

bernhardmgruber reviewed Apr 16, 2026

View reviewed changes

Split build into compile + load

26946c5

shwina mentioned this pull request Apr 17, 2026

[cccl.c] Add APIs for AoT compilation #8499

Draft

2 tasks

shwina force-pushed the pr/aot-c-layer branch from b5fa374 to 26946c5 Compare April 17, 2026 15:20

Remove unnecessary commentary

9db5061

NaderAlAwar reviewed Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cccl.c] Split build step into compile + load#8484

[cccl.c] Split build step into compile + load#8484
shwina wants to merge 2 commits intoNVIDIA:mainfrom
shwina:pr/aot-c-layer

shwina commented Apr 16, 2026

Uh oh!

This comment has been minimized.

shwina Apr 16, 2026

Uh oh!

bernhardmgruber Apr 16, 2026

Uh oh!

shwina Apr 17, 2026

Uh oh!

NaderAlAwar left a comment

Uh oh!

NaderAlAwar Apr 17, 2026

Uh oh!

NaderAlAwar Apr 17, 2026

Uh oh!

NaderAlAwar Apr 17, 2026

Uh oh!

github-actions bot commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

shwina commented Apr 16, 2026

Description

Checklist

Uh oh!

This comment has been minimized.

shwina Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

bernhardmgruber Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

shwina Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

NaderAlAwar left a comment

Choose a reason for hiding this comment

Uh oh!

NaderAlAwar Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

NaderAlAwar Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

NaderAlAwar Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 17, 2026

🥳 CI Workflow Results

🟩 Finished in 1h 08m: Pass: 100%/20 | Total: 5h 07m | Max: 42m 24s | Hits: 95%/1422

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants