[cccl.c] Split build step into compile + load#8484
[cccl.c] Split build step into compile + load#8484shwina wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
This comment has been minimized.
This comment has been minimized.
| size_t cubin_size; | ||
| CUlibrary library; | ||
| CUkernel kernel; | ||
| // Lowered (mangled) kernel name, heap-allocated, freed by cccl_device_binary_search_cleanup(): |
There was a problem hiding this comment.
Could do without these comments.
| build_ptr->runtime_policy = std::malloc(sizeof(cub::detail::scan::policy_selector)); | ||
| build_ptr->runtime_policy_size = sizeof(cub::detail::scan::policy_selector); | ||
| std::memcpy(build_ptr->runtime_policy, &policy_sel, sizeof(cub::detail::scan::policy_selector)); |
There was a problem hiding this comment.
Q: Why can't we use new to allocate the policy selector anymore?
There was a problem hiding this comment.
We can keep the allocation/deallocation pattern the same. I undid this change.
NaderAlAwar
left a comment
There was a problem hiding this comment.
You did not make the same changes to for, is that intentional?
| build_ptr->cc = cc; | ||
| build_ptr->cubin = (void*) result.data.release(); | ||
| build_ptr->cubin_size = result.size; | ||
| build_ptr->kernel_lowered_name = strdup(lowered_name.c_str()); |
There was a problem hiding this comment.
Important: we should not use strdup (applies to all other files in the PR) since it is POSIX, not standard C++. I would suggest something like
char* duplicate_c_string(std::string_view s)
{
auto p = std::make_unique<char[]>(s.size() + 1);
std::memcpy(p.get(), s.data(), s.size());
p[s.size()] = '\0';
return p.release();
}and use that everywhere
|
|
||
| std::unique_ptr<char[]> cubin(reinterpret_cast<char*>(build_ptr->cubin)); | ||
| check(cuLibraryUnload(build_ptr->library)); | ||
| std::free(build_ptr->kernel_lowered_name); |
There was a problem hiding this comment.
Important: after replacing strdup, we should use something similar to the std::unique_ptr<char[]> pattern above
| #include <cstring> // strdup, free, memcpy | ||
| #include <format> | ||
| #include <memory> | ||
| #include <new> // std::nothrow |
There was a problem hiding this comment.
Suggestion: this seems like it's unused, should be removed.
🥳 CI Workflow Results🟩 Finished in 1h 08m: Pass: 100%/20 | Total: 5h 07m | Max: 42m 24s | Hits: 95%/1422See results here. |
Description
Closes #8410.
This PR introduces two changes to all algorithms in CCCL C:
void* runtime_policywhere applicable.cccl_<algo>_buildalgorithm is split into two steps:cccl_<algo>_compileandcccl_<algo>_load.These changes are needed to support ahead-of-time compilation. Specifically, splitting off the
compilestep into its own API allows the user to compile on a CPU-only machine (say), and for different target architectures.Currently, the
cccl_<algo>_buildfunctions include calls tocuLibraryLoadDataandcuLibraryGetKernelwhich would fail in the above scenarios.Checklist