Make gpu_* runtime stubs CPU-AOT-safe via weak linkage.#808
Merged
Conversation
Back-end-provided runtime symbols (`Runtime.compile(:name, ...)`) used to
emit `ccall("extern gpu_<name>", llvmcall, ...)` as the Julia stub body.
That made every AOT pipeline that materialized the stub on CPU — juliac,
sysimage `compile=all`, PrecompileTools — fail with `JIT session error:
Symbols not found: [ gpu_<name> ]`, because the `gpu_*` symbols only exist
inside the GPU runtime library.
The stub still needs to *reference* `gpu_<name>` somewhere so that, after
`link!(ir, runtime; only_needed=true)`, the kernel calls the back-end's
implementation (which `build_runtime` emits as `gpu_<name>` by renaming
`runtime_module(job).<name>`). Back-ends override at the LLVM-symbol
level, not via Julia method tables, so the stub has to produce that
symbol reference itself.
Emit the stub via `Base.llvmcall` with an inline `define weak <rt>
@gpu_<name>(...)` returning a sentinel, plus an entry that calls it.
LLVM linker semantics: the weak no-op satisfies CPU JIT materialization,
and the runtime library's strong definition replaces it during the GPU
link step. No method-table machinery, no post-codegen pass, no registry
— the fix is local to `Runtime.compile`. IR is built with LLVM.jl's
`create_function`/`IRBuilder`/`call_function` rather than string IR,
matching the pattern used by `Runtime.type_tag` in the same file.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
maleadt
commented
May 21, 2026
Comment on lines
+140
to
+155
| function emit_fake_return!(builder::IRBuilder, rt::LLVMType) | ||
| if rt isa LLVM.VoidType | ||
| ret!(builder) | ||
| elseif rt isa LLVM.PointerType | ||
| # Use Int64(1), not 0, so `Ptr(Int64(...))` doesn't get lowered to C_NULL. | ||
| i64 = LLVM.IntType(64) | ||
| ret!(builder, const_inttoptr(ConstantInt(i64, 1), rt)) | ||
| elseif rt isa LLVM.IntegerType | ||
| ret!(builder, ConstantInt(rt, 0)) | ||
| elseif rt isa LLVM.LLVMFloat || rt isa LLVM.LLVMDouble | ||
| ret!(builder, ConstantFP(rt, 0.0)) | ||
| else | ||
| error("Unsupported runtime stub return type: $rt") | ||
| end | ||
| end | ||
|
|
Member
Author
There was a problem hiding this comment.
Pretty questionable, but this code really shouldn't be ever called. Even on the CPU, it's replaced by stronger methods from the CPU runtime library.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #808 +/- ##
==========================================
- Coverage 75.95% 74.52% -1.44%
==========================================
Files 25 25
Lines 4026 4204 +178
==========================================
+ Hits 3058 3133 +75
- Misses 968 1071 +103 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This was referenced May 21, 2026
gbaraldi
added a commit
to EnzymeAD/Enzyme.jl
that referenced
this pull request
May 25, 2026
…ll (#3091) The body of `Compiler.deferred_codegen` was `ccall("extern deferred_codegen", llvmcall, Ptr{Cvoid}, (UInt,), id)`. The `deferred_codegen` symbol is provided at runtime by `GPUCompiler.register_deferred_codegen` (OrcV2 `absolute_symbols` in the JuliaGlobals JD), and is the marker GPUCompiler's host-side scanner picks up to thread the inner Enzyme adjoint through deferred compilation. On JIT this works fine, but AOT linkers (sysimage `compile=all`, juliac, PrecompileTools) walk `jl_compile_all_defs` into this body and fail to resolve the undefined `deferred_codegen` symbol, breaking sysimage builds (#3091). Replace the `ccall("extern …)` with a `Base.llvmcall((ir, "entry"), …)` whose IR module declares `deferred_codegen` with weak linkage and a CPU-safe identity body (`inttoptr i64 %x to ptr; ret ptr %r`). On AOT the weak body satisfies the linker; on JIT the strong runtime symbol wins. Mirrors JuliaGPU/GPUCompiler.jl#808's treatment of the `gpu_*` runtime stubs. `id` is lifted to a type parameter via a small `@generated` helper so the generator can splice it as a literal `i64` constant into the `call @deferred_codegen` site — GPUCompiler's scanner does `convert(Int, operands(call)[1])`, which only works when that operand is a `ConstantInt`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Alternative to #799. Instead of using overlay tables to conditionally define runtime methods, add proper CPU-compatible stubs marked
weakthat get overridden by the back-end versions linked in. This should sidestep the whole issue.