Skip to content

Make gpu_* runtime stubs CPU-AOT-safe via weak linkage.#808

Merged
maleadt merged 1 commit into
mainfrom
tb/runtime_linkage
May 21, 2026
Merged

Make gpu_* runtime stubs CPU-AOT-safe via weak linkage.#808
maleadt merged 1 commit into
mainfrom
tb/runtime_linkage

Conversation

@maleadt
Copy link
Copy Markdown
Member

@maleadt maleadt commented May 21, 2026

Alternative to #799. Instead of using overlay tables to conditionally define runtime methods, add proper CPU-compatible stubs marked weak that get overridden by the back-end versions linked in. This should sidestep the whole issue.

Back-end-provided runtime symbols (`Runtime.compile(:name, ...)`) used to
emit `ccall("extern gpu_<name>", llvmcall, ...)` as the Julia stub body.
That made every AOT pipeline that materialized the stub on CPU — juliac,
sysimage `compile=all`, PrecompileTools — fail with `JIT session error:
Symbols not found: [ gpu_<name> ]`, because the `gpu_*` symbols only exist
inside the GPU runtime library.

The stub still needs to *reference* `gpu_<name>` somewhere so that, after
`link!(ir, runtime; only_needed=true)`, the kernel calls the back-end's
implementation (which `build_runtime` emits as `gpu_<name>` by renaming
`runtime_module(job).<name>`). Back-ends override at the LLVM-symbol
level, not via Julia method tables, so the stub has to produce that
symbol reference itself.

Emit the stub via `Base.llvmcall` with an inline `define weak <rt>
@gpu_<name>(...)` returning a sentinel, plus an entry that calls it.
LLVM linker semantics: the weak no-op satisfies CPU JIT materialization,
and the runtime library's strong definition replaces it during the GPU
link step. No method-table machinery, no post-codegen pass, no registry
— the fix is local to `Runtime.compile`. IR is built with LLVM.jl's
`create_function`/`IRBuilder`/`call_function` rather than string IR,
matching the pattern used by `Runtime.type_tag` in the same file.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/runtime.jl
Comment on lines +140 to +155
function emit_fake_return!(builder::IRBuilder, rt::LLVMType)
if rt isa LLVM.VoidType
ret!(builder)
elseif rt isa LLVM.PointerType
# Use Int64(1), not 0, so `Ptr(Int64(...))` doesn't get lowered to C_NULL.
i64 = LLVM.IntType(64)
ret!(builder, const_inttoptr(ConstantInt(i64, 1), rt))
elseif rt isa LLVM.IntegerType
ret!(builder, ConstantInt(rt, 0))
elseif rt isa LLVM.LLVMFloat || rt isa LLVM.LLVMDouble
ret!(builder, ConstantFP(rt, 0.0))
else
error("Unsupported runtime stub return type: $rt")
end
end

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty questionable, but this code really shouldn't be ever called. Even on the CPU, it's replaced by stronger methods from the CPU runtime library.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

Codecov Report

❌ Patch coverage is 83.87097% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.52%. Comparing base (5ff3ef6) to head (2470600).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/runtime.jl 83.87% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #808      +/-   ##
==========================================
- Coverage   75.95%   74.52%   -1.44%     
==========================================
  Files          25       25              
  Lines        4026     4204     +178     
==========================================
+ Hits         3058     3133      +75     
- Misses        968     1071     +103     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@maleadt maleadt merged commit 7ce80c3 into main May 21, 2026
37 checks passed
@maleadt maleadt deleted the tb/runtime_linkage branch May 21, 2026 15:26
gbaraldi added a commit to EnzymeAD/Enzyme.jl that referenced this pull request May 25, 2026
…ll (#3091)

The body of `Compiler.deferred_codegen` was `ccall("extern deferred_codegen",
llvmcall, Ptr{Cvoid}, (UInt,), id)`. The `deferred_codegen` symbol is
provided at runtime by `GPUCompiler.register_deferred_codegen` (OrcV2
`absolute_symbols` in the JuliaGlobals JD), and is the marker GPUCompiler's
host-side scanner picks up to thread the inner Enzyme adjoint through
deferred compilation. On JIT this works fine, but AOT linkers (sysimage
`compile=all`, juliac, PrecompileTools) walk `jl_compile_all_defs` into
this body and fail to resolve the undefined `deferred_codegen` symbol,
breaking sysimage builds (#3091).

Replace the `ccall("extern …)` with a `Base.llvmcall((ir, "entry"), …)`
whose IR module declares `deferred_codegen` with weak linkage and a
CPU-safe identity body (`inttoptr i64 %x to ptr; ret ptr %r`). On AOT the
weak body satisfies the linker; on JIT the strong runtime symbol wins.
Mirrors JuliaGPU/GPUCompiler.jl#808's treatment of the `gpu_*` runtime
stubs.

`id` is lifted to a type parameter via a small `@generated` helper so the
generator can splice it as a literal `i64` constant into the
`call @deferred_codegen` site — GPUCompiler's scanner does
`convert(Int, operands(call)[1])`, which only works when that operand is
a `ConstantInt`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant