Revert "Fix always_inline via inlining policy override"#797
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reverts #795, which breaks the following MWE:
This surfaced with KernelAbstraction.jl's unittest_testsuite
which contains analways_inline=true`.@aviatesk Could you take a look at this? I wanted to fix this downstream since JuliaLang/julia#48257 doesn't seem to be moving forwards, and we really need this in the GPU stack (in fact, cuTile.jl depends on the ability to inline everything).
Here's a description by Claude:
What's happening
CUDA's
Base.setproperty!(::ExceptionInfo, sym::Symbol, value)(CUDACore/src/device/runtime.jl:79) has six Symbol-dispatched branches with different value types per branch (Int32,@NamedTuple{x,y,z::Int32},Ptr{UInt8}). Its body is big enough thatinlining_costsaturates toMAX_INLINE_COST=0xffff— verified directly:Before #795, the default policy rejected this saturated
MaybeCompressedsource (is_inlineable=false), sosetproperty!stayed out-of-line and the standalone-compiled body — with type-incompatible branches correctly folded toCore.throw_methoderrorcalls by inference — validated fine.After #795, the override force-inlines the generic
MaybeCompressedbody (slottypes[..., Symbol, Ptr{UInt8}]) intothrow_boundserror. The full if/elseif chain survives:The dead
:threadIdx/:blockIdxbranches containunsafe_store!(::Ptr{NamedTuple{...}}, ::Ptr{UInt8})— type-incompatible dispatches that ssa_inlining + optimization can't resolve statically. They end up asijl_apply_generic(unsafe_store!, ...)calls in the final IR, which validation rejects.The const-prop'd specialized versions (cost=10, body folded to a single
pointerset) are offered to the policy too, but the inliner picks the generic source for some calls. I couldn't track down exactly why in the time I spent on it.Probable cause
Force-inlining a
MaybeCompressedsource whoseinlining_costsaturated is unsafe when the body has dead branches that only fold under const-prop. Julia's inliner doesn't re-run the constant-folding that would have eliminated those branches during regular inference;ssa_inlining_pass!+compact!+ ADCE don't fold===on Symbol literals after they've been inlined as builtin calls into a larger body.The motivating case in #795 (
philox2x_rounds-style: saturated from raw call count, no Symbol-dispatched branches) doesn't hit this — its inlined body has nothing to fold. cuTile uses the same policy shape and works for the same reason.Sanity checks while debugging
may_discard_trees: still fails.src_inlining_policy: passes. So the policy override is the trigger, not the tree-preservation change.@invokefall-through,inline_cost_threshold=typemax(Int)): still fails. cuTile would fail on this CUDA pattern too; it just doesn't have one.ConstCallInfocalls: fixes the@cudaMWE but not the KA path (where the inliner offers both generic and const-prop'd sources and still picks the generic one).IRCodeonly: fixes both. But in the failing PTX compile the inliner never passesIRCode, so this is effectively a full revert.