-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Inline nullable allocators #122167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Inline nullable allocators #122167
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
|
Enabled structs too (basically, all types, even shared), so now #114497 (comment) is properly handed. |
|
@EgorBot -amd -arm using BenchmarkDotNet.Attributes;
[MemoryDiagnoser]
[DisassemblyDiagnoser]
public class Bench
{
private Nullable<bool> _null = null;
private Nullable<bool> _nonnull = true;
[Benchmark] public object BoxNull() => _null;
[Benchmark] public object BoxNonNull() => _nonnull;
// https://github.com/dotnet/runtime/issues/50915
private S? _ns = (S?)default(S);
[Benchmark] public int Issue50915() => CallM(_ns);
static int CallM<T>(T t)
{
if (t is IMyInterface)
return ((IMyInterface)t).M();
return 0;
}
}
interface IMyInterface
{
int M();
}
struct S : IMyInterface
{
public int M() => 42;
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR optimizes nullable boxing operations by introducing early inline expansion instead of using helper calls. The optimization enables escape analysis to stack-allocate boxed nullable values when they don't escape the method, eliminating heap allocations in hot paths. The implementation adds a new early QMARK expansion phase that runs after import but before other optimizations, allowing subsequent passes to optimize the expanded control flow.
Key changes:
- Inline expansion of nullable box operations in hot, optimized code paths using conditional allocation
- New early QMARK expansion phase to enable object stack allocation optimizations
- COMMA node splitting during QMARK expansion to improve optimization opportunities
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/importer.cpp | Adds inline expansion of nullable boxing with conditional allocation based on hasValue field |
| src/coreclr/jit/morph.cpp | Enhances QMARK expansion to support early phase, adds COMMA splitting, changes return type to PhaseStatus |
| src/coreclr/jit/compiler.cpp | Adds early QMARK expansion phase after import and updates late expansion call signature |
| src/coreclr/jit/compiler.h | Adds OMF_HAS_EARLY_QMARKS flag and updates fgExpandQmarkNodes signature |
| src/coreclr/jit/compphases.h | Defines new PHASE_EARLY_QMARK_EXPANSION phase |
src/coreclr/jit/importer.cpp
Outdated
| // E.g. if we are boxing Nullable<System.Int32>, typeToBox is System.Int32. | ||
| CORINFO_RESOLVED_TOKEN tk = *pResolvedToken; | ||
| tk.hClass = typeToBox; | ||
| tk.tokenType = CORINFO_TOKENKIND_Casting; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had some offline discussion about this... would it make sense to introduce new token kind here rather than overloading CORINFO_TOKENKIND_Casting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas @MichalStrehovsky do you an opinion here? Context:
We have a token representing, say box Nullable<int>, we need to emit an allocator for boxed int (not Nullable<int> since we box the underlying type only). I have to call our work horse routine that accepts ResolvedToken which end up calling embedGenericHandle. so I ended up using this hack
CORINFO_RESOLVED_TOKEN tk = *pResolvedToken;
tk.hClass = typeToBox;
tk.tokenType = CORINFO_TOKENKIND_Casting;where I create a copy of that token and change its type from Boxing to Casting so I hit this path:
runtime/src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/Compilation.cs
Lines 338 to 340 in 495fca5
| if (type.IsNullable) | |
| type = type.Instantiation[0]; | |
| return NecessaryTypeSymbolIfPossible(type); |
is it acceptable? it seems to work for all three runtimes (coreclr, r2r and naot)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems to work for all three runtime
If it actually worked, you would not need to disable it for shared generic types. eeIsSharedInst check above is part of the hack.
This hack may cause problems with NAOT tracking of necessary vs. maximally constructed handles. I am not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems to work for all three runtime
If it actually worked, you would not need to disable it for shared generic types.
eeIsSharedInstcheck above is part of the hack.This hack may cause problems with NAOT tracking of necessary vs. maximally constructed handles. I am not sure.
Runtime lookups are problematic in my case even without it - the code that emits them doesn't support emitting them inside QMARKs (so they're constructed only inside the corresponding branch, it can be fixed, though). For non-shared types I expect NAOT's dependency analsys to track both Nullable<T> and just T when ILScan hits box opcode, but I'm no expert in NAOT to say for sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how to properly check the dependency analysis but this seems to be compiling just fine:
class MyProgram
{
static void Main() => Box(null);
[MethodImpl(MethodImplOptions.NoInlining)]
static object Box(MyStruct1? ms) => ms;
}
public struct MyStruct1 {}so MyStruct1 is not used anywhere explicitly, but ILC gives a proper type for the allocator:
G_M42651_IG01: ;; offset=0x0000
sub rsp, 40
mov dword ptr [rsp+0x30], ecx
;; size=8 bbWeight=1 PerfScore 1.25
G_M42651_IG02: ;; offset=0x0008
test cl, cl
je SHORT G_M42651_IG04
;; size=4 bbWeight=1 PerfScore 1.25
G_M42651_IG03: ;; offset=0x000C
lea rcx, [(reloc 0x4000000000420a08)] ; MyStruct1
call CORINFO_HELP_NEWSFAST
mov cl, byte ptr [rsp+0x31]
mov byte ptr [rax+0x08], cl
jmp SHORT G_M42651_IG05
;; size=21 bbWeight=0.25 PerfScore 1.38
G_M42651_IG04: ;; offset=0x0021
xor rax, rax
;; size=2 bbWeight=0.25 PerfScore 0.06
G_M42651_IG05: ;; offset=0x0023
add rsp, 40
ret
;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code 40There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will probably work today wrt dependency analysis because we already have to be very careful not to expose RyuJIT to necessary/constructed MethodTables. Doesn't mean we're not going to curse this hack two years down the road (if/when we start using RyuJIT as a scanner and this will end up boxing a MethodTables that doesn't have a GCDesc at runtime because MethodTables for casting don't get them). It is a hack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas @MichalStrehovsky @AndyAyersMS Ok, I ended up simplifying the implementation without this hack, with just non-shared types and no R2R (cause getReadyToRunHelper JIT-EE API also expects a resolvedtoken and R2R doesn't support stack-allocation I believe which makes it less beneficial to expand for). NAOT is supported.
I still expect NAOT to keep the underlying Nullable<> type alive. I believe it's already required even without my PR - the current importation logic calls getTypeForBox and sets it on the resuling local so then various cast related optimization can work with it.
Does it sound good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R2R supports stack allocation of objects. It does not (yet) support stack allocation for arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R2R supports stack allocation of objects. It does not (yet) support stack allocation for arrays.
Good to know. Still, it requires JIT-EE work to enable so I left it for the future.
c1431db to
ab99de1
Compare
23f505f to
69f6243
Compare
|
The superpmi diffs are obviously useless due to missing contexts. The normal jit-diff (e.g. this) is also a mess due to the way PMI works - we take all generics and try to instantiate them with one of the 7 predefined types (see here) and |
Closes #50915
Today we never inline & stack-allocate (via the escape analysis) boxed Nullable<>, let's see if we can fix that.
Main:
PR:
Another example (from #114497 (comment)) cc @pentp
diff.
Doesn't allocate anymore to box that nullable.
Benchmarks: EgorBot/runtime-utils#563