Skip to content

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Dec 3, 2025

Closes #50915

Today we never inline & stack-allocate (via the escape analysis) boxed Nullable<>, let's see if we can fix that.

object Test(long? n)
{
    return n;
}

Main:

; Method My:Test(System.Nullable`1[long]):System.Object:this (FullOpts)
       sub      rsp, 40
       mov      rcx, 0x7FFBCE1680C0      ; System.Nullable`1[long]
       call     [CORINFO_HELP_BOX_NULLABLE]
       nop      
       add      rsp, 40
       ret      
; Total bytes of code: 26

PR:

; Method My:Test(System.Nullable`1[long]):System.Object:this (FullOpts)
       push     rbx
       sub      rsp, 32
       mov      rbx, rdx
       cmp      byte  ptr [rbx], 0
       je       SHORT G_M7100_IG04
       mov      rcx, 0x7FFBDFBC08B0      ; System.Int64
       call     CORINFO_HELP_NEWSFAST
       mov      rcx, qword ptr [rbx+0x08]
       mov      qword ptr [rax+0x08], rcx
       jmp      SHORT G_M7100_IG05
G_M7100_IG04:
       xor      rax, rax
G_M7100_IG05:
       add      rsp, 32
       pop      rbx
       ret      
; Total bytes of code: 46

Another example (from #114497 (comment)) cc @pentp

public static string? Format<T>(T value)
{
    if (value is IFormattable formattable)
        return formattable.ToString(null, null);
    return null;
}

diff.
Doesn't allocate anymore to box that nullable.

Benchmarks: EgorBot/runtime-utils#563

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 3, 2025
@EgorBo

This comment was marked as outdated.

@EgorBo
Copy link
Member Author

EgorBo commented Dec 4, 2025

Enabled structs too (basically, all types, even shared), so now #114497 (comment) is properly handed.

@EgorBo
Copy link
Member Author

EgorBo commented Dec 4, 2025

@EgorBot -amd -arm

using BenchmarkDotNet.Attributes;

[MemoryDiagnoser]
[DisassemblyDiagnoser]
public class Bench
{
    private Nullable<bool> _null = null;
    private Nullable<bool> _nonnull = true;

    [Benchmark] public object BoxNull() => _null; 

    [Benchmark] public object BoxNonNull() => _nonnull;

    // https://github.com/dotnet/runtime/issues/50915
    private S? _ns = (S?)default(S);
    [Benchmark] public int Issue50915() => CallM(_ns);
    static int CallM<T>(T t)
    {
        if (t is IMyInterface)
            return ((IMyInterface)t).M();
        return 0;
    }
}

interface IMyInterface
{
    int M();
}

struct S : IMyInterface
{
    public int M() => 42;
}

@EgorBo EgorBo marked this pull request as ready for review December 5, 2025 09:10
Copilot AI review requested due to automatic review settings December 5, 2025 09:10
Copilot finished reviewing on behalf of EgorBo December 5, 2025 09:14
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes nullable boxing operations by introducing early inline expansion instead of using helper calls. The optimization enables escape analysis to stack-allocate boxed nullable values when they don't escape the method, eliminating heap allocations in hot paths. The implementation adds a new early QMARK expansion phase that runs after import but before other optimizations, allowing subsequent passes to optimize the expanded control flow.

Key changes:

  • Inline expansion of nullable box operations in hot, optimized code paths using conditional allocation
  • New early QMARK expansion phase to enable object stack allocation optimizations
  • COMMA node splitting during QMARK expansion to improve optimization opportunities

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/coreclr/jit/importer.cpp Adds inline expansion of nullable boxing with conditional allocation based on hasValue field
src/coreclr/jit/morph.cpp Enhances QMARK expansion to support early phase, adds COMMA splitting, changes return type to PhaseStatus
src/coreclr/jit/compiler.cpp Adds early QMARK expansion phase after import and updates late expansion call signature
src/coreclr/jit/compiler.h Adds OMF_HAS_EARLY_QMARKS flag and updates fgExpandQmarkNodes signature
src/coreclr/jit/compphases.h Defines new PHASE_EARLY_QMARK_EXPANSION phase

@EgorBo
Copy link
Member Author

EgorBo commented Dec 5, 2025

@MihuBot

// E.g. if we are boxing Nullable<System.Int32>, typeToBox is System.Int32.
CORINFO_RESOLVED_TOKEN tk = *pResolvedToken;
tk.hClass = typeToBox;
tk.tokenType = CORINFO_TOKENKIND_Casting;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had some offline discussion about this... would it make sense to introduce new token kind here rather than overloading CORINFO_TOKENKIND_Casting?

Copy link
Member Author

@EgorBo EgorBo Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas @MichalStrehovsky do you an opinion here? Context:
We have a token representing, say box Nullable<int>, we need to emit an allocator for boxed int (not Nullable<int> since we box the underlying type only). I have to call our work horse routine that accepts ResolvedToken which end up calling embedGenericHandle. so I ended up using this hack

    CORINFO_RESOLVED_TOKEN tk = *pResolvedToken;
    tk.hClass                 = typeToBox;
    tk.tokenType              = CORINFO_TOKENKIND_Casting;

where I create a copy of that token and change its type from Boxing to Casting so I hit this path:

if (type.IsNullable)
type = type.Instantiation[0];
return NecessaryTypeSymbolIfPossible(type);
(because ResolveToken.hClass is ignored)

is it acceptable? it seems to work for all three runtimes (coreclr, r2r and naot)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to work for all three runtime

If it actually worked, you would not need to disable it for shared generic types. eeIsSharedInst check above is part of the hack.

This hack may cause problems with NAOT tracking of necessary vs. maximally constructed handles. I am not sure.

Copy link
Member Author

@EgorBo EgorBo Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to work for all three runtime

If it actually worked, you would not need to disable it for shared generic types. eeIsSharedInst check above is part of the hack.

This hack may cause problems with NAOT tracking of necessary vs. maximally constructed handles. I am not sure.

Runtime lookups are problematic in my case even without it - the code that emits them doesn't support emitting them inside QMARKs (so they're constructed only inside the corresponding branch, it can be fixed, though). For non-shared types I expect NAOT's dependency analsys to track both Nullable<T> and just T when ILScan hits box opcode, but I'm no expert in NAOT to say for sure

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how to properly check the dependency analysis but this seems to be compiling just fine:

class MyProgram
{
    static void Main() => Box(null);

    [MethodImpl(MethodImplOptions.NoInlining)]
    static object Box(MyStruct1? ms) => ms;
}

public struct MyStruct1 {}

so MyStruct1 is not used anywhere explicitly, but ILC gives a proper type for the allocator:

G_M42651_IG01:  ;; offset=0x0000
       sub      rsp, 40
       mov      dword ptr [rsp+0x30], ecx
						;; size=8 bbWeight=1 PerfScore 1.25
G_M42651_IG02:  ;; offset=0x0008
       test     cl, cl
       je       SHORT G_M42651_IG04
						;; size=4 bbWeight=1 PerfScore 1.25
G_M42651_IG03:  ;; offset=0x000C
       lea      rcx, [(reloc 0x4000000000420a08)]      ; MyStruct1
       call     CORINFO_HELP_NEWSFAST
       mov      cl, byte  ptr [rsp+0x31]
       mov      byte  ptr [rax+0x08], cl
       jmp      SHORT G_M42651_IG05
						;; size=21 bbWeight=0.25 PerfScore 1.38
G_M42651_IG04:  ;; offset=0x0021
       xor      rax, rax
						;; size=2 bbWeight=0.25 PerfScore 0.06
G_M42651_IG05:  ;; offset=0x0023
       add      rsp, 40
       ret      
						;; size=5 bbWeight=1 PerfScore 1.25

; Total bytes of code 40

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will probably work today wrt dependency analysis because we already have to be very careful not to expose RyuJIT to necessary/constructed MethodTables. Doesn't mean we're not going to curse this hack two years down the road (if/when we start using RyuJIT as a scanner and this will end up boxing a MethodTables that doesn't have a GCDesc at runtime because MethodTables for casting don't get them). It is a hack.

Copy link
Member Author

@EgorBo EgorBo Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas @MichalStrehovsky @AndyAyersMS Ok, I ended up simplifying the implementation without this hack, with just non-shared types and no R2R (cause getReadyToRunHelper JIT-EE API also expects a resolvedtoken and R2R doesn't support stack-allocation I believe which makes it less beneficial to expand for). NAOT is supported.

I still expect NAOT to keep the underlying Nullable<> type alive. I believe it's already required even without my PR - the current importation logic calls getTypeForBox and sets it on the resuling local so then various cast related optimization can work with it.

Does it sound good?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R2R supports stack allocation of objects. It does not (yet) support stack allocation for arrays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R2R supports stack allocation of objects. It does not (yet) support stack allocation for arrays.

Good to know. Still, it requires JIT-EE work to enable so I left it for the future.

@EgorBo EgorBo force-pushed the inline-nullable-allocators branch from c1431db to ab99de1 Compare December 5, 2025 21:27
@EgorBo EgorBo force-pushed the inline-nullable-allocators branch from 23f505f to 69f6243 Compare December 6, 2025 01:30
@EgorBo
Copy link
Member Author

EgorBo commented Dec 6, 2025

The superpmi diffs are obviously useless due to missing contexts. The normal jit-diff (e.g. this) is also a mess due to the way PMI works - we take all generics and try to instantiate them with one of the 7 predefined types (see here) and int? in that list which meant tons of boxings on top of nullable where nullable is never really used. If I remove from that list and only leave real usage, the diff become much more smaller and reasonable (it's expected for it to be a size increase obviously).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Nullable<T> interface check / dispatch is comparatively very slow

5 participants