Remove CORINFO_HELP_ASSIGN_BYREF by EgorBo · Pull Request #128542 · dotnet/runtime

EgorBo · 2026-05-25T01:13:57Z

This PR attempts to remove CORINFO_HELP_ASSIGN_BYREF (which is basically CORINFO_HELP_CHECKED_ASSIGN_REF with a custom calling convention), it is replaced by either CORINFO_HELP_ASSIGN_REF or CORINFO_HELP_CHECKED_ASSIGN_REF. As a side effect we no longer use slow rep movsq on x64 for gaps between GC slots and may use SIMD. The motivation that CORINFO_HELP_ASSIGN_BYREF was very useful for large struct copies, but we now have a bulk write barrier for those.

NOTE: A lot of code (>1000-2000 LOC) can be removed from CoreCLR/NAOT side as dead as well since CORINFO_HELP_ASSIGN_BYREF is unused now, but that's for another PR and might require R2R bump.

class MyClass
{
    MyStruct _fld;
    
    void SetA(MyStruct value) => _fld = value;
    
}

record struct MyStruct(long N1, long N2, long N3, long N4, long N5, string A, string B);

-       push     rdi
        push     rsi
+       push     rbx
+       mov      rbx, rdx
-       lea      rdi, bword ptr [rcx+0x08]
-       mov      rsi, rdx
-       call     CORINFO_HELP_ASSIGN_BYREF
-       call     CORINFO_HELP_ASSIGN_BYREF
-       mov      ecx, 5
-       rep movsq
+       lea      rsi, bword ptr [rcx+0x08]
+       mov      rdx, gword ptr [rbx]
+       mov      rcx, rsi
+       call     CORINFO_HELP_ASSIGN_REF
+       mov      rdx, gword ptr [rbx+0x08]
+       lea      rcx, bword ptr [rsi+0x08]
+       call     CORINFO_HELP_ASSIGN_REF
+       vmovdqu  ymm0, ymmword ptr [rbx+0x10]
+       vmovdqu  ymmword ptr [rsi+0x10], ymm0
+       mov      rax, qword ptr [rbx+0x30]
+       mov      qword ptr [rsi+0x30], rax
+       vzeroupper
+       pop      rbx
        pop      rsi
-       pop      rdi
        ret

Benchmark

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Runtime.CompilerServices;

BenchmarkSwitcher.FromAssembly(typeof(Benchmarks).Assembly).Run(args);

public class Benchmarks
{
    MyStruct src, dst;

    [Benchmark]
    public void Heap() => dst = src;
}

public struct MyStruct
{
    public string A; 
    public InlineArray4<long> G; 
    public string B;
}

INTEL XEON PLATINUM 8573C

Method	Toolchain	Mean	Error	Ratio
Heap	PR #128542	2.724 ns	0.0628 ns	1.00
Heap	main	31.152 ns	0.2597 ns	11.44

AMD EPYC 9V45 2.60GHz

Method	Toolchain	Mean	Error	Ratio
Heap	PR #128542	1.968 ns	0.0603 ns	1.00
Heap	main	13.121 ns	0.0067 ns	6.68

Apple M4

Method	Toolchain	Mean	Error	Ratio
Heap	PR #128542	0.9574 ns	0.0021 ns	1.00
Heap	main	1.4646 ns	0.0053 ns	1.53

Sligtly less difference on arm64 where we don't have slow movsq for nongc gaps.

dotnet-policy-service · 2026-05-25T01:16:05Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

This PR removes the CORINFO_HELP_ASSIGN_BYREF / JIT_ByRefWriteBarrier helper from the JIT/EE boundary and ReadyToRun format, and replaces the JIT’s “cpobj with interleaved GC pointers” lowering/codegen path with either (a) decomposition into per-slot ASSIGN_REF write barriers (for small GC pointer counts) or (b) the existing bulk write-barrier helper (CORINFO_HELP_BULK_WRITEBARRIER). It also bumps the ReadyToRun major version to reflect the serialized format change.

Changes:

Delete the ASSIGN_BYREF helper surface area (helper IDs, write-barrier stubs/AV markers, and per-arch asm implementations) across VM, runtime, NativeAOT EH helper marking, and AOT tooling.
Remove the CpObj* block op kinds and all backend genCodeForCpObj implementations; introduce TryDecomposeBlockStoreAsIndirs + LowerBlockStoreAsGcBulkCopyCall as the new lowering strategy for GC-containing struct copies.
Bump ReadyToRun major version to 19 and reserve helper ID 0x32 (formerly ByRefWriteBarrier) to avoid reuse.

Reviewed changes

Copilot reviewed 70 out of 70 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
src/coreclr/vm/wasm/helpers.cpp	Removes wasm helper end marker and AV location globals for byref write barrier.
src/coreclr/vm/threads.cpp	Stops registering `CORINFO_HELP_ASSIGN_BYREF` helper in `InitThreadManager`; updates AMD64 comment.
src/coreclr/vm/riscv64/asmhelpers.S	Removes `JIT_ByRefWriteBarrier` implementation; adjusts comments about incremented regs.
src/coreclr/vm/loongarch64/asmhelpers.S	Removes `JIT_ByRefWriteBarrier` implementation; adjusts comments about incremented regs.
src/coreclr/vm/jitinterface.h	Removes `RhpByRefAssignRef*` declarations/mapping and `JIT_ByRefWriteBarrier` decl.
src/coreclr/vm/i386/jithelp.S	Deletes x86 `JIT_ByRefWriteBarrier` helper macro/entry and its inclusion in the group.
src/coreclr/vm/i386/jithelp.asm	Deletes x86 `JIT_ByRefWriteBarrier` helper macro/entry and its inclusion in the group.
src/coreclr/vm/excep.cpp	Removes byref write barrier end marker, AV locations, and helper-range checks.
src/coreclr/vm/arm64/patchedcode.S	Removes ARM64 `JIT_ByRefWriteBarrier`; updates comments for remaining helpers/return stubs.
src/coreclr/vm/arm64/patchedcode.asm	Removes ARM64 `JIT_ByRefWriteBarrier`; updates comments for remaining helpers/return stubs.
src/coreclr/vm/arm/stubs.cpp	Removes ARM byref barrier variants from stub declarations/mapping/validation.
src/coreclr/vm/arm/patchedcode.S	Removes ARM `JIT_ByRefWriteBarrier` region from patched code blob.
src/coreclr/vm/arm/asmhelpers.S	Removes ARM byref write barrier helper macro and descriptor entries.
src/coreclr/vm/amd64/jithelpers_fast.S	Removes AMD64 Unix `JIT_ByRefWriteBarrier` implementation.
src/coreclr/vm/amd64/JitHelpers_Fast.asm	Removes AMD64 Windows `JIT_ByRefWriteBarrier` implementation.
src/coreclr/tools/Common/JitInterface/CorInfoHelpFunc.cs	Removes `CORINFO_HELP_ASSIGN_BYREF` from managed mirror enum.
src/coreclr/tools/Common/Internal/Runtime/ReadyToRunConstants.cs	Reserves `0x32` with a “do not reuse” comment in tool-side helper enum.
src/coreclr/tools/aot/ILCompiler.RyuJit/JitInterface/CorInfoImpl.RyuJit.cs	Drops mapping from `ASSIGN_BYREF` to `ByRefWriteBarrier` R2R helper.
src/coreclr/tools/aot/ILCompiler.Reflection.ReadyToRun/ReadyToRunSignature.cs	Removes signature pretty-printing for `ByRefWriteBarrier`.
src/coreclr/tools/aot/ILCompiler.ReadyToRun/JitInterface/CorInfoImpl.ReadyToRun.cs	Drops mapping from `ASSIGN_BYREF` to `ByRefWriteBarrier` R2R helper.
src/coreclr/tools/aot/ILCompiler.Compiler/Compiler/JitHelper.cs	Removes `ReadyToRunHelper.ByRefWriteBarrier` entry point logic.
src/coreclr/runtime/riscv64/WriteBarriers.S	Removes `RhpByRefAssignRef` implementation/AV alternate entry.
src/coreclr/runtime/portable/WriteBarriers.cpp	Removes portable stub for `RhpByRefAssignRef`.
src/coreclr/runtime/loongarch64/WriteBarriers.S	Removes `RhpByRefAssignRef`; updates comment about incremented regs in remaining helper.
src/coreclr/runtime/i386/WriteBarriers.S	Removes `RhpByRefAssignRef` implementation and AV alternate entries.
src/coreclr/runtime/i386/WriteBarriers.asm	Removes `RhpByRefAssignRef` implementation and AV alternate entries.
src/coreclr/runtime/arm64/WriteBarriers.S	Removes `RhpByRefAssignRefArm64` implementation/AV alternate entry.
src/coreclr/runtime/arm64/WriteBarriers.asm	Removes `RhpByRefAssignRefArm64` implementation/AV alternate entry.
src/coreclr/runtime/arm/WriteBarriers.S	Removes `RhpByRefAssignRef` implementation and AV labels.
src/coreclr/runtime/amd64/WriteBarriers.S	Removes `RhpByRefAssignRef` implementation and AV alternate entries (incl. Apple unwind workaround).
src/coreclr/runtime/amd64/WriteBarriers.asm	Removes `RhpByRefAssignRef` implementation and AV alternate entries.
src/coreclr/nativeaot/Runtime/EHHelpers.cpp	Removes byref write barrier AV locations from NativeAOT helper-range detection.
src/coreclr/jit/utils.cpp	Removes `CORINFO_HELP_ASSIGN_BYREF` from helper call property initialization.
src/coreclr/jit/targetx86.h	Removes x86 `ASSIGN_BYREF` register/killset documentation and masks.
src/coreclr/jit/targetwasm.h	Removes wasm byref write barrier reg/killset defines.
src/coreclr/jit/targetriscv64.h	Removes RISCV64 byref write barrier reg/killset defines.
src/coreclr/jit/targetloongarch64.h	Removes LoongArch64 byref write barrier reg/killset defines.
src/coreclr/jit/targetarm64.h	Removes ARM64 byref write barrier reg/killset defines.
src/coreclr/jit/targetarm.h	Removes ARM byref write barrier reg/killset defines.
src/coreclr/jit/targetamd64.h	Removes AMD64 byref write barrier killset defines and related commentary.
src/coreclr/jit/lsraxarch.cpp	Removes CpObj-specific LSRA register assignment logic for xarch.
src/coreclr/jit/lsrariscv64.cpp	Removes CpObj-specific LSRA register assignment logic for riscv64.
src/coreclr/jit/lsraloongarch64.cpp	Removes CpObj-specific LSRA register assignment logic for loongarch64.
src/coreclr/jit/lsrabuild.cpp	Removes CpObj killset handling that depended on `ASSIGN_BYREF`.
src/coreclr/jit/lsraarmarch.cpp	Removes CpObj-specific LSRA register assignment logic for arm/arm64.
src/coreclr/jit/lowerxarch.cpp	Changes GC-containing struct copy lowering to decompose-or-bulk-helper; adjusts stack/off-heap handling.
src/coreclr/jit/lowerwasm.cpp	Uses decompose-or-bulk-helper for GC-containing struct copies on wasm.
src/coreclr/jit/lowerriscv64.cpp	Uses decompose-or-bulk-helper for GC-containing struct copies on riscv64.
src/coreclr/jit/lowerloongarch64.cpp	Uses decompose-or-bulk-helper for GC-containing struct copies on loongarch64.
src/coreclr/jit/lowerarmarch.cpp	Uses decompose-or-bulk-helper for GC-containing struct copies on arm/arm64.
src/coreclr/jit/lower.h	Replaces `TryLowerBlockStoreAsGcBulkCopyCall` with `LowerBlockStoreAsGcBulkCopyCall` and adds `TryDecomposeBlockStoreAsIndirs`.
src/coreclr/jit/lower.cpp	Implements decomposition into per-slot indirections and the bulk write-barrier fallback lowering.
src/coreclr/jit/gentree.h	Removes CpObj-specific `BlkOpKind` enum values.
src/coreclr/jit/gentree.cpp	Removes debug printing for CpObj `BlkOpKind` values.
src/coreclr/jit/emit.cpp	Removes special-case GC-reg-kill handling for `ASSIGN_BYREF`.
src/coreclr/jit/codegenxarch.cpp	Removes CpObj codegen path and related block-kind dispatch.
src/coreclr/jit/codegenwasm.cpp	Removes byref write barrier signature comment.
src/coreclr/jit/codegenriscv64.cpp	Removes CpObj codegen path and related block-kind dispatch.
src/coreclr/jit/codegenloongarch64.cpp	Removes CpObj codegen path and related block-kind dispatch.
src/coreclr/jit/codegencommon.cpp	Removes helper killset case for `ASSIGN_BYREF`.
src/coreclr/jit/codegenarmarch.cpp	Removes CpObj block-kind dispatch.
src/coreclr/jit/codegenarm64.cpp	Removes CpObj codegen path implementation.
src/coreclr/jit/codegenarm.cpp	Removes CpObj codegen path implementation.
src/coreclr/jit/codegen.h	Removes `genCodeForCpObj` declaration.
src/coreclr/inc/readytorunhelpers.h	Removes `READYTORUN_HELPER_ByRefWriteBarrier` mapping macro entry.
src/coreclr/inc/readytorun.h	Bumps R2R major/minimum major to 19; reserves helper slot `0x32`; documents the format break.
src/coreclr/inc/jithelpers.h	Removes `CORINFO_HELP_ASSIGN_BYREF` helper entry.
src/coreclr/inc/jiteeversionguid.h	Updates `JITEEVersionIdentifier` to reflect the interface change.
src/coreclr/inc/corinfo.h	Removes `CORINFO_HELP_ASSIGN_BYREF` from `CorInfoHelpFunc`.
docs/design/coreclr/botr/readytorun-format.md	Updates R2R helper enum docs to reserve `0x32` and note removal.

EgorBo · 2026-05-25T01:28:38Z

@EgorBot -amd -arm

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Benchmarks).Assembly).Run(args);

public class Benchmarks
{
    private MyStruct a1;
    private MyStruct a2;

    [Benchmark]
    public void Copy()
    {
        a1 = a2;
    }
}

record struct MyStruct(long N1, long N2, long N3, long N4, long N5, string A, string B);

Copilot

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/coreclr/jit/lower.cpp:10167

LowerBlockStoreAsGcBulkCopyCall now unconditionally drops a GT_IND source without preserving/handling volatile semantics (previously this path explicitly refused volatile sources/stores). If data is a volatile indirection (e.g. reading from a volatile struct field), removing the GT_IND loses the volatile flag, and the resulting CORINFO_HELP_BULK_WRITEBARRIER call is not wrapped with the usual memory barrier nodes (compare with LowerBlockStoreAsHelperCall). This risks incorrect volatile ordering/semantics on non-xarch targets. Consider either (1) keeping the earlier bail-out for volatile source/store and forcing decomposition in those cases, or (2) explicitly inserting the appropriate BARRIER_* nodes around the helper call and preserving volatile-source behavior.

//   or layouts with too many GC pointers to decompose efficiently). Volatile
//   semantics are preserved by the helper call itself — the call acts as a
//   compiler memory barrier and the helper uses atomic per-slot stores.
//
// Arguments:
//    blk - The block store node to lower
//
void Lowering::LowerBlockStoreAsGcBulkCopyCall(GenTreeBlk* blk)
{
    assert(blk->OperIs(GT_STORE_BLK));
    assert(!blk->OperIsInitBlkOp());
    assert(blk->GetLayout()->HasGCPtr());
    assert(!blk->IsAddressNotOnHeap(m_compiler));

    GenTree* dest = blk->Addr();
    GenTree* data = blk->Data();

    if (data->OperIs(GT_IND))
    {
        // Drop GT_IND nodes
        BlockRange().Remove(data);
        data = data->AsIndir()->Addr();
    }

Copilot

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 1 comment.

EgorBo · 2026-05-25T13:08:26Z

@MihuBot

Copilot

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 2 comments.

EgorBo · 2026-05-25T14:29:26Z

@EgorBot -amd -arm

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

BenchmarkSwitcher.FromAssembly(typeof(Benchmarks).Assembly).Run(args);

public class Benchmarks
{
    // Heap-resident sources/destinations (class fields).
    private S_2GC_NoGap     _2g_nogap_src, _2g_nogap_dst;
    private S_2GC_Gap8      _2g_g8_src,    _2g_g8_dst;
    private S_2GC_Gap32     _2g_g32_src,   _2g_g32_dst;
    private S_2GC_Gap64     _2g_g64_src,   _2g_g64_dst;
    private S_2GC_Gap128    _2g_g128_src,  _2g_g128_dst;
    private S_2GC_Gap1K     _2g_g1k_src,   _2g_g1k_dst;
    private S_2GC_Gap2K     _2g_g2k_src,   _2g_g2k_dst;

    private S_1GC_Gap64     _1g_g64_src,   _1g_g64_dst;
    private S_1GC_Gap1K     _1g_g1k_src,   _1g_g1k_dst;

    private S_3GC_Gap64     _3g_g64_src,   _3g_g64_dst;
    private S_3GC_Gap1K     _3g_g1k_src,   _3g_g1k_dst;

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void Sink<T>(in T _) { }

    // ===== Heap target: dst is class field (write barriers needed) =====
    [Benchmark] public void Heap_2GC_NoGap()  => _2g_nogap_dst = _2g_nogap_src;
    [Benchmark] public void Heap_2GC_Gap8()   => _2g_g8_dst    = _2g_g8_src;
    [Benchmark] public void Heap_2GC_Gap32()  => _2g_g32_dst   = _2g_g32_src;
    [Benchmark] public void Heap_2GC_Gap64()  => _2g_g64_dst   = _2g_g64_src;
    [Benchmark] public void Heap_2GC_Gap128() => _2g_g128_dst  = _2g_g128_src;
    [Benchmark] public void Heap_2GC_Gap1K()  => _2g_g1k_dst   = _2g_g1k_src;
    [Benchmark] public void Heap_2GC_Gap2K()  => _2g_g2k_dst   = _2g_g2k_src;

    [Benchmark] public void Heap_1GC_Gap64()  => _1g_g64_dst   = _1g_g64_src;
    [Benchmark] public void Heap_1GC_Gap1K()  => _1g_g1k_dst   = _1g_g1k_src;

    [Benchmark] public void Heap_3GC_Gap64()  => _3g_g64_dst   = _3g_g64_src;
    [Benchmark] public void Heap_3GC_Gap1K()  => _3g_g1k_dst   = _3g_g1k_src;

    // ===== Stack target: dst is a local (no barriers; SIMD-unroll path) =====
    [Benchmark] public void Stack_2GC_NoGap()  { var l = _2g_nogap_src; Sink(in l); }
    [Benchmark] public void Stack_2GC_Gap8()   { var l = _2g_g8_src;    Sink(in l); }
    [Benchmark] public void Stack_2GC_Gap32()  { var l = _2g_g32_src;   Sink(in l); }
    [Benchmark] public void Stack_2GC_Gap64()  { var l = _2g_g64_src;   Sink(in l); }
    [Benchmark] public void Stack_2GC_Gap128() { var l = _2g_g128_src;  Sink(in l); }
    [Benchmark] public void Stack_2GC_Gap1K()  { var l = _2g_g1k_src;   Sink(in l); }
    [Benchmark] public void Stack_2GC_Gap2K()  { var l = _2g_g2k_src;   Sink(in l); }

    [Benchmark] public void Stack_1GC_Gap64()  { var l = _1g_g64_src;   Sink(in l); }
    [Benchmark] public void Stack_1GC_Gap1K()  { var l = _1g_g1k_src;   Sink(in l); }

    [Benchmark] public void Stack_3GC_Gap64()  { var l = _3g_g64_src;   Sink(in l); }
    [Benchmark] public void Stack_3GC_Gap1K()  { var l = _3g_g1k_src;   Sink(in l); }
}

// ---- Layouts (Sequential + Pack=8 so the GC slots stay where declared) ----

// 2 GC slots, no gap (16B total)
[StructLayout(LayoutKind.Sequential, Pack = 8)]
public struct S_2GC_NoGap { public string A; public string B; }

// 2 GC slots, 8B gap (24B)
[StructLayout(LayoutKind.Sequential, Pack = 8)]
public struct S_2GC_Gap8 { public string A; public long G; public string B; }

// 2 GC slots, 32B gap (48B)
[StructLayout(LayoutKind.Sequential, Pack = 8)]
public struct S_2GC_Gap32 { public string A; public InlineArray4<long> G; public string B; }

// 2 GC slots, 64B gap (80B)
[StructLayout(LayoutKind.Sequential, Pack = 8)]
public struct S_2GC_Gap64 { public string A; public InlineArray8<long> G; public string B; }

// 2 GC slots, 128B gap (144B)
[StructLayout(LayoutKind.Sequential, Pack = 8)]
public struct S_2GC_Gap128 { public string A; public InlineArray16<long> G; public string B; }

// 2 GC slots, 1024B gap (~1040B)
[StructLayout(LayoutKind.Sequential, Pack = 8)]
public struct S_2GC_Gap1K { public string A; public InlineArray16<InlineArray8<long>> G; public string B; }

// 2 GC slots, 2048B gap (~2064B)
[StructLayout(LayoutKind.Sequential, Pack = 8)]
public struct S_2GC_Gap2K { public string A; public InlineArray16<InlineArray16<long>> G; public string B; }

// 1 GC slot + 64B padding (72B)
[StructLayout(LayoutKind.Sequential, Pack = 8)]
public struct S_1GC_Gap64 { public string A; public InlineArray8<long> Pad; }

// 1 GC slot + 1024B padding (~1032B)
[StructLayout(LayoutKind.Sequential, Pack = 8)]
public struct S_1GC_Gap1K { public string A; public InlineArray16<InlineArray8<long>> Pad; }

// 3 GC slots with 64B gaps between (152B)
[StructLayout(LayoutKind.Sequential, Pack = 8)]
public struct S_3GC_Gap64
{
    public string A; public InlineArray8<long> G1;
    public string B; public InlineArray8<long> G2;
    public string C;
}

// 3 GC slots with 1024B gaps between (~2072B)
[StructLayout(LayoutKind.Sequential, Pack = 8)]
public struct S_3GC_Gap1K
{
    public string A; public InlineArray16<InlineArray8<long>> G1;
    public string B; public InlineArray16<InlineArray8<long>> G2;
    public string C;
}

…ollLimit When a struct copy with GC pointers has an off-heap destination but is too large to unroll inline, the isNotHeap check was previously gated on 'size <= copyBlockUnrollLimit', so doCpObj stayed true and the code fell into LowerBlockStoreAsGcBulkCopyCall - which asserts !isNotHeap. Restructure to match the armarch pattern: always clear doCpObj for isNotHeap, and only set gtBlkOpGcUnsafe when small enough to unroll. Larger off-heap copies now fall through to LowerBlockStoreAsHelperCall (memcpy), which is correct since memcpy is a leaf helper (no GC during the call) and both src/dst slots are GC-tracked via local var info. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 32 out of 32 changed files in this pull request and generated 1 comment.

EgorBo · 2026-05-25T15:55:15Z

@EgorBot -arm -aws_arm -amd -windows_intel

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Runtime.CompilerServices;

BenchmarkSwitcher.FromAssembly(typeof(Benchmarks).Assembly).Run(args);

public class Benchmarks
{
    MyStruct src, dst;

    [Benchmark]
    public void Heap() => dst = src;
}

public struct MyStruct
{
    public string A; 
    public InlineArray4<long> G; 
    public string B;
}

EgorBo · 2026-05-25T18:12:43Z

@dotnet/jit-contrib @jkotas is this an acceptable trade-off?

Clearly this is a size increase given how comfortable the previous CORINFO_HELP_ASSIGN_BYREF was with rep movsq e.g.

call     CORINFO_HELP_ASSIGN_BYREF
call     CORINFO_HELP_ASSIGN_BYREF
mov      ecx, 5
rep movsq

but rep movsq can be up to 10x slower, see benchmark in the description. Also, see #128542 (comment)

SPMI diffs, jit-diff (MihuBot).

Quite a bit a hit for Tier0 codegen, mainly because it spills-reloads everything. I tried to just unconditionally use the bulk-barrier for tier0, but it gets a penalty from these explicit null checks.

There are cases where I can improve codegen by tunning LSRA and modelling registers used by write barrier more precisely (presumably).

Pros:

minus ~2000-3000 LOC (if changes on the VM side are included)
in many cases it has a lot better perf characteristics, esp on x64, esp on Intel (it seems). Up to 10x
for known GC locations we now can emit unchecked barriers while previously CORINFO_HELP_ASSIGN_BYREF was always unconditionally checked.
less variations of write-barriers, easier to maintain, less of custom calling conventions

Cons:

Size increase (esp on windows 32bit)
TP hit
R2R bump (to clean up the VM side)

So I think we can improve certain patterns in follow ups here (to reduce redundant stack spill-reloads). Also, we can tune the bulk barrier a bit.

jkotas · 2026-05-25T18:33:51Z

LGTM

Copilot AI review requested due to automatic review settings May 25, 2026 01:13

github-actions Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 25, 2026

Copilot started reviewing on behalf of EgorBo May 25, 2026 01:14 View session

dotnet-policy-service Bot assigned EgorBo May 25, 2026

Copilot AI reviewed May 25, 2026

View reviewed changes

Comment thread src/coreclr/jit/lowerxarch.cpp Outdated

This was referenced May 25, 2026

Benchmarks for dotnet/runtime#128542 (for @EgorBo) EgorBot/Benchmarks#214

Open

Benchmarks for dotnet/runtime#128542 (for @EgorBo) EgorBot/Benchmarks#215

Open

This was referenced May 25, 2026

"We stopped hearing from agent Azure Pipelines 32. Verify the agent machine is running and has a healthy network connection" dotnet/dnceng#1886

Open

Multiple Helix work items fail on maccatalyst/tvos CoreCLR Release #126460

Open

Copilot AI review requested due to automatic review settings May 25, 2026 11:43

EgorBo force-pushed the remove-corinfo-help-assign-byref branch from 9beff5e to c845f6c Compare May 25, 2026 11:43

Copilot started reviewing on behalf of EgorBo May 25, 2026 11:44 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

Comment thread src/coreclr/jit/lower.cpp Outdated

EgorBo force-pushed the remove-corinfo-help-assign-byref branch from c845f6c to ee79d4d Compare May 25, 2026 12:02

Copilot AI review requested due to automatic review settings May 25, 2026 12:42

EgorBo force-pushed the remove-corinfo-help-assign-byref branch from ee79d4d to d805a92 Compare May 25, 2026 12:42

Copilot started reviewing on behalf of EgorBo May 25, 2026 12:42 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

Comment thread src/coreclr/jit/lower.cpp Outdated

EgorBo force-pushed the remove-corinfo-help-assign-byref branch from d805a92 to 28e1813 Compare May 25, 2026 13:03

MihuBot mentioned this pull request May 25, 2026

[JitDiff X64] [EgorBo] Remove CORINFO_HELP_ASSIGN_BYREF MihuBot/runtime-utils#1935

Open

Copilot AI review requested due to automatic review settings May 25, 2026 13:30

EgorBo force-pushed the remove-corinfo-help-assign-byref branch from 28e1813 to 1b597b2 Compare May 25, 2026 13:30

Copilot started reviewing on behalf of EgorBo May 25, 2026 13:30 View session

EgorBot mentioned this pull request May 25, 2026

Benchmarks for dotnet/runtime#128542 (for @EgorBo) EgorBot/Benchmarks#216

Open

Copilot AI reviewed May 25, 2026

View reviewed changes

Comment thread src/coreclr/jit/lower.cpp Outdated

Comment thread src/coreclr/jit/lower.cpp

Only jit changes

0177e27

EgorBo force-pushed the remove-corinfo-help-assign-byref branch from 1b597b2 to 0177e27 Compare May 25, 2026 14:04

EgorBot mentioned this pull request May 25, 2026

Benchmarks for dotnet/runtime#128542 (for @EgorBo) EgorBot/Benchmarks#217

Open

Copilot AI review requested due to automatic review settings May 25, 2026 15:22

Copilot started reviewing on behalf of EgorBo May 25, 2026 15:22 View session

Copilot AI reviewed May 25, 2026

View reviewed changes

Comment thread src/coreclr/jit/lower.cpp Outdated

EgorBot mentioned this pull request May 25, 2026

Benchmarks for dotnet/runtime#128542 (for @EgorBo) EgorBot/Benchmarks#218

Open

EgorBo marked this pull request as ready for review May 25, 2026 18:25

build-analysis Bot mentioned this pull request May 25, 2026

XHarness package install failure on iOS due to devicectl NSPOSIXErrorDomain error 49 #123796

Open

EgorBo added 3 commits May 26, 2026 00:19

clean up

45704ab

clean up

7f8f5cb

clean up

328cb59

Copilot AI review requested due to automatic review settings May 26, 2026 00:02

Copilot started reviewing on behalf of EgorBo May 26, 2026 00:03 View session

Conversation

EgorBo commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

INTEL XEON PLATINUM 8573C

AMD EPYC 9V45 2.60GHz

Apple M4

Uh oh!

dotnet-policy-service Bot commented May 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

EgorBo commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

EgorBo commented May 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

EgorBo commented May 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

EgorBo commented May 25, 2026

Uh oh!

EgorBo commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jkotas commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

EgorBo commented May 25, 2026 •

edited

Loading

EgorBo commented May 25, 2026 •

edited

Loading

EgorBo commented May 25, 2026 •

edited

Loading