Skip to content

arm64 JIT: ins_Move_Extend emits unnecessary sxtw for TYP_INT reg-reg moves #129052

@AndyAyersMS

Description

@AndyAyersMS

Note

AI-assisted (Copilot CLI).

Description

On arm64, ins_Move_Extend in src/coreclr/jit/instr.cpp:2008-2011 returns INS_sxtw when srcType == TYP_INT. The sxtw Xd, Wn instruction sign-extends 32 → 64 bits, but for a TYP_INT → TYP_INT reg-reg move the upper 32 bits are irrelevant — any subsequent 32-bit operation ignores them, and any genuine int → long widening has an explicit CAST in the IR that takes a different codegen path.

// src/coreclr/jit/instr.cpp, arm64 signed branch:
else if (srcType == TYP_INT)
{
    ins = INS_sxtw;   // ← INS_mov would suffice
}
else
{
    ins = INS_mov;
}

INS_mov (encoded as orr Wd, WZR, Wn on arm64) zero-extends the upper 32 bits — exactly what we want for an int-typed local — and is strictly cheaper than sxtw on common cores.

Repro

foreach over a custom Range enumerator whose Current is exposed as an auto-property (int Current { get; private set; }) emits the sxtw in its hot loop. See #40770 for full details.

Inner loop on current main (arm64, FullOpts):

add  w1, w1, w2
add  w2, w3, #1
sxtw w3, w2      ; <-- wasted
cmp  w3, w0
blt  .loop

Replacing the auto-property with a plain int Current; field eliminates the sxtw and matches the for-loop codegen exactly.

Local patch + measurement

Single-line change: ins = INS_sxtw;ins = INS_mov; on arm64 for the TYP_INT case.

Rebuilt checked arm64 JIT and re-measured the repro from #40770 (Apple M4 Max, N=100, 5M outer iterations):

Before After
foreach over Range (auto-prop) ~48 ns/call ~36 ns/call (~25% faster)
foreach over enumerator (plain field) ~27 ns/call ~27 ns/call
for (int i = 1; i < n; i++) ~24 ns/call ~24 ns/call

Sanity checks for (long)int, (long)int.MinValue, -int.MinValue, etc. continue to produce correct results — the explicit CAST path is unaffected.

Caveats

ins_Move_Extend is called from multiple sites, not just STORE_LCL_VAR in codegenarm64.cpp:

  • codegenarm64.cpp:3020 — STORE_LCL_VAR (the case verified above)
  • codegencommon.cpp:7227 — return value codegen
  • hwintrinsiccodegenarm64.cpp:2861, :2900 — HW intrinsic helpers

The other callers should be reviewed before the change is merged, and SPMI diff would confirm the broader impact (and catch any callers that did rely on the sign-extension to produce correct 64-bit observation of the value without an explicit CAST).

Related

cc @dotnet/jit-contrib

Metadata

Metadata

Labels

arch-arm64area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMItenet-performancePerformance related issue

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions