Fix ARM64 interface dispatch cache torn read by MichalStrehovsky · Pull Request #126346 · dotnet/runtime

MichalStrehovsky · 2026-03-31T05:36:58Z

On ARM64, the CHECK_CACHE_ENTRY macro read m_pInstanceType and m_pTargetCode from a cache entry using two separate ldr instructions separated by a control dependency (cmp/bne). ARM64's weak memory model does not order loads across control dependencies, so the hardware can speculatively satisfy the second load (target) before the first (type) commits. When a concurrent thread atomically populates the entry via stlxp/casp (UpdateCacheEntryAtomically), the reader can observe the new m_pInstanceType but the old m_pTargetCode (0), then br to address 0.

Fix by using ldp to load both fields in a single instruction (single-copy atomic on FEAT_LSE2 / ARMv8.4+ hardware), plus a cbz guard to catch torn reads on pre-LSE2 hardware where ldp pair atomicity is not architecturally guaranteed.

Fixes #126345

On ARM64, the CHECK_CACHE_ENTRY macro read m_pInstanceType and m_pTargetCode from a cache entry using two separate ldr instructions separated by a control dependency (cmp/bne). ARM64's weak memory model does not order loads across control dependencies, so the hardware can speculatively satisfy the second load (target) before the first (type) commits. When a concurrent thread atomically populates the entry via stlxp/casp (UpdateCacheEntryAtomically), the reader can observe the new m_pInstanceType but the old m_pTargetCode (0), then br to address 0. Fix by using ldp to load both fields in a single instruction (single-copy atomic on FEAT_LSE2 / ARMv8.4+ hardware), plus a cbz guard to catch torn reads on pre-LSE2 hardware where ldp pair atomicity is not architecturally guaranteed. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

dotnet-policy-service · 2026-03-31T05:37:48Z

Tagging subscribers to this area: @agocke, @dotnet/ilc-contrib
See info in area-owners.md if you want to be subscribed.

Copilot

Pull request overview

Fixes a potential torn-read race in the ARM64 cached interface dispatch fast-path that could lead to branching to address 0 when reading a concurrently populated cache entry.

Changes:

Replace two independent loads of cache-entry fields with a single ldp pair load to avoid reordering across control dependencies on ARM64.
Add a cbz guard on the loaded target to treat observed torn reads (type updated, target still 0) as a cache miss on pre-LSE2 hardware.
Mirror the changes in both the GAS (.S) and ARMASM (.asm) implementations.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
src/coreclr/runtime/arm64/StubDispatch.S	Updates ARM64 stub macro to use `ldp` + `cbz` to avoid torn cache-entry reads.
src/coreclr/runtime/arm64/StubDispatch.asm	Same logic as above for the ARMASM variant to keep implementations consistent.

Copilot · 2026-03-31T05:43:09Z

src/coreclr/runtime/arm64/StubDispatch.S

+    .if (OFFSETOF__InterfaceDispatchCache__m_rgEntries + (\entry * 16)) > 504
+        // ldp's signed immediate offset must be in [-512,504] for 64-bit registers.
+        // Use add to reach far entries in the 32/64 slot stubs.
+        add     x12, x9, #(OFFSETOF__InterfaceDispatchCache__m_rgEntries + (\entry * 16))
+        ldp     x12, x13, [x12]


For stubs with 32/64 entries, OFFSETOF__InterfaceDispatchCache__m_rgEntries is 0x20 (see src/coreclr/vm/arm64/asmconstants.h:289), so entries >= 30 fall into the add+ldp path. That adds an extra instruction on the (common) mismatch path for over half the probes, which could regress interface-dispatch hot-path throughput. Consider restructuring to avoid per-entry add (e.g., split the probe sequence into two ranges using an adjusted base once, so ldp can keep using immediate offsets in-range).

Copilot · 2026-03-31T05:43:09Z

src/coreclr/runtime/arm64/StubDispatch.asm

+        IF (OFFSETOF__InterfaceDispatchCache__m_rgEntries + ($entry * 16)) > 504
+        ;; ldp's signed immediate offset must be in [-512,504] for 64-bit registers.
+        ;; Use add to reach far entries in the 32/64 slot stubs.
+        add     x12, x9, #(OFFSETOF__InterfaceDispatchCache__m_rgEntries + ($entry * 16))
+        ldp     x12, x13, [x12]


For 32/64-entry stubs, OFFSETOF__InterfaceDispatchCache__m_rgEntries is 0x20 (src/coreclr/vm/arm64/asmconstants.h:289), so entries >= 30 will always take the add+ldp sequence. This adds an extra instruction on each mismatch for a large fraction of probes. Consider splitting the probe loop into two ranges with a second base computed once so later ldp uses an in-range immediate offset, avoiding repeated add in the hot path.

src/coreclr/runtime/arm64/StubDispatch.asm

MichalStrehovsky · 2026-03-31T13:26:48Z

/azp run runtime-nativeaot-outerloop

azure-pipelines · 2026-03-31T13:27:06Z

Azure Pipelines successfully started running 1 pipeline(s).

MichalStrehovsky requested a review from davidwrighton March 31, 2026 05:36

MichalStrehovsky added the area-NativeAOT-coreclr label Mar 31, 2026

Copilot AI review requested due to automatic review settings March 31, 2026 05:36

dotnet-policy-service bot assigned MichalStrehovsky Mar 31, 2026

Copilot started reviewing on behalf of MichalStrehovsky March 31, 2026 05:38 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

jakobbotsch reviewed Mar 31, 2026

View reviewed changes

src/coreclr/runtime/arm64/StubDispatch.asm Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ARM64 interface dispatch cache torn read#126346

Fix ARM64 interface dispatch cache torn read#126346
MichalStrehovsky wants to merge 1 commit intodotnet:mainfrom
MichalStrehovsky:fix/arm64-interface-dispatch-torn-read

MichalStrehovsky commented Mar 31, 2026

Uh oh!

dotnet-policy-service bot commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Copilot AI Mar 31, 2026

Uh oh!

Uh oh!

MichalStrehovsky commented Mar 31, 2026

Uh oh!

azure-pipelines bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MichalStrehovsky commented Mar 31, 2026

Uh oh!

dotnet-policy-service bot commented Mar 31, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MichalStrehovsky commented Mar 31, 2026

Uh oh!

azure-pipelines bot commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants