-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Fix ARM64 interface dispatch cache torn read #126346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -13,15 +13,30 @@ | |
| MACRO | ||
| CHECK_CACHE_ENTRY $entry | ||
| ;; Check a single entry in the cache. | ||
| ;; x9 : Cache data structure. Also used for target address jump. | ||
| ;; x9 : Cache data structure | ||
| ;; x10 : Instance MethodTable* | ||
| ;; x11 : Indirection cell address, preserved | ||
| ;; x12 : Trashed | ||
| ldr x12, [x9, #(OFFSETOF__InterfaceDispatchCache__m_rgEntries + ($entry * 16))] | ||
| ;; x12, x13 : Trashed | ||
| ;; | ||
| ;; Use ldp to load both m_pInstanceType and m_pTargetCode in a single instruction. | ||
| ;; On ARM64 two separate ldr instructions can be reordered across a control dependency, | ||
| ;; which means a concurrent atomic cache entry update (via stlxp) could be observed as a | ||
| ;; torn read (new type, old target). ldp is single-copy atomic for the pair on FEAT_LSE2 | ||
| ;; hardware (ARMv8.4+). The cbz guard ensures correctness on pre-LSE2 hardware too: | ||
| ;; a torn read can only produce a zero target (entries go from 0,0 to type,target), | ||
| ;; so we treat it as a cache miss. | ||
| IF (OFFSETOF__InterfaceDispatchCache__m_rgEntries + ($entry * 16)) > 504 | ||
| ;; ldp's signed immediate offset must be in [-512,504] for 64-bit registers. | ||
| ;; Use add to reach far entries in the 32/64 slot stubs. | ||
| add x12, x9, #(OFFSETOF__InterfaceDispatchCache__m_rgEntries + ($entry * 16)) | ||
| ldp x12, x13, [x12] | ||
|
Comment on lines
+28
to
+32
|
||
| ELSE | ||
| ldp x12, x13, [x9, #(OFFSETOF__InterfaceDispatchCache__m_rgEntries + ($entry * 16))] | ||
| ENDIF | ||
| cmp x10, x12 | ||
| bne %ft0 | ||
| ldr x9, [x9, #(OFFSETOF__InterfaceDispatchCache__m_rgEntries + ($entry * 16) + 8)] | ||
| br x9 | ||
| cbz x13, %ft0 | ||
jakobbotsch marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| br x13 | ||
| 0 | ||
| MEND | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For stubs with 32/64 entries,
OFFSETOF__InterfaceDispatchCache__m_rgEntriesis 0x20 (see src/coreclr/vm/arm64/asmconstants.h:289), so entries >= 30 fall into theadd+ldppath. That adds an extra instruction on the (common) mismatch path for over half the probes, which could regress interface-dispatch hot-path throughput. Consider restructuring to avoid per-entryadd(e.g., split the probe sequence into two ranges using an adjusted base once, soldpcan keep using immediate offsets in-range).