Skip to content

Commit c763222

Browse files
timcassellclaude
andauthored
Disassembly follow jump trampolines (#3136)
* Disassembly follow jump trampolines. * Revert ProcessExtensions. * Fix DisassemblyDiagnoser for indirect calls through precodes Replace byte-for-byte stub precode template matching with opcode-based pattern matching that extracts RIP-relative displacements directly from the encoded instructions, so the resolver survives StubPrecodeData layout shifts between .NET runtime patch versions. For StubPrecode and FixupPrecode, resolve to the MethodDesc slot rather than the Target slot, since Target may still point at PreStub or PrecodeFixupThunk when the call site has never been backpatched (common with CallCountingDelayMs=0). The MethodDesc handle lets GetMethodByHandle recover the live ClrMethod regardless of backpatch state. When the handle resolves, also enqueue the method for disassembly rather than only labeling it "Precode of X". Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Address PR feedback: opcode-based stub matching everywhere - Reword the rationale comment: the data-section layout is stable; what varies between runtime versions is the offset between the code page and its data section (the stub page size). Extracting RIP-relative displacements from the encoded instructions themselves is what makes the resolver independent of that gap. - Apply the same opcode-based approach to TryFollowJumpTrampoline so it no longer relies on byte-for-byte stub templates either, and delete the RuntimeSpecificData class along with the per-runtime-version template cache from both IntelDisassembler and Arm64Disassembler. - Make the Arm64 precode/stub resolver tolerant of register-allocation changes: accept any LDR Xa + LDR Xb + BR (Xa|Xb) triple for StubPrecode and use the BR's register to decide which LDR loaded Target vs MethodDesc; FixupPrecode similarly matches any LDR Xa + BR Xa + LDR Xb. CallCountingStub validates only that the LDRH's base register matches the preceding LDR's destination. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Restore fixed-register matching on Arm64 per reviewer feedback The runtime's stub ABI pins which scratch registers the precode/stub templates use (x10/x12 for StubPrecode, x11/x12 for FixupPrecode, x9 for CallCountingStub), so generalising over them was unnecessary. Restore the strict register checks; the actual Arm64 bug being fixed was the LDRH encoding constant, which is now 0x7940012A (Rt=10) — the previous value of 0x79400129 had Rt=9, so CallCountingStub recognition never matched. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * Recognise FixupPrecodeCode_Fixup shape on Arm64 The AArch64 runtime emits two FixupPrecode shapes: the post-backpatch form (LDR x11, Target ; BR x11 ; LDR x12, MethodDesc) and the pre-backpatch FixupPrecodeCode_Fixup form (LDR x12, MethodDesc ; LDR x11, PrecodeFixupThunk ; BR x11). The resolver only matched the former, so a call site that has never been routed through the JIT'd entry point (e.g. a guarded call that never executes during workload) stayed unresolved and the callee was never enqueued for disassembly. Add the missing pattern and resolve via the MethodDesc slot loaded into x12. Fixes CanDisassembleInlinableBenchmarks on Arm64 .NET 8 Release, where __ForDisassemblyDiagnoser__'s `if (notEleven == 11) base.JustReturn()` guard means the call site never executes and the slot still points at the fixup-thunk precode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Restore runtime-source pointer comments and drop unused helper Re-add the "See dotnet/runtime src/coreclr/vm/{arch}/thunktemplates" references that the previous refactor lost when the byte-template classes were deleted, and remove ImmutableConfig.HasDisassemblyDiagnoser now that ProcessExtensions no longer needs to gate CallCountingDelayMs on disassembly being requested. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 7a104c4 commit c763222

5 files changed

Lines changed: 415 additions & 165 deletions

File tree

src/BenchmarkDotNet/Configs/ImmutableConfig.cs

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -113,8 +113,6 @@ internal ImmutableConfig(
113113

114114
internal bool HasPerfCollectProfiler() => diagnosers.OfType<PerfCollectProfiler>().Any();
115115

116-
internal bool HasDisassemblyDiagnoser() => diagnosers.OfType<DisassemblyDiagnoser>().Any();
117-
118116
public bool HasExtraIterationDiagnoser(BenchmarkCase benchmarkCase) => HasMemoryDiagnoser() || diagnosers.Any(d => d.GetRunMode(benchmarkCase) == RunMode.ExtraIteration);
119117

120118
public IDiagnoser? GetCompositeDiagnoser(BenchmarkCase benchmarkCase, Func<RunMode, bool> runModeComparer)

src/BenchmarkDotNet/Disassemblers/Arm64Disassembler.cs

Lines changed: 195 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -138,63 +138,8 @@ public void Feed(Arm64Instruction instruction)
138138

139139
internal class Arm64Disassembler : ClrMdDisassembler
140140
{
141-
internal sealed class RuntimeSpecificData
142-
{
143-
// See dotnet/runtime src/coreclr/vm/arm64/thunktemplates.asm/.S for the stub code
144-
// ldr x9, DATA_SLOT(CallCountingStub, RemainingCallCountCell)
145-
// ldrh w10, [x9]
146-
// subs w10, w10, #0x1
147-
internal readonly byte[] callCountingStubTemplate = [0x09, 0x00, 0x00, 0x58, 0x2a, 0x01, 0x40, 0x79, 0x4a, 0x05, 0x00, 0x71];
148-
// ldr x10, DATA_SLOT(StubPrecode, Target)
149-
// ldr x12, DATA_SLOT(StubPrecode, MethodDesc)
150-
// br x10
151-
internal readonly byte[] stubPrecodeTemplate = [0x4a, 0x00, 0x00, 0x58, 0xec, 0x00, 0x00, 0x58, 0x40, 0x01, 0x1f, 0xd6];
152-
// ldr x11, DATA_SLOT(FixupPrecode, Target)
153-
// br x11
154-
// ldr x12, DATA_SLOT(FixupPrecode, MethodDesc)
155-
internal readonly byte[] fixupPrecodeTemplate = [0x0b, 0x00, 0x00, 0x58, 0x60, 0x01, 0x1f, 0xd6, 0x0c, 0x00, 0x00, 0x58];
156-
internal readonly ulong stubPageSize;
157-
158-
internal RuntimeSpecificData(State state)
159-
{
160-
stubPageSize = (ulong)Environment.SystemPageSize;
161-
if (state.RuntimeVersion.Major >= 8)
162-
{
163-
// In .NET 8, the stub page size was changed to min 16kB
164-
stubPageSize = Math.Max(stubPageSize, 16384);
165-
}
166-
167-
// The stubs code depends on the current OS memory page size, so we need to update the templates to reflect that
168-
ulong pageSizeShifted = stubPageSize / 32;
169-
// Calculate the ldr x9, #offset instruction with offset based on the page size
170-
callCountingStubTemplate[1] = (byte)(pageSizeShifted & 0xff);
171-
callCountingStubTemplate[2] = (byte)(pageSizeShifted >> 8);
172-
173-
// Calculate the ldr x10, #offset instruction with offset based on the page size
174-
stubPrecodeTemplate[1] = (byte)(pageSizeShifted & 0xff);
175-
stubPrecodeTemplate[2] = (byte)(pageSizeShifted >> 8);
176-
// Calculate the ldr x12, #offset instruction with offset based on the page size
177-
stubPrecodeTemplate[5] = (byte)((pageSizeShifted - 1) & 0xff);
178-
stubPrecodeTemplate[6] = (byte)((pageSizeShifted - 1) >> 8);
179-
180-
// Calculate the ldr x11, #offset instruction with offset based on the page size
181-
fixupPrecodeTemplate[1] = (byte)(pageSizeShifted & 0xff);
182-
fixupPrecodeTemplate[2] = (byte)(pageSizeShifted >> 8);
183-
// Calculate the ldr x12, #offset instruction with offset based on the page size
184-
fixupPrecodeTemplate[9] = (byte)(pageSizeShifted & 0xff);
185-
fixupPrecodeTemplate[10] = (byte)(pageSizeShifted >> 8);
186-
}
187-
}
188-
189-
private static readonly Dictionary<Version, RuntimeSpecificData> runtimeSpecificData = [];
190-
191141
protected override IEnumerable<Asm> Decode(byte[] code, ulong startAddress, State state, int depth, ClrMethod currentMethod, DisassemblySyntax syntax)
192142
{
193-
if (!runtimeSpecificData.TryGetValue(state.RuntimeVersion, out var data))
194-
{
195-
runtimeSpecificData.Add(state.RuntimeVersion, data = new RuntimeSpecificData(state));
196-
}
197-
198143
const Arm64DisassembleMode disassembleMode = Arm64DisassembleMode.Arm;
199144
using (CapstoneArm64Disassembler disassembler = CapstoneDisassembler.CreateArm64Disassembler(disassembleMode))
200145
{
@@ -216,33 +161,8 @@ protected override IEnumerable<Asm> Decode(byte[] code, ulong startAddress, Stat
216161
{
217162
if (isIndirect && state.RuntimeVersion.Major >= 7)
218163
{
219-
// Check if the target is a known stub
220-
// The stubs are allocated in interleaved code / data pages in memory. The data part of the stub
221-
// is at an address one memory page higher than the code.
222-
byte[] buffer = new byte[12];
223-
224-
FlushCachedDataIfNeeded(state.Runtime.DataTarget.DataReader, address, buffer);
225-
226-
if (state.Runtime.DataTarget.DataReader.Read(address, buffer) == buffer.Length)
227-
{
228-
if (buffer.SequenceEqual(data.callCountingStubTemplate))
229-
{
230-
const ulong TargetMethodAddressSlotOffset = 8;
231-
address = state.Runtime.DataTarget.DataReader.ReadPointer(address + data.stubPageSize + TargetMethodAddressSlotOffset);
232-
}
233-
else if (buffer.SequenceEqual(data.stubPrecodeTemplate))
234-
{
235-
const ulong MethodDescSlotOffset = 0;
236-
address = state.Runtime.DataTarget.DataReader.ReadPointer(address + data.stubPageSize + MethodDescSlotOffset);
237-
isPrestubMD = true;
238-
}
239-
else if (buffer.SequenceEqual(data.fixupPrecodeTemplate))
240-
{
241-
const ulong MethodDescSlotOffset = 8;
242-
address = state.Runtime.DataTarget.DataReader.ReadPointer(address + data.stubPageSize + MethodDescSlotOffset);
243-
isPrestubMD = true;
244-
}
245-
}
164+
FlushCachedDataIfNeeded(state.Runtime.DataTarget.DataReader, address, new byte[1]);
165+
TryResolvePrecode(state.Runtime.DataTarget.DataReader, ref address, out isPrestubMD);
246166
}
247167
TryTranslateAddressToName(address, isPrestubMD, state, depth, currentMethod);
248168
}
@@ -262,6 +182,120 @@ protected override IEnumerable<Asm> Decode(byte[] code, ulong startAddress, Stat
262182
}
263183
}
264184

185+
// Counterpart of IntelDisassembler.TryResolvePrecode: recognise the AArch64 precode/stub
186+
// shapes by matching the fixed opcode bits and reading slot displacements out of the
187+
// encoded LDR-literal instructions. Resolves to the MethodDesc handle when one is present
188+
// (so GetMethodByHandle can recover the live ClrMethod even if the call site is still
189+
// pointing at PreStub), and to the TargetForMethod slot for call-counting stubs.
190+
//
191+
// See dotnet/runtime src/coreclr/vm/arm64/thunktemplates.asm/.S for the canonical stub
192+
// shapes. The register numbers (x10/x12 for StubPrecode, x11/x12 for FixupPrecode, x9 for
193+
// CallCountingStub) are part of the runtime's stub ABI and stay fixed across versions; the
194+
// data-section layout is also stable. What can change between versions is the offset
195+
// between the code page and its data section, so we extract the LDR-literal displacements
196+
// straight from the bytes instead of consulting a runtime-version-specific page-size table.
197+
private static bool TryResolvePrecode(IDataReader reader, ref ulong address, out bool isPrestubMD)
198+
{
199+
isPrestubMD = false;
200+
byte[] buffer = new byte[12];
201+
if (reader.Read(address, buffer) != 12)
202+
return false;
203+
204+
uint instr0 = ReadInstr(buffer, 0);
205+
uint instr1 = ReadInstr(buffer, 4);
206+
uint instr2 = ReadInstr(buffer, 8);
207+
208+
// StubPrecode: LDR x10, Target ; LDR x12, MethodDesc ; BR x10
209+
if (IsLdrLiteral64(instr0, out int rt0, out int _) && rt0 == 10
210+
&& IsLdrLiteral64(instr1, out int rt1, out int off1) && rt1 == 12
211+
&& instr2 == 0xD61F0140u)
212+
{
213+
ulong mdSlot = unchecked(address + 4 + (ulong)(long)off1);
214+
if (reader.ReadPointer(mdSlot, out ulong md) && IsValidAddress(md))
215+
{
216+
address = md;
217+
isPrestubMD = true;
218+
return true;
219+
}
220+
return false;
221+
}
222+
223+
// FixupPrecode: LDR x11, Target ; BR x11 ; LDR x12, MethodDesc
224+
if (IsLdrLiteral64(instr0, out int rtA, out int _) && rtA == 11
225+
&& instr1 == 0xD61F0160u
226+
&& IsLdrLiteral64(instr2, out int rtB, out int off2) && rtB == 12)
227+
{
228+
ulong mdSlot = unchecked(address + 8 + (ulong)(long)off2);
229+
if (reader.ReadPointer(mdSlot, out ulong md) && IsValidAddress(md))
230+
{
231+
address = md;
232+
isPrestubMD = true;
233+
return true;
234+
}
235+
return false;
236+
}
237+
238+
// FixupPrecodeCode_Fixup: LDR x12, MethodDesc ; LDR x11, PrecodeFixupThunk ; BR x11
239+
// This is the pre-backpatch shape — the call site has never been routed through the
240+
// method's JIT'd entry point yet, so x11 still loads the fixup thunk instead of Target.
241+
// Resolve via the MethodDesc slot loaded into x12 (instr0).
242+
if (IsLdrLiteral64(instr0, out int rtF0, out int offF0) && rtF0 == 12
243+
&& IsLdrLiteral64(instr1, out int rtF1, out int _) && rtF1 == 11
244+
&& instr2 == 0xD61F0160u)
245+
{
246+
ulong mdSlot = unchecked(address + (ulong)(long)offF0);
247+
if (reader.ReadPointer(mdSlot, out ulong md) && IsValidAddress(md))
248+
{
249+
address = md;
250+
isPrestubMD = true;
251+
return true;
252+
}
253+
return false;
254+
}
255+
256+
// CallCountingStub: LDR x9, RemainingCallCount ; LDRH w10, [x9] ; SUBS w10, w10, #1
257+
// No MethodDesc to recover here; read TargetForMethod, which lives 8 bytes after
258+
// RemainingCallCount in the data section.
259+
if (IsLdrLiteral64(instr0, out int rtCount, out int offCount) && rtCount == 9
260+
&& instr1 == 0x7940012Au
261+
&& instr2 == 0x7100054Au)
262+
{
263+
ulong countSlot = unchecked(address + (ulong)(long)offCount);
264+
if (reader.ReadPointer(countSlot + 8, out ulong target) && IsValidAddress(target))
265+
{
266+
address = target;
267+
return true;
268+
}
269+
return false;
270+
}
271+
272+
return false;
273+
}
274+
275+
private static uint ReadInstr(byte[] buffer, int offset)
276+
=> (uint)buffer[offset]
277+
| ((uint)buffer[offset + 1] << 8)
278+
| ((uint)buffer[offset + 2] << 16)
279+
| ((uint)buffer[offset + 3] << 24);
280+
281+
// LDR (literal), 64-bit form. Encoding: bits[31:24]=0x58, bits[23:5]=imm19 (signed,
282+
// word-scaled offset relative to the LDR's own PC), bits[4:0]=Xt. Returns the destination
283+
// register and the byte-scaled offset from the LDR instruction's address to the loaded slot.
284+
private static bool IsLdrLiteral64(uint instr, out int rt, out int offsetBytes)
285+
{
286+
rt = 0;
287+
offsetBytes = 0;
288+
if ((instr & 0xFF000000u) != 0x58000000u)
289+
return false;
290+
rt = (int)(instr & 0x1Fu);
291+
int imm19 = (int)((instr >> 5) & 0x7FFFFu);
292+
// Sign-extend 19-bit imm to 32-bit.
293+
if ((imm19 & 0x40000) != 0)
294+
imm19 |= unchecked((int)0xFFF80000u);
295+
offsetBytes = imm19 * 4;
296+
return true;
297+
}
298+
265299
private static bool TryGetReferencedAddress(Arm64Instruction instruction, RegisterValueAccumulator accumulator, uint pointerSize, out ulong referencedAddress, out bool isReferencedAddressIndirect)
266300
{
267301
if ((instruction.Id == Arm64InstructionId.ARM64_INS_BR || instruction.Id == Arm64InstructionId.ARM64_INS_BLR) && instruction.Details.Operands[0].Register.Id == accumulator.RegisterId && accumulator.HasValue)
@@ -296,5 +330,84 @@ private static DisassembleSyntax Map(DisassemblySyntax syntax)
296330
DisassemblySyntax.Intel => DisassembleSyntax.Intel,
297331
_ => DisassembleSyntax.Masm
298332
};
333+
334+
// Recognise the AArch64 jump trampoline shape the CLR JIT emits when a call's real target
335+
// is out of rel26 range (±128 MB), plus the precode/stub shapes the runtime emits as the
336+
// stable entry point for tiered methods (so a direct `BL imm26` landing on the precode
337+
// still resolves to the underlying method):
338+
// B imm26 (bits[31:26] = 0b000101) — target = address + sign_extended(imm26) * 4
339+
// CallCountingStub (opcode match) — reads TargetForMethod slot
340+
// StubPrecode (opcode match) — reads Target slot (the LDR that BR consumes)
341+
// FixupPrecode (opcode match) — reads Target slot (the LDR that BR consumes)
342+
// Slot displacements are extracted from the LDR-literal instructions themselves, so the
343+
// stub recognition doesn't depend on the runtime's code-to-data offset. Writes the resolved
344+
// target into `target` and returns true if one matches.
345+
protected override bool TryFollowJumpTrampoline(State state, ulong address, out ulong target)
346+
{
347+
target = 0;
348+
IDataReader dataReader = state.Runtime.DataTarget.DataReader;
349+
byte[] buffer = new byte[12];
350+
int read = dataReader.Read(address, buffer);
351+
if (read < 4)
352+
return false;
353+
354+
uint instr0 = ReadInstr(buffer, 0);
355+
356+
// B imm26 — bits[31:26] == 0b000101 (0x5)
357+
if ((instr0 >> 26) == 0x5)
358+
{
359+
uint imm26 = instr0 & 0x03FFFFFFu;
360+
// Sign-extend the 26-bit immediate to 32 bits, then multiply by 4 (instructions are 4-byte aligned).
361+
int offset = (int)(imm26 & 0x02000000u) != 0
362+
? unchecked((int)(imm26 | 0xFC000000u)) << 2
363+
: (int)imm26 << 2;
364+
target = unchecked(address + (ulong)(long)offset);
365+
return IsValidAddress(target);
366+
}
367+
368+
if (read < 12)
369+
return false;
370+
uint instr1 = ReadInstr(buffer, 4);
371+
uint instr2 = ReadInstr(buffer, 8);
372+
373+
// StubPrecode: LDR x10, Target ; LDR x12, MethodDesc ; BR x10. Follow the first LDR.
374+
if (IsLdrLiteral64(instr0, out int rt0, out int off0) && rt0 == 10
375+
&& IsLdrLiteral64(instr1, out int rt1, out int _) && rt1 == 12
376+
&& instr2 == 0xD61F0140u)
377+
{
378+
ulong targetSlot = unchecked(address + (ulong)(long)off0);
379+
if (dataReader.ReadPointer(targetSlot, out target) && IsValidAddress(target))
380+
return true;
381+
target = 0;
382+
return false;
383+
}
384+
385+
// FixupPrecode: LDR x11, Target ; BR x11 ; LDR x12, MethodDesc. Follow the first LDR.
386+
if (IsLdrLiteral64(instr0, out int rtA, out int offA) && rtA == 11
387+
&& instr1 == 0xD61F0160u
388+
&& IsLdrLiteral64(instr2, out int rtB, out int _) && rtB == 12)
389+
{
390+
ulong targetSlot = unchecked(address + (ulong)(long)offA);
391+
if (dataReader.ReadPointer(targetSlot, out target) && IsValidAddress(target))
392+
return true;
393+
target = 0;
394+
return false;
395+
}
396+
397+
// CallCountingStub: LDR x9, RemainingCallCount ; LDRH w10, [x9] ; SUBS w10, w10, #1.
398+
// TargetForMethod lives 8 bytes after RemainingCallCount in the data section.
399+
if (IsLdrLiteral64(instr0, out int rtCount, out int offCount) && rtCount == 9
400+
&& instr1 == 0x7940012Au
401+
&& instr2 == 0x7100054Au)
402+
{
403+
ulong countSlot = unchecked(address + (ulong)(long)offCount);
404+
if (dataReader.ReadPointer(countSlot + 8, out target) && IsValidAddress(target))
405+
return true;
406+
target = 0;
407+
return false;
408+
}
409+
410+
return false;
411+
}
299412
}
300413
}

0 commit comments

Comments
 (0)