JIT: Secondary frame pointer for x64 unoptimized methods#128795
JIT: Secondary frame pointer for x64 unoptimized methods#128795AndyAyersMS wants to merge 4 commits into
Conversation
On x64, addressing modes only encode a signed 8-bit displacement cheaply (disp8, -128..+127); larger offsets require a 4-byte disp32. Methods with large stack frames therefore emit many oversized stack-local references. This adds an optional secondary frame/stack pointer, reserved in a callee-saved register (RBX), offset by a configurable number of bytes from the primary base. Stack locals that fall outside disp8 range of the primary base but inside disp8 range of the secondary pointer are rewritten to use the secondary pointer, shrinking those references from disp32 to disp8. Gated behind the default-off config JitSecondFramePtr (the byte offset; 0x100 = 256) and restricted to OptimizationDisabled() (MinOpts/Tier0) on x64 only. LSRA reserves the register; codegen sets it up in the prolog after unwindEndProlog(); emit rewrites eligible SV refs to [rbx+disp8]. EH/funclet support: EH methods always use RBP frames on x64. The secondary pointer is re-established (lea rbx,[rbp-offset]) only in filter funclet prologs, since the VM's CallEHFilterFunclet restores only RBP. Catch/ finally/fault funclets need no re-establishment because CallEHFunclet restores all nonvolatiles (including RBX) from the establisher context. SuperPMI asmdiffs across all x64 collections (JitSecondFramePtr=0x100): overall -7,353,695 bytes, 100% in MinOpts, FullOpts unchanged (0 diffs). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When an access is redirected through the secondary frame pointer
(REG_OPT_RSVD2/RBX) the emitted bytes use [rbx+disp8], but emitDispFrameRef
previously still printed the canonical [rbp/rsp+disp], so the listing did not
match the encoded instruction. Print the actual [rbx+disp8] operand and append
the canonical reference as a parenthesized suffix, e.g.
mov qword ptr [rbx+0x78] (rbp-0x88), rax
Plumb the instruction through emitDispFrameRef so it can reuse
emitIsSecondFramePtrCandidate (the same decision emitOutputSV makes), keeping
display and emitted bytes in lockstep. Display-only change; no codegen impact.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When a stack access is redirected through the secondary frame pointer (REG_OPT_RSVD2), the disassembly now prints the real [rbx+disp8] operand and emits the canonical frame reference (e.g. rbp-0x88) as an end-of-line ';' comment, rather than an inline parenthetical that could be misread as sitting between operands. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The LSRA-time reservation keyed only off total frame size, so ~27% of methods reserved RBX (push/lea/pop plus unwind data) but never used it: no local actually landed in the secondary disp8 band. The band can't be tested at LSRA, since REGALLOC-layout offsets have no base-register flag and are inflated by an over-estimated callee-save area. Reserve RBX at LSRA only as a cheap candidate (out of allocation), then make the precise band-occupancy decision in genFinalizeFrame once FINAL offsets are known. If nothing lands in the band, cancel the reservation so no push/lea/unwind is emitted; otherwise mark the register modified and redo the frame layout to account for the push. aspnet2 asmdiffs: unused-RBX setups drop from 13 to 0; size win improves from -3864 to -3982 bytes. No replay failures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
There was a problem hiding this comment.
Pull request overview
Adds an AMD64-only "secondary frame pointer" optimization (gated by JitConfig.JitSecondFramePtr, default 0x100) for unoptimized methods with large frames. RBX is reserved during LSRA, conditionally established (RBP - offset or RSP + offset) in the prolog after a profitability check on FINAL frame offsets, restored in filter funclets, and used by the xarch emitter to redirect far stack accesses from [rbp/rsp + disp32] to [rbx + disp8] (saving 3 bytes per redirected access). Disassembly is updated to show the redirected operand plus a trailing canonical-frame-reference comment, and emitDispFrameRef gains an instruction parameter on all targets.
Changes:
- Reserve RBX (
REG_OPT_RSVD2) as candidate secondary FP during LSRA when MinOpts, large frame, EH-compatible, non-OSR; finalize/cancel ingenFinalizeFramevia newgenSecondFramePtrIsProfitable. - Emit prolog
leato establish RBX from RBP/RSP, re-establish it in AMD64 filter funclets, and redirect eligible stack-var encodings (non-EVEX/SSE38/3A/crc32) inemitInsSizeSVCalcDisp/emitOutputSV. - Plumb
instruction insthroughemitDispFrameRefon all targets, and render[rbx+disp8] ; rbp-0xNNin xarch disassembly.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/targetamd64.h | Define REG_OPT_RSVD2 / RBM_OPT_RSVD2 as RBX. |
| src/coreclr/jit/jitconfigvalues.h | Add JitSecondFramePtr config (default 256). |
| src/coreclr/jit/codegeninterface.h | Add genSecondFramePtrReg/Offset/FPbased state (AMD64). |
| src/coreclr/jit/codegen.h | Declare genSecondFramePtrIsProfitable; fix #endif comment for TARGET_XARCH. |
| src/coreclr/jit/lsra.cpp | Reserve RBX candidate when conditions hold (MinOpts, fixed base, large frame, EH/OSR ok). |
| src/coreclr/jit/codegencommon.cpp | Profitability check after FINAL layout; if profitable, mark RBX modified, redo layout, emit prolog lea. |
| src/coreclr/jit/codegenxarch.cpp | Re-establish RBX in FILTER funclet prologs. |
| src/coreclr/jit/emitxarch.h | Declare emitIsSecondFramePtrCandidate and display state for trailing comment. |
| src/coreclr/jit/emitxarch.cpp | Redirect candidate check in size calc/output; modrm `0x40 |
| src/coreclr/jit/emit.h | Add instruction ins = INS_none parameter to emitDispFrameRef. |
| src/coreclr/jit/emitarm.cpp / emitarm64.cpp / emitloongarch64.cpp / emitriscv64.cpp | Update emitDispFrameRef signature on other targets (parameter unused). |
No description provided.