Description
In #60228 I made the JIT generate lea instructions for any handle that the runtime returns an IMAGE_REL_BASED_REL32 hint for. We saw some regressions due to this, e.g. #60626.
After investigating this further I have determined that the issue is similar to one we saw in #49549. We create a constant pointing to kernel32!GetStdHandle, and we try to use rip-relative addressing for this constant. The constant does not end up being reachable, which hits this code path in the runtime that permanently turns off rip-relative addressing for the remaining duration of the process.
The issue is not completely the same as #49549, however. For calls we always assume they are reachable with rip-relative addressing; for constant handles, we are using the getRelocTypeHint function to figure out if this is the case.
For this particular case getRelocTypeHint is returning IMAGE_REL_BASED_REL32 for the address that ends up not being reachable. The reason is that the runtime allocates a 4GB range around coreclr.dll that is the preferred range: it returns IMAGE_REL_BASED_REL32 for any address within this range. However, if jitted code is placed in the beginning of the range, and a handle is at the end, then this is not reachable with rip-relative addressing, and we hit the path above that turns off rip-relative addressing.
After speaking to folks it seems there are conflicting assumptions about what getRelocTypeHint can be used for on both the JIT side and the runtime side. For the runtime side, it is assumed that this function is called only for addresses in the current loader heap and for addresses in coreclr.dll. The runtime tries to allocate memory from loader heaps in a circular fashion, which means that under the above assumption we only end up turning off rip-relative addressing once we have pretty much run out of memory in the preferred range.
However the JIT uses the check more generally to mean that it is assumed the address will be within +-2GB. My change exacerbated this, but it is already the case even before the change. In particular, any indir is checked for rip-relative addressing using this function. Due to this it is quite simple to write a program that reliably turns off rip-relative addressing without actually allocating much of the preferred range, see next.
Reproduction Steps
The following program reliably turns off rip-relative addressing permanently for the remaining duration of the process. It is a preexisting issue to #60228 - even with that PR reverted this example hits the case. Note that the address could come from anywhere, e.g. it could be a pointer into static memory from a native image that would likely be in the preferred range as well. However the VirtualAlloc with explicit address makes the repro simple and reliable.
using System;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Threading;
public unsafe class Program
{
public static void Main()
{
for (int i = 0; i < 100; i++)
{
Foo();
if (i >= 35)
Thread.Sleep(30);
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
private static void Foo()
{
Volatile.Read(ref NearbyMemory[15]);
}
private static readonly byte* NearbyMemory = AllocUnreachableMemoryInPreferredRange();
private static byte* AllocUnreachableMemoryInPreferredRange()
{
delegate*<byte*> codeAddr = &AllocUnreachableMemoryInPreferredRange;
byte* start = (byte*)codeAddr + 0x90000000;
for (byte* addr = start; ; addr += 0x1000)
{
IntPtr alloced = VirtualAlloc((IntPtr)addr, 0x1000, 0x1000 | 0x2000, 0x04);
if (alloced != IntPtr.Zero)
return (byte*)alloced;
}
}
[DllImport("kernel32.dll", SetLastError = true, ExactSpelling = true)]
static extern IntPtr VirtualAlloc(IntPtr lpAddress, nint dwSize, uint flAllocationType, uint flProtect);
}
Expected behavior
We should not be turning off rip-relative addressing globally unless we are actually running out of memory in the preferred range.
Actual behavior
We do turn it off. We see three JITs of Foo in the above:
; Assembly listing for method Program:Foo()
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-0 compilation
; MinOpts code
; rbp based frame
; partially interruptible
; Final local variable assignments
;
; V00 OutArgs [V00 ] ( 1, 1 ) lclBlk (32) [rsp+00H] do-not-enreg[] "OutgoingArgSpace"
;
; Lcl frame size = 32
G_M24659_IG01: ;; offset=0000H
55 push rbp
4883EC20 sub rsp, 32
488D6C2420 lea rbp, [rsp+20H]
;; bbWeight=1 PerfScore 1.75
G_M24659_IG02: ;; offset=000AH
48B908B28E2AFD7F0000 mov rcx, 0x7FFD2A8EB208
BA03000000 mov edx, 3
E8320BA35F call CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
B90F000000 mov ecx, 15
4863C9 movsxd rcx, ecx
48030D53F32400 add rcx, qword ptr [reloc classVar[0x2a8ecf18]]
E8FEBBFEFF call System.Threading.Volatile:Read(byref):ubyte
90 nop
;; bbWeight=1 PerfScore 5.25
G_M24659_IG03: ;; offset=0033H
4883C420 add rsp, 32
5D pop rbp
C3 ret
;; bbWeight=1 PerfScore 1.75
; Total bytes of code 57, prolog size 10, PerfScore 14.45, instruction count 14, allocated bytes for code 57 (MethodHash=999a9fac) for method Program:Foo()
; ============================================================
; Assembly listing for method Program:Foo()
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;# V00 OutArgs [V00 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;* V01 tmp1 [V01 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
;
; Lcl frame size = 0
G_M24659_IG01: ;; offset=0000H
;; bbWeight=1 PerfScore 0.00
G_M24659_IG02: ;; offset=0000H
8B0500000000 mov eax, dword ptr [(reloc 0x7ffdba69000f)]
;; bbWeight=1 PerfScore 2.00
G_M24659_IG03: ;; offset=0006H
C3 ret
;; bbWeight=1 PerfScore 1.00
; Total bytes of code 7, prolog size 0, PerfScore 3.70, instruction count 2, allocated bytes for code 7 (MethodHash=999a9fac) for method Program:Foo()
; ============================================================
Hit jump stub overflow
; Assembly listing for method Program:Foo()
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;# V00 OutArgs [V00 ] ( 1, 1 ) lclBlk ( 0) [rsp+00H] "OutgoingArgSpace"
;* V01 tmp1 [V01 ] ( 0, 0 ) byref -> zero-ref "Inlining Arg"
;
; Lcl frame size = 0
G_M24659_IG01: ;; offset=0000H
;; bbWeight=1 PerfScore 0.00
G_M24659_IG02: ;; offset=0000H
48B80F0069BAFD7F0000 mov rax, 0x7FFDBA69000F
3900 cmp dword ptr [rax], eax
;; bbWeight=1 PerfScore 3.25
G_M24659_IG03: ;; offset=000CH
C3 ret
;; bbWeight=1 PerfScore 1.00
; Total bytes of code 13, prolog size 0, PerfScore 5.55, instruction count 3, allocated bytes for code 13 (MethodHash=999a9fac) for method Program:Foo()
; ============================================================
"Hit jump stub overflow" is a simple printf I added to the runtime code that turns off rip-relative addressing.
Regression?
No response
Known Workarounds
No response
Configuration
No response
Other information
IMO, it would be ideal if the runtime could guess roughly where the jitted code will be located before the JIT has given it a size back and it has done the allocation. Then the reloc hint address range could be based on that.
Alternatively, as suggested by @jkotas, we could eagerly reserve memory before jit and back out of the unused memory.
Another short-term alternative might be to halve the preferred address range so that we get +-1 GB around coreclr.dll; this
should mean that the entire region is reachable regardless of where the code ends up.
In any case, as long as the final scheme works in vast majority of practical cases that should be ok -- for pathological cases we can always fall back to turning off rip-relative addressing.
Fixing this on the JIT side is also a possibility. In that case the JIT should change to ensure that it only uses the hint function for very particular data addresses (e.g. static field addrs, managed method entry points). In this case I'm not really sure what the point is of checking the preferred range at all on the runtime side, compared to just checking if we are allowing rip-relative addressing.
cc @dotnet/jit-contrib, @jkotas, @janvorli
Description
In #60228 I made the JIT generate
leainstructions for any handle that the runtime returns anIMAGE_REL_BASED_REL32hint for. We saw some regressions due to this, e.g. #60626.After investigating this further I have determined that the issue is similar to one we saw in #49549. We create a constant pointing to
kernel32!GetStdHandle, and we try to use rip-relative addressing for this constant. The constant does not end up being reachable, which hits this code path in the runtime that permanently turns off rip-relative addressing for the remaining duration of the process.The issue is not completely the same as #49549, however. For calls we always assume they are reachable with rip-relative addressing; for constant handles, we are using the
getRelocTypeHintfunction to figure out if this is the case.For this particular case
getRelocTypeHintis returningIMAGE_REL_BASED_REL32for the address that ends up not being reachable. The reason is that the runtime allocates a 4GB range around coreclr.dll that is the preferred range: it returnsIMAGE_REL_BASED_REL32for any address within this range. However, if jitted code is placed in the beginning of the range, and a handle is at the end, then this is not reachable with rip-relative addressing, and we hit the path above that turns off rip-relative addressing.After speaking to folks it seems there are conflicting assumptions about what
getRelocTypeHintcan be used for on both the JIT side and the runtime side. For the runtime side, it is assumed that this function is called only for addresses in the current loader heap and for addresses in coreclr.dll. The runtime tries to allocate memory from loader heaps in a circular fashion, which means that under the above assumption we only end up turning off rip-relative addressing once we have pretty much run out of memory in the preferred range.However the JIT uses the check more generally to mean that it is assumed the address will be within +-2GB. My change exacerbated this, but it is already the case even before the change. In particular, any indir is checked for rip-relative addressing using this function. Due to this it is quite simple to write a program that reliably turns off rip-relative addressing without actually allocating much of the preferred range, see next.
Reproduction Steps
The following program reliably turns off rip-relative addressing permanently for the remaining duration of the process. It is a preexisting issue to #60228 - even with that PR reverted this example hits the case. Note that the address could come from anywhere, e.g. it could be a pointer into static memory from a native image that would likely be in the preferred range as well. However the
VirtualAllocwith explicit address makes the repro simple and reliable.Expected behavior
We should not be turning off rip-relative addressing globally unless we are actually running out of memory in the preferred range.
Actual behavior
We do turn it off. We see three JITs of
Fooin the above:"Hit jump stub overflow" is a simple
printfI added to the runtime code that turns off rip-relative addressing.Regression?
No response
Known Workarounds
No response
Configuration
No response
Other information
IMO, it would be ideal if the runtime could guess roughly where the jitted code will be located before the JIT has given it a size back and it has done the allocation. Then the reloc hint address range could be based on that.
Alternatively, as suggested by @jkotas, we could eagerly reserve memory before jit and back out of the unused memory.
Another short-term alternative might be to halve the preferred address range so that we get +-1 GB around coreclr.dll; this
should mean that the entire region is reachable regardless of where the code ends up.
In any case, as long as the final scheme works in vast majority of practical cases that should be ok -- for pathological cases we can always fall back to turning off rip-relative addressing.
Fixing this on the JIT side is also a possibility. In that case the JIT should change to ensure that it only uses the hint function for very particular data addresses (e.g. static field addrs, managed method entry points). In this case I'm not really sure what the point is of checking the preferred range at all on the runtime side, compared to just checking if we are allowing rip-relative addressing.
cc @dotnet/jit-contrib, @jkotas, @janvorli