Runtime<->JIT reloc hinting mechanism needs work 

### Description

In #60228 I made the JIT generate `lea` instructions for any handle that the runtime returns an `IMAGE_REL_BASED_REL32` hint for. We saw some regressions due to this, e.g. #60626.

After investigating this further I have determined that the issue is similar to one we saw in #49549. We create a constant pointing to `kernel32!GetStdHandle`, and we try to use rip-relative addressing for this constant. The constant does not end up being reachable, which hits [this code path](https://github.com/dotnet/runtime/blob/294a284183093ec2b8ef87ffc5704719b90fa53f/src/coreclr/vm/jitinterface.cpp#L11250) in the runtime that permanently turns off rip-relative addressing for the remaining duration of the process.

The issue is not completely the same as #49549, however. For calls we always assume they are reachable with rip-relative addressing; for constant handles, we are using the `getRelocTypeHint` function to figure out if this is the case.

For this particular case `getRelocTypeHint` is returning `IMAGE_REL_BASED_REL32` for the address that ends up not being reachable. The reason is that the runtime [allocates a 4GB range](https://github.com/dotnet/runtime/blob/294a284183093ec2b8ef87ffc5704719b90fa53f/src/coreclr/utilcode/executableallocator.cpp#L66-L80) around coreclr.dll that is the preferred range: it returns `IMAGE_REL_BASED_REL32` for any address within this range. However, if jitted code is placed in the beginning of the range, and a handle is at the end, then this is not reachable with rip-relative addressing, and we hit the path above that turns off rip-relative addressing.

After speaking to folks it seems there are conflicting assumptions about what `getRelocTypeHint` can be used for on both the JIT side and the runtime side. For the runtime side, it is assumed that this function is called only for addresses in the current loader heap and for addresses in coreclr.dll. The runtime tries to allocate memory from loader heaps in a circular fashion, which means that under the above assumption we only end up turning off rip-relative addressing once we have pretty much run out of memory in the preferred range.

However the JIT uses the check more generally to mean that it is assumed the address will be within +-2GB. My change exacerbated this, but it is already the case even before the change. In particular, any indir is checked for rip-relative addressing using this function. Due to this it is quite simple to write a program that reliably turns off rip-relative addressing without actually allocating much of the preferred range, see next.

### Reproduction Steps

The following program reliably turns off rip-relative addressing permanently for the remaining duration of the process. It is a preexisting issue to #60228 - even with that PR reverted this example hits the case. Note that the address could come from anywhere, e.g. it could be a pointer into static memory from a native image that would likely be in the preferred range as well. However the `VirtualAlloc` with explicit address makes the repro simple and reliable.
```csharp
using System;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Threading;

public unsafe class Program
{
    public static void Main()
    {
        for (int i = 0; i < 100; i++)
        {
            Foo();
            if (i >= 35)
                Thread.Sleep(30);
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void Foo()
    {
        Volatile.Read(ref NearbyMemory[15]);
    }

    private static readonly byte* NearbyMemory = AllocUnreachableMemoryInPreferredRange();

    private static byte* AllocUnreachableMemoryInPreferredRange()
    {
        delegate*<byte*> codeAddr = &AllocUnreachableMemoryInPreferredRange;
        byte* start = (byte*)codeAddr + 0x90000000;

        for (byte* addr = start; ; addr += 0x1000)
        {
            IntPtr alloced = VirtualAlloc((IntPtr)addr, 0x1000, 0x1000 | 0x2000, 0x04);
            if (alloced != IntPtr.Zero)
                return (byte*)alloced;
        }
    }

    [DllImport("kernel32.dll", SetLastError = true, ExactSpelling = true)]
    static extern IntPtr VirtualAlloc(IntPtr lpAddress, nint dwSize, uint flAllocationType, uint flProtect);
}
```


### Expected behavior

We should not be turning off rip-relative addressing globally unless we are actually running out of memory in the preferred range.

### Actual behavior

We do turn it off. We see three JITs of `Foo` in the above:
```asm
; Assembly listing for method Program:Foo()
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-0 compilation
; MinOpts code
; rbp based frame
; partially interruptible
; Final local variable assignments
;
;  V00 OutArgs      [V00    ] (  1,  1   )  lclBlk (32) [rsp+00H]   do-not-enreg[] "OutgoingArgSpace"
;
; Lcl frame size = 32

G_M24659_IG01:              ;; offset=0000H
       55                   push     rbp
       4883EC20             sub      rsp, 32
       488D6C2420           lea      rbp, [rsp+20H]
                                                ;; bbWeight=1    PerfScore 1.75
G_M24659_IG02:              ;; offset=000AH
       48B908B28E2AFD7F0000 mov      rcx, 0x7FFD2A8EB208
       BA03000000           mov      edx, 3
       E8320BA35F           call     CORINFO_HELP_GETSHARED_NONGCSTATIC_BASE
       B90F000000           mov      ecx, 15
       4863C9               movsxd   rcx, ecx
       48030D53F32400       add      rcx, qword ptr [reloc classVar[0x2a8ecf18]]
       E8FEBBFEFF           call     System.Threading.Volatile:Read(byref):ubyte
       90                   nop
                                                ;; bbWeight=1    PerfScore 5.25
G_M24659_IG03:              ;; offset=0033H
       4883C420             add      rsp, 32
       5D                   pop      rbp
       C3                   ret
                                                ;; bbWeight=1    PerfScore 1.75

; Total bytes of code 57, prolog size 10, PerfScore 14.45, instruction count 14, allocated bytes for code 57 (MethodHash=999a9fac) for method Program:Foo()
; ============================================================

; Assembly listing for method Program:Foo()
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;# V00 OutArgs      [V00    ] (  1,  1   )  lclBlk ( 0) [rsp+00H]   "OutgoingArgSpace"
;* V01 tmp1         [V01    ] (  0,  0   )   byref  ->  zero-ref    "Inlining Arg"
;
; Lcl frame size = 0

G_M24659_IG01:              ;; offset=0000H
                                                ;; bbWeight=1    PerfScore 0.00
G_M24659_IG02:              ;; offset=0000H
       8B0500000000         mov      eax, dword ptr [(reloc 0x7ffdba69000f)]
                                                ;; bbWeight=1    PerfScore 2.00
G_M24659_IG03:              ;; offset=0006H
       C3                   ret
                                                ;; bbWeight=1    PerfScore 1.00

; Total bytes of code 7, prolog size 0, PerfScore 3.70, instruction count 2, allocated bytes for code 7 (MethodHash=999a9fac) for method Program:Foo()
; ============================================================

Hit jump stub overflow
; Assembly listing for method Program:Foo()
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; Tier-1 compilation
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;# V00 OutArgs      [V00    ] (  1,  1   )  lclBlk ( 0) [rsp+00H]   "OutgoingArgSpace"
;* V01 tmp1         [V01    ] (  0,  0   )   byref  ->  zero-ref    "Inlining Arg"
;
; Lcl frame size = 0

G_M24659_IG01:              ;; offset=0000H
                                                ;; bbWeight=1    PerfScore 0.00
G_M24659_IG02:              ;; offset=0000H
       48B80F0069BAFD7F0000 mov      rax, 0x7FFDBA69000F
       3900                 cmp      dword ptr [rax], eax
                                                ;; bbWeight=1    PerfScore 3.25
G_M24659_IG03:              ;; offset=000CH
       C3                   ret
                                                ;; bbWeight=1    PerfScore 1.00

; Total bytes of code 13, prolog size 0, PerfScore 5.55, instruction count 3, allocated bytes for code 13 (MethodHash=999a9fac) for method Program:Foo()
; ============================================================
```
"Hit jump stub overflow" is a simple `printf` I added to the runtime code that turns off rip-relative addressing.

### Regression?

_No response_

### Known Workarounds

_No response_

### Configuration

_No response_

### Other information

IMO, it would be ideal if the runtime could guess roughly where the jitted code will be located before the JIT has given it a size back and it has done the allocation. Then the reloc hint address range could be based on that.

Alternatively, as suggested by @jkotas, we could eagerly reserve memory before jit and back out of the unused memory.

Another short-term alternative might be to halve the preferred address range so that we get +-1 GB around coreclr.dll; this 
should mean that the entire region is reachable regardless of where the code ends up.

In any case, as long as the final scheme works in vast majority of practical cases that should be ok -- for pathological cases we can always fall back to turning off rip-relative addressing.

Fixing this on the JIT side is also a possibility. In that case the JIT should change to ensure that it only uses the hint function for very particular data addresses (e.g. static field addrs, managed method entry points). In this case I'm not really sure what the point is of checking the preferred range at all on the runtime side, compared to just checking if we are allowing rip-relative addressing.

cc @dotnet/jit-contrib, @jkotas, @janvorli 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runtime<->JIT reloc hinting mechanism needs work #60712

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Runtime<->JIT reloc hinting mechanism needs work #60712

Description

Description

Reproduction Steps

Expected behavior

Actual behavior

Regression?

Known Workarounds

Configuration

Other information

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions