Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CEE_CALL yielding invalid address call instruction on win64 #44397

Closed
tonybaloney opened this issue Nov 9, 2020 · 17 comments
Closed

CEE_CALL yielding invalid address call instruction on win64 #44397

tonybaloney opened this issue Nov 9, 2020 · 17 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI question Answer questions and provide assistance, not an issue with source code or documentation. tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly untriaged New issue has not been triaged by the area owner

Comments

@tonybaloney
Copy link
Contributor

This relates directly to RyuJIT and the emitter for CEE_CALL to a IAT_PVALUE global method.

I'm working on a project that uses only the JIT and compiles CIL to native code.
When calling Global Static methods, the executable is crashing because the JIT is generating invalid call instructions. The call addresses are valid in macOS and Linux, but in Windows they point to an invalid memory address.

When debugging I can see that the call to getCallInfo will return a CORINFO_CALL_INFO struct with the field codePointerLookup.constLookup.addr at the correct memory address of the compiled function. The method has the flags CORINFO_FLG_STATIC

However, when the Jitted code executes it will start running through the correct machine code instructions and raises an access violation on the call. The Windows debugger shows that the memory address it tried to call is not executable code.

Exception thrown at 0x000002E846673418 in unit_tests.exe: 0xC0000005: Access violation executing location 0x000002E846673418.
000002E84D9F084D 00 00                add         byte ptr [rax],al  
000002E84D9F084F 00 00                add         byte ptr [rax],al  
000002E84D9F0851 00 00                add         byte ptr [rax],al  
000002E84D9F0853 00 00                add         byte ptr [rax],al  
000002E84D9F0855 00 00                add         byte ptr [rax],al  
000002E84D9F0857 00 82 05 3F 19 E2    add         byte ptr [rdx-1DE6C0FBh],al  
000002E84D9F085D F6 00 09             test        byte ptr [rax],9  
000002E84D9F0860 55                   push        rbp  
000002E84D9F0861 4C 8D 9C 24 10 FF FF FF lea         r11,[rsp-0F0h]  
000002E84D9F0869 E8 AA 2B C8 F8       call        000002E846673418   <<<< 
000002E84D9F086E 49 8B E3             mov         rsp,r11  
000002E84D9F0871 48 8D AC 24 F0 00 00 00 lea         rbp,[rsp+0F0h]  
000002E84D9F0879 33 C0                xor         eax,eax  
000002E84D9F087B 48 89 85 38 FF FF FF mov         qword ptr [rbp-0C8h],rax  

This code works perfectly on macOS and Linux, so there must be something about the virtual memory addresses in Windows, or a missing indirection?

If someone could help, that would be great. The code is here microsoft/Pyjion#237

Configuration

  • .NET 5 RC2
  • Windows 10 AMD64 (Virtual Machine)

Regression?

Yes, this worked on a very old version of .NET core 1.0

Other information

@AndyAyersMS helped on this project last time (issue #42925). It's working brilliantly on macOS and Linux now.

@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Nov 9, 2020
@tonybaloney
Copy link
Contributor Author

The area is area-CodeGen-coreclr

@jkoritzinsky jkoritzinsky added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 9, 2020
@AndyAyersMS
Copy link
Member

Is the address the jit is calling the one that is returned in the CORINFO_CALL_INFO struct (it would be odd if not)?

@tonybaloney
Copy link
Contributor Author

@AndyAyersMS

The struct returns the address of the pointer to the function.

void get_call_info(CORINFO_CALL_INFO *pResult) override {
        pResult->codePointerLookup.lookupKind.needsRuntimeLookup = false;
        // TODO: If we use IAT_VALUE we need to generate a jump stub
        pResult->codePointerLookup.constLookup.accessType = IAT_PVALUE;
        pResult->codePointerLookup.constLookup.addr = &m_addr;
        pResult->verMethodFlags = pResult->methodFlags = CORINFO_FLG_STATIC;
        pResult->kind = CORINFO_CALL;
        pResult->sig.args = (CORINFO_ARG_LIST_HANDLE)(m_params.size() == 0 ? nullptr : &m_params[0]);
        pResult->sig.retType = m_retType;
        pResult->sig.numArgs = m_params.size();
    }

The debugger can see the symbols for m_addr when I put a breakpoint on the line

m_addr = 0x00007ff646678ef9 {unit_tests.exe!PyJit_PushFrame(struct _frame *)}
pResult->codePointerLookup.constLookup.addr = unit_tests.exe!0x00007ff646a12078 (m_addr)

That generated the machine instructions: (I can see the value farther down in mov rax,7FF646A12078h)

The invalid call instruction is the 5th down from the preamble (call 0000021A46673418). I don't know what this is or where it was supposed to go. Possibly it's a helper function that hasn't been loaded? its working on macOS and Linux which is what has really stumped me.

0000021A0414084F 00 00                add         byte ptr [rax],al  
0000021A04140851 00 00                add         byte ptr [rax],al  
0000021A04140853 00 00                add         byte ptr [rax],al  
0000021A04140855 00 00                add         byte ptr [rax],al  
0000021A04140857 00 7F 5B             add         byte ptr [rdi+5Bh],bh  
0000021A0414085A 52                   push        rdx  
0000021A0414085B C0 A0 75 00 09 55 4C shl         byte ptr [rax+55090075h],4Ch  
0000021A04140862 8D 9C 24 10 FF FF FF lea         ebx,[rsp-0F0h]  
0000021A04140869 E8 AA 2B 53 42       call        0000021A46673418  
0000021A0414086E 49 8B E3             mov         rsp,r11  
0000021A04140871 48 8D AC 24 F0 00 00 00 lea         rbp,[rsp+0F0h]  
0000021A04140879 33 C0                xor         eax,eax  
0000021A0414087B 48 89 85 38 FF FF FF mov         qword ptr [rbp-0C8h],rax  
0000021A04140882 0F 57 E4             xorps       xmm4,xmm4  
0000021A04140885 48 B8 40 FF FF FF FF FF FF FF mov         rax,0FFFFFFFFFFFFFF40h  
0000021A0414088F 0F 29 24 28          movaps      xmmword ptr [rax+rbp],xmm4  
0000021A04140893 0F 29 64 05 10       movaps      xmmword ptr [rbp+rax+10h],xmm4  
0000021A04140898 0F 29 64 05 20       movaps      xmmword ptr [rbp+rax+20h],xmm4  
0000021A0414089D 48 83 C0 30          add         rax,30h  
0000021A041408A1 75 EC                jne         0000021A0414088F  
0000021A041408A3 48 89 4D 10          mov         qword ptr [rbp+10h],rcx  
0000021A041408A7 48 89 55 18          mov         qword ptr [rbp+18h],rdx  
0000021A041408AB B9 68 00 00 00       mov         ecx,68h  
0000021A041408B0 48 63 C9             movsxd      rcx,ecx  
0000021A041408B3 48 03 4D 18          add         rcx,qword ptr [rbp+18h]  
0000021A041408B7 48 89 4D F8          mov         qword ptr [rbp-8],rcx  
0000021A041408BB 48 8B 4D 18          mov         rcx,qword ptr [rbp+18h]  
0000021A041408BF 48 B8 78 20 A1 46 F6 7F 00 00 mov         rax,7FF646A12078h  
0000021A041408C9 FF 10                call        qword ptr [rax]  
0000021A041408CB 33 C0                xor         eax,eax  
0000021A041408CD 89 45 E4             mov         dword ptr [rbp-1Ch],eax  
0000021A041408D0 E8 E6 19 53 42       call        0000021A466722BB  
0000021A041408D5 33 C0                xor         eax,eax  
0000021A041408D7 48 63 C0             movsxd      rax,eax  
0000021A041408DA 48 BA 40 B2 5E 03 1A 02 00 00 mov         rdx,21A035EB240h  
0000021A041408E4 48 03 C2             add         rax,rdx  
0000021A041408E7 48 89 45 90          mov         qword ptr [rbp-70h],rax  
0000021A041408EB 48 8B 45 90          mov         rax,qword ptr [rbp-70h]  
0000021A041408EF FF 00                inc         dword ptr [rax]  
0000021A041408F1 E8 C5 19 53 42       call        0000021A466722BB  
0000021A041408F6 33 C0                xor         eax,eax  
0000021A041408F8 48 63 C0             movsxd      rax,eax  
0000021A041408FB 48 BA 90 31 86 03 1A 02 00 00 mov         rdx,21A03863190h  
0000021A04140905 48 03 C2             add         rax,rdx  
0000021A04140908 48 89 45 88          mov         qword ptr [rbp-78h],rax  
0000021A0414090C 48 8B 45 88          mov         rax,qword ptr [rbp-78h]  
0000021A04140910 FF 00                inc         dword ptr [rax]  
0000021A04140912 E8 A4 19 53 42       call        0000021A466722BB  
0000021A04140917 4C 8B 4D F8          mov         r9,qword ptr [rbp-8]  
0000021A0414091B 41 C7 01 04 00 00 00 mov         dword ptr [r9],4  
0000021A04140922 4C 8B 4D 18          mov         r9,qword ptr [rbp+18h]  
0000021A04140926 48 B9 40 B2 5E 03 1A 02 00 00 mov         rcx,21A035EB240h  
0000021A04140930 48 BA 90 31 86 03 1A 02 00 00 mov         rdx,21A03863190h  
0000021A0414093A 49 B8 40 F0 8B 03 1A 02 00 00 mov         r8,21A038BF040h  
0000021A04140944 48 B8 78 1F A1 46 F6 7F 00 00 mov         rax,7FF646A11F78h  
0000021A0414094E FF 10                call        qword ptr [rax]  
0000021A04140950 48 89 45 80          mov         qword ptr [rbp-80h],rax  
0000021A04140954 48 8B 4D 80          mov         rcx,qword ptr [rbp-80h]  
0000021A04140958 48 89 4D E8          mov         qword ptr [rbp-18h],rcx  
0000021A0414095C 33 C9                xor         ecx,ecx  
0000021A0414095E 48 63 C9             movsxd      rcx,ecx  
0000021A04140961 48 39 4D 80          cmp         qword ptr [rbp-80h],rcx  
0000021A04140965 75 2B                jne         0000021A04140992  
0000021A04140967 48 B9 28 8B 92 46 F6 7F 00 00 mov         rcx,7FF646928B28h  
0000021A04140971 48 B8 B8 17 A1 46 F6 7F 00 00 mov         rax,7FF646A117B8h  
0000021A0414097B FF 10                call        qword ptr [rax]  
0000021A0414097D 48 8B 4D 18          mov         rcx,qword ptr [rbp+18h]  
0000021A04140981 48 B8 D8 0E A1 46 F6 7F 00 00 mov         rax,7FF646A10ED8h  
0000021A0414098B FF 10                call        qword ptr [rax]  
0000021A0414098D E9 4F 02 00 00       jmp         0000021A04140BE1  
0000021A04140992 E8 24 19 53 42       call        0000021A466722BB  
0000021A04140997 48 8B 4D F8          mov         rcx,qword ptr [rbp-8]  
0000021A0414099B C7 01 06 00 00 00    mov         dword ptr [rcx],6  
0000021A041409A1 48 8B 4D E8          mov         rcx,qword ptr [rbp-18h]  
0000021A041409A5 48 BA 90 BA 8D 03 1A 02 00 00 mov         rdx,21A038DBA90h  
0000021A041409AF 48 B8 78 18 A1 46 F6 7F 00 00 mov         rax,7FF646A11878h  
0000021A041409B9 FF 10                call        qword ptr [rax]  
0000021A041409BB 48 89 85 78 FF FF FF mov         qword ptr [rbp-88h],rax  
0000021A041409C2 48 8B 4D E8          mov         rcx,qword ptr [rbp-18h]  
0000021A041409C6 48 89 8D 70 FF FF FF mov         qword ptr [rbp-90h],rcx  
0000021A041409CD 48 8B 8D 78 FF FF FF mov         rcx,qword ptr [rbp-88h]  
0000021A041409D4 48 89 4D E8          mov         qword ptr [rbp-18h],rcx  
0000021A041409D8 48 8B 8D 70 FF FF FF mov         rcx,qword ptr [rbp-90h]  
0000021A041409DF 48 89 8D 68 FF FF FF mov         qword ptr [rbp-98h],rcx  
0000021A041409E6 33 C9                xor         ecx,ecx  
0000021A041409E8 48 63 C9             movsxd      rcx,ecx  
0000021A041409EB 48 39 8D 78 FF FF FF cmp         qword ptr [rbp-88h],rcx  
0000021A041409F2 75 36                jne         0000021A04140A2A  
0000021A041409F4 48 B9 40 8B 92 46 F6 7F 00 00 mov         rcx,7FF646928B40h  
0000021A041409FE 48 B8 B8 17 A1 46 F6 7F 00 00 mov         rax,7FF646A117B8h  
0000021A04140A08 FF 10                call        qword ptr [rax]  
0000021A04140A0A 48 8B 4D 18          mov         rcx,qword ptr [rbp+18h]  
0000021A04140A0E 48 B8 D8 0E A1 46 F6 7F 00 00 mov         rax,7FF646A10ED8h  
0000021A04140A18 FF 10                call        qword ptr [rax]  
0000021A04140A1A 48 8B 85 68 FF FF FF mov         rax,qword ptr [rbp-98h]  
0000021A04140A21 48 89 45 A8          mov         qword ptr [rbp-58h],rax  
0000021A04140A25 E9 B7 01 00 00       jmp         0000021A04140BE1  

@tonybaloney
Copy link
Contributor Author

This is what the stack frame looked like at the call instruction. Its not compiled code, so I don't know where it got the reference from:

Screen Shot 2020-11-10 at 8 21 24 am

@tonybaloney
Copy link
Contributor Author

After some more digging, I think that is a callback for CORINFO_HELP_STACK_PROBE.
I don't know why it's jumping to an invalid address. getHelperFtn is returning a void* pointer

@tonybaloney
Copy link
Contributor Author

@AndyAyersMS what is the correct signature for CORINFO_HELP_STACK_PROBE? There are two different implementations (in assembly).

I think its a signature mismatch and void ftn() isn't an acceptable signature.

@AndyAyersMS
Copy link
Member

It does not have a standard calling convention. It takes an argument in R11 and doesn't return anything. You can likely get pretty far by just defining it as __declspec(naked) void JIT_StackProbe() {} .

If you need to implement it you will need something like the below; the only way to define it is via assembly.

LEAF_ENTRY JIT_StackProbe, _TEXT
; On entry:
; r11 - points to the lowest address on the stack frame being allocated (i.e. [InitialSp - FrameSize])
; rsp - points to some byte on the last probed page
; On exit:
; rax - is not preserved
; r11 - is preserved
;
; NOTE: this helper will probe at least one page below the one pointed by rsp.
mov rax, rsp ; rax points to some byte on the last probed page
and rax, -PAGE_SIZE ; rax points to the **lowest address** on the last probed page
; This is done to make the following loop end condition simpler.
ProbeLoop:
sub rax, PAGE_SIZE ; rax points to the lowest address of the **next page** to probe
test dword ptr [rax], eax ; rax points to the lowest address on the **last probed** page
cmp rax, r11
jg ProbeLoop ; If (rax > r11), then we need to probe at least one more page.
ret
LEAF_END_MARKED JIT_StackProbe, _TEXT
end

@tonybaloney
Copy link
Contributor Author

I've tried reimplementing JIT_StackProbe in a simple MASM file and linking it to be returned back.

As soon as rip moves to the call address, an access violation is thrown. I think maybe there is some other system-level protection happening and a flag missing from the compiler

@AndyAyersMS
Copy link
Member

One thing to check -- since you are returning IAT_PVALUE you need to give the jit the address of a location that contains the address of the code.

That is, some thing like

static void* probeIndir = &JIT_StackProbe;
pResult->codePointerLookup.constLookup.addr = &probeIndir;

@jkotas jkotas added tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly question Answer questions and provide assistance, not an issue with source code or documentation. labels Nov 10, 2020
@tonybaloney
Copy link
Contributor Author

The solution was to use the indirection pointer in getHelperFtn() in Windows and to write a custom assembly proc for JIT_StackProbe because __declspec(naked) is not possible in x64.
The code is now executing and the JIT callbacks are working properly, however the JIT_StackProbe is causing a stack overflow.
Is PAGE_SIZE always 1000h?

@AndyAyersMS
Copy link
Member

The stack overflows might just be from asking for too much stack, and not a bug per se.

On windows you have 1MB stack segments by default. Can you debug into the failure and see if you're trying to go beyond that limit?

@tonybaloney
Copy link
Contributor Author

This is where I got the overflow. Looking at where the helper is called, R11 is the proposed stack depth?

00007FF680BE5AF0 48 8B C4             mov         rax,rsp  
00007FF680BE5AF3 48 25 00 F0 FF FF    and         rax,0FFFFFFFFFFFFF000h  
00007FF680BE5AF9 48 2D 00 10 00 00    sub         rax,1000h  
00007FF680BE5AFF 85 00                test        dword ptr [rax],eax   <<< Overflow
00007FF680BE5B01 49 3B C3             cmp         rax,r11  
00007FF680BE5B04 7F F3                jg          JIT_StackProbe+9h (07FF680BE5AF9h)  

Registers were

RAX = 000000ABA70D9000 RBX = 0000000000000043 RCX = 0000000000000010 RDX = 0000020AFA095550 RSI = 0000000000000000 RDI = 000000ABA71C8990 R8  = 00000000FFFFFD7F R9  = 000000ABA71C8910 R10 = 0000000000000000 R11 = 0000000000000246 R12 = 0000020AF9A1D230 R13 = 0000020AF9AC35D0 R14 = 0000000000000018 R15 = 0000000000000000 RIP = 00007FF680BE5AFF RSP = 000000ABA71C8328 RBP = 000000ABA71C8910 EFL = 00010204 

0x000000ABA70D9000 = 00000000 

The procedure is borrowed from JitHelpers_fast.asm, which seemed to be the correct MASM/x64 proc

.code
PAGE_SIZE equ 1000h
JIT_StackProbe PROC

        ; On entry:
        ;   r11 - points to the lowest address on the stack frame being allocated (i.e. [InitialSp - FrameSize])
        ;   rsp - points to some byte on the last probed page
        ; On exit:
        ;   rax - is not preserved
        ;   r11 - is preserved
        ;
        ; NOTE: this helper will probe at least one page below the one pointed by rsp.

        mov     rax, rsp               ; rax points to some byte on the last probed page
        and     rax, -PAGE_SIZE         ; rax points to the **lowest address** on the last probed page
                                       ; This is done to make the following loop end condition simpler.

ProbeLoop:
        sub     rax, PAGE_SIZE         ; rax points to the lowest address of the **next page** to probe
        test    dword ptr [rax], eax   ; rax points to the lowest address on the **last probed** page
        cmp     rax, r11
        jg      ProbeLoop              ; If (rax > r11), then we need to probe at least one more page.

        ret

JIT_StackProbe ENDP

end

@AndyAyersMS
Copy link
Member

R11 should be set to the desired ending value for RSP. That is, if the frame size is say 0x40000, then on entry to the probe, we should have R11 = RSP - x40000 (maybe off by 8 bytes or so, as calling the helper will modify RSP).

The code in the helper initially sets RAX = RSP, and then decrements RAX in page sized chunks until is no longer greater than R11.

If you set R11 to the desired stack size then likely this loop will run too long and overflow the stack. But since the jit is setting R11 a more likely explanation is that you're just asking for a very large stack frame ...?

@tonybaloney
Copy link
Contributor Author

Noticed that the stack frame is requested as a delta of the page size, from CodeGen::genAllocLclFrame, which comes from getEEInfo(). The pEEInfoOut->osPageSize field wasn't being set and after setting to 1000, it seems to properly request a sensible stack frame.

This is working now. thanks again

@AndyAyersMS
Copy link
Member

Good to hear.

That's an impressive amount of work you have built up over in microsoft/Pyjion#237 -- 280 commits!

@tonybaloney
Copy link
Contributor Author

Those final changes were enough to get the Python test suite passing on macOS, Linux and Windows for .NET5RC2. Look forward to patching this to GA

@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI question Answer questions and provide assistance, not an issue with source code or documentation. tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

5 participants