Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[x86/Linux] JIT.Directed.coverage.oldtests.ovfldiv1_il_r Test Failure #7336

Closed
parjong opened this issue Feb 2, 2017 · 11 comments
Closed

[x86/Linux] JIT.Directed.coverage.oldtests.ovfldiv1_il_r Test Failure #7336

parjong opened this issue Feb 2, 2017 · 11 comments
Labels
arch-x86 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-linux Linux OS (any supported distro)
Milestone

Comments

@parjong
Copy link
Contributor

parjong commented Feb 2, 2017

JIT.Directed.coverage.oldtests.ovfldiv1_il_r test ends with segmentation fault:

$ ./corerun coreclr-unittest/JIT/Directed/coverage/oldtests/ovfldiv1_il_r/ovfldiv1_il_r.exe
-9223372036854775808
1
-8762203435012037017
PASSED
Segmentation fault (core dumped)
@parjong
Copy link
Contributor Author

parjong commented Feb 2, 2017

It seems that there is some calling convention mismatch:

(gdb) r
-9223372036854775808
1
-8762203435012037017
PASSED

Program received signal SIGSEGV, Segmentation fault.
0xffffffff in ?? ()
(gdb) bt
#0  0xffffffff in ?? ()
dotnet/coreclr#1  0x00000000 in ?? ()
(gdb)

@parjong
Copy link
Contributor Author

parjong commented Feb 2, 2017

I figured out why segfault happens. Here is the JIT disasm of Main.

...
G_M56529_IG03:
       6800000080   push     0x80000000
       6A00         push     0
       6AFF         push     -1
       6AFF         push     -1
       E8960A9104   call     CORINFO_HELP_LDIV
       EB13         jmp      SHORT G_M56529_IG06
G_M56529_IG04:
       8B0D401170F3 mov      ecx, gword ptr [F3701140H]
       FF153C9AA7F5 call     [System.Console:WriteLine(ref)]
       B864000000   mov      eax, 100
G_M56529_IG05:
       5D           pop      ebp
       C3           ret
G_M56529_IG06:
       8B0D441170F3 mov      ecx, gword ptr [F3701144H]
       FF153C9AA7F5 call     [System.Console:WriteLine(ref)]
       B801000000   mov      eax, 1
G_M56529_IG07:
       5D           pop      ebp
       C3           ret
G_M56529_IG08:
G_M56529_IG09:
       8D05AC24E1F1 lea      eax, G_M56529_IG04
G_M56529_IG10:
       C3           ret

CORINFO_HELP_LDIV called inside G_M56529_IG03 throws an exception, and corresponding fault code is located at G_M56529_IG09.

After executing fault code, CLR resumes execution from G_M56529_IG04 without popping stack argument:

Breakpoint 1, RtlRestoreContext () at /home/parjong/projects/dotnet/coreclr/src/pal/src/arch/i386/context2.S:99
99          test    BYTE PTR [eax + CONTEXT_ContextFlags], CONTEXT_FLOATING_POINT
(gdb) n
100         je      LOCAL_LABEL(Done_Restore_CONTEXT_FLOATING_POINT)
(gdb) n
101         frstor  [eax + CONTEXT_FloatSave]
(gdb) n
104         test   BYTE PTR [eax + CONTEXT_ContextFlags], CONTEXT_EXTENDED_REGISTERS
(gdb) n
105         je     LOCAL_LABEL(Done_Restore_CONTEXT_EXTENDED_REGISTERS)
(gdb) n
117         mov   esp, [eax + CONTEXT_Esp]
(gdb) n
RtlRestoreContext () at /home/parjong/projects/dotnet/coreclr/src/pal/src/arch/i386/context2.S:120
120         push  DWORD PTR [eax + CONTEXT_Eip]
(gdb) n
RtlRestoreContext () at /home/parjong/projects/dotnet/coreclr/src/pal/src/arch/i386/context2.S:123
123         mov   ebp, [eax + CONTEXT_Ebp]
(gdb) n
124         mov   edi, [eax + CONTEXT_Edi]
(gdb) n
125         mov   esi, [eax + CONTEXT_Esi]
(gdb) n
126         mov   edx, [eax + CONTEXT_Edx]
(gdb) n
127         mov   ecx, [eax + CONTEXT_Ecx]
(gdb) n
128         mov   ebx, [eax + CONTEXT_Ebx]
(gdb) n
129         mov   eax, [eax + CONTEXT_Eax]
(gdb) n
132         ret
(gdb) n
0xf26584ac in ?? ()
(gdb) x/16x $esp
0xffffc608:     0xffffffff      0xffffffff      0x00000000      0x80000000
0xffffc618:     0xffffc630      0xf7130d47      0xf7c8c000      0x0808aaf0
0xffffc628:     0x0808aaf0      0xf7c8c000      0xffffca78      0xf6eb51ed
0xffffc638:     0xffffcc88      0x0808af08      0x00000400      0xf7c8c000

When the execution reaches G_M56529_IG05, ebp is restored as 0xffffffff and returns to 0xffffffff, which results in segfault.

@parjong
Copy link
Contributor Author

parjong commented Feb 2, 2017

@jkotas @dotnet/jit-contrib Could you let me know your opinion on which approach will be better to resolve this issue?

I currently consider two approaches:

  • Use CDECL instead of STDCALL in x86/Linux
  • Adjust ESP before resume execution

@parjong
Copy link
Contributor Author

parjong commented Feb 2, 2017

\CC @seanshpark @wateret

@jkotas
Copy link
Member

jkotas commented Feb 2, 2017

Use CDECL instead of STDCALL in x86/Linux

It would need to be used everywhere as the managed calling convention, not just for this one helper. Do you agree?

Given that, adjusting ESP before resume execution sounds better to me.

@parjong
Copy link
Contributor Author

parjong commented Feb 2, 2017

@jkotas Is it possible to contorl calling convention on JIT Helpers? If not, I definitely agree with you. We should take the latter approach. If possible, we need to consider the former approach, too (IMO).

For the latter one, I would like to know whether it is possible to get the number for stack elements of JIT Helpers from EE side. As I know, it is possible for managed methods, but I'm not sure about JIT Helpers.

@jkotas
Copy link
Member

jkotas commented Feb 2, 2017

Is it possible to contorl calling convention on JIT Helpers?

The JIT and VM assumes the one managed calling convention for everything: JIT helpers, FCalls, regular JITed managed methods. There is no way to control it.

it is possible to get the number for stack elements of JIT Helpers from EE side

You should be able to get it from the last column of https://github.com/dotnet/coreclr/blob/master/src/inc/jithelpers.h#L31. (I do not think that the last column is compiled in today - you would need to compile it in.)

But I am still wondering - is it really the case that the libunwind unwinder does not take care of this?

@parjong
Copy link
Contributor Author

parjong commented Feb 2, 2017

As I understand, libunwind-based unwinder takes care of this (ESPIncrOnReturn in
eetwain.cpp seems to be related) for JITed methods.

I'm not sure about libunwind itself. The execution trace implies that libunwind restores virtual registers as the state before call (not after call).

@parjong
Copy link
Contributor Author

parjong commented Feb 7, 2017

@jkotas It turns out that libunwind does not take care of calling convention. I tested the following program:

#include <libunwind.h>
#include <stdio.h>

int __attribute__((stdcall)) func(int x, int y)
{
        unw_cursor_t    cursor;
        unw_context_t   context;

        unw_getcontext(&context);
        unw_init_local(&cursor, &context);

        unw_proc_info_t pi;
        unw_get_proc_info(&cursor, &pi);

        unw_step(&cursor);

        unw_word_t value;

        unw_get_reg(&cursor, UNW_REG_SP, (unw_word_t *) &value);

        printf("%d: Previous SP = %x\n", __LINE__, value);

        unw_resume(&cursor);

        printf("%d: x = %d, y = %d\n", __LINE__, x, y);
}

int main(int argc, char **argv)
{
        func(1, 2);
        return 0;
}

When the execution of main is resumed, esp is just same as before call <func> is executed:

$ gdb ./sample
...
(gdb) disas main
Dump of assembler code for function main:
   0x08048734 <+0>:     lea    0x4(%esp),%ecx
   0x08048738 <+4>:     and    $0xfffffff0,%esp
   0x0804873b <+7>:     pushl  -0x4(%ecx)
   0x0804873e <+10>:    push   %ebp
   0x0804873f <+11>:    mov    %esp,%ebp
   0x08048741 <+13>:    push   %ecx
   0x08048742 <+14>:    sub    $0x14,%esp
   0x08048745 <+17>:    movl   $0x2,0x4(%esp)
   0x0804874d <+25>:    movl   $0x1,(%esp)
   0x08048754 <+32>:    call   0x804866d <func>
   0x08048759 <+37>:    sub    $0x8,%esp
   0x0804875c <+40>:    mov    $0x0,%eax
   0x08048761 <+45>:    mov    -0x4(%ebp),%ecx
   0x08048764 <+48>:    leave
   0x08048765 <+49>:    lea    -0x4(%ecx),%esp
   0x08048768 <+52>:    ret
End of assembler dump.
(gdb) b *0x0804874d
Breakpoint 1 at 0x804874d
(gdb) b *0x08048759
Breakpoint 2 at 0x8048759
(gdb) r
Starting program: /root/sample

Breakpoint 1, 0x0804874d in main ()
(gdb) p/x $esp
$1 = 0xffffd780
(gdb) c
Continuing.
21: Previous SP = ffffd780

Breakpoint 2, 0x08048759 in main ()
(gdb) p/x $esp
$2 = 0xffffd780
(gdb)

@jkotas
Copy link
Member

jkotas commented Feb 7, 2017

Ok. We will need to compute the number of stack elements to pop as you have proposed.

@parjong
Copy link
Contributor Author

parjong commented Apr 4, 2017

Close as this issue disappears.

@parjong parjong closed this as completed Apr 4, 2017
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@dotnet dotnet locked as resolved and limited conversation to collaborators Dec 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-x86 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI os-linux Linux OS (any supported distro)
Projects
None yet
Development

No branches or pull requests

3 participants