Skip to content

[Linux/ARM] SIGILL during stepping under managed debugger #10884

Closed
@kbaladurin

Description

@kbaladurin

Sometimes during stepping through method that could be called from different threads SIGILL occurs (dotnet/coreclr#19409 does right things but doesn't solve this problem):

(gdb) i threads
  Id   Target Id         Frame
  29   Thread 0xb1d09040 (LWP 5271) "DN_sung.tv.csfs" 0xb639cce4 in read ()
   from /lib/libpthread.so.0
  28   Thread 0xb18b7040 (LWP 5272) "DN_sung.tv.csfs" 0xb61b5b14 in poll ()
   from /lib/libc.so.6
  27   Thread 0xb1657040 (LWP 5273) "DN_sung.tv.csfs" 0xb639cce4 in read ()
   from /lib/libpthread.so.0
  26   Thread 0xb1457040 (LWP 5274) "DN_sung.tv.csfs" 0xb6398674 in pthread_cond_timedwait@@GLIBC_2.4 () from /lib/libpthread.so.0
  25   Thread 0xb0eff040 (LWP 5278) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
  24   Thread 0xaecfe040 (LWP 5279) "DN_sung.tv.csfs" 0xb6398674 in pthread_cond_timedwait@@GLIBC_2.4 () from /lib/libpthread.so.0
  23   Thread 0xae0d0040 (LWP 5296) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
  22   Thread 0xadcff040 (LWP 5297) "DN_sung.tv.csfs" 0xae3479fa in sigill_handler(int, siginfo_t*, void*) ()
   from /usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.1/libclrjit.so
  21   Thread 0xad1cf040 (LWP 5317) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
  20   Thread 0xabd6a040 (LWP 5338) "gmain" 0xb61b5b14 in poll ()
   from /lib/libc.so.6
  19   Thread 0xab9a5040 (LWP 5339) "gkdbus" 0xb61b5b14 in poll ()
---Type <return> to continue, or q <return> to quit---
   from /lib/libc.so.6
  18   Thread 0xac935040 (LWP 5384) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
  17   Thread 0xaa1ae040 (LWP 5386) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
  16   Thread 0xa9f6e040 (LWP 5387) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
  15   Thread 0xa9d6e040 (LWP 5388) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
  14   Thread 0xa9b3e040 (LWP 5390) "gkdbus" 0xb61b5b14 in poll ()
   from /lib/libc.so.6
  13   Thread 0xa64c0040 (LWP 5391) "Edbg-sys" 0xb639c524 in __lll_lock_wait ()
   from /lib/libpthread.so.0
  12   Thread 0xaa3ff040 (LWP 5392) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
  11   Thread 0xa4003040 (LWP 5393) "DN_sung.tv.csfs" 0xb61bf768 in epoll_wait
    () from /lib/libc.so.6
  10   Thread 0xa2d63040 (LWP 5395) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
  9    Thread 0x9ff3e040 (LWP 5409) "gkdbus" 0xb61b5b14 in poll ()
   from /lib/libc.so.6
  8    Thread 0xac10d040 (LWP 5461) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
  7    Thread 0xad3ff040 (LWP 5465) "DN_sung.tv.csfs" 0xb61b5b14 in poll ()
   from /lib/libc.so.6
  6    Thread 0xacb35040 (LWP 5482) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
  5    Thread 0x9e05b040 (LWP 5483) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
  4    Thread 0x9de5b040 (LWP 5484) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
  3    Thread 0x9dc5b040 (LWP 5485) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
  2    Thread 0xad8ff040 (LWP 5486) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
* 1    Thread 0xb47cc000 (LWP 5269) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
(gdb) thread 22
[Switching to thread 22 (Thread 0xadcff040 (LWP 5297))]
#0  0xae3479fa in sigill_handler(int, siginfo_t*, void*) ()
   from /usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.1/libclrjit.so
(gdb) bt
#0  0xae3479fa in sigill_handler(int, siginfo_t*, void*) ()
   from /usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.1/libclrjit.so
dotnet/coreclr#1  <signal handler called>
dotnet/coreclr#2  0xadae0cbc in ?? ()
dotnet/coreclr#3  0xb1e23158 in JIT_MonExit_Signal(Object*) ()
   from /usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.1/libcoreclr.so
dotnet/coreclr#4  0xaea6adf8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) f 2
dotnet/coreclr#2  0xadae0cbc in ?? ()
(gdb) i r
r0             0x1      1
r1             0x1      1
r2             0xadcfe848       2916083784
r3             0xf98ff3e8       4186960872
r4             0xadcfe628       2916083240
r5             0x0      0
r6             0xadcfe5f4       2916083188
r7             0xadcfe738       2916083512
r8             0x0      0
r9             0xadcfe784       2916083588
r10            0x0      0
r11            0xadcfe620       2916083232
r12            0xb622e194       3055739284
sp             0xadcfe480       0xadcfe480
lr             0xb1e23159       -1310576295
pc             0xadae0cbc       0xadae0cbc
cpsr           0x600d0030       1611464752
(gdb) x/10i $pc-4
   0xadae0cb8:  asrs    r2, r4, dotnet/coreclr#15
   0xadae0cba:  blx     r3
=> 0xadae0cbc:  nop
   0xadae0cbe:  nop
   0xadae0cc0:  add     sp, dotnet/coreclr#32
   0xadae0cc2:  ldmia.w sp!, {r4, r5, r6, r10, r11, pc}
   0xadae0cc6:  stmdb   sp!, {r4, r5, r6, r10, r11, lr}
   0xadae0cca:  sub     sp, dotnet/coreclr#32
   0xadae0ccc:  add.w   r3, r11, dotnet/coreclr#8
   0xadae0cd0:  str     r3, [sp, dotnet/runtime#3860]
(gdb)

We are stepping in the main thread (5269) and sigill occurs in 5297.

The problem occurs after breakpoint patch was unapplied:

Breakpoint was inserted at ADAE065D for opcode bf00
DC::UP unapply patch at addr 0xADAE065D
DC::ApplyPatch at addr 0xADAE068D
Breakpoint was inserted at ADAE068D for opcode bf00
DC::UP unapply patch at addr 0xADAE068D
DC::ApplyPatch at addr 0xADAE0C99
Breakpoint was inserted at ADAE0C99 for opcode e92d
DC::UP unapply patch at addr 0xADAE0C99
DC::ApplyPatch at addr 0xADAE0CBD
Breakpoint was inserted at ADAE0CBD for opcode bf00
DC::UP unapply patch at addr 0xADAE0CBD
sigill_handler called!!!!!

I think the reason is that patches are applied and unapplied non-atomically:

inline void CORDbgSetInstruction(CORDB_ADDRESS_TYPE* address,
                                 PRD_TYPE instruction)
{
    // In a DAC build, this function assumes the input is an host address.
    LIMITED_METHOD_DAC_CONTRACT;

    ULONG ptraddr = dac_cast<ULONG>(address);
    _ASSERTE(ptraddr & THUMB_CODE);
    ptraddr &= ~THUMB_CODE;

    *(PRD_TYPE *)ptraddr = instruction; // <-- non-atomically write
    FlushInstructionCache(GetCurrentProcess(),
                          _ClearThumbBit(address),
                          sizeof(PRD_TYPE));
}

Maybe we should use Interlocked function to do atomic write. What do you think?

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions