Closed
Description
Sometimes during stepping through method that could be called from different threads SIGILL occurs (dotnet/coreclr#19409 does right things but doesn't solve this problem):
(gdb) i threads
Id Target Id Frame
29 Thread 0xb1d09040 (LWP 5271) "DN_sung.tv.csfs" 0xb639cce4 in read ()
from /lib/libpthread.so.0
28 Thread 0xb18b7040 (LWP 5272) "DN_sung.tv.csfs" 0xb61b5b14 in poll ()
from /lib/libc.so.6
27 Thread 0xb1657040 (LWP 5273) "DN_sung.tv.csfs" 0xb639cce4 in read ()
from /lib/libpthread.so.0
26 Thread 0xb1457040 (LWP 5274) "DN_sung.tv.csfs" 0xb6398674 in pthread_cond_timedwait@@GLIBC_2.4 () from /lib/libpthread.so.0
25 Thread 0xb0eff040 (LWP 5278) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
24 Thread 0xaecfe040 (LWP 5279) "DN_sung.tv.csfs" 0xb6398674 in pthread_cond_timedwait@@GLIBC_2.4 () from /lib/libpthread.so.0
23 Thread 0xae0d0040 (LWP 5296) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
22 Thread 0xadcff040 (LWP 5297) "DN_sung.tv.csfs" 0xae3479fa in sigill_handler(int, siginfo_t*, void*) ()
from /usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.1/libclrjit.so
21 Thread 0xad1cf040 (LWP 5317) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
20 Thread 0xabd6a040 (LWP 5338) "gmain" 0xb61b5b14 in poll ()
from /lib/libc.so.6
19 Thread 0xab9a5040 (LWP 5339) "gkdbus" 0xb61b5b14 in poll ()
---Type <return> to continue, or q <return> to quit---
from /lib/libc.so.6
18 Thread 0xac935040 (LWP 5384) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
17 Thread 0xaa1ae040 (LWP 5386) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
16 Thread 0xa9f6e040 (LWP 5387) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
15 Thread 0xa9d6e040 (LWP 5388) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
14 Thread 0xa9b3e040 (LWP 5390) "gkdbus" 0xb61b5b14 in poll ()
from /lib/libc.so.6
13 Thread 0xa64c0040 (LWP 5391) "Edbg-sys" 0xb639c524 in __lll_lock_wait ()
from /lib/libpthread.so.0
12 Thread 0xaa3ff040 (LWP 5392) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
11 Thread 0xa4003040 (LWP 5393) "DN_sung.tv.csfs" 0xb61bf768 in epoll_wait
() from /lib/libc.so.6
10 Thread 0xa2d63040 (LWP 5395) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
9 Thread 0x9ff3e040 (LWP 5409) "gkdbus" 0xb61b5b14 in poll ()
from /lib/libc.so.6
8 Thread 0xac10d040 (LWP 5461) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
7 Thread 0xad3ff040 (LWP 5465) "DN_sung.tv.csfs" 0xb61b5b14 in poll ()
from /lib/libc.so.6
6 Thread 0xacb35040 (LWP 5482) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
5 Thread 0x9e05b040 (LWP 5483) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
4 Thread 0x9de5b040 (LWP 5484) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
3 Thread 0x9dc5b040 (LWP 5485) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
2 Thread 0xad8ff040 (LWP 5486) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
* 1 Thread 0xb47cc000 (LWP 5269) "DN_sung.tv.csfs" 0xb63982c4 in pthread_cond_wait@@GLIBC_2.4 () from /lib/libpthread.so.0
(gdb) thread 22
[Switching to thread 22 (Thread 0xadcff040 (LWP 5297))]
#0 0xae3479fa in sigill_handler(int, siginfo_t*, void*) ()
from /usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.1/libclrjit.so
(gdb) bt
#0 0xae3479fa in sigill_handler(int, siginfo_t*, void*) ()
from /usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.1/libclrjit.so
dotnet/coreclr#1 <signal handler called>
dotnet/coreclr#2 0xadae0cbc in ?? ()
dotnet/coreclr#3 0xb1e23158 in JIT_MonExit_Signal(Object*) ()
from /usr/share/dotnet/shared/Microsoft.NETCore.App/2.1.1/libcoreclr.so
dotnet/coreclr#4 0xaea6adf8 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) f 2
dotnet/coreclr#2 0xadae0cbc in ?? ()
(gdb) i r
r0 0x1 1
r1 0x1 1
r2 0xadcfe848 2916083784
r3 0xf98ff3e8 4186960872
r4 0xadcfe628 2916083240
r5 0x0 0
r6 0xadcfe5f4 2916083188
r7 0xadcfe738 2916083512
r8 0x0 0
r9 0xadcfe784 2916083588
r10 0x0 0
r11 0xadcfe620 2916083232
r12 0xb622e194 3055739284
sp 0xadcfe480 0xadcfe480
lr 0xb1e23159 -1310576295
pc 0xadae0cbc 0xadae0cbc
cpsr 0x600d0030 1611464752
(gdb) x/10i $pc-4
0xadae0cb8: asrs r2, r4, dotnet/coreclr#15
0xadae0cba: blx r3
=> 0xadae0cbc: nop
0xadae0cbe: nop
0xadae0cc0: add sp, dotnet/coreclr#32
0xadae0cc2: ldmia.w sp!, {r4, r5, r6, r10, r11, pc}
0xadae0cc6: stmdb sp!, {r4, r5, r6, r10, r11, lr}
0xadae0cca: sub sp, dotnet/coreclr#32
0xadae0ccc: add.w r3, r11, dotnet/coreclr#8
0xadae0cd0: str r3, [sp, dotnet/runtime#3860]
(gdb)
We are stepping in the main thread (5269) and sigill occurs in 5297.
The problem occurs after breakpoint patch was unapplied:
Breakpoint was inserted at ADAE065D for opcode bf00
DC::UP unapply patch at addr 0xADAE065D
DC::ApplyPatch at addr 0xADAE068D
Breakpoint was inserted at ADAE068D for opcode bf00
DC::UP unapply patch at addr 0xADAE068D
DC::ApplyPatch at addr 0xADAE0C99
Breakpoint was inserted at ADAE0C99 for opcode e92d
DC::UP unapply patch at addr 0xADAE0C99
DC::ApplyPatch at addr 0xADAE0CBD
Breakpoint was inserted at ADAE0CBD for opcode bf00
DC::UP unapply patch at addr 0xADAE0CBD
sigill_handler called!!!!!
I think the reason is that patches are applied and unapplied non-atomically:
inline void CORDbgSetInstruction(CORDB_ADDRESS_TYPE* address,
PRD_TYPE instruction)
{
// In a DAC build, this function assumes the input is an host address.
LIMITED_METHOD_DAC_CONTRACT;
ULONG ptraddr = dac_cast<ULONG>(address);
_ASSERTE(ptraddr & THUMB_CODE);
ptraddr &= ~THUMB_CODE;
*(PRD_TYPE *)ptraddr = instruction; // <-- non-atomically write
FlushInstructionCache(GetCurrentProcess(),
_ClearThumbBit(address),
sizeof(PRD_TYPE));
}
Maybe we should use Interlocked
function to do atomic write. What do you think?
Thank you!