You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When we wrote the first implementation of the SYSCALL instruction, we made an effort to ensure that if there's a crash inside the system call implementation, we can "backtrace" in gdb and see both the call chain inside the SYSCALL implementation, and the call chain which led to calling the system call. To make sure this works correctly, we needed to properly use CFI tags in the syscall_entry code, and also properly set up %rbp for the benefit of old-style frame-pointer-chasing code (e.g., our backtrace_safe).
Unfortunately, commit 3f2ca0c, which introduced a separate syscall stack, broke this. The main problem is that GDB normally expects that deeper stack frames have lower addresses, and if we switch the stack to a completely different address and it happens to be a higher address, gdb stops with "Backtrace stopped: previous frame inner to this frame (corrupt stack?)" message.
But this shouldn't be hard to fix. This is not the only case where we switch stacks in the middle of a thread's run - we also do this in two other cases: in exceptions (in the x86 sense, not C++ sense :-)) and in signals handling. The way it works there is that we prepare a "signal frame" with a special format and mark it with .cfi_signal_frame. When gdb sees this, it believes this is a signal handler, and thinks it's fine that the stack pointer changed arbitrarily. In "backtrace" you see this as a special marked "signal handler called" or something like that (we also see this same text even in the exception case).
It's fairly easy to test this by adding an abort in gettid() used in tests/tst-syscall.so, running that test, and trying gdb's "backtrace" after the crash, and also looking at the crash-time bactrace printed by OSv (using backtrace_safe()). For extra reassurance, change the Makefile to use -fomit-frame-pointer instead of -fno-omit-frame-pointer so that CFI would need to be absolutely relied on by gdb (the backtrace_safe() won't work then, of course).
The text was updated successfully, but these errors were encountered:
When we wrote the first implementation of the SYSCALL instruction, we made an effort to ensure that if there's a crash inside the system call implementation, we can "backtrace" in gdb and see both the call chain inside the SYSCALL implementation, and the call chain which led to calling the system call. To make sure this works correctly, we needed to properly use CFI tags in the syscall_entry code, and also properly set up %rbp for the benefit of old-style frame-pointer-chasing code (e.g., our backtrace_safe).
Unfortunately, commit 3f2ca0c, which introduced a separate syscall stack, broke this. The main problem is that GDB normally expects that deeper stack frames have lower addresses, and if we switch the stack to a completely different address and it happens to be a higher address, gdb stops with "Backtrace stopped: previous frame inner to this frame (corrupt stack?)" message.
But this shouldn't be hard to fix. This is not the only case where we switch stacks in the middle of a thread's run - we also do this in two other cases: in exceptions (in the x86 sense, not C++ sense :-)) and in signals handling. The way it works there is that we prepare a "signal frame" with a special format and mark it with
.cfi_signal_frame
. When gdb sees this, it believes this is a signal handler, and thinks it's fine that the stack pointer changed arbitrarily. In "backtrace" you see this as a special marked "signal handler called" or something like that (we also see this same text even in the exception case).It's fairly easy to test this by adding an abort in gettid() used in tests/tst-syscall.so, running that test, and trying gdb's "backtrace" after the crash, and also looking at the crash-time bactrace printed by OSv (using backtrace_safe()). For extra reassurance, change the Makefile to use
-fomit-frame-pointer
instead of-fno-omit-frame-pointer
so that CFI would need to be absolutely relied on by gdb (the backtrace_safe() won't work then, of course).The text was updated successfully, but these errors were encountered: