Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow debugging through a SYSCALL instruction #973

Open
nyh opened this issue May 16, 2018 · 0 comments
Open

Allow debugging through a SYSCALL instruction #973

nyh opened this issue May 16, 2018 · 0 comments

Comments

@nyh
Copy link
Contributor

nyh commented May 16, 2018

When we wrote the first implementation of the SYSCALL instruction, we made an effort to ensure that if there's a crash inside the system call implementation, we can "backtrace" in gdb and see both the call chain inside the SYSCALL implementation, and the call chain which led to calling the system call. To make sure this works correctly, we needed to properly use CFI tags in the syscall_entry code, and also properly set up %rbp for the benefit of old-style frame-pointer-chasing code (e.g., our backtrace_safe).

Unfortunately, commit 3f2ca0c, which introduced a separate syscall stack, broke this. The main problem is that GDB normally expects that deeper stack frames have lower addresses, and if we switch the stack to a completely different address and it happens to be a higher address, gdb stops with "Backtrace stopped: previous frame inner to this frame (corrupt stack?)" message.

But this shouldn't be hard to fix. This is not the only case where we switch stacks in the middle of a thread's run - we also do this in two other cases: in exceptions (in the x86 sense, not C++ sense :-)) and in signals handling. The way it works there is that we prepare a "signal frame" with a special format and mark it with .cfi_signal_frame. When gdb sees this, it believes this is a signal handler, and thinks it's fine that the stack pointer changed arbitrarily. In "backtrace" you see this as a special marked "signal handler called" or something like that (we also see this same text even in the exception case).

It's fairly easy to test this by adding an abort in gettid() used in tests/tst-syscall.so, running that test, and trying gdb's "backtrace" after the crash, and also looking at the crash-time bactrace printed by OSv (using backtrace_safe()). For extra reassurance, change the Makefile to use -fomit-frame-pointer instead of -fno-omit-frame-pointer so that CFI would need to be absolutely relied on by gdb (the backtrace_safe() won't work then, of course).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants