-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack probing in managed code for frames > 64kB doesn't work on Linux kernel >= 4.9.34 #9899
Comments
@BruceForstall , do you see any potential issues with such change from the JIT point of view? |
Impressive investigation thanks for the educational writeup @janvorli. |
@dotnet/jit-contrib |
It seems like changing the JIT to loop with:
would be ok, though slightly more code. It seems like this should work on Windows as well. We can't change SP by more than one page size, or risk missing the guard page. I wonder: what does Linux stack probing code look like (e.g., some Linux version of "chkstk")? |
I am not aware of existence of anything like that on Unix.
It may actually be just one more instruction out of the loop. This is what we had in the failure I was investigating: 0x7fe4cc260083: testq (%rsp,%rax), %rax
0x7fe4cc260087: subq $0x1000, %rax ; imm = 0x1000
0x7fe4cc26008d: cmpq $-0x1d3c0, %rax ; imm = 0xFFFFFFFFFFFE2C40
0x7fe4cc260093: jge 0x7fe4cc260083 And I think the new version can look like this: lea rax, [rsp - 0x1d3c0]
loop:
test [rsp], rax
sub rsp, 0x1000
cmp rsp, rax
jge loop |
I vaguely recall there was some reason why on windows we don't move RSP until the end of the probing sequence (or at least don't move it at the beginning). Something like RSP always has to point at a valid page. I could very well be wrong though. Let me see if I can dig it up.... |
You have to opt into it via For example, the code for method like this:
Looks like this (when compiled with
|
I don't think we'd want to move SP to it's final value before the probing happens; presumably someone could asynchronously start using our stack (maybe the VM, or an interrupt handler?), and SP wouldn't be valid. I'm surprised that's what the code @jkotas shows does. |
The you can replace test [rsp], rax by test [rsp - 0x1000], rax I actually makes my sample above correct, the way I've put it, we would not be probing the last page. |
Related: dotnet/coreclr#14481. It looks like we do currently change RSP as we probe, so I'm wondering what the current "bad" prolog probe code looks like that @janvorli was looking at. However, it appears our current probe code does not work on Windows AMD64. An exception thrown within the probe loop (e.g., on the probe instruction) fails to unwind because the static prolog unwind codes can't handle dynamically changing RSP. |
Now that dotnet/coreclr#14481 is undone, is this issue the one true "figure out how to probe the Unix stack" bug? Do we know whether there is a single pattern that can correctly probe on both Windows and Unix or we have to bifurcate? Does OSX have a required probing pattern different from Linux? |
Another question: Why don't we just call __chkstk like C++? And write proper versions for non-Windows platforms? |
@RussKeldorph the way that would work on Unix was the one that was reverted, provided we fix the unwind info issue. |
@janvorli Thanks. So it sounds like the most expedient thing is to put back dotnet/coreclr#14481 conditional on Unix only. |
...or does the unwind info need to be fixed for Unix as well? |
I think we don't need to fix the unwind info on Unix, since all stack overflow and probes are handled as failfast and we are not trying to unwind stack in such cases. So putting it back just for Unix sounds good. |
Works around https://github.com/dotnet/coreclr/issues/17170 - Works around an issue on linux-arm64 where pthread_create appears to reserve less stack space than requested without failing, and limits the max reserved stack size to the ulimit-configued value, leading to seg fault when the requested size of stack space is attempted to be used - Fixed to write every page in the stack region that is being checked, to guarantee hitting the guard page in case of failure. The JIT's stack probe for stackalloc should cover this, https://github.com/dotnet/coreclr/issues/16827 details why that was not happening, in any case this test is not intended to test stack probing behavior, so this change makes the test a bit stronger in what it actually intends to test.
Works around and closes https://github.com/dotnet/coreclr/issues/17170 - Works around an issue on linux-arm64 where pthread_create appears to reserve less stack space than requested without failing, and limits the max reserved stack size to the ulimit-configued value, leading to seg fault when the requested size of stack space is attempted to be used - Fixed to write every page in the stack region that is being checked, to guarantee hitting the guard page in case of failure. The JIT's stack probe for stackalloc should cover this, https://github.com/dotnet/coreclr/issues/16827 details why that was not happening, in any case this test is not intended to test stack probing behavior, so this change makes the test a bit stronger in what it actually intends to test.
Resolved with dotnet/coreclr#17360 |
Works around and closes https://github.com/dotnet/coreclr/issues/17170 - Works around an issue on linux-arm64 where pthread_create appears to reserve less stack space than requested without failing, and limits the max reserved stack size to the ulimit-configued value, leading to seg fault when the requested size of stack space is attempted to be used - Fixed to write every page in the stack region that is being checked, to guarantee hitting the guard page in case of failure. The JIT's stack probe for stackalloc should cover this, https://github.com/dotnet/coreclr/issues/16827 details why that was not happening, in any case this test is not intended to test stack probing behavior, so this change makes the test a bit stronger in what it actually intends to test.
* Add stack probing algorithm that moves sp while touching pages under _TARGET_UNIX_ only
I have spend some time during the last week tracing down an issue reported by @pavlexander (#16462) that he had when running his .NET app on Debian 9.3.
It manifested itself as a sigsegv when running a static constructor. And it was also intermittent, happening on some runs and not happening on others. And also not happening when running under a debugger. Until @pavlexander found that it is due to the fact that debuggers disable ASLR by default. After enabling it, it started to repro under the debugger too. But only on Debian 9.3, not e.g. on Ubuntu 14.04 or 16.04 etc.
This static constructor has enormous frame size, almost 128kB. So depending on where the ASLR set the initial RSP for the process, the used stack size either crossed the initial stack size of 128kB or not. And if it did, it has crashed during the stack probing. That was pretty strange since this was running on the primary thread and the maximum stack size was 8MB. So the kernel did not convert the fault to stack expansion on the affected system.
I’ve written a little testing app in C that basically does the same thing as the probing generated by the JIT. And on all my systems other than the Debian 9.3 (I even had Debian 9.0 and it was ok), it was working fine. On the Debian 9.3, it was crashing with sigsegv.
To make the story shorter, I’ve found that it is a kernel version dependent thing. By bisection and building and testing various kernel versions, I’ve found that it started to happen on kernel 4.9.34. Further bisection identified the linux kernel commit that changed that (torvalds/linux@cfc0eb4). It was a fix to a potential issue fixed by a rewrite of the guard page handling stuff. I spent some time reading and debugging the related portions of the kernel source and it turns out that basically, we were just lucky it has worked before this change. There is this comment close to the stack expansion invocation:
There is a check that tests if the fault address is farther than (65536 + 32 * 8) bytes from the stack pointer and if it is, it refuses to expand the stack. But before this check, there is another check testing if the fault address is inside of a range stored in a virtual memory descriptor for the stack. And that’s where the new and old kernels differ. The new kernel includes only the committed portion of the stack virtual memory range while the old one was including the guard page in that range too.
That means that on the old kernel, our probing has hit the guard page and since it was inside of the range described by the virtual memory descriptor for the stack, the test for the distance between SP and the failure address was skipped, the new page committed and the range expanded by another guard page.
On the new kernel, we hit the guard page, it is not part of the range for the stack anymore, so it checks the distance between the failure address and the RSP, finds that it is too large and so it passes sigsegv to our process.
I’ve tested even the latest Linux kernel 4.15 and this new behavior persists.
That means that in order to make stack probing work correctly on the new kernels too, we will need to modify the probing so that it moves the RSP as it probes (or at least once every 64kB). Btw, it looks like this check of the address to SP distance is there only for x86 / x64 and not for ARM / ARM64.
The text was updated successfully, but these errors were encountered: