Handle page protection errors as D errors on Linux with --DRT-memoryError=1 #2249
Conversation
Thanks for your pull request, @wilzbach! Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub run digger -- build "master + druntime#2249" |
This change could use a bit more rationale. Is there a way to disable it ? Do we really want that ? Personally I'm worried about having different behavior across platform. |
There's an entire NG thread with people complaining about this not being the default.
deregisterMemoryErrorHandler
Other platforms will just segfault (not really a good behavior anyhow). |
Null pointers on win32 also throw errors btw. |
Though while I'm not voting no, I do object to calling segfaults bad behavior. They do exactly what they are supposed to do - and exactly what a D error is allowed to do according to our spec - and have superior debugging capabilities. (I'd say superior handling capabilities too, you can handle-and-resume or retry a segfault, not so much a D exception, but we don't really lose that since you can just register your own signal handler instead.) The exception isn't bad, but neither is the segfault, you just need to know how to actually use it and then you see the advantages. |
One good thing about segmentation fault is that you can generate core dump which you could inspect to see what caused the segmentation fault very easily. Is one still able to do that with this approach? If I'm not mistaken, the position of the segmentation fault is removed from the backtrace. |
Sorry, I read the stacktrace in a wrong way 🤦♂️ |
This feels very much rushed after minimal discussion on the NG. There is even a post that mentions that in the past this was debated, so at least I expect some arguments both ways and with a conclusion of why it now should come out different than before. Did you test how this works with debuggers and ASan and other tools? |
My red flag here is that with this PR per default after a page error, more code is executed that even allocates ( |
I second the detractors. At the very least, it should be possible to turn this capability off. |
A few notes:
This is why I could go either way on this. I like things the way they are now, but I could live with this change too. (or a compromise: print the stack trace to the console, then abort the program, but I think that has the same downsides to this basically anyway) |
After having slept on this, I have to raise the level of my objection to this change. Before I begin, I'll point out that I've tried to follow the code for With that in mind, please keep in mind that the program might have been doing anything when it segfaulted. It might have been in the middle of a Letting the program flow continue, even as a an exception, is just too likely to cause further problems. These will mask the original problem, or might even drive the program into an infinite loop of crashing, trying to throw, crashing, trying to throw. As such, I must re-iterate how bad an idea I think it is to turn this on by default. |
Alright, I was just trying to follow up on the NG thread. However, we definitely should improve the documentation and I will keep it open until then, s.t. I don't forget this. |
What about turning it on by default when the |
-1 on the grounds that the Linux memory error module is both dmd and x86 centric, and should be improved by some measure first so that it be more useable before making it a druntime dependency. Though that is a purely technical reason only. |
I think we should leave it as is. It should be documented, along with the possible pitfalls it might bring. If anyone wants it, they can add it to their code. I don't think it should be on by default. Ever. |
I don't know the original reasons why this isn't the default, or why it's not supported on other platforms. What I do know, is that I enable it on Linux in any server program that I write, and it works swimmingly well. Yes, a segfault can be useful in certain circumstances. But a stack trace is eminently easier to use for diagnostics, especially when you aren't in control of the environment. I can tell just by reading the stack trace where the problem has occurred. It may require more in-depth analysis, and perhaps using a debugger, or dumping a core file. But it also could be a "oops! I forgot to initialize that thing!" type of bug, in which case, I need nothing more than an editor to fix, and I'm done in 2 minutes. Instead of digging out the exact binary (hopefully you saved all those), loading into a debugger, hoping that the customer has core dumps turned on, hoping that they didn't just delete the file, hoping their disk has enough space to store it, etc., etc. In regards to "different behavior on different platforms", my understanding is that Windows ALWAYS generated a stack trace for seg faults. There is plenty of precedent here. It definitely makes sense to provide runtime switches to turn it off or on (given that this would enable it before e.g. any static ctors are run), but I still believe the default should be on. I'm not sure of the harm here, we should be exploiting the most we can from the platform in terms of providing diagnostic information. |
This would work too. Printing the stack trace and aborting has the advantage that you aren't going to unwind the stack, possibly running things after something very bad has happened. |
BTW, here is the original PR, you can read a LOT of comments in there: #187 |
So I came back to this PR and it looks like I already did the required work here, i.e. adding a flag to druntime ( Anything else that we want to do here? |
I think its fine, and a lot of the previous criticisms were about it being a default, but I still like to get some more feedback on this (but don't let me forget about it!). |
Since my comments on this, I have found one place where this really sucks. If you encounter a segfault inside a destructor being called by the GC, then the attempt to Before making this the default (I think reading the code, this is not the default, if it was before), I think we should make the memory error not depend on the GC. In fact, if there is a way to print the stack trace and abort without unwinding anything, that would be ideal, and much much safer. |
Don't we have a buffer somewhere for statically allocated exceptions? |
It's done elsewhere, but not for this extension. Would be a useful change, even if this PR isn't accepted. |
Yep no longer the default. This PR was down-graded to just adding the CLI-equivalent of
See e.g. #1710 |
Page protection error handling can now be registered via `--DRT-memoryError=1` | ||
|
||
In environments where attaching a debugger or retrieving a core dump isn't possible or hard, | ||
on Linux x86_64 one has always been able to attach druntime's memory handler: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in the etc.linux.memoryerror
module is defined for both 32 bit and 64 bit.
There's no test for what happens when this flag is passed on a platform that doesn't support it. |
Ugh, looking again at Somehow, we need to fix this problem. However, that shouldn't hold up this PR. All it is doing is exposing the linux memory error registration as a DRT flag. |
In general, unsupported or unknown DRT flags are going to be ignored. For example, if you passed this flag on an older version of druntime, nothing will happen. |
This is slightly different in that it's only available on some platforms. But it might not be worth testing. |
Technically it's not the
Looking at |
druntime/src/gc/impl/proto/gc.d Lines 234 to 237 in 01daddf
|
We'd have to mark |
Yes, exactly. I was looking into the allocation of |
I think you need a unique instance per exception. But if we are preallocating the exception, we can preallocate the trace info as well. Looks like the big part of the trace info is the stack frame array. The way the default trace info is stored will have to change. Looks like it's an inner class, but it doesn't need to be. |
The way |
09e0840
to
fb7c309
Compare
@schveiguy @wilzbach @thewilsonator I have rebased this. Should we merge it? |
Still a -1, this time because it would add a circular dependency between druntime and phobos. |
See also: https://forum.dlang.org/post/pi31ab$me3$1@digitalmars.com