Skip to content
This repository has been archived by the owner on Oct 12, 2022. It is now read-only.

Handle page protection errors as D errors on Linux with --DRT-memoryError=1 #2249

Closed
wants to merge 1 commit into from

Conversation

wilzbach
Copy link
Member

@wilzbach wilzbach requested a review from andralex as a code owner July 11, 2018 04:04
@dlang-bot
Copy link
Contributor

dlang-bot commented Jul 11, 2018

Thanks for your pull request, @wilzbach!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + druntime#2249"

@Geod24
Copy link
Member

Geod24 commented Jul 11, 2018

This change could use a bit more rationale. Is there a way to disable it ? Do we really want that ?

Personally I'm worried about having different behavior across platform.

@wilzbach
Copy link
Member Author

This change could use a bit more rationale.

There's an entire NG thread with people complaining about this not being the default.
To quote @schveiguy

It was controversial at the time, and considered a hack. It's also only supported on Linux. So I don't know the reason why it's not always done for Linux, I really think it should be.

Is there a way to disable it

deregisterMemoryErrorHandler

Personally I'm worried about having different behavior across platform.

Other platforms will just segfault (not really a good behavior anyhow).
Also note that there's already tons of behavior that's platform-specific, e.g. #2035 (rt_trap_exceptions isn't required on Windows because it's always set there when a debugger is detected).

@adamdruppe
Copy link
Contributor

Null pointers on win32 also throw errors btw.

@adamdruppe
Copy link
Contributor

Though while I'm not voting no, I do object to calling segfaults bad behavior. They do exactly what they are supposed to do - and exactly what a D error is allowed to do according to our spec - and have superior debugging capabilities. (I'd say superior handling capabilities too, you can handle-and-resume or retry a segfault, not so much a D exception, but we don't really lose that since you can just register your own signal handler instead.)

The exception isn't bad, but neither is the segfault, you just need to know how to actually use it and then you see the advantages.

@nemanja-boric-sociomantic
Copy link
Contributor

One good thing about segmentation fault is that you can generate core dump which you could inspect to see what caused the segmentation fault very easily. Is one still able to do that with this approach? If I'm not mistaken, the position of the segmentation fault is removed from the backtrace.

@nemanja-boric-sociomantic
Copy link
Contributor

Sorry, I read the stacktrace in a wrong way 🤦‍♂️

@JohanEngelen
Copy link
Contributor

This feels very much rushed after minimal discussion on the NG. There is even a post that mentions that in the past this was debated, so at least I expect some arguments both ways and with a conclusion of why it now should come out different than before.

Did you test how this works with debuggers and ASan and other tools?

@JohanEngelen
Copy link
Contributor

My red flag here is that with this PR per default after a page error, more code is executed that even allocates (new), uses D's exception mechanism, calls constructors of class objects, etc. ...... That sounds problematic to me.

@Shachar
Copy link

Shachar commented Jul 11, 2018

I second the detractors.
The Linux segmentation fault behavior is a well known capabilities with established ways of handling it. Turning it into an exception makes zero sense to me.

At the very least, it should be possible to turn this capability off.

@adamdruppe
Copy link
Contributor

A few notes:

  • you can turn this off by calling a function (just like right now, you can turn it on by calling the function). Maybe we should make it a DRT command line argument too. The implementation btw is it is just registers a signal handler with sigaction http://dpldocs.info/experimental-docs/source/etc.linux.memoryerror.d.html#L33 ). If running in a debugger, just like how we want uncaught exceptions to go through the debugger, these should too.

  • Executing more code after it doesn't actually bother me since the segfault interrupts the illegal operation... it is perfectly normal to handle and resume with a page fault.

This is why I could go either way on this. I like things the way they are now, but I could live with this change too. (or a compromise: print the stack trace to the console, then abort the program, but I think that has the same downsides to this basically anyway)

@Shachar
Copy link

Shachar commented Jul 12, 2018

After having slept on this, I have to raise the level of my objection to this change.

Before I begin, I'll point out that I've tried to follow the code for registerMemoryErrorHandler, but I'm not sure what the playing around with the variables called RIP actually do. As such, it might be that this objection is incorrect.

With that in mind, please keep in mind that the program might have been doing anything when it segfaulted. It might have been in the middle of a throw. It might have been in the middle of a context switch. It might have been mucking around with variables that outside scope(failure)s might be relying on. It might have been in the middle of performing five separate program lines, interleaved by the compiler, all partially completed.

Letting the program flow continue, even as a an exception, is just too likely to cause further problems. These will mask the original problem, or might even drive the program into an infinite loop of crashing, trying to throw, crashing, trying to throw.

As such, I must re-iterate how bad an idea I think it is to turn this on by default.

@wilzbach
Copy link
Member Author

Alright, I was just trying to follow up on the NG thread.
Thanks a lot for all your input and I will close this soon. Really appreciated!

However, we definitely should improve the documentation and I will keep it open until then, s.t. I don't forget this.

@jacob-carlborg
Copy link
Contributor

What about turning it on by default when the -debug flag is used?

@ibuclaw
Copy link
Member

ibuclaw commented Jul 12, 2018

-1 on the grounds that the Linux memory error module is both dmd and x86 centric, and should be improved by some measure first so that it be more useable before making it a druntime dependency.

Though that is a purely technical reason only.

@Shachar
Copy link

Shachar commented Jul 12, 2018

I think we should leave it as is. It should be documented, along with the possible pitfalls it might bring. If anyone wants it, they can add it to their code.

I don't think it should be on by default. Ever.

@schveiguy
Copy link
Member

I don't know the original reasons why this isn't the default, or why it's not supported on other platforms. What I do know, is that I enable it on Linux in any server program that I write, and it works swimmingly well.

Yes, a segfault can be useful in certain circumstances. But a stack trace is eminently easier to use for diagnostics, especially when you aren't in control of the environment. I can tell just by reading the stack trace where the problem has occurred. It may require more in-depth analysis, and perhaps using a debugger, or dumping a core file. But it also could be a "oops! I forgot to initialize that thing!" type of bug, in which case, I need nothing more than an editor to fix, and I'm done in 2 minutes. Instead of digging out the exact binary (hopefully you saved all those), loading into a debugger, hoping that the customer has core dumps turned on, hoping that they didn't just delete the file, hoping their disk has enough space to store it, etc., etc.

In regards to "different behavior on different platforms", my understanding is that Windows ALWAYS generated a stack trace for seg faults. There is plenty of precedent here.

It definitely makes sense to provide runtime switches to turn it off or on (given that this would enable it before e.g. any static ctors are run), but I still believe the default should be on. I'm not sure of the harm here, we should be exploiting the most we can from the platform in terms of providing diagnostic information.

@schveiguy
Copy link
Member

or a compromise: print the stack trace to the console, then abort the program, but I think that has the same downsides to this basically anyway

This would work too. Printing the stack trace and aborting has the advantage that you aren't going to unwind the stack, possibly running things after something very bad has happened.

@schveiguy
Copy link
Member

BTW, here is the original PR, you can read a LOT of comments in there: #187

@wilzbach wilzbach changed the title Handle page protection errors by default as D errors on Linux Handle page protection errors as D errors on Linux with --DRT-memoryError=1 Jan 2, 2019
@wilzbach
Copy link
Member Author

wilzbach commented Jan 2, 2019

So I came back to this PR and it looks like I already did the required work here, i.e. adding a flag to druntime (--DRT-memoryError=1) which allows users to enable this option without needing to recompile their binaries (see the changelog entry of this PR).

Anything else that we want to do here?

CC @thewilsonator

@thewilsonator
Copy link
Contributor

I think its fine, and a lot of the previous criticisms were about it being a default, but I still like to get some more feedback on this (but don't let me forget about it!).

@schveiguy
Copy link
Member

Since my comments on this, I have found one place where this really sucks. If you encounter a segfault inside a destructor being called by the GC, then the attempt to new an Error causes an invalid memory operation. It took me a while to track that down, as the information printed there is totally useless.

Before making this the default (I think reading the code, this is not the default, if it was before), I think we should make the memory error not depend on the GC. In fact, if there is a way to print the stack trace and abort without unwinding anything, that would be ideal, and much much safer.

@jacob-carlborg
Copy link
Contributor

jacob-carlborg commented Jan 2, 2019

If you encounter a segfault inside a destructor being called by the GC, then the attempt to new an Error causes an invalid memory operation.

Don't we have a buffer somewhere for statically allocated exceptions?

@schveiguy
Copy link
Member

Don't we have a buffer somewhere for statically allocated exceptions?

It's done elsewhere, but not for this extension. Would be a useful change, even if this PR isn't accepted.

@wilzbach wilzbach removed the Blocked label Jan 2, 2019
@wilzbach
Copy link
Member Author

wilzbach commented Jan 2, 2019

Before making this the default (I think reading the code, this is not the default, if it was before),

Yep no longer the default. This PR was down-graded to just adding the CLI-equivalent of registerMemoryErrorHandler.

Don't we have a buffer somewhere for statically allocated exceptions?

See e.g. #1710

Page protection error handling can now be registered via `--DRT-memoryError=1`

In environments where attaching a debugger or retrieving a core dump isn't possible or hard,
on Linux x86_64 one has always been able to attach druntime's memory handler:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in the etc.linux.memoryerror module is defined for both 32 bit and 64 bit.

src/rt/dmain2.d Outdated Show resolved Hide resolved
@jacob-carlborg
Copy link
Contributor

There's no test for what happens when this flag is passed on a platform that doesn't support it.

@schveiguy
Copy link
Member

Ugh, looking again at staticError, it appears that it suppresses the stack trace. Which is completely useless. Same issue as I had with InvalidMemoryOperation that I mentioned earlier.

Somehow, we need to fix this problem. However, that shouldn't hold up this PR. All it is doing is exposing the linux memory error registration as a DRT flag.

@schveiguy
Copy link
Member

There's no test for what happens when this flag is passed on a platform that doesn't support it.

In general, unsupported or unknown DRT flags are going to be ignored. For example, if you passed this flag on an older version of druntime, nothing will happen.

@jacob-carlborg
Copy link
Contributor

In general, unsupported or unknown DRT flags are going to be ignored. For example, if you passed this flag on an older version of druntime, nothing will happen.

This is slightly different in that it's only available on some platforms. But it might not be worth testing.

@jacob-carlborg
Copy link
Contributor

Ugh, looking again at staticError, it appears that it suppresses the stack trace.

Technically it's not the staticError function but rather all Errors that use staticError that sets the info instance variable to an object that will suppress the stack trace.

Somehow, we need to fix this problem

Looking at core.runtime.defaultTraceHandler, there are two things stopping this from being @nogc: calling core.memory.gc_inFinalizer and creating an instance of DefaultTraceInfo. The latter can be fixed with again with a static allocation, not sure what to do about gc_inFinalizer.

@schveiguy
Copy link
Member

schveiguy commented Jan 2, 2019

gc_inFinalizer actually doesn't require GC since it's now lazily initialized:

bool inFinalizer() nothrow
{
return false;
}

@schveiguy
Copy link
Member

We'd have to mark inFinalizer as @nogc in the interface, but that shouldn't be a problem, as the implementation even with a GC doesn't require doing any GC-ish things.

@jacob-carlborg
Copy link
Contributor

We'd have to mark inFinalizer as @nogc in the interface, but that shouldn't be a problem, as the implementation even with a GC doesn't require doing any GC-ish things.

Yes, exactly.

I was looking into the allocation of DefaultTraceInfo, the question is: does it need to return a new instance every time or can it return the same pre-allocated instance?

@schveiguy
Copy link
Member

does it need to return a new instance every time or can it return the same pre-allocated instance?

I think you need a unique instance per exception. But if we are preallocating the exception, we can preallocate the trace info as well. Looks like the big part of the trace info is the stack frame array. The way the default trace info is stored will have to change. Looks like it's an inner class, but it doesn't need to be.

@jacob-carlborg
Copy link
Contributor

I think you need a unique instance per exception.

The way staticError currently you can only have one exception anyway. It will always overwrite the buffer.

@RazvanN7
Copy link
Contributor

RazvanN7 commented Nov 9, 2021

@schveiguy @wilzbach @thewilsonator I have rebased this. Should we merge it?

@ibuclaw
Copy link
Member

ibuclaw commented Nov 9, 2021

-1 on the grounds that the Linux memory error module is both dmd and x86 centric, and should be improved by some measure first so that it be more useable before making it a druntime dependency.

Though that is a purely technical reason only.

Still a -1, this time because it would add a circular dependency between druntime and phobos.

@RazvanN7 RazvanN7 closed this Nov 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet