Sigsegv as exception #187

deadalnix · 2012-03-22T23:32:59Z

When core.nullpointererror is imported in a project, it transform null deference into NullPointerError and other segfault in SignalError on linux x86 and x86_64 .

If the idea is successful, and implemented on other systems (windows, macOS, freeBSD) it could become the standard behavior. For now, it only behave that way when the module is explicitly imported.

This is realted to : #181 that should be included before.

MartinNowak · 2012-03-28T17:44:00Z

Translating signals to exceptions is highly questionable.
It was already a bad decision on windows and we shouldn't
try to emulate it.
I'm rather worried about the ABI instabilities of ucontext_t.
Signal handler's are set per process.
- you need to do something with existing handlers
- what if a segfault occurs in a non-D thread

MartinNowak · 2012-03-28T17:47:15Z

src/core/nullpointererror.d

+
+// Init
+
+shared static this() {


This is odd.
It means that this mechanism is activated by importing the module.
Instead it should be a function like installHandler.

That is fine to me. It can definitively be changed.

deadalnix · 2012-03-28T19:07:49Z

Translating signals to exceptions is highly questionable. It was already a bad decision on windows and we shouldn't try to emulate it.

I think your point here is very weak. No argument whatsoever here. Let me state why I think it is a good idea.

it allow to recover from a null pointer deference. This isn't an error that screw up the state of your program and it is definitively recoverable. Throw allow to recover.
it provide a stacktrace for when things goes wrong. Even if you are not running tje program using a debugger.
you can exit your program nicely on null deference.
If you call something that is @safe, unless things goes REALLY wrong, in way that your program can do little about, you can now ensure that things will not crash. (deferencing null is @safe ).

Additionally, it opens some doors :

it is now possible to implement thing in a consistent manner on windows and linux, making easier to write cross plateform code.
the proposed code allow to do something else than throw. This can evolve in custom handler depending on the memory block (something similar to libsigsegv).
This type of tricks are mandatory to implement concurrent GC. It is a direction we want to go in.

Information have been asked to linux hackers, ucontext_t is supported, even if not very documented. You can ask feep if you doubt my claim.

About existing handler, I think an user playing with signal handler is big enough to know what he/she is doing and be sure to not collide with this. If the segfault occur in non-D thread (or even in non D code) things will work as well if you recover. If you throw this will go wrong, as D's Exception are not compatible with C/C++. Anyway, without the Exception, things goes wrong as well. A global thread local flag can be set to ensure this behavior doesn't trigger in non D threads.

MartinNowak · 2012-03-28T20:38:42Z

it allow to recover from a null pointer deference.

No you cannot because you corrupt your complete program state at the point
you do not return from the signal. You might only call async safe functions after
that.

http://www.google.com/search?q=bug%20signal%20handler%20deadlock

provide a stacktrace for when things go

If you want to improve the current behavior then try to print
useful information from within the signal handler.
That will be hard but it might be possible, among other functions
you cannot use malloc/printf.

deadalnix · 2012-03-28T20:53:28Z

I think you totally misunderstood the piece of code you are commenting. The whole point of the code in the handler is to provide a way for another piece of code OUTSIDE the handler, to handle the segfault.

The state of the program isn't corrupted, so it is safe to throw and even to recover from it. I even have a piece of code where I use mprotect to trigger segfault, and totally recover without throwing, using the exact same method.

It is recoverable, just try it if you don't believe me.

MartinNowak · 2012-03-28T22:27:09Z

Yeah, you only reset the instruction pointer but that isn't much different
from performing a longjmp from inside the signal handler.

You still interrupt functions at arbitrary points, and exception handling
won't help you with scope cleanup because the code is not prepared
to handle them. These are similar issues as with exception-unsafe code,
only that the compiler will make it even worse and it may occur at
any instruction.

Logger logger;
void setLogger()
{
    logger = new Logger;
    scope (failure) logger = null;
    logger.init(getSomething());
}

If getSomething caused a segfault your program is corrupted.
Nobody will clean up the initializer, because the compiler thinks
getSomething is nothrow.

Likewise if a segfault happened in malloc you may not call it again.
It isn't async-safe and you have left it in whatever state it was.

deadalnix · 2012-03-28T23:05:43Z

No, this isn't like longjmp ! In that case, you know that the whole program is back in the right state (longjmp doesn't guarantee that).

And the only registers altered are trash registers, so it is safe in regard of try catch blocks. For you example, it currently works (tested as well) with the code generated by dmd.

I feel like you are making up problems that doesn't even exists here, and you don't even consider the advantages that such a mechanism provide.

JakobOvrum · 2012-03-28T23:18:31Z

If the idea is successful, and implemented on other systems (windows, macOS, freeBSD) it could become the standard behavior.

This is already implemented on Windows as the default behaviour. This needs more thought; with this patch it will even throw different exceptions on Linux and Windows, which is a completely unnecessary inconsistency.

deadalnix · 2012-03-29T07:20:24Z

Yes, this is inconsistent. And yes, this needs to be made consistent with windows.

The point is that the exception thrown on windows is system specific. I would rather think that both should throw a NullPointerError on null deference, so we have a system agnostic way of catching this. This definitively need to converge.

I didn't wanted to change existing behavior on windows, so this approach can be tested without breaking anything. Ultimately, this is linked to the problem of @safe that allow unsafe things to be done when passed large objects/pointers that are null. What must be done to solve that problem is something we have to decide before changing the existing behavior, or we risk to change it twice, which is something we want to avoid.

FeepingCreature · 2012-05-01T17:11:11Z

For reference, I cite ##kernel on freenode: http://pastebin.com/VEeZYPRJ . So it seems to be "officially" supported.

The main advantage over longjmp is that it definitely ends up with the handler function called from a state that is "safe", since it reuses the kernel's existing return-to-previous-context mechanism. So you don't have to emulate whatever cleanup the kernel does at signal handler exit.

andralex · 2012-07-08T22:18:34Z

Undecided on what to do about this. Should we discuss the matter further in the newsgroup? I know @WalterBright has a dim view on converting null pointer accesses into exceptions.

andralex · 2012-07-08T22:18:49Z

(it ain't rebased either :o))

deadalnix · 2012-07-08T22:30:20Z

Indeed, this pull request involve a language design decision, and may be discussed in the newsgroup. The problem to be fixed is in fact much larger than what this pull request does, it is about the whole null deference handling in D.

This pull request open the door for unified behavior between linux and windows. It is also easy to provide a callback to do whatever is wanted (HALT if one think is better, or throw). It is even possible to recover in some situation (concurrent GC is a great use case of that capability for instance).

Note that I use this in all code I produce in D until then, and it have been of great help.

deadalnix · 2012-07-09T12:12:59Z

I rebased the pull request. git wasn't able to merge the stuff by itself.

MartinNowak · 2012-07-10T10:02:31Z

There are still tons of issues.

Your signal handler is global.

Do you throw D exceptions in any language?
What to do when you overwrite an existing signal?

This applies to all D executables and all D libraries.

Would you want libfontconfig to steal your sigsegv handler?

You'll get deadlocks which are even worse than crashing.

What to do w.r.t. signal-unsafe functions?

You cannot recover without unwinding/destruction support.

How could this be implemented with enregistered variables?
Why shall we make a non-standard unsafe function which will cause difficult to find
bugs the default behavior when you can make it a library?

This has absolutely no place in production code because it is unpredictable and unreliable.

deadalnix · 2012-07-10T12:40:11Z

I'm sorry, but I don't think most of this are problems.

First, yes, it throw D exception, but not in any language, because IT IS IN D RUNTIME. So it throw D exception in D. If you interface D with other languages, then you have to make sure that the interfacing make sense. This is no news and have nothing to do with that pull request specifically.

As of 3rd party lib stealing the signal handler, it is a possibility. However, the signal mechanism is already used in druntime. So either we consider this is a problem, or we don't. But this is rather stupid to state this is a problem when this is done all over the place.

Many other remarks show that you don't understand what is going on here. The whole mechanism is here to set up a function call on top of the instruction that caused the fault. So everything happen in userland, not in the signal handler. Signal usafe function will work properly, and no deadlock will occur.

I'm sorry, but I see nothing but FUD in your comment. You should come with actual fact here.

FeepingCreature · 2012-07-10T12:53:06Z

I have to agree with this. See my earlier paste from ##kernel - it's standard, albeit uncommon. The technique works on any operating system that uses the basic x86 stackframe layout, which, I think, is more systems than D runs on - and it's not like "but this only works on Linux" stopped people when it came to supporting SEH. And the entire point of this is to sidestep the issue of signal safety. Let's not get into flames - but please make sure you have read the entire thing before objecting.

deadalnix · 2012-07-10T13:19:58Z

@dawgfoto please excuse me for the rudeness of my previous message. I let it here to keep the discussion understandable, but It was way too aggressive. I'm sorry.

To restate thing in a more neutral way : I have nothing against this not being included, but it have to be discussed and the decision must be taken based on actual hard facts. Please feel free to ask any question about the technical aspect of things, because your post contained inaccurate informations. As I'm pretty sure it was not done on purpose, I guess we simply have a misunderstanding. So let's start again on good basis, and please forgive my tone.

andralex · 2012-07-10T13:43:44Z

@deadalnix Thanks very much. All - let's keep the good spirits going! I'll ask Walter what he thinks about the idea. Is this implementable on all major OSs?

deadalnix · 2012-07-10T13:49:54Z

On windows, the system already throw an Exception when this happen. A similar mechanism can be implement on windows without much problems.

I'm not qualified enough on FreeBSD or macOS to answer that question.

MartinNowak · 2012-07-10T21:52:18Z

Many other remarks show that you don't understand what is going on here. The whole mechanism is here to set up a function call on top of the instruction that caused the fault. So everything happen in userland, not in the signal handler. Signal usafe function will work properly, and no deadlock will occur.

void* p = void; // uninitialized
core.stdc.free(p);

// malloc.c
void free(void* p)
{
    // ...
    rlock_acquire(&malloc_mtx);
    size_t len = *cast(size_t)(p-1); // SIGSEGV
}

Now your preemptively exited a signal-unsafe function and you have no way of repairing it.
From now on every call to malloc/realloc/calloc/free might dead lock.

A set of functions that you may call is listed in signal(7) - Async-signal-safe functions.

it allow to recover from a null pointer deference. This isn't an error that screw up the state of your program and it is definitively recoverable. Throw allow to recover.

If your talking about recovering you need a mechanism to unwind the stack and restore state.
The one used for synchronous exceptions doesn't scale to asynchronous exceptions
because the compiler has to dump all variables to the stack in order to access them from exception handlers.

See my earlier paste from ##kernel - it's standard, albeit uncommon.
Altering the instruction pointer and continuing execution somewhere else is not the issue.

MartinNowak · 2012-07-10T22:07:50Z

Is this implementable on all major OSs?

It's common that sigreturn restores the CPU context from the signal's ucontext_t as far as security allows.
FreeBSD - sys_sigreturn
OSX

deadalnix · 2012-07-10T23:05:35Z

I understand your example with malloc/free. You have to understand that in this case, whatever happen, you are doomed. Either the program crash either it is in an inconsistent state. It is a situation where catching the NullPointerError make no sense t all, because you can't recover from it.

Not having this behavior will not solve the problem, because you'll also be in an inconsistent state or you'll crash. This isn't any better and I don't see how this pull request is making things worse. A invalid call to a system function have been made, whatever comes out from that is either a crash or a inconsistent state.

Note that those function are signal unsafe. And this pull request don't change in any way if signal are send or received, it just change how they are handled. At the moment the code present in this pull request start to execute, the arm is already done, and the program is already in a beyond repair state. It is in that state because of the signal, not because of the code present in the pull request, so I don't see how it is an argument for or against it.

Another fact here is that the exception is thrown from C code and C don't handle exception. This problem isn't specific to this pull request, this is a problem that can occur every time an exception is thrown throw C code. And this problem is unsolvable, as C don't support exceptions. It is up to the programmer to ensure that exception are not throw throw C code.

This is exactly why I inherited NullPointerError from Error and not from Exception. They are not always recoverable. But they always are in @safe code.

MartinNowak · 2012-07-11T00:41:55Z

This isn't any better

On an automated system deadlocks are worse than failures.
So we'd need either an opt-in or an opt-out switch.

By the way could you rephrase the purpose of translating signals to errors?
If it's error reporting you could do way more advanced stuff using dumps, execve, fork and ptrace.
For example you could restart the process in an error-reporting mode. You could also fork it first and
enable ptrace so that you may inspect memory from the reporter.

immutable pid = fork();
if (pid)
{
    char[11] buf=void;
    format(buf, pid);

    const char* args[3];
    args[0] = argv[0]; // C argv (this has issues with chdir, deleted images, changed rights...)
    args[1] = "--druntime-report-error";
    args[2] = buf.ptr;
    execve(args[0], args.ptr, environ);
}
else
{
    ptrace(PT_TRACE_ME);
    abort();
}

FeepingCreature · 2012-07-11T04:11:56Z

Personally: exceptions/errors have backtraces. Backtraces are immensely useful (no, gdb is not the answer). It'd also add consistency with Windows.

Look. Of course we can get backtraces for segfaults under linux with effort. But this patch allows us to get backtraces without effort, and that level of trivial convenience has a quality of its own.

About your free example: worst case, you can always inspect the stack, see if you're called from free(), and manually unlock. In point of fact, I think C free under Linux can tell that its argument is not a valid pointer and give a proper error [edit my mistake: it does segfault]. That aside, without this patch it just dies. With this patch it maybe dies if the exception is uncaught. All it does is add the option to do proper cleanup. And generally speaking, if you're catching an Error you deserve what you get. It's a big red flag that says "This guy think he know what he doin".

MartinNowak · 2012-07-11T04:31:16Z

The point is that creating a backtrace might be enough to do way more harm.
Reliable crash information can be generated from another process (FF, Chrome, Ubuntu, OSX, Windows...).
If we come up with a solid solution it might be useful for deployed applications too.

How about a library solution, for example?
http://code.google.com/p/google-breakpad/

deadalnix · 2012-07-13T22:28:12Z

I also think that should evolve to provide a custom handler for advanced user.

As of now, it is in opt-in, you have to include this module somewhere to « activate » it.

MartinNowak · 2012-07-17T17:50:28Z

Why doesn't this simply restore BP and call _d_throwc from within the signal handler?
That would also allow us to use sigaltstack for handling stack overflows.

deadalnix · 2012-07-17T18:04:59Z

Throwing from the signal handler isn't safe. You aren't in a standard execution flow, and the kernel knows it. Yes the code can be extended to manage stack overflows, and should be IMO.

MartinNowak · 2012-07-17T18:31:24Z

Throwing from the signal handler isn't safe.

For the same reason that this mechanism is unsafe or am I missing something.

You are in a standard execution flow, and the kernel knows it.

What do you mean by that?

JakobOvrum · 2012-10-24T04:13:37Z

src/etc/linux/memoryerror.d

@@ -0,0 +1,277 @@
+/**
+ * Handle page protection error using Errors. NullPointerError is throw when deferencing null. A system dependant error is throw in other cases.


Some corrections:
errors
thrown (twice)
dereferencing
system-dependent

It's also a good idea to use $(D symbol) when referencing symbols.

alexrp · 2012-11-18T00:06:37Z

@deadalnix can you address @JakobOvrum's points?

MartinNowak · 2012-11-18T03:22:28Z

src/etc/linux/memoryerror.d

+import core.sys.posix.ucontext;
+
+// Register and unregister memory error handler.
+private shared sigaction_t old_sigaction;


You might want to use __gshared here to avoid casting.

MartinNowak · 2012-11-18T03:36:27Z

Please stick to camel casing for function names.

… in other SIGSEGV cases.

deadalnix · 2012-11-18T21:54:18Z

What is the hard limit for line length ? What is the policy to split ternary operator on multiple line ?

jmdavis · 2012-11-18T22:11:41Z

What is the hard limit for line length ?

There is a soft character limit of 80 characters per line and a hard limit of 120. So, most lines should be within 80 characters but no lines can exceed 120.

What is the policy to split ternary operator on multiple line ?

There is none. For the most part, we don't have much in the way of style rules about formatting. They boil down primarily to using 4 spaces per indent (no tabs) and putting braces on their own line. For most of the rest, it's primarily a matter of making sure that the style in the file is reasonably consistent, and there probably aren't enough ternary operators being used to even set a precedent within most files. Personally, if the line with a ternary operator is too long, I do

auto result = condition
              ? branch1
              : branch2

but I expect that there's code in druntime and/or Phobos which formats that differently. At minimum, if Andrei were doing it, he'd only indent the 2nd and 3rd lines by 4 spaces rather than lining them up with the previous line. It all depends on who wrote the code and what file it's in (as the formatting style varies from file to file).

MartinNowak · 2012-11-29T16:11:59Z

OK let's give it a try.
I've added the missing win{32,64}.mak rules here and here.

andralex · 2012-11-29T16:12:43Z

historical moment!

deadalnix · 2012-11-29T19:36:40Z

This one have been epic :D

MartinNowak reviewed Mar 28, 2012
View reviewed changes

JakobOvrum reviewed Oct 24, 2012
View reviewed changes

MartinNowak reviewed Nov 18, 2012
View reviewed changes

deadalnix added 14 commits November 18, 2012 22:21

Allow to throw a NullPointerError on null deference and a SignalError…

34412a3

… in other SIGSEGV cases.

Nullpointererror with makefile.

38f8698

Several changes according to comment in the github thread.

7e7cec4

rename nullpointererror into memoryerror for more clarity.

5dbbff2

fix comments.

aa7c529

remove GC allocation from restore registers.

59888e8

change tabs into spaces.

9492568

put { on new lines.

10258d2

move enum declarations for registers into ucontext

4cf2d75

remove trailing spaces

a66c0ee

Function register and unregister instead of shared this trick.

4d6d306

update makefile for windows

8fa3cc6

fix english

395ebfc

several changes for better conformance to druntime style

c5b30a5

MartinNowak merged commit c5b30a5 into dlang:master Nov 29, 2012

Safety0ff mentioned this pull request Jul 10, 2014

Fix issue with @nogc breaking the signal API #879

Closed

schveiguy mentioned this pull request Jul 12, 2018

Handle page protection errors as D errors on Linux with --DRT-memoryError=1 #2249

Closed

		@@ -0,0 +1,277 @@
		/**
		* Handle page protection error using Errors. NullPointerError is throw when deferencing null. A system dependant error is throw in other cases.

Sigsegv as exception #187

Sigsegv as exception #187

Conversation

deadalnix commented Mar 22, 2012

MartinNowak commented Mar 28, 2012

MartinNowak Mar 28, 2012

Choose a reason for hiding this comment

deadalnix Mar 28, 2012

Choose a reason for hiding this comment

deadalnix commented Mar 28, 2012

MartinNowak commented Mar 28, 2012

deadalnix commented Mar 28, 2012

MartinNowak commented Mar 28, 2012

deadalnix commented Mar 28, 2012

JakobOvrum commented Mar 28, 2012

deadalnix commented Mar 29, 2012

FeepingCreature commented May 1, 2012

andralex commented Jul 8, 2012

andralex commented Jul 8, 2012

deadalnix commented Jul 8, 2012

deadalnix commented Jul 9, 2012

MartinNowak commented Jul 10, 2012

deadalnix commented Jul 10, 2012

FeepingCreature commented Jul 10, 2012

deadalnix commented Jul 10, 2012

andralex commented Jul 10, 2012

deadalnix commented Jul 10, 2012

MartinNowak commented Jul 10, 2012

MartinNowak commented Jul 10, 2012

deadalnix commented Jul 10, 2012

MartinNowak commented Jul 11, 2012

FeepingCreature commented Jul 11, 2012

MartinNowak commented Jul 11, 2012

deadalnix commented Jul 13, 2012

MartinNowak commented Jul 17, 2012

deadalnix commented Jul 17, 2012

MartinNowak commented Jul 17, 2012

JakobOvrum Oct 24, 2012

Choose a reason for hiding this comment

alexrp commented Nov 18, 2012

MartinNowak Nov 18, 2012

Choose a reason for hiding this comment

MartinNowak commented Nov 18, 2012

deadalnix commented Nov 18, 2012

jmdavis commented Nov 18, 2012

MartinNowak commented Nov 29, 2012

andralex commented Nov 29, 2012

deadalnix commented Nov 29, 2012