Skip to content
This repository has been archived by the owner on Oct 12, 2022. It is now read-only.

Sigsegv as exception #187

Merged
merged 14 commits into from Nov 29, 2012
Merged

Sigsegv as exception #187

merged 14 commits into from Nov 29, 2012

Conversation

deadalnix
Copy link
Contributor

When core.nullpointererror is imported in a project, it transform null deference into NullPointerError and other segfault in SignalError on linux x86 and x86_64 .

If the idea is successful, and implemented on other systems (windows, macOS, freeBSD) it could become the standard behavior. For now, it only behave that way when the module is explicitly imported.

This is realted to : #181 that should be included before.

@MartinNowak
Copy link
Member

  • Translating signals to exceptions is highly questionable.
    It was already a bad decision on windows and we shouldn't
    try to emulate it.
  • I'm rather worried about the ABI instabilities of ucontext_t.
  • Signal handler's are set per process.
    • you need to do something with existing handlers
    • what if a segfault occurs in a non-D thread


// Init

shared static this() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd.
It means that this mechanism is activated by importing the module.
Instead it should be a function like installHandler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is fine to me. It can definitively be changed.

@deadalnix
Copy link
Contributor Author

Translating signals to exceptions is highly questionable. It was already a bad decision on windows and we shouldn't try to emulate it.

I think your point here is very weak. No argument whatsoever here. Let me state why I think it is a good idea.

  • it allow to recover from a null pointer deference. This isn't an error that screw up the state of your program and it is definitively recoverable. Throw allow to recover.
  • it provide a stacktrace for when things goes wrong. Even if you are not running tje program using a debugger.
  • you can exit your program nicely on null deference.
  • If you call something that is @safe, unless things goes REALLY wrong, in way that your program can do little about, you can now ensure that things will not crash. (deferencing null is @safe ).

Additionally, it opens some doors :

  • it is now possible to implement thing in a consistent manner on windows and linux, making easier to write cross plateform code.
  • the proposed code allow to do something else than throw. This can evolve in custom handler depending on the memory block (something similar to libsigsegv).
  • This type of tricks are mandatory to implement concurrent GC. It is a direction we want to go in.

Information have been asked to linux hackers, ucontext_t is supported, even if not very documented. You can ask feep if you doubt my claim.

About existing handler, I think an user playing with signal handler is big enough to know what he/she is doing and be sure to not collide with this. If the segfault occur in non-D thread (or even in non D code) things will work as well if you recover. If you throw this will go wrong, as D's Exception are not compatible with C/C++. Anyway, without the Exception, things goes wrong as well. A global thread local flag can be set to ensure this behavior doesn't trigger in non D threads.

@MartinNowak
Copy link
Member

it allow to recover from a null pointer deference.

No you cannot because you corrupt your complete program state at the point
you do not return from the signal. You might only call async safe functions after
that.

http://www.google.com/search?q=bug%20signal%20handler%20deadlock

provide a stacktrace for when things go

If you want to improve the current behavior then try to print
useful information from within the signal handler.
That will be hard but it might be possible, among other functions
you cannot use malloc/printf.

@deadalnix
Copy link
Contributor Author

I think you totally misunderstood the piece of code you are commenting. The whole point of the code in the handler is to provide a way for another piece of code OUTSIDE the handler, to handle the segfault.

The state of the program isn't corrupted, so it is safe to throw and even to recover from it. I even have a piece of code where I use mprotect to trigger segfault, and totally recover without throwing, using the exact same method.

It is recoverable, just try it if you don't believe me.

@MartinNowak
Copy link
Member

Yeah, you only reset the instruction pointer but that isn't much different
from performing a longjmp from inside the signal handler.

You still interrupt functions at arbitrary points, and exception handling
won't help you with scope cleanup because the code is not prepared
to handle them. These are similar issues as with exception-unsafe code,
only that the compiler will make it even worse and it may occur at
any instruction.

Logger logger;
void setLogger()
{
    logger = new Logger;
    scope (failure) logger = null;
    logger.init(getSomething());
}

If getSomething caused a segfault your program is corrupted.
Nobody will clean up the initializer, because the compiler thinks
getSomething is nothrow.

Likewise if a segfault happened in malloc you may not call it again.
It isn't async-safe and you have left it in whatever state it was.

@deadalnix
Copy link
Contributor Author

No, this isn't like longjmp ! In that case, you know that the whole program is back in the right state (longjmp doesn't guarantee that).

And the only registers altered are trash registers, so it is safe in regard of try catch blocks. For you example, it currently works (tested as well) with the code generated by dmd.

I feel like you are making up problems that doesn't even exists here, and you don't even consider the advantages that such a mechanism provide.

@JakobOvrum
Copy link
Member

If the idea is successful, and implemented on other systems (windows, macOS, freeBSD) it could become the standard behavior.

This is already implemented on Windows as the default behaviour. This needs more thought; with this patch it will even throw different exceptions on Linux and Windows, which is a completely unnecessary inconsistency.

@deadalnix
Copy link
Contributor Author

Yes, this is inconsistent. And yes, this needs to be made consistent with windows.

The point is that the exception thrown on windows is system specific. I would rather think that both should throw a NullPointerError on null deference, so we have a system agnostic way of catching this. This definitively need to converge.

I didn't wanted to change existing behavior on windows, so this approach can be tested without breaking anything. Ultimately, this is linked to the problem of @safe that allow unsafe things to be done when passed large objects/pointers that are null. What must be done to solve that problem is something we have to decide before changing the existing behavior, or we risk to change it twice, which is something we want to avoid.

@FeepingCreature
Copy link
Contributor

For reference, I cite ##kernel on freenode: http://pastebin.com/VEeZYPRJ . So it seems to be "officially" supported.

The main advantage over longjmp is that it definitely ends up with the handler function called from a state that is "safe", since it reuses the kernel's existing return-to-previous-context mechanism. So you don't have to emulate whatever cleanup the kernel does at signal handler exit.

@andralex
Copy link
Member

andralex commented Jul 8, 2012

Undecided on what to do about this. Should we discuss the matter further in the newsgroup? I know @WalterBright has a dim view on converting null pointer accesses into exceptions.

@andralex
Copy link
Member

andralex commented Jul 8, 2012

(it ain't rebased either :o))

@deadalnix
Copy link
Contributor Author

Indeed, this pull request involve a language design decision, and may be discussed in the newsgroup. The problem to be fixed is in fact much larger than what this pull request does, it is about the whole null deference handling in D.

This pull request open the door for unified behavior between linux and windows. It is also easy to provide a callback to do whatever is wanted (HALT if one think is better, or throw). It is even possible to recover in some situation (concurrent GC is a great use case of that capability for instance).

Note that I use this in all code I produce in D until then, and it have been of great help.

@deadalnix
Copy link
Contributor Author

I rebased the pull request. git wasn't able to merge the stuff by itself.

@MartinNowak
Copy link
Member

There are still tons of issues.

Your signal handler is global.

  • Do you throw D exceptions in any language?
  • What to do when you overwrite an existing signal?

This applies to all D executables and all D libraries.

  • Would you want libfontconfig to steal your sigsegv handler?

You'll get deadlocks which are even worse than crashing.

  • What to do w.r.t. signal-unsafe functions?

You cannot recover without unwinding/destruction support.

  • How could this be implemented with enregistered variables?
  • Why shall we make a non-standard unsafe function which will cause difficult to find
    bugs the default behavior when you can make it a library?

This has absolutely no place in production code because it is unpredictable and unreliable.

@deadalnix
Copy link
Contributor Author

I'm sorry, but I don't think most of this are problems.

First, yes, it throw D exception, but not in any language, because IT IS IN D RUNTIME. So it throw D exception in D. If you interface D with other languages, then you have to make sure that the interfacing make sense. This is no news and have nothing to do with that pull request specifically.

As of 3rd party lib stealing the signal handler, it is a possibility. However, the signal mechanism is already used in druntime. So either we consider this is a problem, or we don't. But this is rather stupid to state this is a problem when this is done all over the place.

Many other remarks show that you don't understand what is going on here. The whole mechanism is here to set up a function call on top of the instruction that caused the fault. So everything happen in userland, not in the signal handler. Signal usafe function will work properly, and no deadlock will occur.

I'm sorry, but I see nothing but FUD in your comment. You should come with actual fact here.

@FeepingCreature
Copy link
Contributor

I have to agree with this. See my earlier paste from ##kernel - it's standard, albeit uncommon. The technique works on any operating system that uses the basic x86 stackframe layout, which, I think, is more systems than D runs on - and it's not like "but this only works on Linux" stopped people when it came to supporting SEH. And the entire point of this is to sidestep the issue of signal safety. Let's not get into flames - but please make sure you have read the entire thing before objecting.

@deadalnix
Copy link
Contributor Author

@dawgfoto please excuse me for the rudeness of my previous message. I let it here to keep the discussion understandable, but It was way too aggressive. I'm sorry.

To restate thing in a more neutral way : I have nothing against this not being included, but it have to be discussed and the decision must be taken based on actual hard facts. Please feel free to ask any question about the technical aspect of things, because your post contained inaccurate informations. As I'm pretty sure it was not done on purpose, I guess we simply have a misunderstanding. So let's start again on good basis, and please forgive my tone.

@andralex
Copy link
Member

@deadalnix Thanks very much. All - let's keep the good spirits going! I'll ask Walter what he thinks about the idea. Is this implementable on all major OSs?

@deadalnix
Copy link
Contributor Author

On windows, the system already throw an Exception when this happen. A similar mechanism can be implement on windows without much problems.

I'm not qualified enough on FreeBSD or macOS to answer that question.

@MartinNowak
Copy link
Member

Many other remarks show that you don't understand what is going on here. The whole mechanism is here to set up a function call on top of the instruction that caused the fault. So everything happen in userland, not in the signal handler. Signal usafe function will work properly, and no deadlock will occur.

void* p = void; // uninitialized
core.stdc.free(p);
// malloc.c
void free(void* p)
{
    // ...
    rlock_acquire(&malloc_mtx);
    size_t len = *cast(size_t)(p-1); // SIGSEGV
}

Now your preemptively exited a signal-unsafe function and you have no way of repairing it.
From now on every call to malloc/realloc/calloc/free might dead lock.

A set of functions that you may call is listed in signal(7) - Async-signal-safe functions.

it allow to recover from a null pointer deference. This isn't an error that screw up the state of your program and it is definitively recoverable. Throw allow to recover.

If your talking about recovering you need a mechanism to unwind the stack and restore state.
The one used for synchronous exceptions doesn't scale to asynchronous exceptions
because the compiler has to dump all variables to the stack in order to access them from exception handlers.

See my earlier paste from ##kernel - it's standard, albeit uncommon.
Altering the instruction pointer and continuing execution somewhere else is not the issue.

@MartinNowak
Copy link
Member

Is this implementable on all major OSs?

It's common that sigreturn restores the CPU context from the signal's ucontext_t as far as security allows.
FreeBSD - sys_sigreturn
OSX

@deadalnix
Copy link
Contributor Author

I understand your example with malloc/free. You have to understand that in this case, whatever happen, you are doomed. Either the program crash either it is in an inconsistent state. It is a situation where catching the NullPointerError make no sense t all, because you can't recover from it.

Not having this behavior will not solve the problem, because you'll also be in an inconsistent state or you'll crash. This isn't any better and I don't see how this pull request is making things worse. A invalid call to a system function have been made, whatever comes out from that is either a crash or a inconsistent state.

Note that those function are signal unsafe. And this pull request don't change in any way if signal are send or received, it just change how they are handled. At the moment the code present in this pull request start to execute, the arm is already done, and the program is already in a beyond repair state. It is in that state because of the signal, not because of the code present in the pull request, so I don't see how it is an argument for or against it.

Another fact here is that the exception is thrown from C code and C don't handle exception. This problem isn't specific to this pull request, this is a problem that can occur every time an exception is thrown throw C code. And this problem is unsolvable, as C don't support exceptions. It is up to the programmer to ensure that exception are not throw throw C code.

This is exactly why I inherited NullPointerError from Error and not from Exception. They are not always recoverable. But they always are in @safe code.

@MartinNowak
Copy link
Member

This isn't any better

On an automated system deadlocks are worse than failures.
So we'd need either an opt-in or an opt-out switch.

By the way could you rephrase the purpose of translating signals to errors?
If it's error reporting you could do way more advanced stuff using dumps, execve, fork and ptrace.
For example you could restart the process in an error-reporting mode. You could also fork it first and
enable ptrace so that you may inspect memory from the reporter.

immutable pid = fork();
if (pid)
{
    char[11] buf=void;
    format(buf, pid);

    const char* args[3];
    args[0] = argv[0]; // C argv (this has issues with chdir, deleted images, changed rights...)
    args[1] = "--druntime-report-error";
    args[2] = buf.ptr;
    execve(args[0], args.ptr, environ);
}
else
{
    ptrace(PT_TRACE_ME);
    abort();
}

@FeepingCreature
Copy link
Contributor

Personally: exceptions/errors have backtraces. Backtraces are immensely useful (no, gdb is not the answer). It'd also add consistency with Windows.

Look. Of course we can get backtraces for segfaults under linux with effort. But this patch allows us to get backtraces without effort, and that level of trivial convenience has a quality of its own.

About your free example: worst case, you can always inspect the stack, see if you're called from free(), and manually unlock. In point of fact, I think C free under Linux can tell that its argument is not a valid pointer and give a proper error [edit my mistake: it does segfault]. That aside, without this patch it just dies. With this patch it maybe dies if the exception is uncaught. All it does is add the option to do proper cleanup. And generally speaking, if you're catching an Error you deserve what you get. It's a big red flag that says "This guy think he know what he doin".

@MartinNowak
Copy link
Member

The point is that creating a backtrace might be enough to do way more harm.
Reliable crash information can be generated from another process (FF, Chrome, Ubuntu, OSX, Windows...).
If we come up with a solid solution it might be useful for deployed applications too.

How about a library solution, for example?
http://code.google.com/p/google-breakpad/

@deadalnix
Copy link
Contributor Author

I also think that should evolve to provide a custom handler for advanced user.

As of now, it is in opt-in, you have to include this module somewhere to « activate » it.

@MartinNowak
Copy link
Member

Why doesn't this simply restore BP and call _d_throwc from within the signal handler?
That would also allow us to use sigaltstack for handling stack overflows.

@deadalnix
Copy link
Contributor Author

Throwing from the signal handler isn't safe. You aren't in a standard execution flow, and the kernel knows it. Yes the code can be extended to manage stack overflows, and should be IMO.

@MartinNowak
Copy link
Member

Throwing from the signal handler isn't safe.

For the same reason that this mechanism is unsafe or am I missing something.

You are in a standard execution flow, and the kernel knows it.

What do you mean by that?

@@ -0,0 +1,277 @@
/**
* Handle page protection error using Errors. NullPointerError is throw when deferencing null. A system dependant error is throw in other cases.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some corrections:
errors
thrown (twice)
dereferencing
system-dependent

It's also a good idea to use $(D symbol) when referencing symbols.

@alexrp
Copy link
Member

alexrp commented Nov 18, 2012

@deadalnix can you address @JakobOvrum's points?

import core.sys.posix.ucontext;

// Register and unregister memory error handler.
private shared sigaction_t old_sigaction;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to use __gshared here to avoid casting.

@MartinNowak
Copy link
Member

Please stick to camel casing for function names.

@deadalnix
Copy link
Contributor Author

What is the hard limit for line length ? What is the policy to split ternary operator on multiple line ?

@jmdavis
Copy link
Member

jmdavis commented Nov 18, 2012

What is the hard limit for line length ?

There is a soft character limit of 80 characters per line and a hard limit of 120. So, most lines should be within 80 characters but no lines can exceed 120.

What is the policy to split ternary operator on multiple line ?

There is none. For the most part, we don't have much in the way of style rules about formatting. They boil down primarily to using 4 spaces per indent (no tabs) and putting braces on their own line. For most of the rest, it's primarily a matter of making sure that the style in the file is reasonably consistent, and there probably aren't enough ternary operators being used to even set a precedent within most files. Personally, if the line with a ternary operator is too long, I do

auto result = condition
              ? branch1
              : branch2

but I expect that there's code in druntime and/or Phobos which formats that differently. At minimum, if Andrei were doing it, he'd only indent the 2nd and 3rd lines by 4 spaces rather than lining them up with the previous line. It all depends on who wrote the code and what file it's in (as the formatting style varies from file to file).

@MartinNowak MartinNowak merged commit c5b30a5 into dlang:master Nov 29, 2012
@MartinNowak
Copy link
Member

OK let's give it a try.
I've added the missing win{32,64}.mak rules here and here.

@andralex
Copy link
Member

historical moment!

@deadalnix
Copy link
Contributor Author

This one have been epic :D

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet