Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom GSL error handler #167

Closed
Lestropie opened this issue Feb 20, 2015 · 14 comments
Closed

Custom GSL error handler #167

Lestropie opened this issue Feb 20, 2015 · 14 comments
Labels
bug

Comments

@Lestropie
Copy link
Member

@Lestropie Lestropie commented Feb 20, 2015

This is required for dwi2response, such that if the matrix for a particular voxel is not positive definite, it is simply excluded from the SF mask, rather than crashing the program.

@Lestropie Lestropie added the bug label Feb 20, 2015
@jdtournier

This comment has been minimized.

Copy link
Member

@jdtournier jdtournier commented Feb 20, 2015

?? But there is already one, defined in lib/app.cpp, line 40, set in App:: init (), line 514. Is that not sufficient...?

@Lestropie

This comment has been minimized.

Copy link
Member Author

@Lestropie Lestropie commented Feb 24, 2015

OK, so it is there. Was still getting an error in dwi2response because it wasn't the CSD step (that lives in a try{} clause) that was crashing, but the SH::Transform constructor with a dodgy acquisition scheme.

However, looks like the error handler isn't performing as it should. Even though mrtrix_gsl_error_handler() is called, the terminal only gives this:

terminate called after throwing an instance of 'MR::Exception'
terminate called recursively
Aborted (core dumped)
@jdtournier

This comment has been minimized.

Copy link
Member

@jdtournier jdtournier commented Feb 24, 2015

OK, so the GSL handler isn't the issue since it clearly states that an MR::Exception is being thrown. That should be handled at the very least at the top level in main (), which wraps everything up in a try / catch clause. My best guess is that whatever is catching the exception is itself throwing an exception that somehow leads to a recursive invocation of the same handler. Hard to see how that could happen, unless the exception handler (including all the cleanup code involved in unwrapping the stack, i.e. class destructors, etc) itself raises a GSL error...?

@Lestropie

This comment has been minimized.

Copy link
Member Author

@Lestropie Lestropie commented Feb 24, 2015

Yeah, it's a weird one. Going to have to produce a MWE to get at this one I think.

@jdtournier

This comment has been minimized.

Copy link
Member

@jdtournier jdtournier commented Feb 26, 2015

Just did a quick search for terminate called after throwing an instance of terminate called recursively Aborted - seems there are lots of reports of broken exception handling on various versions of g++, some of them quite recent. Might be worth trying this on a different compiler / distro...?

@jdtournier

This comment has been minimized.

Copy link
Member

@jdtournier jdtournier commented Feb 26, 2015

Another thing you could try is to print out the Exception's msg in its constructor, so you can track exactly when each is being created. Currently the error message is only shown by the global catch() call or some other catch() handler, not as it is thrown...

@Lestropie

This comment has been minimized.

Copy link
Member Author

@Lestropie Lestropie commented Feb 27, 2015

It looks like the 'terminate called recursively' is just something detecting multiple instances of the same exception; which in this case is multiple processing threads throwing exceptions for the same reason. Set -nthreads 1 and that part of the error goes away. So it's not actually 'recursive'.

Set -nthreads 0 and the exception is caught correctly. It seems that the try{} code wrapping the run() function catches the exception if it's all executed in the same thread (which is what -nthreads 0 enforces), but if the exception is thrown from a spawned thread it's not caught and instead causes an abort() call or similar. My recollection was that previously if a processing thread threw an exception the thread would just disappear, maybe that's changed with C++11 or some of our multi-threading changes. Also it's not specific to GSL; throw any exception in a spawned thread and the same thing happens.

I can explicitly wrap the failing piece of code in dwi2response, but it might be worth thinking about how to deal with this more generally.

Lestropie added a commit that referenced this issue Feb 27, 2015
In some data where an erroneous gradient scheme was used, the initial A2SH transformation for computing FODs works, but once the gradient table is rotated according to the peak FOD orientation for estimating m=0 coefficients of the response function, the A2SH transformation can no longer be derived.
This changes catches the error and gives the user a meaningful warning, rather than having the command terminate with an ambiguous message.
Discussed in #167.
@jdtournier

This comment has been minimized.

Copy link
Member

@jdtournier jdtournier commented Feb 27, 2015

Ok, I'll look into what is the correct way to handle exceptions in C++11. Although I have a strong suspicion it should just work. Especially since it used to work in C++03, and C++11 is a superset of that. I'm still of the opinion that it'll be a compiler bug...

That said, if it does need explicit handling, that should definitely be done using a try/catch in the Thread::run() call, we can't have each developer of a threaded application worry about this every time. Not sure what to do about it though, catching the exception doesn't really help us propagate it to the other threads if we actually want to terminate cleanly...

@jdtournier

This comment has been minimized.

Copy link
Member

@jdtournier jdtournier commented Feb 27, 2015

OK, seems I was wrong. Exception handling in C++11 threads is not simple, and for this reason the standard doesn't really handle it very well - if any thread throws an exception that is not handled in the thread itself, std::terminate() is called. There's a full discussion of the issue on this blog.

This leaves us in a funny position. We can use the C++11 std::async class, which will catch the exception and allow us to inspect it or rethrow it from the invoking thread, but this isn't ideal either. It won't cause the other threads to terminate, so we'd have to wait until they've all joined before being able to detect the exception in the main thread. This might be the cleanest way to do things, at least it'll provide a clean exit (with all the temporary file cleanup, etc that might happen with pipes). And in most cases all threads will bail out in short order if something is seriously wrong. I'll try to implement this when I have a minute, see if it works well enough for us.

The ideal situation would be to have a mechanism to propagate the exception to all running threads to get them to all unwind their stacks and exit cleanly. There is unfortunately no simple way to do this, it would rely on some form of cooperation - each thread will need to monitor some flag (most likely a std::shared_future) and exit if it's set. For most of the operations we do, that would require a significant amount of effort to set up, and possibly incur a minor overhead. I'm not convinced this will really help.

One thing this does emphasize though is that the best course of action when developing applications is to perform as many checks as you can reasonably think of in the body of your run() function (i.e. in a single-threaded context), before launching into the meaty multi-threaded parts of your application, to minimise the chances of an exception being thrown during processing itself... Good design practice anyway, but now there's one more reason for it.

jdtournier added a commit that referenced this issue Feb 27, 2015
This allows exceptions thrown in threads to be handled by parent thread,
avoiding messy terminations when this happens - see discussion about
this in issue #167.
@jdtournier

This comment has been minimized.

Copy link
Member

@jdtournier jdtournier commented Feb 27, 2015

OK, looks like this was a simple enough change, but probably needs to be tested further. I've just push this to branch C++11_thread_exception_handling, have a go and see if it works for you. I've tested it with mrconvert, dwi2fod and tckgen, which should test both the ThreadedLoop and Queue backends, and everything works as anticipated - but then, they don't throw exceptions...

@Lestropie

This comment has been minimized.

Copy link
Member Author

@Lestropie Lestropie commented Mar 2, 2015

Windows machine: Get a single popup of 'dwi2response.exe has stopped working', exception message fails to print on terminal. Though there are two different exception handlers to choose from when you download the QT / MinGW package...

Arch just gives me 'terminate called after throwing an instance of MR::Exception'. Not sure if the 'terminate called recursively' is gone, might just be because my laptop is only 2 threads.

More generally, I'm not convinced this is worth stressing about. What happened with dwi2response is such an odd combination of circumstances* that it's difficult to see such a thing popping up again; and if it does, it'll almost certainly get reported to us, which we actually want as this kind of crash shouldn't be happening based on your multi-threading framework design (and so functors should be designed appropriately).

  • bad gradient directions, in dwi2response CSD only does initial transform at lmax 4 whereas the A2SH transformation after gradient scheme rotation is lmax 8 and only the latter fails, can't exploit the return false in pipe functor trick
@jdtournier

This comment has been minimized.

Copy link
Member

@jdtournier jdtournier commented Mar 2, 2015

OK, I think I've had enough of trying to figure this out. Previous problem was while the exceptions were being caught, they were being re-thrown in the thread object's destructor, and user-defined destructors are by default assumed to be noexcept - so throwing an exception in a destructor results in an immediate call to std::terminate()...

Spent a little while trying to get my head around it, there really isn't a nice clean way of doing this. It's not possible to terminate a thread without breaking a lot of stuff (there's no way of knowing what the thread is doing at any one time, so terminating it might result in unexpected side-effects), so we really have to let the remaining threads finish by themselves. I could add some checks against a global flag within the Thread::Queue & Image::ThreadedLoop so that the remaining threads could abort cleanly upon error, but I can't be bothered at this stage - I'd rather the program was properly designed to handle all its error-checking upfront.

I've committed what I've come up with, which essentially just allows all the threads to complete, prints out their error messages, and optionally re-throws an exception if the thread object's new wait() function was used rather than the destructor. See what you think, I don't think I'm going to invest a great deal more time into it...

On a different note, not sure why you'd want to rotate the gradient scheme - you can probably achieve what you're after much more cheaply in the SH domain using a spherical (de)convolution with an appropriately-oriented delta function (?). This is the approach used in get_response, which admittedly isn't (yet) in the main repo, but I think you should have access to the code...?

@Lestropie

This comment has been minimized.

Copy link
Member Author

@Lestropie Lestropie commented Mar 2, 2015

I agree; given it's not handled well in the standard, and your multi-threaded queue design, I say just shoot any programmer that allows a multi-threaded functor to throw an exception. Wait... (I'll test that branch again on both OS's soon)

Gradient scheme rotation was done in dwi2response because that's how it was handled in estimate_response. SH domain approach I guess will only need a single matrix inverse calculation rather than one per voxel, not sure if it's worth the effort but I'll have a look see for my own curiosity's sake. Though I guess I have to merge get_response capability into dwi2response at some point anyway.

@jdtournier

This comment has been minimized.

Copy link
Member

@jdtournier jdtournier commented Mar 3, 2015

Well, when you think about it, it's not really the standard that's the issue, it really goes at the heart of how multitasking is implemented in hardware. Interrupting a thread on the middle of execution when it's sharing its address space with a load of other threads is just a Bad Idea...

The fix in this branch will at least handle exceptions semi-gracefully - won't crash out outright that is, and will display the exception's error message. We could do better if we really wanted to, but I'll worry about that if & when it proves to be necessary...

As to merging get_response into dwi2response, I'm in two minds about that. I'm thinking I'd rather split up the different parts: getting a single-fibre mask, getting the fibre orientations in that mask, and estimating the response itself. We could put it all into a neat script, but that would give us a lot more flexibility to try out different things - would be useful for the multi-shell stuff, for instance. And would be useful in cases where the standard approach doesn't quite work or the data don't match our expectations. I don't think it would necessarily impact on performance much either, even using the recursive calibration approach...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.