Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

./configure doesn't abort on KeyboardInterrupt #1391

Closed
michaelkarlcoleman opened this Issue Jul 10, 2018 · 15 comments

Comments

Projects
None yet
3 participants
@michaelkarlcoleman
Copy link

michaelkarlcoleman commented Jul 10, 2018

The configure script fails to properly trap a keyboard interrupt, apparently giving a bogus check result instead.

./configure -nogui

MRtrix build type requested: release version with nogui

Detecting OS: linux
Looking for compiler [clang++]: not found
Looking for compiler [g++]: g++ (GCC) 7.3.0
Checking for C++11 compliance: ok
Checking shared library generation: ok
Detecting pointer size: 64 bit
Detecting byte order: little-endian
Checking for variable-length array support: ok
Checking for non-POD variable-length array support: ^C('Unexpected error:', "(<type 'exceptions.KeyboardInterrupt'>, KeyboardInterrupt(), <traceback object at 0x2aaaabdbf290>)")
not found
Checking for ::max_align_t: 16 bytes
Checking for std::max_align_t: 16 bytes
Checking for Eigen3 library:
...

@Lestropie Lestropie added the build label Jul 11, 2018

@Lestropie

This comment has been minimized.

Copy link
Member

Lestropie commented Jul 11, 2018

Michael,

Thanks for the report. Setting up signal handlers in both the configure and build scripts would be worthwhile I think; I sometimes myself get caught mashing Ctrl-C during builds trying to kill the multiple processes that are running. I have a little experience with such in Python since #858, so hoping it won't be a huge amount of effort.

Rob

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Jul 11, 2018

I don't think it'll be necessary to implement full signal handling here. I reckon we just need to not catch the KeyboardInterrupt exception at this point in configure...

@michaelkarlcoleman

This comment has been minimized.

Copy link
Author

michaelkarlcoleman commented Jul 13, 2018

Yes, I believe @jdtournier 's approach is correct. (And I'd fix all of the bare except: clauses.)

On a somewhat related note, it'd be really nice if programs like mrview didn't try to catch and swallow segfaults themselves. This behavior makes it harder to debug the issue, and arguably doesn't really add much value.

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Jul 16, 2018

On a somewhat related note, it'd be really nice if programs like mrview didn't try to catch and swallow segfaults themselves.

We resorted to signal handlers to try to avoid leaving temporary files around when a command crashed or terminated abnormally. This is important to minimise issues like filling up the /tmp filesystem due to crashes in piped commands (since the second command is responsible for cleaning up). There may be other ways to handle this, but given the number of times users report errors that we eventually track back to a full /tmp folder, I'd argue it adds a lot of value!

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Jul 16, 2018

On a related note, I'd be interested to hear what you were trying to debug in mrview that was made more difficult with the signal handling...? Presumably it made it hard to backtrace in gdb?

@michaelkarlcoleman

This comment has been minimized.

Copy link
Author

michaelkarlcoleman commented Jul 16, 2018

I don't know that much about the challenges that mrview faces, but in general I'd hope for programs to (a) never (TM) segfault, and to (b) always give a clear indication when "out of space" conditions occur. I do realize that that can sometimes be more of a goal one journeys towards.

As a relatively easy first step, perhaps you could just add a debug flag that turns off this handling, allowing the segfault to percolate out in the usual fashion.

Regarding how this came up, I've forgotten exactly what was happening, but mrview was segfaulting and I was trying to get a quick stack trace with catchsegv. I didn't think of it at the time, but I suppose I could have used gdb and caught the stack trace before the signal handler had a chance to eat it.

@Lestropie

This comment has been minimized.

Copy link
Member

Lestropie commented Jul 17, 2018

I don't know that much about the challenges that mrview faces ...

I know there's definitely some segfaults floating around that are occurring deep within the Qt5 library. However I don't yet know if those are genuine Qt5 faults or whether we're somehow misusing it or not initialising something; one would have thought that by version 5.10 they'd have ironed out the kind of basic errors I was seeing last time I checked. Unfortunately it's becoming increasingly hard to put time into such things given we've lost two core devs in the last 18 months.

As a relatively easy first step, perhaps you could just add a debug flag that turns off this handling ...

Yep, that'd be easy enough. A config file option would be best so as not to expand the command help pages with a rarely-used command-line option.

@jdtournier Thoughts on whether the handlers should be disabled if NDEBUG is not set?

... mrview was segfaulting and I was trying to get a quick stack trace with catchsegv.

Had never actually heard of catchsegv; I've always used gdb, and I think @jdtournier is the same. Hence why the signal handlers were perfectly sensible universally for us.

Is there a speed difference between catchsegv and running in gdb? If so the former might be useful in those cases where faults can't be reproduced with minimal examples.

Lestropie added a commit that referenced this issue Jul 17, 2018

@michaelkarlcoleman

This comment has been minimized.

Copy link
Author

michaelkarlcoleman commented Jul 17, 2018

Re catchsegv versus gdb, it's not so much a performance difference (though indeed gdb startup time must be slower), but rather that catchsegv can be painlessly prepended, much like time. In fact, though I hardly ever do, one could prepend it to any command one thought might crash.

The big problem with gdb is that actually passing the program arguments and starting the program is a production, and it's not easy to tell naive users how to do this. With catchsegv, the usage is trivial.

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Jul 17, 2018

Interesting to hear about catchsegv, I'd never come across it before - thanks for the tip.

Regarding disabling of signal handling, my personal preference would be to set an environment variable. Maybe if MRTRIX_NOSIGNALS or something is set, then we don't bother setting up the signal handlers. Means we could instruct users to run:

MRTRIX_NOSIGNALS=1 catchsegv mrconvert ...

which will hopefully be simple enough?

I'm not keen on making that a config option, or dependent on the build config. We typically do want to catch signals and clean up if we can, even in debug code. I reckon the environment variable route is clean and unobtrusive enough - as long as it's properly documented...

I don't know that much about the challenges that mrview faces, but in general I'd hope for programs to (a) never (TM) segfault, and to (b) always give a clear indication when "out of space" conditions occur. I do realize that that can sometimes be more of a goal one journeys towards.

That's what we aim for too. Ideally, all our commands also run clean on valgrind. We encourage users to report segfaults and other issues so we can fix them, but there will inevitably be some edge conditions that we haven't considered that will trigger segfaults - typically only with bad inputs though, but still. As to (b), I'd love to find a portable way of reporting out of space situations, but I've not come across anything that does this reliably so far. We often end up with SIGBUS, which isn't terribly helpful. If you know of good approaches to this, I'm all ears.

jdtournier added a commit that referenced this issue Jul 18, 2018

remove naked excecpt: statements from python scripts
As discussed in #1391

Now using 'except: Exception' as a catch-all instead. This will not
catch KeyboardInterrupt, SystemExit, or GeneratorExit exception, since
by design these derive from BaseException directly - precisely to ensure
they don't get caught by an 'except: Exception' statement.
@michaelkarlcoleman

This comment has been minimized.

Copy link
Author

michaelkarlcoleman commented Jul 20, 2018

@jdtournier Yes, the env var idea would be great as far as I'm concerned. Thanks!

@Lestropie

This comment has been minimized.

Copy link
Member

Lestropie commented Jul 21, 2018

Mmm yes you're right, an envvar is better here.

... as long as it's properly documented...

The environment variables I can find are: MRTRIX_QUIET, MRTRIX_CONFIGFILE, MRTRIX_TMPFILE_PREFIX, MRTRIX_RNG_SEED, MRTRIX_NTHREADS. Here we could add MRTRIX_NOSIGNALS; #1033 / #1405 could add MRTRIX_TMPFILE_DIR. I suppose once the list starts getting to this kind of length, generate_user_docs.sh should be compiling these like it does with config file options?

Lestropie added a commit that referenced this issue Jul 21, 2018

Support environment variable MRTRIX_NOSIGNALS
Disables construction of signal handlers, which may be useful in particular debugging contexts.
As discussed in #1391.
@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Aug 8, 2018

I've just had a quick go at disabling the signal handlers via an environment variable - see branch disable_signal_handlers_with_env_var if you're interested (main change is 5c1e06f).

However, I note that gdb breaks immediately when a signal is raised, regardless of whether our signal handlers are in place. Not the case with catchsegv, as you've already observed. So I'm not sure whether we need to implement this. We could achieve this by asking users to run:

gdb --batch -ex "set logging on" -ex "run" -ex "thread apply all bt" --args command ...

which logs the session to gdb.txt, dumping a full backtrace of all threads in the process.

For convenience, we could also provide a debugging command, so users only need to prefix their command with it. If we add a bin/mrdebug command with the following contents:

#!/bin/bash

if which gdb &>/dev/null; then
  gdb --batch -ex "set logging on" -ex "run" -ex "thread apply all bt" --args "$@"
elif which lldb &>/dev/null; then
  lldb --batch -o "run" -k "thread backtrace all; bt all" -- "$@" | tee gdb.txt
else
  echo "ERROR: can't find suitable debuggeri in PATH - aborting"
  exit 1
fi

Then users would only need to invoke mrdebug command .... Note that this would need to be fixed up for lldb, what I've got here is what I could find out from a quick search, but it's completely untested, and I can't figure out whether I can log the session to file from lldb directly - I don't have access to a macOS system...

@Lestropie

This comment has been minimized.

Copy link
Member

Lestropie commented Aug 13, 2018

One other gotcha with the mrdebug idea is that I found on Windows I have to explicitly set breakpoints on abort() and exit().

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Aug 14, 2018

One other gotcha with the mrdebug idea is that I found on Windows I have to explicitly set breakpoints on abort() and exit().

Even if the MRTRIX_NOSIGNALS environment variable is set...?

@jdtournier

This comment has been minimized.

Copy link
Member

jdtournier commented Jan 30, 2019

I think all the issues here have been addressed...?

@jdtournier jdtournier closed this Jan 30, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.