Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nix-daemon should not core dump on SIGINT (i.e. CTRL-C) #1692

Closed
Mic92 opened this issue Nov 21, 2017 · 7 comments
Closed

nix-daemon should not core dump on SIGINT (i.e. CTRL-C) #1692

Mic92 opened this issue Nov 21, 2017 · 7 comments
Assignees

Comments

@Mic92
Copy link
Member

Mic92 commented Nov 21, 2017

On signal SIGINT in nix-shell, nix-daemon will throw nix::Interrupted:

$ nix-shell -p some-new-package
downloading...
Hit Ctrl-C

and crash with the following error:

Nov 20 23:17:42 turingmachine nix-daemon[32676]: terminate called after throwing an instance of 'nix::Interrupted'
Nov 20 23:17:42 turingmachine nix-daemon[32676]:   what():  interrupted by the user

that will lead to nix beeing core dumped:

at 2017-11-18 21:47:02 GMT    8220     0     0   6 error     /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/bin/nix-daemon
Sat 2017-11-18 23:03:03 GMT   13544     0     0   6 error     /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/bin/nix-daemon
Sun 2017-11-19 13:27:49 GMT    9122     0     0   6 error     /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/bin/nix-daemon
Mon 2017-11-20 10:06:53 GMT    3644     0     0   6 error     /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/bin/nix-daemon
Mon 2017-11-20 10:07:00 GMT   13544     0     0   6 error     /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/bin/nix-daemon
Mon 2017-11-20 10:10:10 GMT   19802     0     0   6 error     /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/bin/nix-daemon
Mon 2017-11-20 10:13:37 GMT   24917     0     0   6 error     /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/bin/nix-daemon
Mon 2017-11-20 10:17:13 GMT   26679     0     0   6 error     /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/bin/nix-daemon
Mon 2017-11-20 10:17:38 GMT   32140     0     0   6 error     /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/bin/nix-daemon
Mon 2017-11-20 10:18:36 GMT   18734     0     0   6 error     /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/bin/nix-daemon
Mon 2017-11-20 10:19:28 GMT    6854     0     0   6 error     /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/bin/nix-daemon
Mon 2017-11-20 10:25:08 GMT     754     0     0   6 error     /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/bin/nix-daemon
Mon 2017-11-20 10:25:18 GMT    8593     0     0   6 error     /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/bin/nix-daemon
Mon 2017-11-20 10:26:01 GMT   21676     0     0   6 error     /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/bin/nix-daemon
Mon 2017-11-20 11:47:57 GMT   20885     0     0   6 error     /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/bin/nix-daemon
Mon 2017-11-20 12:23:36 GMT   28990     0     0   6 error     /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/bin/nix-daemon
Mon 2017-11-20 14:56:42 GMT   18653     0     0   6 error     /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/bin/nix-daemon
Mon 2017-11-20 18:13:06 GMT   29323     0     0   6 error     /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/bin/nix-daemon
Mon 2017-11-20 18:14:22 GMT   31764     0     0   6 error     /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/bin/nix-daemon
Mon 2017-11-20 23:17:53 GMT   21778     0     0   6 error     /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/bin/nix-daemon

I guess a high-level catch block would avoid this.

example stacktrace:

sudo coredumpctl gdb 21778
(gdb) bt
#0  0x00007fb571d72274 in raise () from /nix/store/mjx71lmnlf4psm9942djjcd8b56hyk8b-glibc-2.26-75/lib/libc.so.6
#1  0x00007fb571d73675 in abort () from /nix/store/mjx71lmnlf4psm9942djjcd8b56hyk8b-glibc-2.26-75/lib/libc.so.6
#2  0x00007fb5728ff98d in __gnu_cxx::__verbose_terminate_handler() () from /nix/store/ni3s4gas1c95aqkkr67pla6k66chkmb8-gcc-6.4.0-lib/lib/libstdc++.so.6
#3  0x00007fb5728fd966 in ?? () from /nix/store/ni3s4gas1c95aqkkr67pla6k66chkmb8-gcc-6.4.0-lib/lib/libstdc++.so.6
#4  0x00007fb5728fc989 in ?? () from /nix/store/ni3s4gas1c95aqkkr67pla6k66chkmb8-gcc-6.4.0-lib/lib/libstdc++.so.6
#5  0x00007fb5728fd2dd in __gxx_personality_v0 () from /nix/store/ni3s4gas1c95aqkkr67pla6k66chkmb8-gcc-6.4.0-lib/lib/libstdc++.so.6
#6  0x00007fb57231ef63 in _Unwind_RaiseException_Phase2 () from /nix/store/mjx71lmnlf4psm9942djjcd8b56hyk8b-glibc-2.26-75/lib/libgcc_s.so.1
#7  0x00007fb57231f487 in _Unwind_Resume () from /nix/store/mjx71lmnlf4psm9942djjcd8b56hyk8b-glibc-2.26-75/lib/libgcc_s.so.1
#8  0x00007fb573126822 in void nix::Activity::result<unsigned long, unsigned long, unsigned long, unsigned long>(nix::ResultType, unsigned long const&, unsigned long const&, unsigned long const&, unsigned long const&) const () from /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/lib/libnixstore.so
#9  0x00007fb573159ae7 in nix::CurlDownloader::DownloadItem::progressCallbackWrapper(void*, double, double, double, double) ()
   from /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/lib/libnixstore.so
#10 0x00007fb571183f01 in Curl_pgrsUpdate () from /nix/store/45aigjqdg2wwmj777xw4pwjkil85nrcq-curl-7.56.1/lib/libcurl.so.4
#11 0x00007fb571184349 in Curl_pgrsDone () from /nix/store/45aigjqdg2wwmj777xw4pwjkil85nrcq-curl-7.56.1/lib/libcurl.so.4
#12 0x00007fb5711ab718 in multi_done () from /nix/store/45aigjqdg2wwmj777xw4pwjkil85nrcq-curl-7.56.1/lib/libcurl.so.4
#13 0x00007fb5711ad747 in curl_multi_remove_handle () from /nix/store/45aigjqdg2wwmj777xw4pwjkil85nrcq-curl-7.56.1/lib/libcurl.so.4
#14 0x00007fb573157e93 in nix::CurlDownloader::DownloadItem::~DownloadItem() () from /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/lib/libnixstore.so
#15 0x00007fb57315983a in std::_Rb_tree<void*, std::pair<void* const, std::shared_ptr<nix::CurlDownloader::DownloadItem> >, std::_Select1st<std::pair<void* const, std::shared_ptr<nix::CurlDownloader::DownloadItem> > >, std::less<void*>, std::allocator<std::pair<void* const, std::shared_ptr<nix::CurlDownloader::DownloadItem> > > >::_M_erase(std::_Rb_tree_node<std::pair<void* const, std::shared_ptr<nix::CurlDownloader::DownloadItem> > >*) () from /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/lib/libnixstore.so
#16 0x00007fb57315d9b0 in nix::CurlDownloader::workerThreadMain() () from /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/lib/libnixstore.so
#17 0x00007fb57315e79c in nix::CurlDownloader::workerThreadEntry() () from /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2/lib/libnixstore.so
#18 0x00007fb57292865f in ?? () from /nix/store/ni3s4gas1c95aqkkr67pla6k66chkmb8-gcc-6.4.0-lib/lib/libstdc++.so.6
#19 0x00007fb5720f82a5 in start_thread () from /nix/store/mjx71lmnlf4psm9942djjcd8b56hyk8b-glibc-2.26-75/lib/libpthread.so.0
#20 0x00007fb571e2e8af in clone () from /nix/store/mjx71lmnlf4psm9942djjcd8b56hyk8b-glibc-2.26-75/lib/libc.so.6
@Mic92 Mic92 changed the title nix 1.12 should not coredump on SIGINT (i.e. CTRL-C) nix 1.12 should not core dump on SIGINT (i.e. CTRL-C) Nov 21, 2017
@dtzWill
Copy link
Member

dtzWill commented Nov 27, 2017

Hmm, well a number of commands such as nix-build do this properly AFAICT.

What invocation/command is used when this happens?

(probably could guess by exploring them, but easier to ask 😇)

@Mic92
Copy link
Member Author

Mic92 commented Nov 28, 2017

$ nix-shell -p some-new-package
downloading...
Hit Ctrl-C

@dtzWill
Copy link
Member

dtzWill commented Nov 29, 2017

Hmm, having problems reproducing (using the exact same nix you're using, /nix/store/k9cz2cp7fh7lv4il3b6fsmfrf398zpf4-nix-unstable-1.12pre5732_fd10f6f2).
Judging from the list of coredumps you posted, I'm guessing this is not difficult to "reproduce" on your end?

The coredump seems to be a symptom of the unhandled exception.

As I understand it, it's a big case of "YMMV" anytime exceptions are thrown across DSO boundaries, which appears to be the case here (libnixstore -> libnixutil/nix-shell, not sure)....
I recently tracked down a problem in my environment that was due to this (#1678) but I don't think that specific fix would help here.

It might also be something in the voodoo that is the code in Nix that catches interrupts and makes them do reasonable things re:exceptions and re:threads. Perhaps someone who understands it can comment 😁

Anyway if this is just another problem like #1678, perhaps a pass should be made that ensures all of our exception classes are anchored somewhere like libnixutil.so?

@dtzWill
Copy link
Member

dtzWill commented Nov 29, 2017

Switched to nixUnstable entirely, and I can now reproduce! Yay? :).

For anyone else poking at this, the client looks fine (reports "error: interrupted by the user"), but nix-daemon logs (journalctl) show the problem.

@Mic92 Mic92 changed the title nix 1.12 should not core dump on SIGINT (i.e. CTRL-C) nix-daemon should not core dump on SIGINT (i.e. CTRL-C) Nov 29, 2017
@Mic92
Copy link
Member Author

Mic92 commented Nov 29, 2017

sorry, my error report was not precise enough. I updated it.

@dtzWill
Copy link
Member

dtzWill commented Nov 30, 2017

No worries, this looks to be trickier than I realized at first.

AFAICT the problem is the way in which nix-daemon dies (throwing an uncaught "Interrupted" exception, apparently?) when the client is interrupted with control-C.

(Control-C'ing the nix-daemon itself seems to work as expected)

Haha the commit history for the interrupt stuff at least acknowledges how much magic it involves 😁.

@dtzWill
Copy link
Member

dtzWill commented Nov 30, 2017

Poked at this a bit more, here's what I've found:

In my own debugging, as well as the stack trace reported, the problem is the result of throwing an exception from the destructors. This is badness and results in program termination if this happens while running already unwinding from an exception.

Reconstructed Timeline

  • Client disappears
  • MonitorFD notices this and triggers interrupt state (which code checks and throws Interrupt exceptions from many places)
  • Download code notices this (it will do so almost immediately as part of the progress reporting callback if nothing else).
  • Download threads/etc start cleaning up
  • DownloadItem destructor (in trace from report) calls curl methods to perform cleanup, which invoke the progress reporting callback.
  • Progress reporting callback notices interrupt state and attempts to throw Interrupt exception.[*]
  • Unwind code calls terminate() per language spec

[*] in some places code has checks to avoid re-throwing Interrupt while unwinding, but either this doesn't use that code or it's not working. Possibly complicated by threads, not sure.

Possible Solutions

Ignore Exceptions in DownloadItem destructor

~DownloadItem() already ignores exceptions for part of what it's doing, perhaps this should be extended to cover the curl shutdown/cleanup code?

This might be bad if it causes us to miss important download errors, but I'm not sure if that can happen.

Don't throw from Callbacks, particularly progress reporting callback

Also possibly overkill, but without this we're a bit reliant on what callbacks curl decides to invoke while shutdown occurs.

Specifically ignore interrupted exceptions in progress reporting callback

Narrower in scope but should be safe.
Proposed implementation looks like:

        int progressCallback(double dltotal, double dlnow)
        {
            try {
              act.progress(dlnow, dltotal);
            } catch (nix::Interrupted &) {
              assert(_isInterrupted);
            }
            return _isInterrupted;
        }

Since this callback uses "isInterrupted" to indicate to curl that it should abort, ignoring any Interrupted exceptions encountered (which cause _isInterrupted to be set) seems reasonable.

In my testing this fixes the reported issue.

Specifically ignoring interrupted exception in destructor

This should (untested) fix the problem, but since the curl code performing cleanup encountered an exception any cleanup it normally would do after the callback may not be executed.
(Also I'm not sure if libcurl's state can be relied upon to be consistent if this occurs)

Summary

What a doozy. The code above fixes the problem in my testing, I'll submit it as a PR shortly.

Hope this helps, thoughts/comments welcome-- there's a lot of details to be accounted for here :).

dtzWill added a commit to dtzWill/nix that referenced this issue Nov 30, 2017
dtzWill added a commit to dtzWill/nix that referenced this issue Feb 5, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Feb 20, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Feb 22, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Feb 23, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Feb 28, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 1, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 2, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 3, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 6, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 6, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 7, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 9, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 14, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 14, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 15, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 15, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 19, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 19, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 20, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 20, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 20, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 21, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 21, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 22, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 22, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 26, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 27, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Mar 29, 2018
@shlevy shlevy added the backlog label Apr 1, 2018
@shlevy shlevy self-assigned this Apr 1, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Apr 4, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Apr 11, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Apr 18, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Apr 19, 2018
dtzWill added a commit to dtzWill/nix that referenced this issue Apr 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants