Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signal watching uses signal.set_wakeup_fd, which is fundamentally unreliable #118

Closed
njsmith opened this issue Nov 20, 2016 · 30 comments
Closed

Comments

@njsmith
Copy link
Contributor

njsmith commented Nov 20, 2016

This isn't urgent, it's more of an edge-case thing, but should be fixed at some point...

Right now, the way the signal-watching machinery detects a signal is that is uses the stdlib's signal.set_wakeup_fd to request that when signal X is detected, then the byte X should be written to a pipe. Then the curio kernel reads from this pipe, and when it sees byte X then it knows that signal X was detected.

The problem is that pipes have a limited capacity, so if too many signals arrive in a short period of time (before the kernel task wakes up to read from the pipe), then eventually the pipe buffer will fill up and any new signals will just get dropped on the floor.

The correct way to do this is that when a signal arrives, we should record that information in some out-of-band location, and use the pipe only for wakeups. You can't use a wake-up pipe as a data transmission mechanism. I'm not sure why signal.set_wakeup_fd even exists. For some reason signals seem to confuse people and cause them to make this mistake over and over again (cf. POSIX's frustrating "real-time signals" API, which also suffers from this basic misdesign -- it provides signal queues, but oops, they might overflow so your code has to be able to get on without them).

Probably the Right Thing is to keep a set or a bitmask of representing which signals have arrived, and a signal handler which updates the set or bitmask and then triggers a wakeup by writing to the pipe. The advantage of a set representation over a full-fledged deque is that the set has bounded size (there are only finitely many distinct signals); the disadvantage is that if multiple identical signals arrive in a short time period, they can get compressed into one. But... this is actually how Unix signals have always worked -- they're flags, not distinct events. When a signal arrives, the kernel sets a flag on the process, and then the next time the process is runnable it runs the signal handler for each flag that's set. Because the kernel doesn't want to commit unbounded storage either. So that's OK.

@1st1
Copy link

1st1 commented Nov 20, 2016

The problem is that pipes have a limited capacity, so if too many signals arrive in a short period of time (before the kernel task wakes up to read from the pipe), then eventually the pipe buffer will fill up and any new signals will just get dropped on the floor.

How large is the buffer? I suspect it's at least a few kilobytes, which means that you need thousands of signals/s to have any kind of problem.

@njsmith
Copy link
Contributor Author

njsmith commented Nov 20, 2016

@1st1: Linux pipes generally use a 4096 byte buffer (= 1 page), though curio uses a socketpair for its wakeup pipe rather than an actual pipe, and socketpairs are a bit different; not sure how their buffering works.

But, yes, like I said, this is an edge case kind of issue -- you can go a long time without hitting it; maybe forever if you're lucky. Nonetheless, we should not be using an unreliable mechanism when a reliable mechanism is available.

It's not hard to receive thousands of signals in between passes through the scheduling loop -- that can easily be tens of milliseconds. It probably indicates the system's under some sort of weird stress, but that's no excuse for doing the wrong thing.

@dabeaz
Copy link
Owner

dabeaz commented Nov 20, 2016

Open to patches. This isn't something I'm likely to look at in the short-term.

@njsmith
Copy link
Contributor Author

njsmith commented Feb 4, 2017

For reference, the actual empirical socketpair default buffer size seems to be 278 bytes on Linux, and 8192 bytes on MacOS. (On Linux, the accounting is weird; it takes fewer bytes to hit the limit if you split them up over more calls to send.) This can be tweaked by adjusting SO_SNDBUF and SO_RCVBUF.

[Edit: code used to check this:

import socket, itertools
a, b = socket.socketpair()
a.setblocking(False)
b.setblocking(False)
try:
    for i in itertools.count():
        a.send(b"x")
except BlockingIOError:
    print(i)

]

@nirs
Copy link

nirs commented Feb 27, 2017

pipes on Linux use 16 pages for long time. 4096 is the maximum size you can write atomically to a pipe, not the limit of the pipe size.

@njsmith, Do you have real world use case for process receiving many signals in short time? Did you try to reproduce this?

@imrn
Copy link

imrn commented Feb 28, 2017

Reproduced or not this is a plausable case even if the probability is low.
Data can be lost on loaded pipes if the write side is non-blocking or doesn not
conforms to full async principles. It is interesting that Python signal ocumentation
gives this: "signal.set_wakeup_fd(fd): fd must be non-blocking." (Probably due to
other reasons.)

Now let's think about the flag solution: Once signal is received, main thread is preemtively
interrupted and begins to run signal handler which sets the flag. But what if the main
thread was just about to reset that flag just before the handler kicks in. Upon continuation,
the flag will be cleared and again there is data loss. Opps.

How about this suggestion: Lets protect that flag context with thread syncronization
primitives. Then have our signal handler launch an other thread that modifies
the protected flag. And of course writes to curio's notify sock to wake and process the
new situation. But wait: What if there is another congestion on curio nofity sockpair.
To make it foolproof you should have blocking write. Lets' do it..
Opps. We are now blocked in Curio' home thread, waiting curio to clear things up.
Deadlocked..

Yeah this is unnecesarily long and complex and again unreliable thing. But it seems
if we dig enough we have such awkward things come up such as the inability for
accessing files asyncronously. :/

@dabeaz
Copy link
Owner

dabeaz commented Mar 1, 2017

How do other libraries (Twisted, gevent, etc.) handle this?

@imrn
Copy link

imrn commented Mar 1, 2017

How about this suggestion: Lets protect that flag context with thread syncronization
primitives. Then have our signal handler launch an other thread that modifies
the protected flag. And of course writes to curio's notify sock to wake and process the
new situation. But wait: What if there is another congestion on curio nofity sockpair.
To make it foolproof you should have blocking write. Lets' do it..
Opps. We are now blocked in Curio' home thread, waiting curio to clear things up.
Deadlocked..

What about the "helper thread" making the blocking write to notify sock?
Any problem with this? If so, this would be an other case that is dependent to
"helper threads" like async files.

@dabeaz
Copy link
Owner

dabeaz commented Mar 1, 2017

Is there an actual problem with Curio's signal handling right now or is all of this being directed at a theoretical bug?

@imrn
Copy link

imrn commented Mar 1, 2017

Since I have no serious use of signals I have no problems with it. But lets decide
for the general theme for Curio discussions: Is it "if it works don't touch" or
Are "possible improvements/questions/art_elsewhere/etc.." welcome?
Since no body can hard press on anybody, I'd always thought the latter.
But if you have any objections, please make it clear and lets continue like that.

@dabeaz
Copy link
Owner

dabeaz commented Mar 1, 2017

As a general rule, I have a pretty pragmatic attitude towards things. Signal handling, in particular, is a known area of hell for systems programming--especially when it starts getting mixed up with threads. With respect to Curio, I consider signal handling to be important, but I also think that probably 99% of the use of signals are going to be related to process termination, interruption, or restart (i.e., SIGTERM, SIGINT, and SIGHUP). These aren't the kinds of signals that are going to be arriving at a high rate of speed, causing signals to be lost. Most other signals, for that matter, often indicate various program faults or error conditions (SIGSEGV, SIGILL, SIGFPE, SIGABRT, etc.) or things related to terminals/job control (SIGTSTP, SIGTTIN, SIGTTOU, etc.). None of these are going to arrive at high rates of speed either.

There are some signals related to asynchronous I/O such as (SIGIO, SIGPOLL, etc.) that could potentially come in fast, but using those would be weird given that the whole point of using Curio is to do asynchronous I/O in the first place---I'm not sure why you be doing it in a manner that additionally added signal handling into the mix unless you were some kind of masochist. There are applications involving interval timers (SIGALRM) that could cause a lot of signals, but Curio already has timing functionality. So, I'm not entirely sure how common it would be for someone to combine an interval timer signal with Curio. And even if they did set up an interval timer, how often would it have to be firing to cause signals to be dropped?

The bottom line is that I'm all for improvement, but working on far-out edge cases isn't a super high priority. I'd much prefer to find out if people are actually experiencing a problem in the real world. I'd also like to know if these problems have been addressed in other libraries with a much larger user base (e.g., Twisted, Gevent, etc.). If the problem has been addressed elsewhere, that is certainly more motivation for looking at it in Curio. If not, I'm inclined to let sleeping dogs lie.

@dabeaz
Copy link
Owner

dabeaz commented Mar 8, 2017

In the course of cleaning up some kernel internals and looking at various things, I've concluded that I can implement signal handling entirely "out of kernel". That is, it doesn't have to be in the kernel at all, but could be done using user-level code in Curio. It could continue to use the wake-up fd approach that's used now. However, something else entirely could be used as well. I'm going to experiment, but I'm open to ideas.

@njsmith
Copy link
Contributor Author

njsmith commented Mar 13, 2017

In case it's of interest, here's how trio does it: https://github.com/njsmith/trio/blob/master/trio/_signals.py

It relies on trio's core providing a call_soon_thread_and_signal_safe primitive with is used for all forms of kernel reentry, and then there are signal specific hoops to jump through (in particular, it coalesces identical signals which is probably over-engineering but whatever, and more importantly it takes some care to avoid a race condition when signals arrive just as we're about to stop listening for them).

While we're talking about signal handling challenges, this thread might also be of interest: python-trio/trio#42
I know Curio doesn't support Windows right now, but it turns out that control-C on windows doesn't interrupt select unless you jump through some hoops.

@dabeaz
Copy link
Owner

dabeaz commented Mar 14, 2017

If anything, all of this reaffirms my decision to take signal handling out of the Curio kernel core ;-). At the moment, I'm doing some different kinds of handling involving events/queues. But it's all Curio user-level code so I suppose it's something that could be played around with if necessary. And windows... ugh.

@imrn
Copy link

imrn commented Mar 19, 2017

This is not directly related with this issue but an other warning about signals on python:

Procedure:

  • Register your signal handler in main thread
  • Launch a thread; T1.
  • Launch a process P1 from T1
  • SIGCHLD is never received at main thread

There are all sorts of warnings around the net about
python/thread/signal combinations but not this one.

I'd like to hear similar experiences about this case to confirm the
behavior or to find out if I'm doing something wrong.

@njsmith
Copy link
Contributor Author

njsmith commented Mar 19, 2017 via email

@imrn
Copy link

imrn commented Mar 19, 2017 via email

@nirs
Copy link

nirs commented Mar 19, 2017

@imrn this seems to work as expected:

$ cat test.py 
import signal
import subprocess
import threading

def sigchld(signo, frame):
    print("received singal: %d" % signo)

signal.signal(signal.SIGCHLD, sigchld)

def run_child():
    subprocess.call("true")

t = threading.Thread(target=run_child)
t.start()
t.join()
$ python3 test.py 
received singal: 17

@imrn
Copy link

imrn commented Mar 20, 2017 via email

@dabeaz
Copy link
Owner

dabeaz commented Mar 20, 2017

I used to teach a concurrency course (primarily focused on threads). In that class, I usually advised keeping the main-thread idle and running everything else in separate threads. The main reason was to make it so that KeyboardInterrupt could continue to work. However, that same advice applies to signals.

I'd probably have to think about how this impacts Curio. I haven't noticed an issue with signal delivery, but yes, if the main thread were getting blocked up for some reason, it might make more sense to run the Curio kernel in a separate thread.

@imrn
Copy link

imrn commented Mar 20, 2017

It seems python runtime/threading constraints are the reason
for limited signal handling capability:

Interruption of system calls and library functions by signal handlers
http://man7.org/linux/man-pages/man7/signal.7.html

@njsmith
Copy link
Contributor Author

njsmith commented Mar 20, 2017 via email

@nirs
Copy link

nirs commented Mar 20, 2017

I think time.sleep in the main thread does not work since the child thread receive the SIGCHLD signal (it is the parent process of the child process). Python writes a byte to the wakeup fd, but the main thread is not watching this fd, so it continue to sleep, and you receive the signal when the sleep returns. We had this issue in vdsm, and we solved it using signal.set_wakeup_fd, which works reliably, and nothing is fundamentally wrong with it.

See https://github.com/oVirt/vdsm/blob/master/lib/vdsm/common/sigutils.py.

Here is a simplified example:

$ cat test.py 
import fcntl
import logging
import os
import select
import signal
import subprocess
import threading


def sigchld(signo, frame):
    logging.info("received singal: %d", signo)


def set_non_blocking(fd):
    flags = fcntl.fcntl(fd, fcntl.F_GETFL)
    flags |= os.O_NONBLOCK
    fcntl.fcntl(fd, fcntl.F_SETFL, flags)


def run_child():
    logging.info("starting child...")
    subprocess.call(["sleep", "1"])
    logging.info("child terminated")


logging.basicConfig(level=logging.INFO,
                    format="%(asctime)s (%(threadName)s) %(message)s")

signal.signal(signal.SIGCHLD, sigchld)

pipe = os.pipe()
set_non_blocking(pipe[0])
set_non_blocking(pipe[1])

signal.set_wakeup_fd(pipe[1])

t = threading.Thread(target=run_child)
t.start()

logging.info("waiting 10 seconds...")
r, _, _ = select.select([pipe[0]], [], [], 10.0)
if not r:
    logging.info("timeout, no signal received")
logging.info("terminating")

Here is example run:

$ python test.py 
2017-03-20 22:15:25,631 (Thread-1) starting child...
2017-03-20 22:15:25,632 (MainThread) waiting 10 seconds...
2017-03-20 22:15:26,633 (MainThread) received singal: 17
2017-03-20 22:15:26,634 (Thread-1) child terminated
2017-03-20 22:15:26,634 (MainThread) terminating

This issue exist only with SIGCHLD from child processes started on another thread. Signals sent to the process can be handled with much less code.

@njsmith
Copy link
Contributor Author

njsmith commented Mar 21, 2017

This issue exist only with SIGCHLD from child processes started on another thread. Signals sent to the process can be handled with much less code.

This might be true in practice, but it's not true in theory – POSIX says that when a signal is sent to the process, the kernel is free to pick any arbitrary thread (that doesn't have that signal masked) for the delivery. Not sure if it matters, but it's always nice to code to the spec rather than implementation quirks when possible...

For some reason I couldn't get set_wakeup_fd to work to solve the analogous version of problem on Windows (though I agree that it ought to work!). I'm assuming you haven't tried running vdsm on Windows though :-). set_wakeup_fd is probably a good solution for curio though.

I think a more general solution would be for the CPython C-level signal handler to do something like:

if (this_is_not_main_thread) {
    pthread_kill(main_thread, signum);
} else {
    // actual signal handler
}

It looks like there's sort of an existing bug for this here: https://bugs.python.org/issue21895

@njsmith
Copy link
Contributor Author

njsmith commented Mar 21, 2017

Oh, and returning somewhat to the original topic of this issue... it looks like set_wakeup_fd is fundamentally unreliable even when used solely to trigger a wakeup with actual content carried over other channels :-(. The problem is that if the wakeup fd's internal buffer overflows, then what we want is to ignore this and carry on – if we're just using the fd to trigger wakeups then the only states that matter are "empty" and "not-empty", and if the buffer is overflowing then it's already in the "not-empty" state. But CPython's actual wakeup_fd handling doesn't work like this: what it actually does if the buffer overflows is to blow up your program by causing an OSError to materialize at some arbitrary bytecode instruction.

In trio we only use the wakeup fd for wakeups, so we set its buffer as small as possible to avoid wasting kernel memory. On Linux "as small as possible" means 6 entries, and on MacOS it means 1. So I definitely can't use set_wakeup_fd on these platforms. Or rather, I can, but only if I bump up the buffer size to some "reasonable" amount that's hopefully big enough to avoid ever hitting the limit without wasting too much memory. Sigh.

@nirs
Copy link

nirs commented Mar 21, 2017

@njsmith,

python 2 ignores overflowed buffer, and python 3 (checked 3.5 and 3.7.0a) writes a warning to stderr, certainly not blowing up:

Exception ignored when trying to write to the signal wakeup fd:
BlockingIOError: [Errno 11] Resource temporarily unavailable

Can you show a real program affected by these fundamental issues?

@dabeaz
Copy link
Owner

dabeaz commented Mar 21, 2017

I'm still not convinced that this set_wakeup_fd() is a practical concern in practice. If Python gets a signal, it writes to that file. If there's a chance it might overflow buffer space because nothing is listening, then make the buffer bigger. Or, don't write code that blocks the signal listener for long periods.

Regarding earlier messages----is there some particular reason someone would be fooling around with SIGCHLD in combination with the subprocess module? Functions in that module already reap the exit code. There's almost no reason to be writing a SIGCHLD handler to avoid zombies there. Maybe I'm missing something obvious.

@imrn
Copy link

imrn commented Mar 21, 2017

This is curio's subprocess wait function.
I think it assumes it will be called when the process has already exited,
or just about to exit so that wasted cpu cycles and reduction in application
responsiveness is negligible.

What if an application launches a long running process and starts waiting it?
What is the proper way for waiting a subprocess?

async def wait(self):
    while True:
        retcode = self._popen.poll()
        if retcode is not None:
            return retcode
        await sleep(0.0005)

@njsmith
Copy link
Contributor Author

njsmith commented Mar 22, 2017

@nirs: doh, you're right, I misread the C code. A bogus warning that can't be silenced is better than a bogus exception, but still not so great :-)

@dabeaz
Copy link
Owner

dabeaz commented Mar 22, 2017

I'm going to punt the subprocess.wait() issue to a different issue. That definitely needs to be addressed.

@dabeaz dabeaz closed this as completed Nov 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants