Signal watching uses signal.set_wakeup_fd, which is fundamentally unreliable #118

njsmith · 2016-11-20T00:42:25Z

This isn't urgent, it's more of an edge-case thing, but should be fixed at some point...

Right now, the way the signal-watching machinery detects a signal is that is uses the stdlib's signal.set_wakeup_fd to request that when signal X is detected, then the byte X should be written to a pipe. Then the curio kernel reads from this pipe, and when it sees byte X then it knows that signal X was detected.

The problem is that pipes have a limited capacity, so if too many signals arrive in a short period of time (before the kernel task wakes up to read from the pipe), then eventually the pipe buffer will fill up and any new signals will just get dropped on the floor.

The correct way to do this is that when a signal arrives, we should record that information in some out-of-band location, and use the pipe only for wakeups. You can't use a wake-up pipe as a data transmission mechanism. I'm not sure why signal.set_wakeup_fd even exists. For some reason signals seem to confuse people and cause them to make this mistake over and over again (cf. POSIX's frustrating "real-time signals" API, which also suffers from this basic misdesign -- it provides signal queues, but oops, they might overflow so your code has to be able to get on without them).

Probably the Right Thing is to keep a set or a bitmask of representing which signals have arrived, and a signal handler which updates the set or bitmask and then triggers a wakeup by writing to the pipe. The advantage of a set representation over a full-fledged deque is that the set has bounded size (there are only finitely many distinct signals); the disadvantage is that if multiple identical signals arrive in a short time period, they can get compressed into one. But... this is actually how Unix signals have always worked -- they're flags, not distinct events. When a signal arrives, the kernel sets a flag on the process, and then the next time the process is runnable it runs the signal handler for each flag that's set. Because the kernel doesn't want to commit unbounded storage either. So that's OK.

The text was updated successfully, but these errors were encountered:

1st1 · 2016-11-20T00:47:41Z

The problem is that pipes have a limited capacity, so if too many signals arrive in a short period of time (before the kernel task wakes up to read from the pipe), then eventually the pipe buffer will fill up and any new signals will just get dropped on the floor.

How large is the buffer? I suspect it's at least a few kilobytes, which means that you need thousands of signals/s to have any kind of problem.

njsmith · 2016-11-20T00:53:15Z

@1st1: Linux pipes generally use a 4096 byte buffer (= 1 page), though curio uses a socketpair for its wakeup pipe rather than an actual pipe, and socketpairs are a bit different; not sure how their buffering works.

But, yes, like I said, this is an edge case kind of issue -- you can go a long time without hitting it; maybe forever if you're lucky. Nonetheless, we should not be using an unreliable mechanism when a reliable mechanism is available.

It's not hard to receive thousands of signals in between passes through the scheduling loop -- that can easily be tens of milliseconds. It probably indicates the system's under some sort of weird stress, but that's no excuse for doing the wrong thing.

dabeaz · 2016-11-20T02:19:33Z

Open to patches. This isn't something I'm likely to look at in the short-term.

njsmith · 2017-02-04T04:38:28Z

For reference, the actual empirical socketpair default buffer size seems to be 278 bytes on Linux, and 8192 bytes on MacOS. (On Linux, the accounting is weird; it takes fewer bytes to hit the limit if you split them up over more calls to send.) This can be tweaked by adjusting SO_SNDBUF and SO_RCVBUF.

[Edit: code used to check this:

import socket, itertools
a, b = socket.socketpair()
a.setblocking(False)
b.setblocking(False)
try:
    for i in itertools.count():
        a.send(b"x")
except BlockingIOError:
    print(i)

]

nirs · 2017-02-27T08:04:47Z

pipes on Linux use 16 pages for long time. 4096 is the maximum size you can write atomically to a pipe, not the limit of the pipe size.

@njsmith, Do you have real world use case for process receiving many signals in short time? Did you try to reproduce this?

imrn · 2017-02-28T23:39:37Z

Reproduced or not this is a plausable case even if the probability is low.
Data can be lost on loaded pipes if the write side is non-blocking or doesn not
conforms to full async principles. It is interesting that Python signal ocumentation
gives this: "signal.set_wakeup_fd(fd): fd must be non-blocking." (Probably due to
other reasons.)

Now let's think about the flag solution: Once signal is received, main thread is preemtively
interrupted and begins to run signal handler which sets the flag. But what if the main
thread was just about to reset that flag just before the handler kicks in. Upon continuation,
the flag will be cleared and again there is data loss. Opps.

How about this suggestion: Lets protect that flag context with thread syncronization
primitives. Then have our signal handler launch an other thread that modifies
the protected flag. And of course writes to curio's notify sock to wake and process the
new situation. But wait: What if there is another congestion on curio nofity sockpair.
To make it foolproof you should have blocking write. Lets' do it..
Opps. We are now blocked in Curio' home thread, waiting curio to clear things up.
Deadlocked..

Yeah this is unnecesarily long and complex and again unreliable thing. But it seems
if we dig enough we have such awkward things come up such as the inability for
accessing files asyncronously. :/

dabeaz · 2017-03-01T01:32:54Z

How do other libraries (Twisted, gevent, etc.) handle this?

imrn · 2017-03-01T06:01:23Z

How about this suggestion: Lets protect that flag context with thread syncronization
primitives. Then have our signal handler launch an other thread that modifies
the protected flag. And of course writes to curio's notify sock to wake and process the
new situation. But wait: What if there is another congestion on curio nofity sockpair.
To make it foolproof you should have blocking write. Lets' do it..
Opps. We are now blocked in Curio' home thread, waiting curio to clear things up.
Deadlocked..

What about the "helper thread" making the blocking write to notify sock?
Any problem with this? If so, this would be an other case that is dependent to
"helper threads" like async files.

dabeaz · 2017-03-01T11:00:55Z

Is there an actual problem with Curio's signal handling right now or is all of this being directed at a theoretical bug?

imrn · 2017-03-01T19:44:28Z

Since I have no serious use of signals I have no problems with it. But lets decide
for the general theme for Curio discussions: Is it "if it works don't touch" or
Are "possible improvements/questions/art_elsewhere/etc.." welcome?
Since no body can hard press on anybody, I'd always thought the latter.
But if you have any objections, please make it clear and lets continue like that.

dabeaz · 2017-03-01T20:35:49Z

As a general rule, I have a pretty pragmatic attitude towards things. Signal handling, in particular, is a known area of hell for systems programming--especially when it starts getting mixed up with threads. With respect to Curio, I consider signal handling to be important, but I also think that probably 99% of the use of signals are going to be related to process termination, interruption, or restart (i.e., SIGTERM, SIGINT, and SIGHUP). These aren't the kinds of signals that are going to be arriving at a high rate of speed, causing signals to be lost. Most other signals, for that matter, often indicate various program faults or error conditions (SIGSEGV, SIGILL, SIGFPE, SIGABRT, etc.) or things related to terminals/job control (SIGTSTP, SIGTTIN, SIGTTOU, etc.). None of these are going to arrive at high rates of speed either.

There are some signals related to asynchronous I/O such as (SIGIO, SIGPOLL, etc.) that could potentially come in fast, but using those would be weird given that the whole point of using Curio is to do asynchronous I/O in the first place---I'm not sure why you be doing it in a manner that additionally added signal handling into the mix unless you were some kind of masochist. There are applications involving interval timers (SIGALRM) that could cause a lot of signals, but Curio already has timing functionality. So, I'm not entirely sure how common it would be for someone to combine an interval timer signal with Curio. And even if they did set up an interval timer, how often would it have to be firing to cause signals to be dropped?

The bottom line is that I'm all for improvement, but working on far-out edge cases isn't a super high priority. I'd much prefer to find out if people are actually experiencing a problem in the real world. I'd also like to know if these problems have been addressed in other libraries with a much larger user base (e.g., Twisted, Gevent, etc.). If the problem has been addressed elsewhere, that is certainly more motivation for looking at it in Curio. If not, I'm inclined to let sleeping dogs lie.

dabeaz · 2017-03-08T01:37:08Z

In the course of cleaning up some kernel internals and looking at various things, I've concluded that I can implement signal handling entirely "out of kernel". That is, it doesn't have to be in the kernel at all, but could be done using user-level code in Curio. It could continue to use the wake-up fd approach that's used now. However, something else entirely could be used as well. I'm going to experiment, but I'm open to ideas.

njsmith · 2017-03-13T23:06:31Z

In case it's of interest, here's how trio does it: https://github.com/njsmith/trio/blob/master/trio/_signals.py

It relies on trio's core providing a call_soon_thread_and_signal_safe primitive with is used for all forms of kernel reentry, and then there are signal specific hoops to jump through (in particular, it coalesces identical signals which is probably over-engineering but whatever, and more importantly it takes some care to avoid a race condition when signals arrive just as we're about to stop listening for them).

While we're talking about signal handling challenges, this thread might also be of interest: python-trio/trio#42
I know Curio doesn't support Windows right now, but it turns out that control-C on windows doesn't interrupt select unless you jump through some hoops.

dabeaz · 2017-03-14T00:32:59Z

If anything, all of this reaffirms my decision to take signal handling out of the Curio kernel core ;-). At the moment, I'm doing some different kinds of handling involving events/queues. But it's all Curio user-level code so I suppose it's something that could be played around with if necessary. And windows... ugh.

imrn · 2017-03-19T15:04:55Z

This is not directly related with this issue but an other warning about signals on python:

Procedure:

Register your signal handler in main thread
Launch a thread; T1.
Launch a process P1 from T1
SIGCHLD is never received at main thread

There are all sorts of warnings around the net about
python/thread/signal combinations but not this one.

I'd like to hear similar experiences about this case to confirm the
behavior or to find out if I'm doing something wrong.

njsmith · 2017-03-19T15:17:56Z

Yes, in general in Python signals are only delivered to the main thread, and only the main thread is allowed to call signal related APIs. I'm pretty sure this is documented in the signal module manual.

…

On Mar 19, 2017 8:04 AM, "Imran Geriskovan" ***@***.***> wrote: This is not directly related with this issue but an other warning about signals on python: Procedure: - Register your signal handler in main thread - Launch a thread; T1. - Launch a process P1 from T1 - SIGCHLD is never received at main thread There are all sort of warnings around the net about python/thread/signal combinations but not this one. I'd like to hear similar experiences about case to confirm the behavior or to find out thing I'm doing wrong. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#118 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAlOaC_7wNRMEHg2DtzJhra-UqczUBb8ks5rnUQXgaJpZM4K3ahq> .

imrn · 2017-03-19T15:28:13Z

On 3/19/17, Nathaniel J. Smith ***@***.***> wrote: Yes, in general in Python signals are only delivered to the main thread, and only the main thread is allowed to call signal related APIs. I'm pretty sure this is documented in the signal module manual.

Documentation says so. But in this case "the signal wasn't even delivered to the main thread."

nirs · 2017-03-19T23:30:01Z

@imrn this seems to work as expected:

$ cat test.py 
import signal
import subprocess
import threading

def sigchld(signo, frame):
    print("received singal: %d" % signo)

signal.signal(signal.SIGCHLD, sigchld)

def run_child():
    subprocess.call("true")

t = threading.Thread(target=run_child)
t.start()
t.join()

$ python3 test.py 
received singal: 17

imrn · 2017-03-20T11:36:59Z

Here, signal is only printed at the end of sleep(30). You can replace sleep with "select" and have the same result. So I can not use a signal handler to write to a socket and wake up the selector. After carefully re-reading the documentation I think this case is described with this paragraph: "A long-running calculation implemented purely in C (such as regular expression matching on a large body of text) may run uninterrupted for an arbitrary amount of time, regardless of any signals received. The Python signal handlers will be called when the calculation finishes." ```python from signal import signal, SIGCHLD from subprocess import Popen, PIPE from threading import Thread from time import sleep def sigchld(signo, frame): print("received singal: %d" % signo) signal(SIGCHLD, sigchld) def run_child(): with Popen(('ls',), stdout=PIPE) as ls: print(ls.stdout.read()) t = Thread(target=run_child) t.start() # t.join() sleep(30) ```

dabeaz · 2017-03-20T12:26:31Z

I used to teach a concurrency course (primarily focused on threads). In that class, I usually advised keeping the main-thread idle and running everything else in separate threads. The main reason was to make it so that KeyboardInterrupt could continue to work. However, that same advice applies to signals.

I'd probably have to think about how this impacts Curio. I haven't noticed an issue with signal delivery, but yes, if the main thread were getting blocked up for some reason, it might make more sense to run the Curio kernel in a separate thread.

imrn · 2017-03-20T14:51:39Z

It seems python runtime/threading constraints are the reason
for limited signal handling capability:

Interruption of system calls and library functions by signal handlers
http://man7.org/linux/man-pages/man7/signal.7.html

njsmith · 2017-03-20T15:37:20Z

That time.sleep thing is a really interesting bug! I think what's going on is that Python assumes that on unix, syscalls like sleep will get interrupted if a signal arrives, and if you look at the source to time.sleep there's some logic there to notice this interruption, run the signal handler, and then continue the sleep (unless the signal handler raised an exception, in which case the exception propagates). But here, the kernel is (for whatever reason) delivering the signal to the child thread, rather than the main thread. So the C level signal handler is running promptly, and it sets the relevant flags to tell the interpreter to run the python level handler asap, but the call to sleep is *not* interrupted, so the main thread doesn't wake up and notice these flags until later. On Windows Python has some special logic in sleep (and a few other places, but not select) to deal with this kind of thing, because on Windows signals are always delivered in separate threads. But on unix the rule is that the kernel gets to pick any thread to handle the signal, and Python is incorrectly assuming it will always be the main thread. At least that's my guess. A possible workaround would be to call pthread_sigmask in the child thread to block delivery of SIGCHLD. (Python exposes pthread_sigmask in the signal module.) Then the kernel won't be able to deliver the signal to the child thread, and will have to use the main thread instead. Doing this for every thread is fragile and error prone though, because you have to do it for *every* thread. And libraries like zmq might spawn their own threads without our knowledge. Really this is a python bug, I think. I wouldn't be surprised if there's some report on bugs.python.org from like 2009 complaining about it :-).

nirs · 2017-03-20T20:30:14Z

I think time.sleep in the main thread does not work since the child thread receive the SIGCHLD signal (it is the parent process of the child process). Python writes a byte to the wakeup fd, but the main thread is not watching this fd, so it continue to sleep, and you receive the signal when the sleep returns. We had this issue in vdsm, and we solved it using signal.set_wakeup_fd, which works reliably, and nothing is fundamentally wrong with it.

See https://github.com/oVirt/vdsm/blob/master/lib/vdsm/common/sigutils.py.

Here is a simplified example:

$ cat test.py 
import fcntl
import logging
import os
import select
import signal
import subprocess
import threading


def sigchld(signo, frame):
    logging.info("received singal: %d", signo)


def set_non_blocking(fd):
    flags = fcntl.fcntl(fd, fcntl.F_GETFL)
    flags |= os.O_NONBLOCK
    fcntl.fcntl(fd, fcntl.F_SETFL, flags)


def run_child():
    logging.info("starting child...")
    subprocess.call(["sleep", "1"])
    logging.info("child terminated")


logging.basicConfig(level=logging.INFO,
                    format="%(asctime)s (%(threadName)s) %(message)s")

signal.signal(signal.SIGCHLD, sigchld)

pipe = os.pipe()
set_non_blocking(pipe[0])
set_non_blocking(pipe[1])

signal.set_wakeup_fd(pipe[1])

t = threading.Thread(target=run_child)
t.start()

logging.info("waiting 10 seconds...")
r, _, _ = select.select([pipe[0]], [], [], 10.0)
if not r:
    logging.info("timeout, no signal received")
logging.info("terminating")

Here is example run:

$ python test.py 
2017-03-20 22:15:25,631 (Thread-1) starting child...
2017-03-20 22:15:25,632 (MainThread) waiting 10 seconds...
2017-03-20 22:15:26,633 (MainThread) received singal: 17
2017-03-20 22:15:26,634 (Thread-1) child terminated
2017-03-20 22:15:26,634 (MainThread) terminating

This issue exist only with SIGCHLD from child processes started on another thread. Signals sent to the process can be handled with much less code.

njsmith · 2017-03-21T04:33:30Z

This issue exist only with SIGCHLD from child processes started on another thread. Signals sent to the process can be handled with much less code.

This might be true in practice, but it's not true in theory – POSIX says that when a signal is sent to the process, the kernel is free to pick any arbitrary thread (that doesn't have that signal masked) for the delivery. Not sure if it matters, but it's always nice to code to the spec rather than implementation quirks when possible...

For some reason I couldn't get set_wakeup_fd to work to solve the analogous version of problem on Windows (though I agree that it ought to work!). I'm assuming you haven't tried running vdsm on Windows though :-). set_wakeup_fd is probably a good solution for curio though.

I think a more general solution would be for the CPython C-level signal handler to do something like:

if (this_is_not_main_thread) {
    pthread_kill(main_thread, signum);
} else {
    // actual signal handler
}

It looks like there's sort of an existing bug for this here: https://bugs.python.org/issue21895

njsmith · 2017-03-21T06:08:08Z

Oh, and returning somewhat to the original topic of this issue... it looks like set_wakeup_fd is fundamentally unreliable even when used solely to trigger a wakeup with actual content carried over other channels :-(. The problem is that if the wakeup fd's internal buffer overflows, then what we want is to ignore this and carry on – if we're just using the fd to trigger wakeups then the only states that matter are "empty" and "not-empty", and if the buffer is overflowing then it's already in the "not-empty" state. But CPython's actual wakeup_fd handling doesn't work like this: what it actually does if the buffer overflows is to blow up your program by causing an OSError to materialize at some arbitrary bytecode instruction.

In trio we only use the wakeup fd for wakeups, so we set its buffer as small as possible to avoid wasting kernel memory. On Linux "as small as possible" means 6 entries, and on MacOS it means 1. So I definitely can't use set_wakeup_fd on these platforms. Or rather, I can, but only if I bump up the buffer size to some "reasonable" amount that's hopefully big enough to avoid ever hitting the limit without wasting too much memory. Sigh.

nirs · 2017-03-21T08:38:29Z

@njsmith,

python 2 ignores overflowed buffer, and python 3 (checked 3.5 and 3.7.0a) writes a warning to stderr, certainly not blowing up:

Exception ignored when trying to write to the signal wakeup fd:
BlockingIOError: [Errno 11] Resource temporarily unavailable

Can you show a real program affected by these fundamental issues?

dabeaz · 2017-03-21T11:30:21Z

I'm still not convinced that this set_wakeup_fd() is a practical concern in practice. If Python gets a signal, it writes to that file. If there's a chance it might overflow buffer space because nothing is listening, then make the buffer bigger. Or, don't write code that blocks the signal listener for long periods.

Regarding earlier messages----is there some particular reason someone would be fooling around with SIGCHLD in combination with the subprocess module? Functions in that module already reap the exit code. There's almost no reason to be writing a SIGCHLD handler to avoid zombies there. Maybe I'm missing something obvious.

imrn · 2017-03-21T15:28:21Z

This is curio's subprocess wait function.
I think it assumes it will be called when the process has already exited,
or just about to exit so that wasted cpu cycles and reduction in application
responsiveness is negligible.

What if an application launches a long running process and starts waiting it?
What is the proper way for waiting a subprocess?

async def wait(self):
    while True:
        retcode = self._popen.poll()
        if retcode is not None:
            return retcode
        await sleep(0.0005)

njsmith · 2017-03-22T01:28:27Z

@nirs: doh, you're right, I misread the C code. A bogus warning that can't be silenced is better than a bogus exception, but still not so great :-)

dabeaz · 2017-03-22T18:09:58Z

I'm going to punt the subprocess.wait() issue to a different issue. That definitely needs to be addressed.

dabeaz closed this as completed Nov 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Signal watching uses signal.set_wakeup_fd, which is fundamentally unreliable #118

Signal watching uses signal.set_wakeup_fd, which is fundamentally unreliable #118

njsmith commented Nov 20, 2016

1st1 commented Nov 20, 2016

njsmith commented Nov 20, 2016

dabeaz commented Nov 20, 2016

njsmith commented Feb 4, 2017 •

edited

Loading

nirs commented Feb 27, 2017

imrn commented Feb 28, 2017

dabeaz commented Mar 1, 2017

imrn commented Mar 1, 2017

dabeaz commented Mar 1, 2017

imrn commented Mar 1, 2017 •

edited

Loading

dabeaz commented Mar 1, 2017

dabeaz commented Mar 8, 2017

njsmith commented Mar 13, 2017

dabeaz commented Mar 14, 2017

imrn commented Mar 19, 2017 •

edited

Loading

njsmith commented Mar 19, 2017 via email

imrn commented Mar 19, 2017 via email •

edited

Loading

nirs commented Mar 19, 2017

imrn commented Mar 20, 2017 via email •

edited

Loading

dabeaz commented Mar 20, 2017

imrn commented Mar 20, 2017

njsmith commented Mar 20, 2017 via email

nirs commented Mar 20, 2017

njsmith commented Mar 21, 2017

njsmith commented Mar 21, 2017

nirs commented Mar 21, 2017

dabeaz commented Mar 21, 2017

imrn commented Mar 21, 2017 •

edited

Loading

njsmith commented Mar 22, 2017

dabeaz commented Mar 22, 2017

Signal watching uses signal.set_wakeup_fd, which is fundamentally unreliable #118

Signal watching uses signal.set_wakeup_fd, which is fundamentally unreliable #118

Comments

njsmith commented Nov 20, 2016

1st1 commented Nov 20, 2016

njsmith commented Nov 20, 2016

dabeaz commented Nov 20, 2016

njsmith commented Feb 4, 2017 • edited Loading

nirs commented Feb 27, 2017

imrn commented Feb 28, 2017

dabeaz commented Mar 1, 2017

imrn commented Mar 1, 2017

dabeaz commented Mar 1, 2017

imrn commented Mar 1, 2017 • edited Loading

dabeaz commented Mar 1, 2017

dabeaz commented Mar 8, 2017

njsmith commented Mar 13, 2017

dabeaz commented Mar 14, 2017

imrn commented Mar 19, 2017 • edited Loading

njsmith commented Mar 19, 2017 via email

imrn commented Mar 19, 2017 via email • edited Loading

nirs commented Mar 19, 2017

imrn commented Mar 20, 2017 via email • edited Loading

dabeaz commented Mar 20, 2017

imrn commented Mar 20, 2017

njsmith commented Mar 20, 2017 via email

nirs commented Mar 20, 2017

njsmith commented Mar 21, 2017

njsmith commented Mar 21, 2017

nirs commented Mar 21, 2017

dabeaz commented Mar 21, 2017

imrn commented Mar 21, 2017 • edited Loading

njsmith commented Mar 22, 2017

dabeaz commented Mar 22, 2017

njsmith commented Feb 4, 2017 •

edited

Loading

imrn commented Mar 1, 2017 •

edited

Loading

imrn commented Mar 19, 2017 •

edited

Loading

imrn commented Mar 19, 2017 via email •

edited

Loading

imrn commented Mar 20, 2017 via email •

edited

Loading

imrn commented Mar 21, 2017 •

edited

Loading