Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About greenlet worker and OS signals #755

Closed
abannura opened this issue Jul 18, 2023 · 12 comments
Closed

About greenlet worker and OS signals #755

abannura opened this issue Jul 18, 2023 · 12 comments

Comments

@abannura
Copy link

Hi! I'm using the Greenlet worker, and noticed that if a signal is received while one of the greenlets is executing (i.e. not "blocked" waiting for I/O), it will be that "thread" the one who handles the signal.

This leads to a somehow troublesome behavior. For instance, when a SIGINT is received, it can lead to only one greenlet being stopped, instead of the whole event loop (the other greenlets keep going on).

I just wanted to know your thoughts on this. Is there a way to force that the signal is handled outside the event loop? Should I even worry about this, as the chances of it happening should be extremely low (tasks are heavy on I/O, not CPU)?

@coleifer
Copy link
Owner

Let me investigate, thanks for the report.

@coleifer
Copy link
Owner

I've added a change in 361e380 that seems to address the issue you described, as well as a potential separate issue where a SIGTERM is handled, but does not actually interrupt the running (blocking) task.

Do you mind pulling those changes in and taking a look when you have time?

@abannura
Copy link
Author

abannura commented Jul 18, 2023

Thanks for the quick reply! I did some experiments, and I think that those changes solve the problem. In any case, I leave here my findings as it may be helpful to you:

1. SIGTERM:

1.1. When no greenlet is executing, the signal is handled by the main thread.

In this case, the signal handler will raise a KeyboardInterrupt, which is captured and handled as a SIGINT by the consumer (i.e. graceful shutdown).

If there are greenlets "waiting" for an I/O result, the consumer will wait on them (via .join) instead of interrupting them. Otherwise the consumer will just exit.

1.2. When there is a greenlet executing, the signal is handled by the greenlet.

In this case, the signal handler stops the execution of the running greenlet, and sets the necessary flags for the consumer to shutdown (not gracefully).

If there were no other greenlets ready to be run, the consumer will exit right away.

If a greenlet was waiting for the result of an I/O operation, the consumer will also exit right away and that particular result will be lost (which is expected). Nonetheless, Huey's SIGNAL_INTERRUPTED will be triggered.

If there were other greenlets waiting to be run, one of them will start executing (despite the consumer already exiting). This will not block the main thread, but the task being run will be lost (no SIGNAL_INTERRUPTED will be triggered).

In my case, I'm reenqueuing a task in a SIGNAL_INTERRUPTED hook. This leads to the reenqueued task to starts executing in one of the greenlets, and then being lost (which I know is not Huey's problem as it doesn't try to achieve at-least-once execution).

2. SIGINT:

2.1. When no greenlet is executing, the signal is handled by the main thread.

This case is the same as 1.1., because it will trigger a graceful shutdown.

2.2. When there is a greenlet executing, the signal is handled by the greenlet.

In this case, the greenlet will continue executing (it will not be stopped by the signal). Once it finishes, one of two things can happen depending on whether there is a greenlet ready for execution up next. If that is the case, it will start executing. Otherwise, the consumer will enter graceful shutdown mode, and it will wait for all the greenlets to finish.

@coleifer
Copy link
Owner

I may need to get a little more clever with this - let me think on it some more and see if I can't get the behavior to more precisely match the threads/processes.

@abannura
Copy link
Author

Awesome! Let me know if I can help with anything.

@coleifer
Copy link
Owner

coleifer commented Jul 19, 2023

I'm not quite sure yet what the best option will be, but this clarifies a bit some of the difficulty:

Signal handlers run in the main thread and the main greenlet in that thread (whether anything is patched or not). The main greenlet is running the hub, and things that run in the hub can't switch out. Trying to join a greenlet-based thread requires switching.

Looking at the source (and by testing locally), the signal handlers are all run on the gevent Hub. As the comment indicates, we can't switch there.

I've tried making a small change here:

9cbf17e

Testing locally it looks to be working OK.

@coleifer
Copy link
Owner

The behavior I'm observing with the above change (9cbf17e) is:

  • SIGINT
    • no tasks: exits cleanly
    • io-bound tasks: waits til they finish, exits cleanly
    • mix of io-bound and cpu-bound tasks: waits til they finish, exits cleanly
  • SIGTERM
    • no tasks, exits cleanly
    • io-bound tasks: exits immediately, interrupted task signal fires
    • mix of io-bound and cpu-bound tasks: waits til the cpu-bound tasks finish(!), notifies any interrupted tasks, exits cleanly

I'm not sure we can do a whole lot better than that, but am open to ideas. Closing for now.

@mihaelmaltar
Copy link

Why was the change in 361e380 done only for WORKER_GREENLET?

I am using WORKER_THREAD and I have a problem that worker is restarted instead of being shut down after finishing task if it receives SIGINT while running a task.

@coleifer
Copy link
Owner

coleifer commented Apr 4, 2024

I don't see this. In the examples/simple directory in this repo there is a task that just sleeps for however long you specify. Here is output of interrupting that task while it is running when using 4 worker threads:

process 11423, thread 140159330158336 - startup hook
process 11423, thread 140159321765632 - startup hook
process 11423, thread 140159313372928 - startup hook
process 11423, thread 140159304980224 - startup hook
09:09:25 Executing tasks.slow: d8b86a95-0f0a-4c62-b1a2-c46deeb40355
going to sleep for 30 seconds
^C09:09:28 Received SIGINT
09:09:28 Shutting down gracefully...
process 11423, thread 140159304980224 - shutdown hook
process 11423, thread 140159313372928 - shutdown hook
process 11423, thread 140159321765632 - shutdown hook
finished sleeping for 30 seconds
09:09:55 tasks.slow: d8b86a95-0f0a-4c62-b1a2-c46deeb40355 executed in 30.029s
received signal [complete] for task [tasks.slow: d8b86a95-0f0a-4c62-b1a2-c46deeb40355]
process 11423, thread 140159330158336 - shutdown hook
09:09:55 All workers have stopped.
09:09:55 Consumer exiting.

Here is the code to invoke the task:

>>> from tasks import *
>>> slow(30)

As you can see above, the consumer picks up the task and starts executing it. I send a SIGINT to the consumer and the 3 idle workers shutdown. Then once the task has finished it shuts down cleanly and the consumer process exits.

@mihaelmaltar
Copy link

mihaelmaltar commented Apr 4, 2024

Sorry, I didn't explain the full story. I am periodically restarting workers by sending SIGHUP. If that happens while worker is running a task and after that I send SIGINT to shutdown, only first signal (restart) is respected and shutdown is ignored.

If commit 361e380 also included WORKER_THREAD, it would fix my problem.

@coleifer
Copy link
Owner

coleifer commented Apr 4, 2024

Interesting, OK let me take a look.

@coleifer
Copy link
Owner

coleifer commented Apr 4, 2024

Should be fixed here 2e3f08b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants