Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent hang when worker process is killed #260

Closed
asavoy opened this issue Jan 8, 2020 · 2 comments
Closed

Intermittent hang when worker process is killed #260

asavoy opened this issue Jan 8, 2020 · 2 comments
Labels
Milestone

Comments

@asavoy
Copy link
Contributor

@asavoy asavoy commented Jan 8, 2020

Summary

When a worker process is killed, in the process of cleaning up, there is a small chance that dramatiq will be left in a state where it is running but not processing the queue. This is problematic when operating a dramatiq service, and we have seen this in production a few times already.

Investigation

My understanding is this sequence of events is occurring:

  • A worker process is killed
  • Dramatiq terminates all worker processes, so no more working on the queue
  • Dramatiq closes each worker pipe
  • If one of the worker pipes has already been closed, an OSError: [Errno 9] Bad file descriptor is thrown
  • This occurs in the main process, so it has exited
  • But not all worker pipes were closed yet, so the log_watcher thread is still looping over them and never exits
  • Hence the dramatiq process does not exit (as per its typical behavior), but does not process the queue either

I noticed in the code, there is already some handling for this in watch_logs(), so this is a patch (#261) based on that.

How to reproduce

This is how I was able to reproduce the problem.

OS: macOS 10.15.2
Python version: 3.7.5
Dramatiq version: 1.7.0
Redis version: 5.0.7

  1. Install dramatiq & redis
pip install dramatiq[redis]
  1. Write a simple dramatiq service in app.py:
# app.py
import dramatiq
from dramatiq.brokers.redis import RedisBroker

redis_broker = RedisBroker(host="localhost")
dramatiq.set_broker(redis_broker)

@dramatiq.actor
def suddenly_terminate():
    import ctypes
    ctypes.string_at(0)
  1. Run Redis
redis-server
  1. Enqueue the task many times:
python -c 'import app ; [app.suddenly_terminate.send() for _ in range(100)]'
  1. Run the dramatiq service, in a loop
(while true; do dramatiq app --processes 3 --threads 1 ; done)
  1. Most of the time, dramatiq will shutdown properly, and then restart due to the loop. But eventually it will hang with output similar to:
$ (while true; do dramatiq app --processes 3 --threads 1 ; done)
[2020-01-08 10:53:02,626] [PID 95401] [MainThread] [dramatiq.MainProcess] [INFO] Dramatiq '1.7.0' is booting up.
[2020-01-08 10:53:02,671] [PID 95419] [MainThread] [dramatiq.WorkerProcess(0)] [INFO] Worker process is ready for action.
[2020-01-08 10:53:02,671] [PID 95420] [MainThread] [dramatiq.WorkerProcess(1)] [INFO] Worker process is ready for action.
[2020-01-08 10:53:02,671] [PID 95421] [MainThread] [dramatiq.WorkerProcess(2)] [INFO] Worker process is ready for action.
Traceback (most recent call last):
  File "/Users/alvin/Projects/dramatiq-sandbox/.venv/bin/dramatiq", line 8, in <module>
    sys.exit(main())
  File "/Users/alvin/Projects/dramatiq-sandbox/.venv/lib/python3.7/site-packages/dramatiq/cli.py", line 458, in main
    pipe.close()
  File "/Users/alvin/.pyenv/versions/3.7.5/lib/python3.7/multiprocessing/connection.py", line 177, in close
    self._close()
  File "/Users/alvin/.pyenv/versions/3.7.5/lib/python3.7/multiprocessing/connection.py", line 361, in _close
    _close(self._handle)
OSError: [Errno 9] Bad file descriptor
@Bogdanp Bogdanp added the bug label Jan 14, 2020
@Bogdanp Bogdanp added this to the v1.8.0 milestone Jan 14, 2020
@Bogdanp
Copy link
Owner

@Bogdanp Bogdanp commented Jan 14, 2020

Thanks for the report and the fix!

@Bogdanp Bogdanp closed this Jan 14, 2020
@asavoy
Copy link
Contributor Author

@asavoy asavoy commented Jan 15, 2020

@Bogdanp Thank you, for Dramatiq! We moved from Celery and it is by far superior 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants