Re-interpret error in lost worker scenario#6193
Conversation
Previously when a worker would die unexpectedly (such as with a sys.exit call) the nanny's multiprocessing.Queue to that worker would get a strange TypeError because the process went away. We now identify and interpret this error message and tell the nanny to clean up accordingly.
3ab3d19 to
98c94ff
Compare
|
Merging in 24 hours if there are no comments |
| loop.add_callback(do_stop, **msg) | ||
| try: | ||
| msg = child_stop_q.get() | ||
| except (TypeError, OSError): |
There was a problem hiding this comment.
Do you have an example traceback of the TypeError. I'm a bit uncomfortable as the docs indicate that this shouldn't be thrown at all.
get([block[, timeout]])
Remove and return an item from the queue. If optional args block is True (the default) and timeout is None (the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the queue.Empty exception if no item was available within that time. Otherwise (block is False), return an item if one is immediately available, else raise the queue.Empty exception (timeout is ignored in that case).
Changed in version 3.8: If the queue is closed, ValueError is raised instead of OSError.
Additionally, it's probably worth catching ValueError due to the exception change in Python 3.8.
There was a problem hiding this comment.
Grrrr, this stackoverflow comment was deleted. https://stackoverflow.com/questions/66886004/typeerror-when-getting-content-from-queue
There was a problem hiding this comment.
Grrrr, this stackoverflow comment was deleted. https://stackoverflow.com/questions/66886004/typeerror-when-getting-content-from-queue
The google cache of the above mentions something like the following:
File "listener.py", line 77, in <module>
print(parse_queue())
File "listener.py", line 68, in parse_queue
for item in Drainer(queue_res):
File "listener.py", line 18, in __iter__
yield self.q.get(False, 0.6)
File "<string>", line 2, in get
File "/usr/lib/python3.6/multiprocessing/managers.py", line 756, in _callmethod
conn.send((self._id, methodname, args, kwds))
File "/usr/lib/python3.6/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes
self._send(header + buf)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 368, in _send
n = write(self._handle, buf)
TypeError: an integer is required (got type NoneType)There was a problem hiding this comment.
I guess this is the situation you're running into if processes are dying during sys.exit
There was a problem hiding this comment.
Also, FWIW, I wouldn't spend too much time digging in here. This is a very very small part of the codebase and not a critical issue.
There was a problem hiding this comment.
I was getting OSError in Python 3.10. I haven't seen ValueError in practice.
There was a problem hiding this comment.
Ah I think I see. For an OSError, I guess there's distinction between:
- a formal queue close and
- the IPC socket closing as reality collapses around the Python interpreter due to a
sys.exit?
Just FYI, I was thinking of the first case, which has now been replaced by ValueError https://github.com/python/cpython/pull/9010/files.
Previously when a worker would die unexpectedly
(such as with a sys.exit call)
the nanny's multiprocessing.Queue to that worker would get a strange
TypeError because the process went away.
We now identify and interpret this error message and tell the nanny to clean up accordingly.