Reproducer
- Start a scheduler with a Python3.9 environment
- Run the following code with a Python3.10 environment
from __future__ import annotations
from distributed import Client
from distributed.diagnostics import SchedulerPlugin
client = Client("<ADDRESS>")
class TestPlugin(SchedulerPlugin):
def start(self, scheduler):
scheduler.handlers["test"] = self.test
async def test(self, comm):
return None
client.register_scheduler_plugin(TestPlugin(), idempotent=True)
client.sync(client.scheduler.test)
Outcome
Client
Traceback (most recent call last):
File "/Users/hendrikmakait/projects/dask/distributed/reproducer.py", line 18, in <module>
client.sync(client.scheduler.test)
File "/Users/hendrikmakait/projects/dask/distributed/distributed/utils.py", line 339, in sync
return sync(
File "/Users/hendrikmakait/projects/dask/distributed/distributed/utils.py", line 406, in sync
raise exc.with_traceback(tb)
File "/Users/hendrikmakait/projects/dask/distributed/distributed/utils.py", line 379, in f
result = yield future
File "/opt/homebrew/Caskroom/mambaforge/base/envs/dask-distributed-py3.10/lib/python3.10/site-packages/tornado/gen.py", line 762, in run
value = future.result()
File "/Users/hendrikmakait/projects/dask/distributed/distributed/core.py", line 1154, in send_recv_from_rpc
return await send_recv(comm=comm, op=key, **kwargs)
File "/Users/hendrikmakait/projects/dask/distributed/distributed/core.py", line 944, in send_recv
raise exc.with_traceback(tb)
File "/Users/hendrikmakait/projects/dask/distributed/distributed/core.py", line 770, in _handle_comm
result = await result
File "/Users/hendrikmakait/projects/dask/distributed/reproducer.py", line 13, in test
async def test(self, comm):
SystemError: unknown opcode
Scheduler logs
XXX lineno: 13, opcode: 129
2022-09-07 16:05:48,646 - distributed.core - ERROR - Exception while handling op test
Traceback (most recent call last):
File "/Users/hendrikmakait/projects/dask/distributed/distributed/core.py", line 770, in _handle_comm
result = await result
File "/Users/hendrikmakait/projects/dask/distributed/reproducer.py", line 13, in test
async def test(self, comm):
SystemError: unknown opcode
In a more complicated use case, this has led to a segfault instead of a SystemError on Coiled:
ep 6 09:40:12 ip-10-0-1-28 cloud-init[1133]: 2022-09-06 09:40:12,001 - distributed.scheduler - INFO - Clear task state
Sep 6 09:40:13 ip-10-0-1-28 kernel: [ 90.688881] python[1517]: segfault at 8 ip 0000563f29032c44 sp 00007ffc41695700 error 4 in python3.9[563f28f45000+209000]
Sep 6 09:40:13 ip-10-0-1-28 kernel: [ 90.688895] Code: 0f 84 90 00 00 00 48 8d 15 19 f0 23 00 48 39 d7 74 0c 48 8d 0d ed ae 24 00 48 39 cf 75 08 31 c0 c3 0f 1f 44 00 00 48 83 ec 08 <48> 8b 77 08 4c 8b 46 60 4d 85 c0 74 2f 49 8b 40 48 48 85 c0 74 26
Reproducer
Outcome
Client
Scheduler logs
In a more complicated use case, this has led to a
segfaultinstead of aSystemErroron Coiled: