Versions:
- dask=0.14.3=py35_1
- distributed=1.16.3=py35_0
- tornado=4.5.1=py35_0
I'll try to create a repro but until I can do that, I'll just describe the setup here as a placeholder.
I am running a LocalCluster with 6 processes (# cpus) and 1 thread per worker.
I am submitting tasks from tasks using worker_client and using an AsCompleted(loop=client.loop) in the worker_client context to await their completion. Occasionally I will see a secondary task complete (ie, the task-of-task has finished what it is supposed to do), but the original task (that launched it) continues to wait on its queue for notification of completion, halting progress of the program. When I interrupt the program, my stack traces are as follows:
In worker_client context / original task:
^CTraceback (most recent call last):
File "<string>", line 1, in <module>
File "lib/python3.5/multiprocessing/forkserver.py", line 164, in main
rfds = [key.fileobj for (key, events) in selector.select()]
File "lib/python3.5/selectors.py", line 441, in select
fd_event_list = self._epoll.poll(timeout, max_ev)
KeyboardInterrupt
File "...", line 248, in run
for f in queue:
File "lib/python3.5/site-packages/distributed/client.py", line 2634, in __next__
return self.queue.get()
File "lib/python3.5/queue.py", line 164, in get
self.not_empty.wait()
File "lib/python3.5/threading.py", line 293, in wait
waiter.acquire()
Where the secondary task was running:
tornado.application - ERROR - Exception in callback functools.partial(<function wrap.<locals>.null_wrapper at 0x7f3fbd31b730>, <tornado.concurrent.Future object at 0x7f3fbd2c9d68>)
Traceback (most recent call last):
File "lib/python3.5/site-packages/tornado/ioloop.py", line 605, in _run_callback
ret = callback()
File "lib/python3.5/site-packages/tornado/stack_context.py", line 277, in null_wrapper
return fn(*args, **kwargs)
File "lib/python3.5/site-packages/tornado/ioloop.py", line 626, in _discard_future_result
future.result()
File "lib/python3.5/site-packages/tornado/concurrent.py", line 238, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "lib/python3.5/site-packages/tornado/gen.py", line 1063, in run
yielded = self.gen.throw(*exc_info)
File "lib/python3.5/site-packages/distributed/client.py", line 2600, in track_future
yield _wait(future)
File "lib/python3.5/site-packages/tornado/gen.py", line 1055, in run
value = future.result()
File "lib/python3.5/site-packages/tornado/concurrent.py", line 238, in result
raise_exc_info(self._exc_info)
File "<string>", line 4, in raise_exc_info
File "lib/python3.5/site-packages/tornado/gen.py", line 1069, in run
yielded = self.gen.send(value)
File "lib/python3.5/site-packages/distributed/client.py", line 2482, in _wait
raise CancelledError(cancelled)
concurrent.futures._base.CancelledError: ['mytask-349a99bf-9ffa-4fc6-988e-d85345a654d6']
Versions:
I'll try to create a repro but until I can do that, I'll just describe the setup here as a placeholder.
I am running a LocalCluster with 6 processes (# cpus) and 1 thread per worker.
I am submitting tasks from tasks using
worker_clientand using an AsCompleted(loop=client.loop) in the worker_client context to await their completion. Occasionally I will see a secondary task complete (ie, the task-of-task has finished what it is supposed to do), but the original task (that launched it) continues to wait on its queue for notification of completion, halting progress of the program. When I interrupt the program, my stack traces are as follows:In worker_client context / original task:
Where the secondary task was running: