I experienced a work-stealing issue with one task out of a batch of a couple of thousand sent to distributed. I don't believe there to be anything exceptional about this task. For this task, a summary of the logs is as follows:
11:36 - 12:06 : distributed.stealing: Moved task 45 times.
12:07 : stimulus task finished + unexpected worker completed task
12:15 - 12:18 : distributed.stealing: Moved task 3 times.
12:19 : stimulus task finished + unexpected worker completed task
12:20 : stimulus task finished + received already computed task
12:25 - 12:30 : Communication failed during replication (Repeated in batches of 10, about once every 0.2 seconds - so roughly 15,000 messages)
12:32 : Couldn’t gather keys
12:36 : Stimulus task finished
Any thoughts on what might be happening?
(distributed==1.19.3)
@mrocklin, do you think this may have been addressed (inter alia) by #1489?
I experienced a work-stealing issue with one task out of a batch of a couple of thousand sent to distributed. I don't believe there to be anything exceptional about this task. For this task, a summary of the logs is as follows:
Any thoughts on what might be happening?
(distributed==1.19.3)
@mrocklin, do you think this may have been addressed (inter alia) by #1489?