Skip to content

Work stealing issue #1526

@rbubley

Description

@rbubley

I experienced a work-stealing issue with one task out of a batch of a couple of thousand sent to distributed. I don't believe there to be anything exceptional about this task. For this task, a summary of the logs is as follows:

11:36 - 12:06 : distributed.stealing: Moved task 45 times.
12:07         : stimulus task finished + unexpected worker completed task
12:15 - 12:18 : distributed.stealing: Moved task 3 times.
12:19         : stimulus task finished + unexpected worker completed task
12:20         : stimulus task finished + received already computed task
12:25 - 12:30 : Communication failed during replication (Repeated in batches of 10, about once every 0.2 seconds - so roughly 15,000 messages)
12:32         : Couldn’t gather keys
12:36         : Stimulus task finished

Any thoughts on what might be happening?

(distributed==1.19.3)

@mrocklin, do you think this may have been addressed (inter alia) by #1489?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions