Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resources are ignored if a task is cancelled and then added again #6710

Closed
crusaderky opened this issue Jul 11, 2022 · 1 comment · Fixed by #6711
Closed

Resources are ignored if a task is cancelled and then added again #6710

crusaderky opened this issue Jul 11, 2022 · 1 comment · Fixed by #6711
Assignees

Comments

@crusaderky
Copy link
Collaborator

  1. The client submits a task without resource restrictions -> the task enters WorkerState.ready

  2. The client cancels the task before it can start executing -> the task is forgotten, but its key is still in WorkerState.ready. This is normally ok thanks to

    while self.ready and len(self.executing) < self.nthreads:
    _, key = heapq.heappop(self.ready)
    ts = self.tasks.get(key)
    if ts is None:
    # It is possible for tasks to be released while still remaining on
    # `ready`. The scheduler might have re-routed to a new worker and
    # told this worker to release. If the task has "disappeared", just
    # continue through the heap.
    continue

  3. The client resubmit the task, with the same key, this time with resource restrictions.
    The task key is now both in WorkerState.ready and WorkerState.constrained.

  4. Due to other tasks in the queue, the task reaches the top of WorkerState.ready before it reaches the top of WorkerState.constrained.

  5. The task starts executing, completely bypassing resource restrictions. WorkerState.available_resources may become negative for as long as the task is running.

Proposed design

Reimplement ready and constrained as HeapSets. This also solves #6137.

@crusaderky crusaderky self-assigned this Jul 11, 2022
@crusaderky
Copy link
Collaborator Author

I've commented out a wealth of assertions in worker_state_machine.py because of this. Search for this ticket number.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant