New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Adaptive] Do not allow workers to downscale if they are running long-running tasks (e.g. worker_client
)
#7481
[Adaptive] Do not allow workers to downscale if they are running long-running tasks (e.g. worker_client
)
#7481
Conversation
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 27 files ± 0 27 suites ±0 11h 46m 14s ⏱️ + 13m 13s For more details on these failures, see this check. Results for commit 315b337. ± Comparison against base commit 415d4fa. ♻️ This comment has been updated with latest results. |
I think that if there is a running task, long-running or otherwise, that we should not choose to remove that worker when using adaptive scaling, even if it is mostly-idle. |
I think I agree given the current implementation. There is a possibility to allow for workers to naturally conclude their work by not allowing further assignments of tasks. This would be a similar mechanism as suggested in #3761 (comment) (basically setting a worker to |
1b101df
to
d381e34
Compare
valid_workers = [ws for ws in self.workers.values() if not ws.long_running] | ||
groups = groupby(key, valid_workers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this now prohibits downscaling of workers with long-running tasks, e.g. tasks-within-tasks
0f5b0da
to
a1ae1a7
Compare
worker_client
)
worker_client
)worker_client
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nits
Having tasks flagged as long running protects the worker from being classified as idle. However, when scaling adaptively and
Adapt
recommends fewer workers than there are idle, this can cause workers with seceded tasks to be downscaled even though there are better options.This suggests to only close workers with long running tasks if there are no other possible options.
Considering that these workers may be driving a computation we may even need to go as far as to prohibit downscaling of those tasks entirely