-
-
Notifications
You must be signed in to change notification settings - Fork 143
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Promote LIFO scaling log to warning level (#812)
* Promote LIFO scaling log to warning level * Add section on this to troubleshooting docs
- Loading branch information
1 parent
ef94c94
commit bd1bb7e
Showing
3 changed files
with
23 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
Troubleshooting | ||
=============== | ||
|
||
This page contains common problems and resolutions. | ||
|
||
Why am I losing data during scale down? | ||
--------------------------------------- | ||
|
||
When scaling down a cluster the controller will attempt to coordinate with the Dask scheduler and | ||
decide which workers to remove. If the controller cannot communicate with the scheduler it will fall | ||
back to last-in-first-out scaling and will remove the worker with the lowest uptime, even if that worker | ||
is actively processing data. This can result in loss of data and recalculation of a graph. | ||
|
||
This commonly happens if the version of Dask on the scheduler is very different to the verison on the controller. | ||
|
||
To mitigate this Dask has an optional HTTP API which is more decoupled than the RPC and allows for better | ||
support between versions. | ||
|
||
See `https://github.com/dask/dask-kubernetes/issues/807 <https://github.com/dask/dask-kubernetes/issues/807>`_ |