Cross-posting since seems now to be mostly a dask.distributed problem.
Maybe related:
#3898
dask/dask#3530
dask/dask#6762
See for code/repro: dmlc/xgboost#6388 (comment)
In very short order workers and the scheduler hit OOM killers due to them keep accumulating memory, including across cleanly completed python client code.
This issue basically makes using dask with, e.g. NVIDIA RAPIDS/xgboost not possible as a multi-GPU or multi-node solution.
Cross-posting since seems now to be mostly a dask.distributed problem.
Maybe related:
#3898
dask/dask#3530
dask/dask#6762
See for code/repro: dmlc/xgboost#6388 (comment)
In very short order workers and the scheduler hit OOM killers due to them keep accumulating memory, including across cleanly completed python client code.
This issue basically makes using dask with, e.g. NVIDIA RAPIDS/xgboost not possible as a multi-GPU or multi-node solution.