Skip to content

Massive memory (100GB) used by dask-scheduler #4243

@pseudotensor

Description

@pseudotensor

Cross-posting since seems now to be mostly a dask.distributed problem.

Maybe related:
#3898
dask/dask#3530
dask/dask#6762

See for code/repro: dmlc/xgboost#6388 (comment)

In very short order workers and the scheduler hit OOM killers due to them keep accumulating memory, including across cleanly completed python client code.

This issue basically makes using dask with, e.g. NVIDIA RAPIDS/xgboost not possible as a multi-GPU or multi-node solution.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions