Skip to content

Massive memory (100GB) used by dask-scheduler #6833

@pseudotensor

Description

@pseudotensor

Using rapids 0.14 with dask 2.17.0 python 3.6, conda, Ubuntu 16.04

I'm running xgboost using dask on GPUs. I do:

  1. convert in-memory numpy frame -> dask distributed frame using from_array()
  2. chunk the frames sufficiently for every worker (here 3 nodes, 2 GPUs/node each) has data as required so xgboost does not hang
  3. Run dataset like 5M rows x 10 columns of airlines data

Notes:

  1. Every time 1-3 is done it is in an isolate fork that dies at end of the fit. So whatever instances of client etc. are destroyed. Nothing remains on GPU, nothing remains in a process since it's gone. So I don't believe I need a client.close() call.
  2. Even though these forks are gone, within the code I'm always using client as a context manager within a with statement. So again shouldn't need a client.close() call or something like that.

I see my application use reasonable amount of memory based upon that dataset. I see workers using not much memory at all, like 2-3%.

However, the dask-scheduler is using 70% of my 128GB system! I don't understand how/why since the scheduler shouldn't hold data as far as I understand. Perhaps the above sequence of sending dask frame to xgboost is a problem, but it would be odd that task graph is forced to hold data.

Even if a single graph held data, which is already a problem, there's no way 90GB are needed to hold the data involved, so it's like there is repeatedly old data being stored.

image

I don't have code to share to repro since it's not easy to extract, but I'm hoping still for ideas. I will work on a repro, but any fixes/ideas would be good.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs attentionIt's been a while since this was pushed on. Needs attention from the owner or a maintainer.scheduler

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions