Skip to content

Query collections that are currently persisted #7080

@mrocklin

Description

@mrocklin

People often run into situations where they have excess data in memory and don't know why. A common culprit for this is persisted collections or futures sitting in Jupyter state. It's hard for users to become aware of this lingering state.

Advanced users could look at the futures owned by a client, but this isn't super-accessible, and it's not always easy to back out larger collections from these futures.

Some possible thoughts:

  1. Futures could have weak references to collections that contain them (we would populate this during persist calls). Clients could then have a client.collections_in_memory method or whatever
  2. The scheduler could have dashboards showing high level structures that are in RAM better. We already have this in the various progress and TaskGroup plots but it's subtle (we show groups or prefixes that are in memory by making them less transparent than otherwise). This wouldn't show collections but rather groups, but maybe that's ok.

cc @phobson who ran into this recently

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions