People often run into situations where they have excess data in memory and don't know why. A common culprit for this is persisted collections or futures sitting in Jupyter state. It's hard for users to become aware of this lingering state.
Advanced users could look at the futures owned by a client, but this isn't super-accessible, and it's not always easy to back out larger collections from these futures.
Some possible thoughts:
- Futures could have weak references to collections that contain them (we would populate this during persist calls). Clients could then have a
client.collections_in_memory method or whatever
- The scheduler could have dashboards showing high level structures that are in RAM better. We already have this in the various progress and TaskGroup plots but it's subtle (we show groups or prefixes that are in memory by making them less transparent than otherwise). This wouldn't show collections but rather groups, but maybe that's ok.
cc @phobson who ran into this recently
People often run into situations where they have excess data in memory and don't know why. A common culprit for this is persisted collections or futures sitting in Jupyter state. It's hard for users to become aware of this lingering state.
Advanced users could look at the futures owned by a client, but this isn't super-accessible, and it's not always easy to back out larger collections from these futures.
Some possible thoughts:
client.collections_in_memorymethod or whatevercc @phobson who ran into this recently