-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Allow collections_to_dsk
to be overridden
#3196
Conversation
To make it easier for user to hook into the presubmission process, allow users to provide their own `collections_to_dsk` function.
Why do you need this? Without strong motivation I'm fairly against adding this as it blurs the boundary between dask internals and user provided methods. I'd wonder if what you're trying to accomplish can better be served via a different hook. |
This probably has some issues admittedly as it is a bit more exploratory at this stage. My use case is very similar to the one raised in issue ( dask/distributed#1384 ). Namely a desire to have a persistent cache that both works in the same analytical session and across analytical sessions. One of the suggestions, was to override If you have ideas on different ways to hook in, would be interested to hear. :) |
I'll give it some thought. To clarify, when you say persistent cache do you mean:
Do you want this to work just with the distributed scheduler or on all schedulers? |
Thanks. Both. To clarify, if we have submitted the jobs during this session, this should be case 1. If this was an old session we are reviving, then this would be case 2. There's also the chance that in the same session we have had to expire some keys due to memory issues and then would also have to pull results from disk. Personally am primarily interested in the distributed scheduler. In practice it would be using |
FWIW have also been playing with a |
collections_to_dsk
to be overriddencollections_to_dsk
to be overridden
Have you had a chance to give this more thought @jcrist? |
Think this can be better addressed by the |
To make it easier for user to hook into the presubmission process, allow users to provide their own
collections_to_dsk
function.