Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moar Plugins #205

Open
mrocklin opened this issue Nov 10, 2021 · 8 comments
Open

Moar Plugins #205

mrocklin opened this issue Nov 10, 2021 · 8 comments

Comments

@mrocklin
Copy link
Member

There are a few really fun plugins for Dask

These are easy to write and really impactful. They're also an easy way for people to get involved without touching the beating heart of Dask internals. This might be a fun way to get peripheral folks engaged. I was speaking with a colleague and found that he had a few other fun ones like hooking up logging or print statements to the dask dashboard. We've listed a few in the docs. I think that @crusaderky mentioned something in the AMM docs around adding better GC. We could add the malloc trim trick there as well.

There is a lot of potential here, and it's a good way to showcase how pluggable and hackable Dask is. These could also be featured in docs. So I'll suggest a few steps:

  1. Bring what we have in docs into the codebase (distributed maybe, or should these be in a dask-contrib package?)
  2. Add a docs section that includes each
  3. Host a small one-day sprint where we
    • First seed a set of ideas (smaller group)
    • Then bring on some folks who are not as familiar with the scheduler/workers to help implement them (or come up with their own
@crusaderky
Copy link

crusaderky commented Nov 11, 2021

I think that @crusaderky mentioned something in the AMM docs around adding better GC. We could add the malloc trim trick there as well.

That was an early idea about periodically calling the malloc_trim C function on the workers. However @fjetter had empirical experience about it causing hard to debug segfaults. The idea was discarded in favour of setting an env variable for stdlib to read before starting the worker.

@mrocklin
Copy link
Member Author

mrocklin commented Nov 11, 2021 via email

@fjetter
Copy link
Member

fjetter commented Nov 12, 2021

I'm not too concerned about the malloc trim anymore. We've had this discussion several times and I don't want to forbid a potentially game changing feature based on some anecdotal evidence. If we're hit by it, we'll keep this in mind but I think we should move on.
We are now setting the MALLOC_TRIM_THRESHOLD_ which should do the same. However, I think I've heard users complain that this isn't having the same impact as an explicit trim. If a plugin helps and people can opt-in, that would be nice.

@mrocklin
Copy link
Member Author

I'm mostly suggesting that we get more comfortable including optional plugins, and then showcasing these plugins in documentation. The optional, and default-off nature of these plugins should lower our standards for inclusion into the codebase. I also think that it's good to include lots of things, mostly so that people can see a gallery of what is possible.

@jsignell
Copy link
Member

This is a cool idea and sounds like it should be a new dask-contrib repo to me

@martindurant
Copy link
Member

Plus, plugins make for great, easy contrib packages. We could consider making a skeleton for it.

@martindurant
Copy link
Member

(sorry @jsignell , repeating you!)

@mrocklin
Copy link
Member Author

We've added several plugins to the distributed codebase (UploadFile, UploadDirectory, PipInstall, ...) and these don't seem to have caused any issues. Having plugins like these be automatically available if someone installs Dask is nice. Having to go find lots of little libraries is easier on developers but probably harder on users, especially more novice users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants