Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Developer documentation #190

Open
1 of 6 tasks
fjetter opened this issue Oct 12, 2021 · 5 comments
Open
1 of 6 tasks

Developer documentation #190

fjetter opened this issue Oct 12, 2021 · 5 comments

Comments

@fjetter
Copy link
Member

fjetter commented Oct 12, 2021

In an off-line discussion about technical debt and code complexity the valid concern was raised that many of our internal systems are not properly documented.

One example that came up is the current/new state machine (dask/distributed#4413 dask/distributed#5046) which is documented to some extend (https://distributed.dask.org/en/stable/scheduling-state.html and https://distributed.dask.org/en/stable/worker.html#internal-scheduling) but likely not sufficiently for another developer to make educated judgment calls about code changes.

I would like to collect topics, mostly for dask/dask and dask/distributed where more extensive developer documentation would help either onboarding new developers or help existing developers to familiarize themselves with other areas of the code.

cc @jcrist @jrbourbeau @gjoseph92 @ncclementi

@jcrist
Copy link
Member

jcrist commented Oct 12, 2021

Thanks for opening this @fjetter!

A few topics that come to mind:

  • Task states and and valid state transitions and how those are handled in the scheduler
  • The worker state machine and how it relates to the above
  • The path from dask collection -> HLG -> low level graph -> scheduler -> tasks (we have some docs on this already, but again probably not enough or easily discovered)
  • Networking in distributed. What talks to what, and in what direction? Are multiple interfaces supported? What are the different comm types? Any security implications?
  • Disk spilling/memory management. When does data move on the worker, and how is this configured?
  • Cythonization in the scheduler. How is this project going, how is it configured and applied, ... (perhaps this is in an active issue?)

@jacobtomlinson
Copy link
Member

I would add implementing Cluster classes to that list. Maybe custom adaptive classes too.

@GenevieveBuckley
Copy link
Collaborator

High level graphs are another area that have been mentioned as needing better developer docs. There is a tracking issue here: dask/dask#7755

@fjetter
Copy link
Member Author

fjetter commented Oct 13, 2021

Disk spilling/memory management. When does data move on the worker, and how is this configured?

https://distributed.dask.org/en/stable/worker.html#memory-management

Is this sufficient? Should I create a ticket to restructure/move this?

@fjetter
Copy link
Member Author

fjetter commented Oct 13, 2021

I created dedicated issues for the topics you mentioned. We can move the discussion about the individual items to the respective tickets.

Apart from further collecting topics, I would be curious about how we want to structure these new or already existing sections. I already realized, while researching the topic on our current docs, that some of the information asked here is already partially documented under "Developer Documentation" while other are in "Build understanding". This might be a judgement call for individual topics but if there are general best practices to follow, this can be discussed here as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants