Developer documentation #190

fjetter · 2021-10-12T15:55:12Z

In an off-line discussion about technical debt and code complexity the valid concern was raised that many of our internal systems are not properly documented.

One example that came up is the current/new state machine (dask/distributed#4413 dask/distributed#5046) which is documented to some extend (https://distributed.dask.org/en/stable/scheduling-state.html and https://distributed.dask.org/en/stable/worker.html#internal-scheduling) but likely not sufficiently for another developer to make educated judgment calls about code changes.

I would like to collect topics, mostly for dask/dask and dask/distributed where more extensive developer documentation would help either onboarding new developers or help existing developers to familiarize themselves with other areas of the code.

cc @jcrist @jrbourbeau @gjoseph92 @ncclementi

The text was updated successfully, but these errors were encountered:

jcrist · 2021-10-12T17:49:57Z

Thanks for opening this @fjetter!

A few topics that come to mind:

Task states and and valid state transitions and how those are handled in the scheduler
The worker state machine and how it relates to the above
The path from dask collection -> HLG -> low level graph -> scheduler -> tasks (we have some docs on this already, but again probably not enough or easily discovered)
Networking in distributed. What talks to what, and in what direction? Are multiple interfaces supported? What are the different comm types? Any security implications?
Disk spilling/memory management. When does data move on the worker, and how is this configured?
Cythonization in the scheduler. How is this project going, how is it configured and applied, ... (perhaps this is in an active issue?)

jacobtomlinson · 2021-10-12T17:56:58Z

I would add implementing Cluster classes to that list. Maybe custom adaptive classes too.

GenevieveBuckley · 2021-10-12T23:52:05Z

High level graphs are another area that have been mentioned as needing better developer docs. There is a tracking issue here: dask/dask#7755

fjetter · 2021-10-13T09:43:40Z

Disk spilling/memory management. When does data move on the worker, and how is this configured?

https://distributed.dask.org/en/stable/worker.html#memory-management

Is this sufficient? Should I create a ticket to restructure/move this?

fjetter · 2021-10-13T09:48:14Z

I created dedicated issues for the topics you mentioned. We can move the discussion about the individual items to the respective tickets.

Apart from further collecting topics, I would be curious about how we want to structure these new or already existing sections. I already realized, while researching the topic on our current docs, that some of the information asked here is already partially documented under "Developer Documentation" while other are in "Build understanding". This might be a judgement call for individual topics but if there are general best practices to follow, this can be discussed here as well.

This was referenced Oct 13, 2021

[DEV DOCS] Documentation of Scheduler and Worker state machine dask/distributed#5413

Closed

[DEV DOCS] Journey of a task - HLG edition dask/distributed#5415

Open

fjetter mentioned this issue Oct 13, 2021

[DEV DOCS] Cluster and Adaptive classes dask/distributed#5417

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Developer documentation #190

Developer documentation #190

fjetter commented Oct 12, 2021 •

edited

jcrist commented Oct 12, 2021

jacobtomlinson commented Oct 12, 2021

GenevieveBuckley commented Oct 12, 2021

fjetter commented Oct 13, 2021

fjetter commented Oct 13, 2021

Developer documentation #190

Developer documentation #190

Comments

fjetter commented Oct 12, 2021 • edited

jcrist commented Oct 12, 2021

jacobtomlinson commented Oct 12, 2021

GenevieveBuckley commented Oct 12, 2021

fjetter commented Oct 13, 2021

fjetter commented Oct 13, 2021

fjetter commented Oct 12, 2021 •

edited