Configuration Reference #6069

quasiben · 2020-04-06T15:32:50Z

This PR adds a configuration page to the docs. As the configuration rather large, I'd like to finish up both scheduler and worker configs while getting consensus on style. In subsequent PRs I/we can continue adding configuration details.

Fixes #4286

I've been using this online RST editor

Attached is a screen shot of the current setup:

jrbourbeau

Woo, thanks for working on this @quasiben! Looking forward to seeing this added. It appears the configuration-reference.rst file hasn't yet been included

mrocklin · 2020-04-06T15:46:57Z

I have two comments that I think we should consider:

Hierarchy

I'm curious about how best to handle hierarchy. Dask's configuration is nested, which I think helps with keeping things compartmentalized. The current structure has one level of nesting, followed up with a flattened hierarchy. This is a sensible choice, but there might be other choices.

One approach would be to use subsections and subsubsections to handle the nesting, and then use a Sphinx TOC on top. This would capture the hierarchical nature of our configuration and also maybe make it easy for users to navigate quickly to bits that they care about. It also might be way more confusing. I'm not sure.

Single source of truth

We're now copying configuration into two places. This might be necessary, but it's also something that we should be aware of and avoid if possible (although I have no suggestions on how best to avoid it).

mrocklin · 2020-04-06T15:48:33Z

One way to handle the single source of truth would be to have way more extensive comments in the config files themselves, and then autogenerate the docs. This has tradeoffs though.

quasiben · 2020-04-06T15:56:56Z

@mrocklin, I briefly thought about subsections but was lazy -- as a test, I opted for the following:

When defining nested configurations, the top level default value will be blank, with subsequent keys and values listed below

The single source of truth is important but I am -0.5 on adding more comments to the config. With more text in the config it makes editing the YAML rather challenging and becomes error prone. It's also worth noting that dask has configuration management in two locations:

distributed.yaml
dask.yaml

quasiben · 2020-04-06T16:35:56Z

I am looking into a smallish sphinx plugin + ruamel.yaml for loading comments

mrocklin · 2020-04-06T16:36:05Z

My hope is that using subsections wouldn't be terrible. It might also help to avoid the foo.bar.baz formatting in the table (which is liable to get long).

Let's take a look at the scheduler section

  scheduler:
    allowed-failures: 3     # number of retries before a task is considered bad
    bandwidth: 100000000    # 100 MB/s estimated worker-worker bandwidth
    blocked-handlers: []
    default-data-size: 1000
    # Number of seconds to wait until workers or clients are removed from the events log
    # after they have been removed from the scheduler
    events-cleanup-delay: 1h
    idle-timeout: null      # Shut down after this duration, like "1h" or "30 minutes"
    transition-log-length: 100000
    work-stealing: True     # workers should steal tasks from each other
    work-stealing-interval: 100ms  # Callback time for work stealing
    worker-ttl: null        # like '60s'. Time to live for workers.  They must heartbeat faster than this
    pickle: True            # Is the scheduler allowed to deserialize arbitrary bytestrings
    preload: []
    preload-argv: []
    unknown-task-duration: 500ms  # Default duration for all tasks with unknown durations ("15m", "2h")
    default-task-durations:  # How long we expect function names to run ("1h", "1s") (helps for long     tasks)
      rechunk-split: 1us
      shuffle-split: 1us
    validate: False         # Check scheduler state at every step for debugging
    dashboard:
      status:
        task-stream-length: 1000
      tasks:
        task-stream-length: 100000
      tls:
        ca-file: null
        key: null
        cert: null
    locks:
      lease-validation-interval: 10s  # The time to wait until an acquired semaphore is released if the  Client goes out of scope

So there are a lot of entries in the top level, which is great. The table will be simple for them. There are some larger sections, like scheduler.dashboard, which I think would make sense to break out to a subsection, and there are some smaller sections, like locks with a single element, that may not make sense to break out.

I think that it probably won't be hard to make subsections (the RST for this is fortunately simple). In some cases I think that we probably do want subsections, and in other cases the subsections might be small enough to even be a hindrance. I don't think that we want to mix approaches, but I'm not sure.

(I'm just thinking out loud here to get conversation going)

mrocklin · 2020-04-06T16:37:08Z

The single source of truth is important but I am -0.5 on adding more comments to the config. With more text in the config it makes editing the YAML rather challenging and becomes error prone.

Yeah, to be clear, I'm not pushing this. I'm just dumping a bunch of options out there to spur some thinking. This is going to be a lot of detailed work once we get something down. I'd prefer that we only do that detailed work once.

quasiben · 2020-04-06T21:54:48Z

In #b06c46 I played with building an plugin to automate the table (just for fun). It's not awful :)

Below is a screen shot

quasiben · 2020-04-06T21:57:07Z

docs/source/ext/yamltotable.py

+                if comment_token is None:
+                    comment = ""
+                else:
+                    comment = comment_token[2].value


Theoretically you could pull comments from another location besides the original yaml spec. However, this may come back around to single-source of truth issues

In dask-gateway @jcrist uses traitlets to store all of this. I'm not sure that Dask should go that far, but it's a demonstration of this approach.

This could also be useful for input validation

jcrist · 2020-04-06T22:35:48Z

Woot, this is nice to see @quasiben. I do agree with @mrocklin's concern about two sources of truth though.

For dask client-side configuration, lately I've been ensuring the default file is heavily (and nicely) commented, and then directly including that in the docs. Examples:

As another idea, the zero-to-jupyterhub project includes an extra validation script that tests that the values.yaml file matches the documented schema (see https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/master/jupyterhub/validate.py). The schema is stored in json, which makes it easier to use it to generate a nice html table for the docs.

*I personally prefer the traitlets approach, but recognize that not everyone might prefer that approach, and moving from Dask's existing configuration system would be tricky. *

mrocklin · 2020-04-06T22:52:03Z

*I personally prefer the traitlets approach, but recognize that not everyone might prefer that approach, and moving from Dask's existing configuration system would be tricky. *

Heh. That's me! :)

Yeah, I personally would be sad with traitlets I think (although I probably should do more homework on it). There is probably some happier medium though.

For dask client-side configuration, lately I've been ensuring the default file is heavily (and nicely) commented, and then directly including that in the docs.

Those are nice.

As another idea, the zero-to-jupyterhub project includes an extra validation script that tests that the values.yaml file matches the documented schema (see https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/master/jupyterhub/validate.py). The schema is stored in json, which makes it easier to use it to generate a nice html table for the docs.

A companion file like this might not be bad, especially if adding to it could be done optionally and users didn't see it directly.

quasiben · 2020-04-07T00:09:24Z

A companion file like this might not be bad, especially if adding to it could be done optionally and users didn't see it directly.

A companion file would need to duplicate the keys, no ? Maybe that's ok and it's a baby step in the direction of validation ?

jcrist · 2020-04-07T01:09:56Z

A companion file would need to duplicate the keys, no ? Maybe that's ok and it's a baby step in the direction of validation ?

To be clear, I think the value of the JupyterHub spec file approach is the validation script. So while the information is split between files (which could fall out of sync), there's a way to test that they remain in sync and that test could be run as part of our CI.

mrocklin · 2020-04-10T23:03:32Z

Here is an attempt with jsonschema (the thing that JuptyerHub uses)

dask/distributed#3696

So far I'm decently happy with it.

quasiben · 2020-04-13T03:40:35Z

I spent a bit of time with @mrocklin this weekend using PR dask/distributed#3696 to help with descriptions in the table. We pushed on getting the table to render everything and while it works, the rendering is a bit off:

I also spent some time adding a custom yaml to HTML styling. This non-csv layout of the config perhaps balances verbose descriptions (which users will appreciate) without sacrifice layout/readability (which the csv-table does). However, we lose fancy TOC linking:

mrocklin · 2020-04-13T14:13:58Z

Woot

…

On Sun, Apr 12, 2020 at 8:40 PM Benjamin Zaitlen ***@***.***> wrote: I spent a bit of time with @mrocklin <https://github.com/mrocklin> this weekend using PR dask/distributed#3696 <dask/distributed#3696> to help with descriptions in the table. We pushed on getting the table to render everything and while it works, the rendering is a bit off: [image: Screen Shot 2020-04-12 at 11 37 27 PM] <https://user-images.githubusercontent.com/1403768/79090126-9faeb180-7d16-11ea-8632-2f42b26e828e.png> I also spent some time adding a custom yaml to HTML styling. This non-csv layout of the config perhaps balances verbose descriptions (which users will appreciate) without sacrifice layout/readability (which the csv-table does). However, we lose fancy TOC linking: [image: Screen Shot 2020-04-12 at 11 32 13 PM] <https://user-images.githubusercontent.com/1403768/79090233-f6b48680-7d16-11ea-85f4-de4001250c22.png> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#6069 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACKZTCGMTZ4LZJ6HDTO2FLRMKCUBANCNFSM4MCL7W4Q> .

jcrist · 2020-04-13T14:41:41Z

This non-csv layout of the config perhaps balances verbose descriptions (which users will appreciate) without sacrifice layout/readability (which the csv-table does). However, we lose fancy TOC linking.

I personally prefer this style, and think you should be able to do TOC linking. Refs work fine for the setup I have for dask-gateway (https://gateway.dask.org/api-server.html), I feel like they should be doable here too.

quasiben · 2020-04-16T16:46:41Z

@jcrist this is ready for a review if you have some time.

Note: i also included a dask-schema in the PR. This made iterating a bit faster for me but also happy to move to another PR if you'd like.

- Register the extension module appropriately. - Lint conf.py and yamltohtml.py

jcrist · 2020-04-21T13:36:55Z

docs/source/configuration-reference.rst

+
+.. dask_config_to_html::
+    :location: distributed.dashboard
+    :config: https://raw.githubusercontent.com/dask/distributed/70700a1059fdae542ddbb3534f3caa3d27ca2e5d/distributed/distributed.yaml


I assume you intend to change this when dask/distributed#3696 is merged, just making a note so it's not forgotten.

yup yup, still futzing around with a few things

quasiben · 2020-04-21T20:27:55Z

dask/dask-schema.yaml

+              Upper limit for width, where width = num_nodes / height, a good measure
+              of parallelizability
+
+          subraphs:


This is correct but I suspect this should be changed to subgraphs in a later PR

quasiben · 2020-04-21T20:28:46Z

This is ready for another review should @jcrist or @mrocklin have some time

jcrist

Overall this looks pretty good to me. Thanks @quasiben.

docs/source/conf.py

dask/dask-schema.yaml

TomAugspurger · 2020-04-22T13:50:36Z

I think this is safe to merge when you're when you're ready. Maybe give @mrocklin a bit of time to object if he wants to take another look.

mrocklin · 2020-04-22T15:14:00Z

dask/dask-schema.yaml

+          size:
+            type: integer
+            description: |
+              The size of pixels used when displaying a dask array as an SVG image.


Suggested change

The size of pixels used when displaying a dask array as an SVG image.

The size of the image in pixels used when displaying a dask array as an SVG image.

dask/dask-schema.yaml

Co-Authored-By: Matthew Rocklin <mrocklin@gmail.com>

mrocklin · 2020-04-22T15:20:15Z

Thanks @quasiben . I'm really excited about this. I've added a couple of small suggestions inline. Two other comments here: When I build this locally I don't seem to get descriptions for most of the Dask config values. This is maybe just my environment? I think that it would be useful to have a TOC at the top of the configuration-reference page so that people can see quickly the options that are available to them without scrolling linearly through everything

…

On Wed, Apr 22, 2020 at 6:50 AM Tom Augspurger ***@***.***> wrote: I think this is safe to merge when you're when you're ready. Maybe give @mrocklin <https://github.com/mrocklin> a bit of time to object if he wants to take another look. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#6069 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACKZTB76FUOXWOJVNI4J6TRN3Y3ZANCNFSM4MCL7W4Q> .

jcrist · 2020-04-22T15:22:55Z

I think that it would be useful to have a TOC at the top of the
configuration-reference page so that people can see quickly the options
that are available to them without scrolling linearly through everything

Good idea. That should be doable by inserting:

.. contents:: :local:

where you want it.

quasiben · 2020-04-22T15:24:55Z

You won't get dask config values because the schema file is currently not in master. After merging you should be fine.

We could add a TOC or they could use the TOC on left bar:

quasiben · 2020-04-22T15:28:37Z

Thanks for the suggestion @jcrist and for all the reviews !

quasiben · 2020-04-22T15:55:52Z

Tests are currently failing because dask-schema is not in master. I can break that out into another PR then merge this or merge this and things should pass. Should they not I can immediately fix

jsignell

@quasiben - Feel free to merge this PR as-is when you have time to make sure that everything goes according to plan.

quasiben · 2020-04-24T13:14:41Z

Thanks @jsignell . I'll keep watch here and the docs

quasiben · 2020-04-24T13:23:08Z

docs are broken -- fixing now

initial setup of configuration reference

82a1926

jrbourbeau reviewed Apr 6, 2020

View reviewed changes

add missing file

695f902

quasiben changed the title ~~initial setup of configuration reference~~ Configuration Reference Apr 6, 2020

add automation plugin

8b06c46

quasiben commented Apr 6, 2020

View reviewed changes

quasiben added 2 commits April 12, 2020 17:23

render with jsonschema

100e196

add raw html output directive with custom HTML styling

7e6854a

quasiben added 5 commits April 13, 2020 11:16

add additional keys

1bc6612

add autodoc work

e63ac70

add dask schema, test, and update requirements

f78fa14

fix schema

93356b0

clean up/lint extension

58aeefd

Fixups

2852c6f

- Register the extension module appropriately. - Lint conf.py and yamltohtml.py

update config directive

727f33a

jcrist reviewed Apr 21, 2020

View reviewed changes

quasiben added 3 commits April 21, 2020 10:22

update schema, update tests of schema, lint

f1db338

change default value in docstring for ave_width

d8c51b1

make schema official and switch to master and fix test

eb9a9c1

quasiben commented Apr 21, 2020

View reviewed changes

jcrist reviewed Apr 21, 2020

View reviewed changes

docs/source/conf.py Outdated Show resolved Hide resolved

dask/dask-schema.yaml Outdated Show resolved Hide resolved

remove trailing comma and rename extension file

c17e6c7

mrocklin reviewed Apr 22, 2020

View reviewed changes

dask/dask-schema.yaml Outdated Show resolved Hide resolved

mrocklin reviewed Apr 22, 2020

View reviewed changes

dask/dask-schema.yaml Show resolved Hide resolved

quasiben and others added 2 commits April 22, 2020 11:19

Update dask/dask-schema.yaml

e2cb58a

Co-Authored-By: Matthew Rocklin <mrocklin@gmail.com>

Update dask/dask-schema.yaml

29bc7c4

Co-Authored-By: Matthew Rocklin <mrocklin@gmail.com>

add local TOC

ff2cb81

jsignell approved these changes Apr 24, 2020

View reviewed changes

quasiben merged commit 4c7b170 into dask:master Apr 24, 2020

quasiben mentioned this pull request Apr 24, 2020

Broken Docs #6131

Closed

quasiben deleted the configuration-reference branch April 24, 2020 14:48

jcrist mentioned this pull request May 19, 2020

Configuration Reference PrefectHQ/prefect#2562

Closed

jsignell mentioned this pull request Jul 24, 2020

Configuration Validation #5695

Open

quasiben mentioned this pull request Feb 23, 2022

Add dask config extension to sphinx theme dask/dask-sphinx-theme#64

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration Reference #6069

Configuration Reference #6069

quasiben commented Apr 6, 2020

jrbourbeau left a comment

mrocklin commented Apr 6, 2020

mrocklin commented Apr 6, 2020

quasiben commented Apr 6, 2020

quasiben commented Apr 6, 2020

mrocklin commented Apr 6, 2020

mrocklin commented Apr 6, 2020

quasiben commented Apr 6, 2020

quasiben Apr 6, 2020

mrocklin Apr 6, 2020

jcrist commented Apr 6, 2020 •

edited

mrocklin commented Apr 6, 2020

quasiben commented Apr 7, 2020

jcrist commented Apr 7, 2020

mrocklin commented Apr 10, 2020

quasiben commented Apr 13, 2020

mrocklin commented Apr 13, 2020 via email

jcrist commented Apr 13, 2020

quasiben commented Apr 16, 2020

jcrist Apr 21, 2020

quasiben Apr 21, 2020

quasiben Apr 21, 2020

quasiben commented Apr 21, 2020

jcrist left a comment

TomAugspurger commented Apr 22, 2020

mrocklin Apr 22, 2020

mrocklin commented Apr 22, 2020 via email

jcrist commented Apr 22, 2020

quasiben commented Apr 22, 2020

quasiben commented Apr 22, 2020

quasiben commented Apr 22, 2020

jsignell left a comment

quasiben commented Apr 24, 2020

quasiben commented Apr 24, 2020

	The size of pixels used when displaying a dask array as an SVG image.
	The size of the image in pixels used when displaying a dask array as an SVG image.

Configuration Reference #6069

Configuration Reference #6069

Conversation

quasiben commented Apr 6, 2020

jrbourbeau left a comment

Choose a reason for hiding this comment

mrocklin commented Apr 6, 2020

Hierarchy

Single source of truth

mrocklin commented Apr 6, 2020

quasiben commented Apr 6, 2020

quasiben commented Apr 6, 2020

mrocklin commented Apr 6, 2020

mrocklin commented Apr 6, 2020

quasiben commented Apr 6, 2020

quasiben Apr 6, 2020

Choose a reason for hiding this comment

mrocklin Apr 6, 2020

Choose a reason for hiding this comment

jcrist commented Apr 6, 2020 • edited

mrocklin commented Apr 6, 2020

quasiben commented Apr 7, 2020

jcrist commented Apr 7, 2020

mrocklin commented Apr 10, 2020

quasiben commented Apr 13, 2020

mrocklin commented Apr 13, 2020 via email

jcrist commented Apr 13, 2020

quasiben commented Apr 16, 2020

jcrist Apr 21, 2020

Choose a reason for hiding this comment

quasiben Apr 21, 2020

Choose a reason for hiding this comment

quasiben Apr 21, 2020

Choose a reason for hiding this comment

quasiben commented Apr 21, 2020

jcrist left a comment

Choose a reason for hiding this comment

TomAugspurger commented Apr 22, 2020

mrocklin Apr 22, 2020

Choose a reason for hiding this comment

mrocklin commented Apr 22, 2020 via email

jcrist commented Apr 22, 2020

quasiben commented Apr 22, 2020

quasiben commented Apr 22, 2020

quasiben commented Apr 22, 2020

jsignell left a comment

Choose a reason for hiding this comment

quasiben commented Apr 24, 2020

quasiben commented Apr 24, 2020

jcrist commented Apr 6, 2020 •

edited