New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configuration Reference #6069
Configuration Reference #6069
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woo, thanks for working on this @quasiben! Looking forward to seeing this added. It appears the configuration-reference.rst
file hasn't yet been included
I have two comments that I think we should consider: HierarchyI'm curious about how best to handle hierarchy. Dask's configuration is nested, which I think helps with keeping things compartmentalized. The current structure has one level of nesting, followed up with a flattened hierarchy. This is a sensible choice, but there might be other choices. One approach would be to use subsections and subsubsections to handle the nesting, and then use a Sphinx TOC on top. This would capture the hierarchical nature of our configuration and also maybe make it easy for users to navigate quickly to bits that they care about. It also might be way more confusing. I'm not sure. Single source of truthWe're now copying configuration into two places. This might be necessary, but it's also something that we should be aware of and avoid if possible (although I have no suggestions on how best to avoid it). |
One way to handle the single source of truth would be to have way more extensive comments in the config files themselves, and then autogenerate the docs. This has tradeoffs though. |
@mrocklin, I briefly thought about subsections but was lazy -- as a test, I opted for the following:
The single source of truth is important but I am -0.5 on adding more comments to the config. With more text in the config it makes editing the YAML rather challenging and becomes error prone. It's also worth noting that dask has configuration management in two locations:
|
I am looking into a smallish sphinx plugin + ruamel.yaml for loading comments |
My hope is that using subsections wouldn't be terrible. It might also help to avoid the Let's take a look at the scheduler section scheduler:
allowed-failures: 3 # number of retries before a task is considered bad
bandwidth: 100000000 # 100 MB/s estimated worker-worker bandwidth
blocked-handlers: []
default-data-size: 1000
# Number of seconds to wait until workers or clients are removed from the events log
# after they have been removed from the scheduler
events-cleanup-delay: 1h
idle-timeout: null # Shut down after this duration, like "1h" or "30 minutes"
transition-log-length: 100000
work-stealing: True # workers should steal tasks from each other
work-stealing-interval: 100ms # Callback time for work stealing
worker-ttl: null # like '60s'. Time to live for workers. They must heartbeat faster than this
pickle: True # Is the scheduler allowed to deserialize arbitrary bytestrings
preload: []
preload-argv: []
unknown-task-duration: 500ms # Default duration for all tasks with unknown durations ("15m", "2h")
default-task-durations: # How long we expect function names to run ("1h", "1s") (helps for long tasks)
rechunk-split: 1us
shuffle-split: 1us
validate: False # Check scheduler state at every step for debugging
dashboard:
status:
task-stream-length: 1000
tasks:
task-stream-length: 100000
tls:
ca-file: null
key: null
cert: null
locks:
lease-validation-interval: 10s # The time to wait until an acquired semaphore is released if the Client goes out of scope So there are a lot of entries in the top level, which is great. The table will be simple for them. There are some larger sections, like scheduler.dashboard, which I think would make sense to break out to a subsection, and there are some smaller sections, like I think that it probably won't be hard to make subsections (the RST for this is fortunately simple). In some cases I think that we probably do want subsections, and in other cases the subsections might be small enough to even be a hindrance. I don't think that we want to mix approaches, but I'm not sure. (I'm just thinking out loud here to get conversation going) |
Yeah, to be clear, I'm not pushing this. I'm just dumping a bunch of options out there to spur some thinking. This is going to be a lot of detailed work once we get something down. I'd prefer that we only do that detailed work once. |
docs/source/ext/yamltotable.py
Outdated
if comment_token is None: | ||
comment = "" | ||
else: | ||
comment = comment_token[2].value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically you could pull comments from another location besides the original yaml spec. However, this may come back around to single-source of truth issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In dask-gateway @jcrist uses traitlets to store all of this. I'm not sure that Dask should go that far, but it's a demonstration of this approach.
This could also be useful for input validation
Woot, this is nice to see @quasiben. I do agree with @mrocklin's concern about two sources of truth though. For dask client-side configuration, lately I've been ensuring the default file is heavily (and nicely) commented, and then directly including that in the docs. Examples: As another idea, the zero-to-jupyterhub project includes an extra validation script that tests that the *I personally prefer the traitlets approach, but recognize that not everyone might prefer that approach, and moving from Dask's existing configuration system would be tricky. * |
Heh. That's me! :) Yeah, I personally would be sad with traitlets I think (although I probably should do more homework on it). There is probably some happier medium though.
Those are nice.
A companion file like this might not be bad, especially if adding to it could be done optionally and users didn't see it directly. |
A companion file would need to duplicate the keys, no ? Maybe that's ok and it's a baby step in the direction of validation ? |
To be clear, I think the value of the JupyterHub spec file approach is the validation script. So while the information is split between files (which could fall out of sync), there's a way to test that they remain in sync and that test could be run as part of our CI. |
Here is an attempt with jsonschema (the thing that JuptyerHub uses) So far I'm decently happy with it. |
I spent a bit of time with @mrocklin this weekend using PR dask/distributed#3696 to help with descriptions in the table. We pushed on getting the table to render everything and while it works, the rendering is a bit off: I also spent some time adding a custom yaml to HTML styling. This non-csv layout of the config perhaps balances verbose descriptions (which users will appreciate) without sacrifice layout/readability (which the csv-table does). However, we lose fancy TOC linking: |
Woot
…On Sun, Apr 12, 2020 at 8:40 PM Benjamin Zaitlen ***@***.***> wrote:
I spent a bit of time with @mrocklin <https://github.com/mrocklin> this
weekend using PR dask/distributed#3696
<dask/distributed#3696> to help with descriptions
in the table. We pushed on getting the table to render everything and while
it works, the rendering is a bit off:
[image: Screen Shot 2020-04-12 at 11 37 27 PM]
<https://user-images.githubusercontent.com/1403768/79090126-9faeb180-7d16-11ea-8632-2f42b26e828e.png>
I also spent some time adding a custom yaml to HTML styling. This non-csv
layout of the config perhaps balances verbose descriptions (which users
will appreciate) without sacrifice layout/readability (which the csv-table
does). However, we lose fancy TOC linking:
[image: Screen Shot 2020-04-12 at 11 32 13 PM]
<https://user-images.githubusercontent.com/1403768/79090233-f6b48680-7d16-11ea-85f4-de4001250c22.png>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6069 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTCGMTZ4LZJ6HDTO2FLRMKCUBANCNFSM4MCL7W4Q>
.
|
I personally prefer this style, and think you should be able to do TOC linking. Refs work fine for the setup I have for dask-gateway (https://gateway.dask.org/api-server.html), I feel like they should be doable here too. |
@jcrist this is ready for a review if you have some time. Note: i also included a dask-schema in the PR. This made iterating a bit faster for me but also happy to move to another PR if you'd like. |
|
||
.. dask_config_to_html:: | ||
:location: distributed.dashboard | ||
:config: https://raw.githubusercontent.com/dask/distributed/70700a1059fdae542ddbb3534f3caa3d27ca2e5d/distributed/distributed.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you intend to change this when dask/distributed#3696 is merged, just making a note so it's not forgotten.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup yup, still futzing around with a few things
Upper limit for width, where width = num_nodes / height, a good measure | ||
of parallelizability | ||
|
||
subraphs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct but I suspect this should be changed to subgraphs
in a later PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks pretty good to me. Thanks @quasiben.
I think this is safe to merge when you're when you're ready. Maybe give @mrocklin a bit of time to object if he wants to take another look. |
size: | ||
type: integer | ||
description: | | ||
The size of pixels used when displaying a dask array as an SVG image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size of pixels used when displaying a dask array as an SVG image. | |
The size of the image in pixels used when displaying a dask array as an SVG image. |
Co-Authored-By: Matthew Rocklin <mrocklin@gmail.com>
Co-Authored-By: Matthew Rocklin <mrocklin@gmail.com>
Thanks @quasiben . I'm really excited about this. I've added a couple of
small suggestions inline. Two other comments here:
When I build this locally I don't seem to get descriptions for most of the
Dask config values. This is maybe just my environment?
I think that it would be useful to have a TOC at the top of the
configuration-reference page so that people can see quickly the options
that are available to them without scrolling linearly through everything
…On Wed, Apr 22, 2020 at 6:50 AM Tom Augspurger ***@***.***> wrote:
I think this is safe to merge when you're when you're ready. Maybe give
@mrocklin <https://github.com/mrocklin> a bit of time to object if he
wants to take another look.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#6069 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTB76FUOXWOJVNI4J6TRN3Y3ZANCNFSM4MCL7W4Q>
.
|
Good idea. That should be doable by inserting: .. contents:: :local: where you want it. |
Thanks for the suggestion @jcrist and for all the reviews ! |
Tests are currently failing because dask-schema is not in master. I can break that out into another PR then merge this or merge this and things should pass. Should they not I can immediately fix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@quasiben - Feel free to merge this PR as-is when you have time to make sure that everything goes according to plan.
Thanks @jsignell . I'll keep watch here and the docs |
docs are broken -- fixing now |
This PR adds a configuration page to the docs. As the configuration rather large, I'd like to finish up both
scheduler
andworker
configs while getting consensus on style. In subsequent PRs I/we can continue adding configuration details.Fixes #4286
I've been using this online RST editor
Attached is a screen shot of the current setup: