-
-
Notifications
You must be signed in to change notification settings - Fork 718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configuration File #463
Comments
Currently going ahead with YAML. So far I'm only putting in options that I use personally. Planning to wait until people need more to add more: logging:
distributed: info
distributed.executor: warning
bokeh: critical
compression: auto
# Scheduler specific options
bandwidth: 100000000 # 100 MB/s estimated worker-worker bandwidth
allowed-failures: 3 # number of retries before a task is considered bad
pdb-on-err: False # enter debug mode on scheduling error
transition-log-length: 100000 |
All formats are bad With that said, I've found yaml for config files to be not as a bad as other. YAML is:
YAML is a bit better with some extra tooling like:
|
Implemented in #472 |
Personally, I like having nesting for grouping related config: scheduler:
port: 123 I don't know what the scope of your configurability would be, though. Since all the CI services use yaml, I think developers are getting used to it, so it makes sense to me. |
Fixed by #472 |
Personally I favor environment variables over any configuration file. In our distributed (docker containers on top of mesos, marathon, chronos) setup the common practice is also env variables, distributing files is way more problematic (needs shared storage like HDFS/S3). Click also has built-in support for reading options from env. In our workflow manager a click cli script submits the computation as a chronos or marathon (meta schedulers on top of mesos) task, which starts a mesos (dask.mesos) framework, which schedules multiple tasks across the cluster. All of these tasks can start for example a local dask computation, a distributed spark job, another mesos framework, a data migration tool etc. The workflow manager needs to forward/ship the configuration down to the leaves (for example a cassandra host:port). Personally I use Auto-shipping can be solved via a custom pickler: def inject_addons(self):
self.save_reduce(lambda opts: set_options(**opts), (_globals,))
# register reducer to auto pickle _globals configuration
CloudPickler.inject_addons = inject_addons |
There generally is no centralization documentation for these except for the
files themselves, which should auto-populate into your ~/.config/dask
directory the first time you import any dask sub-project. For the
dask-distributed project in particular you can look at
https://github.com/dask/distributed/blob/master/distributed/distributed.yaml
…On Mon, Jan 14, 2019 at 12:46 PM Scott Brown ***@***.***> wrote:
Is there documentation for the possible options in a yaml configuration? I
can't seem to locate such a document, and instead find small examples here
and there of possible configuration subsets. Where are all possible options
documented?
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#463 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AASszEJqhiGfy-Z90Z4VrkSJ6JK-y-k6ks5vDOyhgaJpZM4Juf34>
.
|
Continuation of #58
I think it's now time to have a configuration file. There are a few options that may be nicer to manage on a per-machine basis rather than in various command line options (though these will remain dominant) and hard coded settings.
Here are a few:
Some open questions:
~/.dask/config
vs
@quasiben I would value your feedback in particular here.
I don't have much scar tissue on this topic.
The text was updated successfully, but these errors were encountered: