-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DASK_CONFIG dictates config write location #3621
Conversation
Previously existing configuration files could be specified by `DASK_CONFIG`, but projects like `distributed` would always write the default configuration file to `~/.config/dask/`. We now make the write location configurable as well. This is important in environments where `~/.config` may not exist of be writable.
cc @mrocklin |
dask/config.py
Outdated
DASK_CONFIG = os.environ['DASK_CONFIG'] | ||
paths.append(DASK_CONFIG) | ||
else: | ||
DASK_CONFIG = os.path.join(os.path.expanduser('~'), '.config', 'dask') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend that we call this something like dask.config.PATH
internally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe it would be better to place it in dask.config.config['path']
I opted for |
Makes sense to me. Another concern, it is currently possible to set DASK_CONFIG to the path of a particular yaml file
In this case what happens when downstream libraries try to write their config files somewhere? Do they overwrite the existing file? Do they write in the same location? Do they err? Do they issue a warning and then do nothing (possibly my preference)? |
When does writing to the config file happen (we use this environment variable)? |
In cases like this:
https://github.com/dask/distributed/blob/master/distributed/config.py#L17
If the file doesn't exist then we write a commented out verison of the file
to the specified config directory
…On Mon, Jun 18, 2018 at 10:14 AM, jakirkham ***@***.***> wrote:
When does writing to the config file happen (we use this environment
variable)?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3621 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszPTs8ZqP8bF8dJJtUe0rzX85londks5t97XdgaJpZM4UqIYH>
.
|
Should have clarified, was meaning if it already exists? ;) |
You should take a look at ensure_file:
https://github.com/dask/dask/blob/4b2454cb44b30f33ff1b5ae7a10d907430961e72/dask/config.py#L160
It only saves the file if it doesn't already exist
…On Mon, Jun 18, 2018 at 10:34 AM, jakirkham ***@***.***> wrote:
Should have clarified, was meaning if it already exists? ;)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3621 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszJncLfBIdkzWr4vRmx3ZdyHKlGRVks5t97pugaJpZM4UqIYH>
.
|
Why is this allowed? It seems to me that this behavior should be prohibited if the intent is to have a directory of dask configuration files. Currently they're just silently not written, which seems to me to be fine. From what I understand, the goal of |
Currently it's because it's convenient, though this could change. In practice when deploying a cluster it is common to send along a single config file with all the changes you want to make. It's somewhat simpler to specify this file rather than construct a directory to hold this file. |
Perhaps instead we should change when files are written by downstream projects? Not sure what that would look like either. Personally, I don't think libraries should ever implicitly write configuration files, but rather keep the defaults in the library, and look for overrides in the files (if found). My main goal here is to fix an issue where loading distributed fails on yarn, since the default configuration location isn't writable. Currently I have to monkeypatch around this or the import fails. |
Ideally it shouldn't fail if the default configuration location isn't
writable. That's a bug regardless.
Hrm, I thought that we had wrapped that whole thing in a
try-except:Exception block. Looks like we didn't.
…On Mon, Jun 18, 2018 at 9:03 PM, Jim Crist ***@***.***> wrote:
Perhaps instead we should change when files are written by downstream
projects? Not sure what that would look like either. Personally, I don't
think libraries should ever implicitly write configuration files, but
rather keep the defaults in the library, and look for overrides in the
files (if found).
My main goal here is to fix an issue where loading distributed fails on
yarn, since the default configuration location isn't writable. Currently I
have to monkeypatch around this or the import fails.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3621 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszDUOblFDDmGy15hIx4NmbTNU-KTrks5t-E3fgaJpZM4UqIYH>
.
|
The reason to write a commented out file is to give new users something to easily edit without having to go find the right config file. This seems to be common-ish practice. Jupyter does it, for example. |
Does it? AFAICT, it only does this if asked to, not on import. http://jupyter-notebook.readthedocs.io/en/stable/config.html
|
Yes, sorry. I should have said "Jupyter provides a commented out config
file" rather than imply that it does it on import. What would you
recommend for the Dask subprojects? I get that you don't like writing on
import. What would you suggest? I think that it's a bit harder for us
because we have many smaller subprojects. Dask.distributed did
write-on-import and, except for working through bugs like you're hitting
now, it has generally been really pleasant. Users seem to use the
configuration options pretty happily and don't need much guidance.
…On Mon, Jun 18, 2018 at 9:14 PM, Jim Crist ***@***.***> wrote:
Jupyter does it, for example.
Does it? AFAICT, it only does this if asked to, not on import.
http://jupyter-notebook.readthedocs.io/en/stable/config.html
(dask) jcrist dask $ jupyter --config-dir
/Users/jcrist/.jupyter
(dask) jcrist dask $ ls ~/.jupyter
ls: /Users/jcrist/.jupyter: No such file or directory
(dask) jcrist dask $ python -c "import jupyter"
(dask) jcrist dask $ ls ~/.jupyter
ls: /Users/jcrist/.jupyter: No such file or directory
(dask) jcrist dask $ jupyter notebook --generate-config
Writing default config to: /Users/jcrist/.jupyter/jupyter_notebook_config.py
(dask) jcrist dask $ ls ~/.jupyter
jupyter_notebook_config.py
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3621 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszBQ8CcnUS7WipPYr6Y-S5OxwXSEhks5t-FCHgaJpZM4UqIYH>
.
|
The number of issues or stack overflow questions that end happily with "Add
the following line to your config file, probably at ~/.dask/config.yaml"
without further questions was decently high (or at least seems subjectively
high to me now, without looking through things to back that statement up)
On Mon, Jun 18, 2018 at 9:17 PM, Matthew Rocklin <mrocklin@anaconda.com>
wrote:
… Yes, sorry. I should have said "Jupyter provides a commented out config
file" rather than imply that it does it on import. What would you
recommend for the Dask subprojects? I get that you don't like writing on
import. What would you suggest? I think that it's a bit harder for us
because we have many smaller subprojects. Dask.distributed did
write-on-import and, except for working through bugs like you're hitting
now, it has generally been really pleasant. Users seem to use the
configuration options pretty happily and don't need much guidance.
On Mon, Jun 18, 2018 at 9:14 PM, Jim Crist ***@***.***>
wrote:
> Jupyter does it, for example.
>
> Does it? AFAICT, it only does this if asked to, not on import.
> http://jupyter-notebook.readthedocs.io/en/stable/config.html
>
> (dask) jcrist dask $ jupyter --config-dir
> /Users/jcrist/.jupyter
> (dask) jcrist dask $ ls ~/.jupyter
> ls: /Users/jcrist/.jupyter: No such file or directory
> (dask) jcrist dask $ python -c "import jupyter"
> (dask) jcrist dask $ ls ~/.jupyter
> ls: /Users/jcrist/.jupyter: No such file or directory
> (dask) jcrist dask $ jupyter notebook --generate-config
> Writing default config to: /Users/jcrist/.jupyter/jupyter_notebook_config.py
> (dask) jcrist dask $ ls ~/.jupyter
> jupyter_notebook_config.py
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#3621 (comment)>, or mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/AASszBQ8CcnUS7WipPYr6Y-S5OxwXSEhks5t-FCHgaJpZM4UqIYH>
> .
>
|
I suppose in the case of non-commandline tools, writing on import (and silently failing) may be acceptable. If we had a nice plug-in system with all dask sub-projects that might use the configuration system, I'd suggest the following:
but that might place restrictions/boilerplate on dask subprojects that we may not want, and may be harder to maintain. In the case without nice cli, we'd either need to standardize on a top-level-function in the library namespace (mildly unpleasant), or continue with write on import. In that case I'd be fine with the following:
The above seems reasonable to me, and I'd be fine implementing the missing bits here. Thoughts? |
Agreed on both points. I think that the |
- Don't ever write to destination if it's an existing file - Catch errors when writing fails.
This should be ready for review. Note that this slightly changes the functionality of
|
This looks good to me. +1 |
Thanks for cleaning this up @jcrist |
Previously existing configuration files could be specified by
DASK_CONFIG
, but projects likedistributed
would always write thedefault configuration file to
~/.config/dask/
. We now make the writelocation configurable as well. This is important in environments where
~/.config
may not exist of be writable.