Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DASK_CONFIG dictates config write location #3621

Merged
merged 3 commits into from
Jun 20, 2018

Conversation

jcrist
Copy link
Member

@jcrist jcrist commented Jun 15, 2018

Previously existing configuration files could be specified by
DASK_CONFIG, but projects like distributed would always write the
default configuration file to ~/.config/dask/. We now make the write
location configurable as well. This is important in environments where
~/.config may not exist of be writable.

Previously existing configuration files could be specified by
`DASK_CONFIG`, but projects like `distributed` would always write the
default configuration file to `~/.config/dask/`. We now make the write
location configurable as well. This is important in environments where
`~/.config` may not exist of be writable.
@jcrist
Copy link
Member Author

jcrist commented Jun 15, 2018

cc @mrocklin

dask/config.py Outdated
DASK_CONFIG = os.environ['DASK_CONFIG']
paths.append(DASK_CONFIG)
else:
DASK_CONFIG = os.path.join(os.path.expanduser('~'), '.config', 'dask')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend that we call this something like dask.config.PATH internally

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe it would be better to place it in dask.config.config['path']

@jcrist
Copy link
Member Author

jcrist commented Jun 18, 2018

I opted for dask.config.PATH. Putting it in dask.config.config felt wrong, since the location of the configuration files feels different than the information in the configuration files. That path is needed before any configurations are loaded, and the location of the configuration files shouldn't be changeable from inside the configuration files.

@mrocklin
Copy link
Member

Makes sense to me.

Another concern, it is currently possible to set DASK_CONFIG to the path of a particular yaml file

export DASK_CONFIG=/path/to/my-config.yaml

In this case what happens when downstream libraries try to write their config files somewhere? Do they overwrite the existing file? Do they write in the same location? Do they err? Do they issue a warning and then do nothing (possibly my preference)?

@jakirkham
Copy link
Member

When does writing to the config file happen (we use this environment variable)?

@mrocklin
Copy link
Member

mrocklin commented Jun 18, 2018 via email

@jakirkham
Copy link
Member

Should have clarified, was meaning if it already exists? ;)

@mrocklin
Copy link
Member

mrocklin commented Jun 18, 2018 via email

@jcrist
Copy link
Member Author

jcrist commented Jun 19, 2018

Another concern, it is currently possible to set DASK_CONFIG to the path of a particular yaml file

Why is this allowed? It seems to me that this behavior should be prohibited if the intent is to have a directory of dask configuration files.


Currently they're just silently not written, which seems to me to be fine. From what I understand, the goal of ensure_file is to write the default configuration files if none exist for that library. In the case of DASK_CONFIG being a directory, this only happens if e.g. kubernetes.yaml already exists in that directory. In the case of it being a file, I'd say this would never work in the case of multiple dask libraries using configuration files, and should either error (my preference, forbid file paths for DASK_CONFIG), or just continue the existing current behavior.

@mrocklin
Copy link
Member

Why is this allowed?

Currently it's because it's convenient, though this could change.

In practice when deploying a cluster it is common to send along a single config file with all the changes you want to make. It's somewhat simpler to specify this file rather than construct a directory to hold this file.

@jcrist
Copy link
Member Author

jcrist commented Jun 19, 2018

Perhaps instead we should change when files are written by downstream projects? Not sure what that would look like either. Personally, I don't think libraries should ever implicitly write configuration files, but rather keep the defaults in the library, and look for overrides in the files (if found).

My main goal here is to fix an issue where loading distributed fails on yarn, since the default configuration location isn't writable. Currently I have to monkeypatch around this or the import fails.

@mrocklin
Copy link
Member

mrocklin commented Jun 19, 2018 via email

@mrocklin
Copy link
Member

The reason to write a commented out file is to give new users something to easily edit without having to go find the right config file. This seems to be common-ish practice. Jupyter does it, for example.

@jcrist
Copy link
Member Author

jcrist commented Jun 19, 2018

Jupyter does it, for example.

Does it? AFAICT, it only does this if asked to, not on import. http://jupyter-notebook.readthedocs.io/en/stable/config.html

(dask) jcrist dask $ jupyter --config-dir
/Users/jcrist/.jupyter
(dask) jcrist dask $ ls ~/.jupyter
ls: /Users/jcrist/.jupyter: No such file or directory
(dask) jcrist dask $ python -c "import jupyter"
(dask) jcrist dask $ ls ~/.jupyter
ls: /Users/jcrist/.jupyter: No such file or directory
(dask) jcrist dask $ jupyter notebook --generate-config
Writing default config to: /Users/jcrist/.jupyter/jupyter_notebook_config.py
(dask) jcrist dask $ ls ~/.jupyter
jupyter_notebook_config.py

@mrocklin
Copy link
Member

mrocklin commented Jun 19, 2018 via email

@mrocklin
Copy link
Member

mrocklin commented Jun 19, 2018 via email

@jcrist
Copy link
Member Author

jcrist commented Jun 19, 2018

I suppose in the case of non-commandline tools, writing on import (and silently failing) may be acceptable. If we had a nice plug-in system with all dask sub-projects that might use the configuration system, I'd suggest the following:

# Write the kubernetes configuration commented out file
# Error if file already exists. Could add a `--force` flag to override.
dask config generate -n kubernetes -o /path/to/dask/config/directory/

but that might place restrictions/boilerplate on dask subprojects that we may not want, and may be harder to maintain.


In the case without nice cli, we'd either need to standardize on a top-level-function in the library namespace (mildly unpleasant), or continue with write on import. In that case I'd be fine with the following:

  • If a library doesn't already have a configuration file in the location of DASK_CONFIG, try to write it on import, and silently fail. Since nothing should rely on the automatic writing (the file is commented out after all), this should be fine.
  • If DASK_CONFIG points to a file path, never try to write on import. The path is pointing to a file of user overrides. Global overrides should be configured by a higher-level administrator and placed in one of the common directories.

The above seems reasonable to me, and I'd be fine implementing the missing bits here. Thoughts?

@mrocklin
Copy link
Member

Agreed on both points. I think that the distributed library actually used to implement the first part of that. It was an unfortunate oversight not to copy it over.

- Don't ever write to destination if it's an existing file
- Catch errors when writing fails.
@jcrist
Copy link
Member Author

jcrist commented Jun 19, 2018

This should be ready for review. Note that this slightly changes the functionality of ensure_file:

  • destination is still allowed to be a filepath, but is only interpreted as one if the file already exists. In this case ensure_file is a no-op. This seemed more intuitive than looking for any '.' in the destination name, since directory names may also contain '.'.
  • In all other cases destination is interpreted as a directory. The source file is attempted to be copied into this directory, silencing any OSError's that may occur.

@mrocklin
Copy link
Member

This looks good to me. +1

@mrocklin
Copy link
Member

Thanks for cleaning this up @jcrist

@mrocklin mrocklin merged commit 1755e86 into dask:master Jun 20, 2018
@jcrist jcrist deleted the config_write_location branch August 1, 2018 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants