Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use environment variable to override root dask config location #3798

Closed
jhamman opened this issue Jul 20, 2018 · 13 comments · Fixed by #3849
Closed

Use environment variable to override root dask config location #3798

jhamman opened this issue Jul 20, 2018 · 13 comments · Fixed by #3849
Labels

Comments

@jhamman
Copy link
Member

jhamman commented Jul 20, 2018

Dask's new configuration file structure looks in /etc/dask for a system wide configuration. I'd like to suggest this be overridable via a environment variable. I've included the relevant lines below.

dask/dask/config.py

Lines 18 to 29 in 69fc200

paths = [
'/etc/dask',
os.path.join(sys.prefix, 'etc', 'dask'),
os.path.join(os.path.expanduser('~'), '.config', 'dask'),
os.path.join(os.path.expanduser('~'), '.dask')
]
if 'DASK_CONFIG' in os.environ:
PATH = os.environ['DASK_CONFIG']
paths.append(PATH)
else:
PATH = os.path.join(os.path.expanduser('~'), '.config', 'dask')

In particular, I'd like to see line 19 changed to something like:

    os.getenv('DASK_ROOT_CONFIG', '/etc/dask')

This would allow system admins to point to a common config file without mandating it be placed in /etc/....

I'll note that there is already a DASK_CONFIG environment variable. This is however appended to the list of dask configs so it would difficult to use for global/system wide configurations.

Btw, my particular use case is for providing a system wide configuration for dask-jobqueue on a large HPC system.

@jhamman
Copy link
Member Author

jhamman commented Jul 27, 2018

Any thoughts on this issue? I'd be happy to issue a PR if there is agreement on how to proceed.

@jakirkham
Copy link
Member

Thoughts @mrocklin?

@mrocklin
Copy link
Member

mrocklin commented Aug 2, 2018

My apologies for the lack of response here @jhamman . I've been swamped.

I don't have any particular objection to this if we can get a couple other people to weigh in. Maybe @jacobtomlinson, @lesteve, and @jcrist for variety across deployment scenarios?

@jcrist
Copy link
Member

jcrist commented Aug 2, 2018

No strong thoughts. If I were to rewrite those lines, I'd cut back the number of potential directories that configurations are loaded from. As is the user gets a mix of files from potentially a few spots. This seems potentially error prone, as one user might store kubernetes.yaml in one of those directories, only to have it be unexpectedly overridden by configuration in a different directory.

I'd prefer to have a single root config directory, and a single user config directory. The location of each could be set by an environment variable, and if not set a list of potential directories will be searched, stopping when a path exists. I don't see the need for more than 2 configuration locations (one admin one user), and cutting back the number of locations that may be loaded from seems easier to reason about.

Something like (untested):

def get_config_path(envvar, search_paths):
    if envvar in os.environ:
        # if explicitly specified, only use that
        return os.environ[envvar]
    else:
        # Find the first directory that has dask configurations, and use that exclusively.
        for path in search_paths:
            if os.path.exists(path):
                return path

    # No path found
    return None

# find the root configuration directory, either from an environment
# variable, or the first existing directory in a search list
root_config_path = get_config_path('DASK_ROOT_CONFIG',
                                   [os.path.join('etc', 'dask'),
                                    os.path.join(sys.prefix, 'etc', 'dask')])

# find the user configuration directory, either from an environment
# variable, or the first existing directory in a search list
user_config_path = get_config_path('DASK_CONFIG',
                                   [os.path.join(os.path.expanduser('~'), '.config', 'dask'), 
                                    os.path.join(os.path.expanduser('~'), '.dask')])

paths = []
if root_config_path is not None:
    paths.append(root_config_path)

if user_config_path is not None:
    paths.append(user_config_path)

# continue on with code as written...

@jacobtomlinson
Copy link
Member

I think I agree with @jcrist's comments.

Personally I'm a big fan of storing config in the environment. So I tend to use the environment variable version of this stuff. However when using config files locally I prefer to have a single system config dir and then a user specific config dir. Generally appdirs helps achieve this.

@mrocklin
Copy link
Member

mrocklin commented Aug 3, 2018

I'd prefer to have a single root config directory, and a single user config directory

Reasons for the following:

  • '/etc/dask': system administrators
  • os.path.join(sys.prefix, 'etc', 'dask'): same, but at the self-contained environment level
  • os.path.join(os.path.expanduser('~'), '.config', 'dask'): current recommended user config directory
  • os.path.join(os.path.expanduser('~'), '.dask'): backwards compatibility with previous recommended user config directory

@mrocklin
Copy link
Member

mrocklin commented Aug 3, 2018

In general I agree with the sentiment of reducing config locations, but pragmatically I have yet to run into a case where conflicts have confused anyone. I've seen active use cases for all of them except for sys.prefix/etc/dask

@mrocklin
Copy link
Member

mrocklin commented Aug 3, 2018

It sounds like in general people are ok with the name DASK_ROOT_CONFIG. It seems like there is an active question about whether this should replace /etc/dask or work in addition to it

@jcrist
Copy link
Member

jcrist commented Aug 3, 2018

I'm not saying we should remove support for any of them, just that I doubt people have a need for more than one root/user config location (and the possibility of using more than one root/user config location seems dangerous). If I see a /etc/dask config location exists, I'd expect all admin config to be in there. If admin config is also loaded from os.path.join(sys.prefix, 'etc', 'dask') that'd be surprising to me. Same goes for user config - if I set DASK_CONFIG to somewhere else, I wouldn't expect files in ~/.config/dask or ~/.dask to be loaded.

It seems like there is an active question about whether this should replace /etc/dask or work in addition to it.

My vote is for override, not additional, following the logic above.

@jhamman
Copy link
Member Author

jhamman commented Aug 3, 2018

'/etc/dask': system administrators

I think you understand the motivation for this issue already but for others, our system administrators are not willing to put a dask config in /etc/. They are willing however to set a environment variable in the default user environment.

@jacobtomlinson
Copy link
Member

our system administrators are not willing to put a dask config in /etc/

I have complete sympathy for sysadmins refusing to do things. I used to be a sysadmin at the Met Office.

However if they were to install dask from an RPM or DEB package this is where it would drop the config. So I'm not entirely sure why they would refuse this. Perhaps this is a valid use case for the one @mrocklin hasn't seen in the wild yet.

os.path.join(sys.prefix, 'etc', 'dask'): same, but at the self-contained environment level

@jhamman
Copy link
Member Author

jhamman commented Aug 6, 2018

So I'm not entirely sure why they would refuse this.

On my particular HPC machine, packages are stored in modules akin to virtual environments. When I load a module (e.g. Python), my environment (path and env variables) are altered. /etc/ is not typically used on this machine to store configs because it doesn't fit within the module framework. So I'm guessing this is the source of their objection.

@jacobtomlinson
Copy link
Member

We also use module for managing environments. I think this probably is an example of a time to use the self-contained environment level version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants