Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get config item if variable is None #6862

Merged
merged 6 commits into from Dec 11, 2020

Conversation

jacobtomlinson
Copy link
Member

@jacobtomlinson jacobtomlinson commented Nov 19, 2020

  • Tests added / passed
  • Passes black dask / flake8 dask

Add config method which either passes a value straight through, of if it is None gets a value from the config.

>>> with dask.config.set({"foo": "bar"}):
...     # If the default is None get the config key
...     dask.config.default_get(None, "foo")
"bar"

>>> with dask.config.set({"foo": "bar"}):
...     # Otherwise pass the default straight through
...     dask.config.default_get("baz", "foo")
"baz"

Method name still very much up for discussion. Current contenders are:

  • default_get
  • get_if_none
  • maybe_get (@jsignell)

Closes #6860

@jsignell
Copy link
Member

jsignell commented Nov 19, 2020

Another idea is config.default_key.

Another idea is to add another option to config.get. But it'd want to be first which might get weird. It could be something like if you pass a tuple to get, it'll take the first item if not Non else get the config for the second.

diff --git a/dask/config.py b/dask/config.py
index 27d8492b..d6f94057 100644
--- a/dask/config.py
+++ b/dask/config.py
@@ -425,7 +425,9 @@ def refresh(config=config, defaults=defaults, **kwargs):
 
 def get(key, default=no_default, config=config):
     """
-    Get elements from global config
+    Get elements from global config. If tuple is passed instead of key,
+    return first item in tuple if not None, else return config value for
+    second item.
 
     Use '.' for nested access
 
@@ -441,10 +443,16 @@ def get(key, default=no_default, config=config):
     >>> config.get('foo.x.y', default=123)  # doctest: +SKIP
     123
 
+    >>> config.get((4, 'foo.x.y))  # doctest: +SKIP
+    4
     See Also
     --------
     dask.config.set
     """
+    if isinstance(key, tuple):
+        value, key = key
+        if value is not None:
+            return value
     keys = key.split(".")
     result = config
     for k in keys:

@jacobtomlinson
Copy link
Member Author

jacobtomlinson commented Nov 19, 2020

I like the idea of incorporating this into get somehow @jsignell.

The tuple may be a little opaque to others when reading a piece of code. I wonder if a kwarg would be better?

But I guess the trouble with a kwarg is the ordering.

@jsignell
Copy link
Member

jsignell commented Nov 19, 2020

Yeah what if the get can just take multiple args and if one is passed then it's the key and if two then it's value, key?

diff --git a/dask/config.py b/dask/config.py
index 27d8492b..3da2526c 100644
--- a/dask/config.py
+++ b/dask/config.py
@@ -423,9 +423,12 @@ def refresh(config=config, defaults=defaults, **kwargs):
     update(config, collect(**kwargs))
 
 
-def get(key, default=no_default, config=config):
+def get(*args, default=no_default, config=config):
     """
-    Get elements from global config
+    Get elements from global config.
+
+    If multiple args are passed, the first item is returned if not None,
+    else return config value for second item (the key).
 
     Use '.' for nested access
 
@@ -441,10 +444,19 @@ def get(key, default=no_default, config=config):
     >>> config.get('foo.x.y', default=123)  # doctest: +SKIP
     123
 
+    >>> config.get(4, 'foo.x.y')
+    4
+
     See Also
     --------
     dask.config.set
     """
+    if len(args) == 1:
+        key, = args
+    elif len(args) == 2:
+        value, key = args
+        if value is not None:
+            return value
     keys = key.split(".")
     result = config
     for k in keys:

@jsignell
Copy link
Member

Ok one more idea. We could abuse the default kwarg by adding a priority kwarg or something that says use default over config value.

@jacobtomlinson
Copy link
Member Author

Yeah what if the get can just take multiple args and if one is passed then it's the key and if two then it's value, key?

Yeah I like this. You beat me to a diff 😁.

We could abuse the default kwarg by adding a priority kwarg or something that says use default over config value.

This would also work. It is less magic, which is good. But I think I still prefer the other approach.

@jsignell
Copy link
Member

You beat me to a diff 😁.

I had to convince myself that it'd work :)

But I think I still prefer the other approach.

👍 me too

@jacobtomlinson
Copy link
Member Author

My only concern is that there may be existing code which doesn't explicitly use the default kwarg, but instead passes two args.

Like here in distributed.

@jsignell
Copy link
Member

there may be existing code which doesn't explicitly use the default kwarg, but instead passes two args

We could do something like if the first arg is a string, check if it's a key... but it does add some magic.

@jsignell
Copy link
Member

Or we could take it slow and deprecate passing default as an arg.

@jsignell
Copy link
Member

Or we can just pick a new name and have a new function :)

@jacobtomlinson
Copy link
Member Author

Or we could take it slow and deprecate passing default as an arg.

This is probably the right approach. But I also want to move fast and break stuff.

@jsignell
Copy link
Member

Is using default considered a good practice? Shouldn't every key be in the default .yml?

@jrbourbeau
Copy link
Member

Is using default considered a good practice? Shouldn't every key be in the default .yml?

Not necessarily. Some use the Dask config system as a convenient way to store metadata. We sometimes do this internally in Dask

dask/dask/base.py

Lines 83 to 90 in ff3ea6c

prev_annotations = config.get("annotations", {})
new_annotations = {
**prev_annotations,
**{f"annotations.{k}": v for k, v in annotations.items()},
}
with config.set(new_annotations):
yield

and Distributed (as Jacob pointed out)

@jsignell
Copy link
Member

jsignell commented Nov 20, 2020

Ah thanks @jrbourbeau I hadn't considered that case. I guess in that case maybe we should deprecate using default as a positional arg and start by implementing the tuple approach. Then later once we are certain that it is only (key, **kwargs) or ((value, key), **kwargs), we can allow people to pass (value, key, **kwargs)

@jacobtomlinson
Copy link
Member Author

I am happy with that, I'll update this PR to reflect that. But given that this is a substantial change it might be good to get some more eyes on this.

@jacobtomlinson
Copy link
Member Author

Having a poke around in dask/dask it appears there are a large number of situations where the default is passed as an arg. I'm nervous that enforcing it being a kwarg will break a lot of things.

@jsignell
Copy link
Member

Having a poke around in dask/dask it appears there are a large number of situations where the default is passed as an arg. I'm nervous that enforcing it being a kwarg will break a lot of things.

I don't think this PR should enforce anything, just warn. But if that seems like it'll be too annoying, I am fine just going with the tuple approach or the new function approach.

@jacobtomlinson
Copy link
Member Author

I don't think this PR should enforce anything, just warn.

Ah that's fair. I was just adding a * as the second argument which is a pretty hard break.

@jrbourbeau
Copy link
Member

Long term I'm hesitant that we should aim for dask.config.get to have a signature of (value, key, **kwargs) as there are many cases were users do want to just grab a configuration value. IIUC in that case we would be forcing them to do dask.config.get(None, key), which feels a little inconvenient.

The new function approach seems nice. We could also add a new conditional= (or some other name) keyword-only argument to dask.config.get which would be used for checking whether an external variable isNone. So we'd turn

my_kwarg if my_kwarg is not None else dask.config.get("mycluster.my-kwarg")

into

dask.config.get("mycluster.my-kwarg", conditional=my_kwarg)

@jacobtomlinson
Copy link
Member Author

Thanks @jrbourbeau!

IIUC in that case we would be forcing them to do dask.config.get(None, key), which feels a little inconvenient.

No we could still accept dask.config.get(key) and do some magic to make it work. See Julia's diff here.

We could also add a new conditional= (or some other name) keyword-only argument

We did also discuss this approach. I think the consensus was that the ordering was a little annoying because kwargs can't go first and it feels like that argument should be first. However it seems like the least surprising way to implement this feature.

Also coming up with a good and succinct name for this kwarg is a tricky one.

@jrbourbeau
Copy link
Member

No we could still accept dask.config.get(key) and do some magic to make it work. See Julia's diff here.

Ah gotcha, thank you for clarifying

@martindurant
Copy link
Member

@jacobtomlinson will you be modifying to use the new diff, or were you wanting to reach a consensus that made everyone happy (as oppsed to merely content) ?

@jacobtomlinson
Copy link
Member Author

@martindurant I'm not sure which diff to do with here. We've discussed multiple approaches and haven't seemed to settle on any of them.

@martindurant
Copy link
Member

coming up with a good and succinct name for this kwarg is a tricky one

I feel like if this was accomplished, we would all agree on the kwarg version.
It would also be reasonable to have an additional alias for those that want it, like

dask.config.get_conditional(my_kwargs, key) ->
dask.config.get("mycluster.my-kwarg", conditional=my_kwarg)

(where I have not tried to come up with a better term!)

@jsignell
Copy link
Member

I don't want to hold this up. @jacobtomlinson if you are happy with the kwarg approach that is totally fine with me.

@jsignell
Copy link
Member

jsignell commented Dec 9, 2020

Welp @jacobtomlinson should we just go with default_get? I am fine with that if you want to just merge and move on with your life :)

@jacobtomlinson
Copy link
Member Author

jacobtomlinson commented Dec 10, 2020

Thanks for the nudge @jsignell.

I decided to go for the kwarg approach as you and @jrbourbeau suggested. I named the kwarg if_not_set, but if anyone can think of something better I'd be happy to change it.

@jsignell
Copy link
Member

jsignell commented Dec 10, 2020

I prefer conditional to if_not_set. Some other options in no particular order: override, override_with, first_use, first_try, provisional, provided

@jacobtomlinson
Copy link
Member Author

Thanks @jsignell. I think from that list my preference would be override or override_with.

@jrbourbeau care to break the tie?

@jsignell
Copy link
Member

In case it helps I prefer override_with so that it's clear it's not a bool

@jacobtomlinson
Copy link
Member Author

That sounds reasonable. Updated! Thanks!

@mrocklin
Copy link
Member

Sorry for coming in here late, but can I ask someone to provide a motivating use case for this?

@mrocklin
Copy link
Member

In general I don't understand the pattern of "get this config value, but actually then throw it away and give me this other value instead". Am I understanding the objective here correctly? If so, my inclination is to leave this to user code unless it is very very common.

@martindurant
Copy link
Member

@mrocklin : this was about avoiding having to write "thing = config.get("thing") if thing is None else thing" or something like that, in many many places; where None is not necessarily the placeholder which means 'use default'.

@mrocklin
Copy link
Member

Ah, reading through issue now. My apologies for the friction, that discussion is long ago enough to have escaped medium-term mental storage. Happy to retract my concern.

@jrbourbeau
Copy link
Member

@jrbourbeau care to break the tie?

+1 for override_with

@jacobtomlinson
Copy link
Member Author

jacobtomlinson commented Dec 11, 2020

@mrocklin's concerns are very valid, however it is my opinion that the usage that @martindurant mentioned is very common, especially in downstream cluster manager packages like dask-{kubernetes,jobqueue,cloudprovider,etc}. Common enough to justify this.

Thanks for the input everyone, I think this is a pretty central change and lazy concensus made me a little nervous here.

Happy to retract my concern.

Merging!

@jacobtomlinson jacobtomlinson merged commit a988716 into dask:master Dec 11, 2020
@jacobtomlinson jacobtomlinson deleted the config-kwarg branch December 11, 2020 15:48
@jacobtomlinson
Copy link
Member Author

Also thanks for all your time here @jsignell. This rabbit hole was pretty deep for such a small change. I appreciated having someone along for the ride.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Get config item if variable is None
5 participants