-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add config for handling obj normalization with missing __dask_tokenize__ #7413
Conversation
dask/tests/test_base.py
Outdated
@@ -8,6 +8,8 @@ | |||
|
|||
from tlz import merge, partial, compose, curry | |||
|
|||
from unittest import mock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely sure what convention to employ here in terms of whether to import mock
at this level or within test_normalize_object
itself (just as it's been done for import warnings
; the latter has been imported within the test only, because its general utility is not as wide, in this particular context).
Any feedback would be greatly appreciated.
Need to go through the rest of the codebase a bit more to get an idea how it'd be best to go about documenting the newly-proposed config. |
I'm considering documenting the config as part of this section, because I haven't been able to find a more suitable one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @hristog!
We can probably scale the changes back here a bit to just support a boolean flag for whether or not we should raise an error. I suspect a relatively small group of users will utilize this setting and when things are non-deterministic they'll want an error to be raised. We can always expand thing in the future if needed.
Something like (I've not tested the snippet below)
if callable(o):
return normalize_function(o)
elif dask.config.get("tokenize.allow-random", True):
return uuid.uuid4().hex
else:
raise RuntimeError("...")
should work for that case
dask/tests/test_base.py
Outdated
@@ -224,6 +226,88 @@ def test_normalize_base(): | |||
assert normalize_token(i) is i | |||
|
|||
|
|||
def test_normalize_object(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about something like (again I've not tested the snippet below):
with dask.config.set({"tokenize.allow-random": False}):
with pytest.raises(RuntimeError, match="..."):
tokenize(object())
Thanks for looking into this PR, @jrbourbeau! I've scaled down the functionality, associated with the newly-proposed config, as suggested, and updated its value to One tiny concern I've got about the naming ( I suppose this could always be added, given actual demand, if GitHub issues are submitted requesting those to be supported at any point later on. When you get a chance, please, let me know what your thoughts are, regarding the best possible way to document the newly-introduced config. |
Hi @jrbourbeau - I'm writing to check if you've had a chance to look into the further changes that I pushed in 41b135b, and whether you'd like any additional refinements to be done on top of that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hristog -- apologies for the delayed response.
One tiny concern I've got about the naming (allow-random) is that - as mentioned in this comment of mine - there are other sources of non-determinism that aren't going to be impacted by the proposed logic.
This is a great point -- if you want to add a check for tokenize.allow-random
in those locations too, that would be welcome
dask/tests/test_base.py
Outdated
@@ -224,6 +226,49 @@ def test_normalize_base(): | |||
assert normalize_token(i) is i | |||
|
|||
|
|||
def test_normalize_object(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test can be simplified a bit. In particular, we can probably just test that an informative error is raised when tokenize.allow-random
is False
and we attempt to tokenize
an object which can't be deterministically hashed. See this comment https://github.com/dask/dask/pull/7413/files#r596499598 for an outline of a simplified test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. my concerns here are that, without tests proving that true negatives are handled as such, we may not have sufficient confidence (purely, from viewing unit tests as documentation of expected functionality) that the newly introduced functionality doesn't break behavior which it shouldn't affect. And, somehow, I don't feel comfortable to rely on other tests, external to this one, to do this for us.
Of course, if you strongly believe we could make do without the extra set of tests, and just have one which makes assertions with respect to one aspect only, then I'll have to remove them (no issues about that). Just wanted to share my concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great point -- if you want to add a check for
tokenize.allow-random
in those locations too, that would be welcome
Hi @jrbourbeau - yes, I've opted to implement configurable behavior for those as well. The commits have already been added to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test can be simplified a bit.
I've made an attempt at simplifying the tests. Hopefully, looking better from your point of view now. Let me know, please, if you'd like anything else changed.
Hi @jrbourbeau, I'm writing to check if you've had a chance to have a look at the most recent updates. Please, let me know if the tests should be further stripped down, despite the discussed concerns. |
@jrbourbeau checking in here, it looks like @hristog updated the code according to your requests and CI is green. Do you think this is ready to go, or are there any other issues that should be addressed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good, you just need a change to the config file to support this change.
- Rename config field to `tokenize.ensure-deterministic` - Add config field to `dask.yaml` and `dask-schema.yaml`
I've renamed the config field to |
Thanks, @jcrist! |
Thanks for the updates @jcrist! I pushed a small commit to avoid |
Test failure is unrelated - merging. Thanks @hristog! |
Thanks, @jrbourbeau! Thanks to your commit, I can much better understand what you meant in your previous comments on this PR. |
normalize_object
doesn't find__dask_tokenize__
#6555tokenize.allow-random
config, acceptingTrue
orFalse
.test_normalize_object
todask/tests/test_base.py
.black dask
/flake8 dask