Add config for handling obj normalization with missing __dask_tokenize__ #7413

hristog · 2021-03-17T14:05:12Z

Closes Flag for raising an error when normalize_object doesn't find __dask_tokenize__ #6555
- Added tokenize.allow-random config, accepting True or False.
Tests added / passed
- Added test_normalize_object to dask/tests/test_base.py.
Passes black dask / flake8 dask

hristog · 2021-03-17T14:07:29Z

dask/tests/test_base.py

@@ -8,6 +8,8 @@

 from tlz import merge, partial, compose, curry

+from unittest import mock


Not entirely sure what convention to employ here in terms of whether to import mock at this level or within test_normalize_object itself (just as it's been done for import warnings; the latter has been imported within the test only, because its general utility is not as wide, in this particular context).

Any feedback would be greatly appreciated.

dask/base.py

hristog · 2021-03-17T15:18:23Z

Need to go through the rest of the codebase a bit more to get an idea how it'd be best to go about documenting the newly-proposed config.

hristog · 2021-03-17T20:29:14Z

I'm considering documenting the config as part of this section, because I haven't been able to find a more suitable one.
Any suggestions would be greatly appreciated.

jrbourbeau

Thanks for the PR @hristog!

We can probably scale the changes back here a bit to just support a boolean flag for whether or not we should raise an error. I suspect a relatively small group of users will utilize this setting and when things are non-deterministic they'll want an error to be raised. We can always expand thing in the future if needed.

Something like (I've not tested the snippet below)

if callable(o):
    return normalize_function(o)
elif dask.config.get("tokenize.allow-random", True):
    return uuid.uuid4().hex
else:
    raise RuntimeError("...")

should work for that case

jrbourbeau · 2021-03-18T01:59:21Z

dask/tests/test_base.py

@@ -224,6 +226,88 @@ def test_normalize_base():
        assert normalize_token(i) is i


+def test_normalize_object():


What do you think about something like (again I've not tested the snippet below):

with dask.config.set({"tokenize.allow-random": False}): with pytest.raises(RuntimeError, match="..."): tokenize(object())

hristog · 2021-03-18T17:42:18Z

Thanks for looking into this PR, @jrbourbeau!

I've scaled down the functionality, associated with the newly-proposed config, as suggested, and updated its value to bool. The accompanying unit-tests have been updated accordingly, too.

One tiny concern I've got about the naming (allow-random) is that - as mentioned in this comment of mine - there are other sources of non-determinism that aren't going to be impacted by the proposed logic.

I suppose this could always be added, given actual demand, if GitHub issues are submitted requesting those to be supported at any point later on.

When you get a chance, please, let me know what your thoughts are, regarding the best possible way to document the newly-introduced config.

hristog · 2021-04-08T16:34:55Z

Hi @jrbourbeau - I'm writing to check if you've had a chance to look into the further changes that I pushed in 41b135b, and whether you'd like any additional refinements to be done on top of that.
Thanks!

jrbourbeau

Thanks @hristog -- apologies for the delayed response.

One tiny concern I've got about the naming (allow-random) is that - as mentioned in this comment of mine - there are other sources of non-determinism that aren't going to be impacted by the proposed logic.

This is a great point -- if you want to add a check for tokenize.allow-random in those locations too, that would be welcome

dask/base.py

jrbourbeau · 2021-04-08T17:16:19Z

dask/tests/test_base.py

@@ -224,6 +226,49 @@ def test_normalize_base():
        assert normalize_token(i) is i


+def test_normalize_object():


This test can be simplified a bit. In particular, we can probably just test that an informative error is raised when tokenize.allow-random is False and we attempt to tokenize an object which can't be deterministically hashed. See this comment https://github.com/dask/dask/pull/7413/files#r596499598 for an outline of a simplified test.

Hmm.. my concerns here are that, without tests proving that true negatives are handled as such, we may not have sufficient confidence (purely, from viewing unit tests as documentation of expected functionality) that the newly introduced functionality doesn't break behavior which it shouldn't affect. And, somehow, I don't feel comfortable to rely on other tests, external to this one, to do this for us.

Of course, if you strongly believe we could make do without the extra set of tests, and just have one which makes assertions with respect to one aspect only, then I'll have to remove them (no issues about that). Just wanted to share my concerns.

This is a great point -- if you want to add a check for tokenize.allow-random in those locations too, that would be welcome

Hi @jrbourbeau - yes, I've opted to implement configurable behavior for those as well. The commits have already been added to this PR.

This test can be simplified a bit.

I've made an attempt at simplifying the tests. Hopefully, looking better from your point of view now. Let me know, please, if you'd like anything else changed.

…llections (dask#7413)

)

hristog · 2021-04-21T15:30:33Z

Hi @jrbourbeau, I'm writing to check if you've had a chance to have a look at the most recent updates.
As discussed, I've scaled down the tests a bit further - the summary of my updates can be found in the most recent two comments, from this thread.

Please, let me know if the tests should be further stripped down, despite the discussed concerns.

ncclementi · 2021-09-24T17:11:14Z

@jrbourbeau checking in here, it looks like @hristog updated the code according to your requests and CI is green. Do you think this is ready to go, or are there any other issues that should be addressed?

jsignell

This is looking good, you just need a change to the config file to support this change.

- Rename config field to `tokenize.ensure-deterministic` - Add config field to `dask.yaml` and `dask-schema.yaml`

jcrist · 2021-10-04T17:47:41Z

I've renamed the config field to tokenize.ensure-deterministic to better reflect the meaning of the parameter. I've also added it to the dask.yaml config file and the dask-schema.yaml schema description. I believe this should be good to go once tests pass.

…k-tokenize

hristog · 2021-10-04T18:10:56Z

I've renamed the config field to tokenize.ensure-deterministic to better reflect the meaning of the parameter. I've also added it to the dask.yaml config file and the dask-schema.yaml schema description. I believe this should be good to go once tests pass.

Thanks, @jcrist!

jrbourbeau · 2021-10-04T18:31:18Z

Thanks for the updates @jcrist! I pushed a small commit to avoid mocking in the tests added here. Will merge after CI finishes

jcrist · 2021-10-04T19:27:16Z

Test failure is unrelated - merging. Thanks @hristog!

hristog · 2021-10-05T10:07:27Z

Thanks for the updates @jcrist! I pushed a small commit to avoid mocking in the tests added here. Will merge after CI finishes

Thanks, @jrbourbeau! Thanks to your commit, I can much better understand what you meant in your previous comments on this PR.

Add config for handling obj normalization with missing __dask_tokeniz…

ec7bddd

…e__ (dask#6555)

hristog commented Mar 17, 2021

View reviewed changes

dask/base.py Outdated Show resolved Hide resolved

hristog commented Mar 17, 2021

View reviewed changes

dask/base.py Outdated Show resolved Hide resolved

hristog mentioned this pull request Mar 17, 2021

Flag for raising an error when normalize_object doesn't find __dask_tokenize__ #6555

Closed

hristog commented Mar 17, 2021

View reviewed changes

dask/base.py Outdated Show resolved Hide resolved

Reset warning filter as a final step in test_normalize_object

05ff542

jrbourbeau reviewed Mar 18, 2021

View reviewed changes

Simplify tokenization config for handling non-determinism

41b135b

jrbourbeau reviewed Apr 8, 2021

View reviewed changes

hristog added 12 commits April 9, 2021 11:33

Merge remote-tracking branch 'dask/main' into flexible-dask-tokenize

271529a

Update err msg to point to obj normalization docs (dask#7413)

7e5a5de

Configure non-deterministic hashing of np.ndarrays and recursive co…

70436ec

…llections (dask#7413)

Test non-deterministic hashing of s and recursive collections (dask#7413

bd8ea0b

)

Fix import order in dask/tests/test_base.py (dask#7413)

eeca881

Simplify test_tokenize_object a bit (dask#7413)

b885ab0

Re-factor `test_tokenize_object (dask#7413)

24b4c67

Simplify test_tokenize_object further (dask#7413)

765db8f

Silence E501 for a line lenth involving a URL (dask#7413)

077f846

Add extra whitespace to make black==20.8b1 happy (dask#7413)

7ad3c7a

Improve runtime error msg (dask#7413)

3fc1c0b

Update test_tokenize_numpy_array_on_object_dtype (dask#7413)

cbf052c

jsignell reviewed Sep 27, 2021

View reviewed changes

A few cleanups

f183bf9

- Rename config field to `tokenize.ensure-deterministic` - Add config field to `dask.yaml` and `dask-schema.yaml`

Merge branch 'main' of https://github.com/dask/dask into flexible-das…

4f39008

…k-tokenize

Test updates

cb9d9d1

jcrist merged commit 049d803 into dask:main Oct 4, 2021

hristog deleted the flexible-dask-tokenize branch October 5, 2021 10:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add config for handling obj normalization with missing __dask_tokenize__ #7413

Add config for handling obj normalization with missing __dask_tokenize__ #7413

hristog commented Mar 17, 2021 •

edited

Loading

hristog Mar 17, 2021

hristog commented Mar 17, 2021

hristog commented Mar 17, 2021

jrbourbeau left a comment

jrbourbeau Mar 18, 2021

hristog commented Mar 18, 2021 •

edited

Loading

hristog commented Apr 8, 2021

jrbourbeau left a comment

jrbourbeau Apr 8, 2021

hristog Apr 9, 2021 •

edited

Loading

hristog Apr 9, 2021

hristog Apr 9, 2021

hristog commented Apr 21, 2021

ncclementi commented Sep 24, 2021

jsignell left a comment

jcrist commented Oct 4, 2021

hristog commented Oct 4, 2021

jrbourbeau commented Oct 4, 2021

jcrist commented Oct 4, 2021

hristog commented Oct 5, 2021

		@@ -8,6 +8,8 @@

		from tlz import merge, partial, compose, curry

		from unittest import mock

		@@ -224,6 +226,88 @@ def test_normalize_base():
		assert normalize_token(i) is i


		def test_normalize_object():

		@@ -224,6 +226,49 @@ def test_normalize_base():
		assert normalize_token(i) is i


		def test_normalize_object():

Add config for handling obj normalization with missing __dask_tokenize__ #7413

Add config for handling obj normalization with missing __dask_tokenize__ #7413

Conversation

hristog commented Mar 17, 2021 • edited Loading

hristog Mar 17, 2021

Choose a reason for hiding this comment

hristog commented Mar 17, 2021

hristog commented Mar 17, 2021

jrbourbeau left a comment

Choose a reason for hiding this comment

jrbourbeau Mar 18, 2021

Choose a reason for hiding this comment

hristog commented Mar 18, 2021 • edited Loading

hristog commented Apr 8, 2021

jrbourbeau left a comment

Choose a reason for hiding this comment

jrbourbeau Apr 8, 2021

Choose a reason for hiding this comment

hristog Apr 9, 2021 • edited Loading

Choose a reason for hiding this comment

hristog Apr 9, 2021

Choose a reason for hiding this comment

hristog Apr 9, 2021

Choose a reason for hiding this comment

hristog commented Apr 21, 2021

ncclementi commented Sep 24, 2021

jsignell left a comment

Choose a reason for hiding this comment

jcrist commented Oct 4, 2021

hristog commented Oct 4, 2021

jrbourbeau commented Oct 4, 2021

jcrist commented Oct 4, 2021

hristog commented Oct 5, 2021

hristog commented Mar 17, 2021 •

edited

Loading

hristog commented Mar 18, 2021 •

edited

Loading

hristog Apr 9, 2021 •

edited

Loading