Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Override tokenize.ensure-deterministic config flag #10913

Merged
merged 3 commits into from Feb 15, 2024

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Feb 9, 2024

  • Add optional parameter to tokenize(ensure_deterministic=True) to force raising exception
  • Can now run test_tokenize.py with either value for the config variable
  • Subclass RuntimeError to restrict exception catching (we're calling pickle internally, which is very complex)
  • Better tests for the _seen variable

crusaderky added a commit to crusaderky/dask that referenced this pull request Feb 9, 2024
Copy link
Contributor

github-actions bot commented Feb 9, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

     15 files  ± 0       15 suites  ±0   3h 23m 44s ⏱️ + 6m 3s
 13 032 tests + 1   12 101 ✅ + 1     931 💤 ±0  0 ❌ ±0 
161 167 runs  +15  144 634 ✅ +20  16 533 💤  - 5  0 ❌ ±0 

Results for commit 10cdfce. ± Comparison against base commit 328d074.

♻️ This comment has been updated with latest results.

Comment on lines +1043 to +1055
# tokenize.ensure-deterministic flag, potentially overridden by tokenize()
_ensure_deterministic: ContextVar[bool] = ContextVar("_ensure_deterministic")

# Circular reference breaker used by _normalize_seq_func.
# This variable is recreated anew every time you call tokenize(). Note that this means
# that you could call tokenize() from inside tokenize() and they would be fully
# independent.
#
# It is a map of {id(obj): (<first seen incremental int>, obj)} which causes an object
# to be tokenized as ("__seen", <incremental>) the second time it's encountered while
# traversing collections. A strong reference to the object is stored in the context to
# prevent ids from being reused by different objects.
_seen: ContextVar[dict[int, tuple[int, object]]] = ContextVar("_seen")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems overly complex. Why isn't a single ContextVar sufficient?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only noticed now that this was already introduced in #10876

dask/base.py Outdated Show resolved Hide resolved
dask/tests/test_tokenize.py Outdated Show resolved Hide resolved
dask/tests/test_tokenize.py Outdated Show resolved Hide resolved
Comment on lines 291 to 293
# Test ctx variable cleanup
with pytest.raises(LookupError):
_ensure_deterministic.get()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is about ufuncs. Asserting on this ctxvar feels odd.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test may fail if I were to tamper with the ctxvar without using the helper function _maybe_raise_nondeterministic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it would be useful as a comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reworked it.

dask/tests/test_tokenize.py Outdated Show resolved Hide resolved
Comment on lines 291 to 293
# Test ctx variable cleanup
with pytest.raises(LookupError):
_ensure_deterministic.get()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it would be useful as a comment.

dask/tests/test_tokenize.py Show resolved Hide resolved
Comment on lines +285 to +287
assert tokenize(inc, ensure_deterministic=False) != tokenize(
inc, ensure_deterministic=False
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be nested within the block above?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. These 3 tests use the same pattern:

with dask.config.set({"tokenize.ensure-deterministic": False}):
    tokenize(...)
with dask.config.set({"tokenize.ensure-deterministic": True}):
    tokenize(...)
tokenize(..., ensure_determinsitic=False)
tokenize(..., ensure_deterministic=True)

This makes them impervious to whatever the config is set to globally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I attributed the inline comment only to this test case, not the one below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wasn't sure if this should test overriding "whatever" the config is set to globally or overriding any possible config value.

dask/tests/test_tokenize.py Show resolved Hide resolved
@crusaderky
Copy link
Collaborator Author

@hendrikmakait @fjetter all comments have been addressed

Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @crusaderky! LGTM assuming CI is green

@crusaderky crusaderky merged commit 76b255e into dask:main Feb 15, 2024
28 checks passed
@crusaderky crusaderky deleted the override_ensure_deterministic branch February 15, 2024 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants