New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Override tokenize.ensure-deterministic config flag #10913
Override tokenize.ensure-deterministic config flag #10913
Conversation
7ce25e9
to
aadcd40
Compare
aadcd40
to
a176d83
Compare
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ± 0 15 suites ±0 3h 23m 44s ⏱️ + 6m 3s Results for commit 10cdfce. ± Comparison against base commit 328d074. ♻️ This comment has been updated with latest results. |
# tokenize.ensure-deterministic flag, potentially overridden by tokenize() | ||
_ensure_deterministic: ContextVar[bool] = ContextVar("_ensure_deterministic") | ||
|
||
# Circular reference breaker used by _normalize_seq_func. | ||
# This variable is recreated anew every time you call tokenize(). Note that this means | ||
# that you could call tokenize() from inside tokenize() and they would be fully | ||
# independent. | ||
# | ||
# It is a map of {id(obj): (<first seen incremental int>, obj)} which causes an object | ||
# to be tokenized as ("__seen", <incremental>) the second time it's encountered while | ||
# traversing collections. A strong reference to the object is stored in the context to | ||
# prevent ids from being reused by different objects. | ||
_seen: ContextVar[dict[int, tuple[int, object]]] = ContextVar("_seen") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems overly complex. Why isn't a single ContextVar
sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only noticed now that this was already introduced in #10876
dask/tests/test_tokenize.py
Outdated
# Test ctx variable cleanup | ||
with pytest.raises(LookupError): | ||
_ensure_deterministic.get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is about ufuncs. Asserting on this ctxvar feels odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test may fail if I were to tamper with the ctxvar without using the helper function _maybe_raise_nondeterministic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it would be useful as a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworked it.
a176d83
to
2bef3a3
Compare
dask/tests/test_tokenize.py
Outdated
# Test ctx variable cleanup | ||
with pytest.raises(LookupError): | ||
_ensure_deterministic.get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it would be useful as a comment.
assert tokenize(inc, ensure_deterministic=False) != tokenize( | ||
inc, ensure_deterministic=False | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be nested within the block above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. These 3 tests use the same pattern:
with dask.config.set({"tokenize.ensure-deterministic": False}):
tokenize(...)
with dask.config.set({"tokenize.ensure-deterministic": True}):
tokenize(...)
tokenize(..., ensure_determinsitic=False)
tokenize(..., ensure_deterministic=True)
This makes them impervious to whatever the config is set to globally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, I attributed the inline comment only to this test case, not the one below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also wasn't sure if this should test overriding "whatever" the config is set to globally or overriding any possible config value.
@hendrikmakait @fjetter all comments have been addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @crusaderky! LGTM assuming CI is green
tokenize(ensure_deterministic=True)
to force raising exceptiontest_tokenize.py
with either value for the config variable_seen
variable