Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not always check if __main__ in result when pickling #8443

Merged
merged 3 commits into from Jan 10, 2024

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Jan 9, 2024

This logic evolved over time but the doc string already suggests that we're performing type checks first before we do the "is main in result" check. Some refactoring along the way changed this. Particularly for large results this can be a big difference such that the thing in result check can be more expensive than the actual serialization. This can be most strongly observed when pickling bytes directly (not sure if we're actually doing that) or more generally for everything that we're blocklisting in _always_use_pickle_for (I think we should expand this, e.g. to include arrow tables for p2p)

I ended up rewriting the logic to something that is easier to understand imo. This includes a minor functional change. Previously, it would have been possible for an object to be classified as eligible for cloudpickle by the main in result or pickle_by_value guard even though it is blocklisted by the always_use_pickle_for but only if the object was very small. This is a bit off an odd logic. In fact, it cannot even occur since always_use_pickle_for concerns instances while the pickle_by_value and main in result check concerns functions and classes. Still, imo this made the logic less readable. The new logic is subjectively easier to read and short circuits much more quickly in the happy path or always_use_pickle_for==True

Copy link
Contributor

github-actions bot commented Jan 9, 2024

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    27 files  ± 0      27 suites  ±0   9h 46m 28s ⏱️ + 2m 13s
 3 951 tests ± 0   3 839 ✅ + 1    109 💤 ±0  3 ❌  - 1 
49 696 runs  +18  47 404 ✅ +25  2 289 💤  - 6  3 ❌  - 1 

For more details on these failures, see this check.

Results for commit c5f1379. ± Comparison against base commit 7562f9c.

Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @fjetter, changes look good to me.

@hendrikmakait hendrikmakait merged commit 5c481dd into dask:main Jan 10, 2024
30 of 35 checks passed
vyasr pushed a commit to rapidsai/rapids-dask-dependency that referenced this pull request Jan 11, 2024
To workaround issue ( dask/distributed#8454 ),
pin Distributed to a version before PR (
dask/distributed#8443 ) was included
fjetter added a commit to fjetter/distributed that referenced this pull request Jan 12, 2024
@fjetter fjetter mentioned this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants