Do not always check if `main in result` when pickling #8443

fjetter · 2024-01-09T10:45:51Z

This logic evolved over time but the doc string already suggests that we're performing type checks first before we do the "is main in result" check. Some refactoring along the way changed this. Particularly for large results this can be a big difference such that the thing in result check can be more expensive than the actual serialization. This can be most strongly observed when pickling bytes directly (not sure if we're actually doing that) or more generally for everything that we're blocklisting in _always_use_pickle_for (I think we should expand this, e.g. to include arrow tables for p2p)

I ended up rewriting the logic to something that is easier to understand imo. This includes a minor functional change. Previously, it would have been possible for an object to be classified as eligible for cloudpickle by the main in result or pickle_by_value guard even though it is blocklisted by the always_use_pickle_for but only if the object was very small. This is a bit off an odd logic. In fact, it cannot even occur since always_use_pickle_for concerns instances while the pickle_by_value and main in result check concerns functions and classes. Still, imo this made the logic less readable. The new logic is subjectively easier to read and short circuits much more quickly in the happy path or always_use_pickle_for==True

github-actions · 2024-01-09T11:30:31Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

27 files ± 0 27 suites ±0 9h 46m 28s ⏱️ + 2m 13s
3 951 tests ± 0 3 839 ✅ + 1 109 💤 ±0 3 ❌ - 1
49 696 runs +18 47 404 ✅ +25 2 289 💤 - 6 3 ❌ - 1

For more details on these failures, see this check.

Results for commit c5f1379. ± Comparison against base commit 7562f9c.

hendrikmakait

Thanks, @fjetter, changes look good to me.

To workaround issue ( dask/distributed#8454 ), pin Distributed to a version before PR ( dask/distributed#8443 ) was included

…sk#8443)" This reverts commit 5c481dd.

fjetter added 3 commits January 9, 2024 10:15

Check if main is in pickled stream very last

f4d3c11

never check for main if _always_use_pickle_for is true

ee7c1c5

rewrite the thing

c5f1379

hendrikmakait approved these changes Jan 10, 2024

View reviewed changes

hendrikmakait merged commit 5c481dd into dask:main Jan 10, 2024
30 of 35 checks passed

hendrikmakait mentioned this pull request Jan 11, 2024

Failing to deserialize user function #8454

Closed

rjzamora mentioned this pull request Jan 11, 2024

refactor CUDA versions in dependencies.yaml rapidsai/cudf#14733

Merged

3 tasks

jakirkham mentioned this pull request Jan 11, 2024

Pin Distributed to last working version rapidsai/rapids-dask-dependency#17

Merged

vyasr pushed a commit to rapidsai/rapids-dask-dependency that referenced this pull request Jan 11, 2024

Pin Distributed to last working version (#17)

4a2dd6f

To workaround issue ( dask/distributed#8454 ), pin Distributed to a version before PR ( dask/distributed#8443 ) was included

fjetter mentioned this pull request Jan 12, 2024

check for main module in reducer override #8455

Draft

fjetter added a commit to fjetter/distributed that referenced this pull request Jan 12, 2024

Revert "Do not always check if __main__ in result when pickling (da…

701dbbd

…sk#8443)" This reverts commit 5c481dd.

fjetter mentioned this pull request Jan 12, 2024

Revert pickle change #8456

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not always check if `main in result` when pickling #8443

Do not always check if `main in result` when pickling #8443

fjetter commented Jan 9, 2024

github-actions bot commented Jan 9, 2024

hendrikmakait left a comment

Do not always check if __main__ in result when pickling #8443

Do not always check if __main__ in result when pickling #8443

Conversation

fjetter commented Jan 9, 2024

github-actions bot commented Jan 9, 2024

Unit Test Results

hendrikmakait left a comment

Choose a reason for hiding this comment

Do not always check if `main in result` when pickling #8443

Do not always check if `main in result` when pickling #8443