Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deduplicate cloudpickle reducers. #368

Merged
merged 51 commits into from
Jun 30, 2020

Conversation

pierreglaser
Copy link
Member

@pierreglaser pierreglaser commented May 14, 2020

closes #284
related to #364

About backward compatiblity:

  • this PR removes make_skel_func, fill_function, e.g the previous function cloudpickle used to reconstruct functions, as they were equivalent to the new function_setstate/function_new (modulo some Python 3.8 compatiblity. These functions are important to reconstruct pickles created by previous cloudpickle versions. A simple fix is to keep them inside cloudpickle.py for a few releases and add a FutureWarning inside them saying that an attempt is made into reading old pickle, and that reading them will break in 2 releases.
  • By removing the previous Python < 3.8 CloudPickler class this PR also removes semi-public functions (all the CloudPickler.save_*). These functions are not necessary to read old pickles, but they could be used inside third-party code. To address this, we could keep exposing the previous CloudPickler for the next few releases, but make cloudpickle.dump(s) use the new CloudPickler.
    This way, we can add a FutureWarning into the previous CloudPickler.__init__, while cloudpickle.dump(s) remains silent.

Also, the module names don't make much sense now. In the future we should rename cloudpickle_fast.py to cloudpickle.py, and merge it with the previous cloudpickle.py.

@jakirkham if you want to give #364 another shot, but rebasing on this PR first, I suspect its implementation should be much easier :)

@pierreglaser pierreglaser marked this pull request as draft May 14, 2020 08:59
@codecov
Copy link

codecov bot commented May 14, 2020

Codecov Report

Merging #368 into master will decrease coverage by 1.66%.
The diff coverage is 97.43%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #368      +/-   ##
==========================================
- Coverage   92.91%   91.25%   -1.67%     
==========================================
  Files           2        2              
  Lines         805      629     -176     
  Branches      164      132      -32     
==========================================
- Hits          748      574     -174     
- Misses         29       34       +5     
+ Partials       28       21       -7     
Impacted Files Coverage Δ
cloudpickle/cloudpickle.py 86.44% <95.12%> (-5.33%) ⬇️
cloudpickle/cloudpickle_fast.py 96.63% <98.26%> (+0.90%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e2fcfeb...8eba950. Read the comment docs.

@jakirkham
Copy link
Member

Thank you @pierreglaser! This is very helpful 😄

Have opened PR ( #370 ) with a redux of the pickle5 changes. There are still some test failures, which I'm working through (though advice would certainly be appreciated ;).

@jakirkham
Copy link
Member

Should this still be marked as a draft or is it ready to review?

@pierreglaser
Copy link
Member Author

What's missing right now is ensuring that cloudpickle in this PR is able to unserialize pickle files created by (reasonably) older cloudpickle versions. We need to:

  • restore make_skel_func/fill_function/make_skeleton_class, but when called, those functions should now raise a deprecation warning saying one should not use different versions of cloudpickle to pickle/depickle the same object. I have not done this before to show the world how useful this PR is through its diff stats, but now that @ogrisel is aware of it I can proceed :)
  • We also need a few tests to make sure the best-effort backward compatibility of cloudpickle works as expected. Hardcoding a few bytes strings containing dynamic classes and function in a compatibility test should be enough, but if anyone thinks of a more automatized way to generate old pickle files for each test of the existing test suite, I'm opened to suggestions.

@jakirkham
Copy link
Member

jakirkham commented Jun 9, 2020

Thanks for the update and additional context Pierre! 😄

Ensuring consistency of results makes sense. Using old pickle files makes sense. Maybe there is some tooling that helps with this? Searching a bit pytest-datadir and pytest-datafiles come up as options for organizing this a bit. Are there particular tests that you think we should use?

@ogrisel
Copy link
Contributor

ogrisel commented Jun 10, 2020

pytest-datadir looks simple and nice to use.

Co-authored-by: jakirkham <jakirkham@gmail.com>
return NotImplemented
# Take into account potential custom reducers registered by external
# modules
dispatch_table = copyreg.dispatch_table.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if copyreg has a new override registered with it? Will this get updated?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thought instead of copying might be to use ChainMap to combine an internal dispatch table with copyreg's. That way we still pick up details from copyreg, but can prioritize our own mapping.

Copy link
Contributor

@ogrisel ogrisel Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the previous code did not have this issue. I like the suggestion to use ChainMap.

@pierreglaser pierreglaser marked this pull request as ready for review June 14, 2020 16:45
@ogrisel
Copy link
Contributor

ogrisel commented Jun 30, 2020

The CI failure on macos + pypy3 seems unrelated to this PR (and to cloudpickle generally) as I can reproduce on master (#383)

@pierreglaser I decided to skip this entry in the build matrix for the time being. Too lazy to debug this for now. We can always decide to reactivate later. pypy3 is still tested under linux.

@ogrisel ogrisel added the ci downstream Signal the CI to run the test suite of all registered cloudpickle downstream projects. label Jun 30, 2020
@ogrisel
Copy link
Contributor

ogrisel commented Jun 30, 2020

  • The joblib failure test_thread_bomb_mitigation[loky] is known and unrelated (although I had not observed it on Py 3.7 before if I remember correctly).

  • The joblib failure test_nested_exception_dispatch has already been fixed in joblib but not yet released.

  • on Ray we get:

 >           pickled_function = pickle.dumps(function)
E           AttributeError: module 'ray.cloudpickle' has no attribute 'dumps'

this is weird. I will check if I can reproduce on master.

@ogrisel
Copy link
Contributor

ogrisel commented Jun 30, 2020

The downstream failure in Ray seems to be cause by a change in this PR, it cannot be reproduced on master: #388

@ogrisel
Copy link
Contributor

ogrisel commented Jun 30, 2020

The Ray failure is indeed related to #368 (review).

@jakirkham
Copy link
Member

jakirkham commented Jun 30, 2020

FWIW Ray vendored an earlier version of these changes and the pickle5 changes in PR ( ray-project/ray#8577 ). Not sure if it’s related, but wanted to mention it just in case it is.

@ogrisel
Copy link
Contributor

ogrisel commented Jun 30, 2020

@pierreglaser I have tried to fix the above but my brain is exploding. I don't know how to do it without introducing mutually dependent modules or breaking backward compat with old pickles which we do not want.

@pierreglaser
Copy link
Member Author

OK - I'm investigating.

@pierreglaser
Copy link
Member Author

The ray downstream build needed a serious lifting. Apparently the ray developpers moved faster and already tweaked cloudpickle to support pickle5 when available. Their codebase seems to make use of the new pickle 5 API, which the current cloudpickle in this PR does not support. We should test the ray downstream builds inside #370.

@jakirkham
Copy link
Member

Was wondering if that was the case. Thanks for figuring that out Pierre! 😄

@pierreglaser
Copy link
Member Author

@ogrisel ray tests are now passing in #389. Debugging this issue was insightful as it casted the light at real compatibility issues between ray and this version of cloudpickle, especially the deletion of the dispatch attribute when CloudPickler extends a C-implemented Pickler.

@ogrisel
Copy link
Contributor

ogrisel commented Jun 30, 2020

Ok let's merge this then.

@ogrisel ogrisel merged commit 938553f into cloudpipe:master Jun 30, 2020
@jakirkham
Copy link
Member

Thanks all! 😄

@pierreglaser
Copy link
Member Author

Woot!

pierreglaser added a commit that referenced this pull request Jul 1, 2020
wip-sync pushed a commit to NetBSD/pkgsrc-wip that referenced this pull request Apr 4, 2022
2.0.0
=====

- Python 3.5 is no longer supported.

- Support for registering modules to be serialised by value. This allows code
  defined in local modules to be serialised and executed remotely without those
  local modules installed on the remote machine.
  ([PR #417](cloudpipe/cloudpickle#417))

- Fix a side effect altering dynamic modules at pickling time.
  ([PR #426](cloudpipe/cloudpickle#426))

- Support for pickling type annotations on Python 3.10 as per [PEP 563](
  https://www.python.org/dev/peps/pep-0563/)
  ([PR #400](cloudpipe/cloudpickle#400))

- Stricter parametrized type detection heuristics in
  _is_parametrized_type_hint to limit false positives.
  ([PR #409](cloudpipe/cloudpickle#409))

- Support pickling / depickling of OrderedDict KeysView, ValuesView, and
  ItemsView, following similar strategy for vanilla Python dictionaries.
  ([PR #423](cloudpipe/cloudpickle#423))

- Suppressed a source of non-determinism when pickling dynamically defined
  functions and handles the deprecation of co_lnotab in Python 3.10+.
  ([PR #428](cloudpipe/cloudpickle#428))

1.6.0
=====

- `cloudpickle`'s pickle.Pickler subclass (currently defined as
  `cloudpickle.cloudpickle_fast.CloudPickler`) can and should now be accessed
  as `cloudpickle.Pickler`. This is the only officially supported way of
  accessing it.
  ([issue #366](cloudpipe/cloudpickle#366))

- `cloudpickle` now supports pickling `dict_keys`, `dict_items` and
  `dict_values`.
  ([PR #384](cloudpipe/cloudpickle#384))

1.5.0
=====

- Fix a bug causing cloudpickle to crash when pickling dynamically created,
  importable modules.
  ([issue #360](cloudpipe/cloudpickle#354))

- Add optional dependency on `pickle5` to get improved performance on
  Python 3.6 and 3.7.
  ([PR #370](cloudpipe/cloudpickle#370))

- Internal refactoring to ease the use of `pickle5` in cloudpickle
  for Python 3.6 and 3.7.
  ([PR #368](cloudpipe/cloudpickle#368))

1.4.1
=====

- Fix incompatibilities between cloudpickle 1.4.0 and Python 3.5.0/1/2
  introduced by the new support of cloudpickle for pickling typing constructs.
  ([issue #360](cloudpipe/cloudpickle#360))

- Restore compat with loading dynamic classes pickled with cloudpickle
  version 1.2.1 that would reference the `types.ClassType` attribute.
  ([PR #359](cloudpipe/cloudpickle#359))

1.4.0
=====

**This version requires Python 3.5 or later**

- cloudpickle can now all pickle all constructs from the ``typing`` module
  and the ``typing_extensions`` library in Python 3.5+
  ([PR #318](cloudpipe/cloudpickle#318))

- Stop pickling the annotations of a dynamic class for Python < 3.6
  (follow up on #276)
  ([issue #347](cloudpipe/cloudpickle#347))

- Fix a bug affecting the pickling of dynamic `TypeVar` instances on Python 3.7+,
  and expand the support for pickling `TypeVar` instances (dynamic or non-dynamic)
  to Python 3.5-3.6 ([PR #350](cloudpipe/cloudpickle#350))

- Add support for pickling dynamic classes subclassing `typing.Generic`
  instances on Python 3.7+
  ([PR #351](cloudpipe/cloudpickle#351))

1.3.0
=====

- Fix a bug affecting dynamic modules occuring with modified builtins
  ([issue #316](cloudpipe/cloudpickle#316))

- Fix a bug affecting cloudpickle when non-modules objects are added into
  sys.modules
  ([PR #326](cloudpipe/cloudpickle#326)).

- Fix a regression in cloudpickle and python3.8 causing an error when trying to
  pickle property objects.
  ([PR #329](cloudpipe/cloudpickle#329)).

- Fix a bug when a thread imports a module while cloudpickle iterates
  over the module list
  ([PR #322](cloudpipe/cloudpickle#322)).

- Add support for out-of-band pickling (Python 3.8 and later).
  https://docs.python.org/3/library/pickle.html#example
  ([issue #308](cloudpipe/cloudpickle#308))

- Fix a side effect that would redefine `types.ClassTypes` as `type`
  when importing cloudpickle.
  ([issue #337](cloudpipe/cloudpickle#337))

- Fix a bug affecting subclasses of slotted classes.
  ([issue #311](cloudpipe/cloudpickle#311))

- Dont pickle the abc cache of dynamically defined classes for Python 3.6-
  (This was already the case for python3.7+)
  ([issue #302](cloudpipe/cloudpickle#302))

1.2.2
=====

- Revert the change introduced in
  ([issue #276](cloudpipe/cloudpickle#276))
  attempting to pickle functions annotations for Python 3.4 to 3.6. It is not
  possible to pickle complex typing constructs for those versions (see
  [issue #193]( cloudpipe/cloudpickle#193))

- Fix a bug affecting bound classmethod saving on Python 2.
  ([issue #288](cloudpipe/cloudpickle#288))

- Add support for pickling "getset" descriptors
  ([issue #290](cloudpipe/cloudpickle#290))

1.2.1
=====

- Restore (partial) support for Python 3.4 for downstream projects that have
  LTS versions that would benefit from cloudpickle bug fixes.

1.2.0
=====

- Leverage the C-accelerated Pickler new subclassing API (available in Python
  3.8) in cloudpickle. This allows cloudpickle to pickle Python objects up to
  30 times faster.
  ([issue #253](cloudpipe/cloudpickle#253))

- Support pickling of classmethod and staticmethod objects in python2.
  arguments. ([issue #262](cloudpipe/cloudpickle#262))

- Add support to pickle type annotations for Python 3.5 and 3.6 (pickling type
  annotations was already supported for Python 3.7, Python 3.4 might also work
  but is no longer officially supported by cloudpickle)
  ([issue #276](cloudpipe/cloudpickle#276))

- Internal refactoring to proactively detect dynamic functions and classes when
  pickling them.  This refactoring also yields small performance improvements
  when pickling dynamic classes (~10%)
  ([issue #273](cloudpipe/cloudpickle#273))

1.1.1
=====

- Minor release to fix a packaging issue (Markdown formatting of the long
  description rendered on pypi.org). The code itself is the same as 1.1.0.

1.1.0
=====

- Support the pickling of interactively-defined functions with positional-only
  arguments. ([issue #266](cloudpipe/cloudpickle#266))

- Track the provenance of dynamic classes and enums so as to preseve the
  usual `isinstance` relationship between pickled objects and their
  original class defintions.
  ([issue #246](cloudpipe/cloudpickle#246))

1.0.0
=====

- Fix a bug making functions with keyword-only arguments forget the default
  values of these arguments after being pickled.
  ([issue #264](cloudpipe/cloudpickle#264))

0.8.1
=====

- Fix a bug (already present before 0.5.3 and re-introduced in 0.8.0)
  affecting relative import instructions inside depickled functions
  ([issue #254](cloudpipe/cloudpickle#254))

0.8.0
=====

- Add support for pickling interactively defined dataclasses.
  ([issue #245](cloudpipe/cloudpickle#245))

- Global variables referenced by functions pickled by cloudpickle are now
  unpickled in a new and isolated namespace scoped by the CloudPickler
  instance. This restores the (previously untested) behavior of cloudpickle
  prior to changes done in 0.5.4 for functions defined in the `__main__`
  module, and 0.6.0/1 for other dynamic functions.

0.7.0
=====

- Correctly serialize dynamically defined classes that have a `__slots__`
  attribute.
  ([issue #225](cloudpipe/cloudpickle#225))

0.6.1
=====

- Fix regression in 0.6.0 which breaks the pickling of local function defined
  in a module, making it impossible to access builtins.
  ([issue #211](cloudpipe/cloudpickle#211))

0.6.0
=====

- Ensure that unpickling a function defined in a dynamic module several times
  sequentially does not reset the values of global variables.
  ([issue #187](cloudpipe/cloudpickle#205))

- Restrict the ability to pickle annotations to python3.7+ ([issue #193](
  cloudpipe/cloudpickle#193) and [issue #196](
  cloudpipe/cloudpickle#196))

- Stop using the deprecated `imp` module under Python 3.
  ([issue #207](cloudpipe/cloudpickle#207))

- Fixed pickling issue with singleton types `NoneType`, `type(...)` and
  `type(NotImplemented)` ([issue #209](cloudpipe/cloudpickle#209))

0.5.6
=====

- Ensure that unpickling a locally defined function that accesses the global
  variables of a module does not reset the values of the global variables if
  they are already initialized.
  ([issue #187](cloudpipe/cloudpickle#187))

0.5.5
=====

- Fixed inconsistent version in `cloudpickle.__version__`.

0.5.4
=====

- Fixed a pickling issue for ABC in python3.7+ ([issue #180](
  cloudpipe/cloudpickle#180)).

- Fixed a bug when pickling functions in `__main__` that access global
  variables ([issue #187](
  cloudpipe/cloudpickle#187)).

0.5.3
=====
- Fixed a crash in Python 2 when serializing non-hashable instancemethods of built-in
  types ([issue #144](cloudpipe/cloudpickle#144)).

- itertools objects can also pickled
  ([PR #156](cloudpipe/cloudpickle#156)).

- `logging.RootLogger` can be also pickled
  ([PR #160](cloudpipe/cloudpickle#160)).

0.5.2
=====

- Fixed a regression: `AttributeError` when loading pickles that hold a
  reference to a dynamically defined class from the `__main__` module.
  ([issue #131]( cloudpipe/cloudpickle#131)).

- Make it possible to pickle classes and functions defined in faulty
  modules that raise an exception when trying to look-up their attributes
  by name.

0.5.1
=====

- Fixed `cloudpickle.__version__`.

0.5.0
=====

- Use `pickle.HIGHEST_PROTOCOL` by default.

0.4.4
=====

- `logging.RootLogger` can be also pickled
  ([PR #160](cloudpipe/cloudpickle#160)).

0.4.3
=====

- Fixed a regression: `AttributeError` when loading pickles that hold a
  reference to a dynamically defined class from the `__main__` module.
  ([issue #131]( cloudpipe/cloudpickle#131)).

- Fixed a crash in Python 2 when serializing non-hashable instancemethods of built-in
  types. ([issue #144](cloudpipe/cloudpickle#144))

0.4.2
=====

- Restored compatibility with pickles from 0.4.0.
- Handle the `func.__qualname__` attribute.

0.4.1
=====

- Fixed a crash when pickling dynamic classes whose `__dict__` attribute was
  defined as a [`property`](https://docs.python.org/3/library/functions.html#property).
  Most notably, this affected dynamic [namedtuples](https://docs.python.org/2/library/collections.html#namedtuple-factory-function-for-tuples-with-named-fields)
  in Python 2. (cloudpipe/cloudpickle#113)
- Cloudpickle now preserves the `__module__` attribute of functions (cloudpipe/cloudpickle#118).
- Fixed a crash when pickling modules that don't have a `__package__` attribute (cloudpipe/cloudpickle#116).

0.4.0
=====

* Fix functions with empty cells
* Allow pickling Logger objects
* Fix crash when pickling dynamic class cycles
* Ignore "None" mdoules added to sys.modules
* Support WeakSets and ABCMeta instances
* Remove non-standard `__transient__` support
* Catch exception from `pickle.whichmodule()`

0.3.1
=====

* Fix version information and ship a changelog

 0.3.0
=====

* Import submodules accessed by pickled functions
* Support recursive functions inside closures
* Fix `ResourceWarnings` and `DeprecationWarnings`
* Assume modules with `__file__` attribute are not dynamic

0.2.2
=====

* Support Python 3.6
* Support Tornado Coroutines
* Support builtin methods
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci downstream Signal the CI to run the test suite of all registered cloudpickle downstream projects.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make cloudpickle and cloudpickle_fast use the same reducers.
3 participants