Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix crash when pickling dynamic class cycles. #102

Merged
merged 1 commit into from
Jun 28, 2017

Conversation

ssanderson
Copy link

@ssanderson ssanderson commented Jun 17, 2017

Fixes a bug where we would fail to pickle a class created inside a
function if that class participated in a cycle with its own __dict__.

Such cycles occur, for example, when a class defines a method that makes
a Python 2-style super call, because we have a cycle from class -> __dict__ -> function -> __closure__ -> class.

The fix for this is to use the same technique we use to
dynamically-created functions: we first pickle an empty "skeleton
class", which we memoize before pickling the rest of the class'
__dict__. We then invoke a reduce function that re-attaches the class'
attributes from the __dict__.

Fixes #99.

@codecov-io
Copy link

codecov-io commented Jun 17, 2017

Codecov Report

Merging #102 into master will increase coverage by 0.58%.
The diff coverage is 88%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #102      +/-   ##
==========================================
+ Coverage   80.62%   81.21%   +0.58%     
==========================================
  Files           2        2              
  Lines         542      559      +17     
  Branches      111      112       +1     
==========================================
+ Hits          437      454      +17     
  Misses         75       75              
  Partials       30       30
Impacted Files Coverage Δ
cloudpickle/cloudpickle.py 81.11% <88%> (+0.59%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c89dc9d...f8fffa5. Read the comment docs.

# Pickle and unpickle the class.
UnpickledDerived = pickle_depickle(Derived)
self.assertEqual(UnpickledDerived().method(), 2)
self.assertEqual(UnpickledDerived.__doc__, "Derived Docstring")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you comment that __doc__ is special on pypy?

# instructions to "rehydrate" the skeleton class by restoring the
# attributes from the __dict__.
#
# A type can appear in a cycle with it's __dict__ if an instance of the
Copy link
Contributor

@llllllllll llllllllll Jun 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

your using the wrong its

@llllllllll
Copy link
Contributor

+1 from me

Fixes a bug where we would fail to pickle a class created inside a
function if that class participated in a cycle with its own __dict__.

Such cycles occur, for example, when a class defines a method that makes
a Python 2-style super call, because we have a cycle from class ->
__dict__ -> function -> __closure__ -> class.

The fix for this is to use the same technique we use to
dynamically-created functions: we first pickle an empty "skeleton
class", which we memoize before pickling the rest of the class'
__dict__. We then invoke a reduce function that re-attaches the class'
attributes from the __dict__.
@rgbkrk rgbkrk merged commit aec80d2 into cloudpipe:master Jun 28, 2017
@rgbkrk
Copy link
Member

rgbkrk commented Jun 28, 2017

Thank you!

@ssanderson ssanderson mentioned this pull request Jul 2, 2017
ghost pushed a commit to dbtsai/spark that referenced this pull request Aug 22, 2017
## What changes were proposed in this pull request?

Based on apache#18282 by rgbkrk this PR attempts to update to the current released cloudpickle and minimize the difference between Spark cloudpickle and "stock" cloud pickle with the goal of eventually using the stock cloud pickle.

Some notable changes:
* Import submodules accessed by pickled functions (cloudpipe/cloudpickle#80)
* Support recursive functions inside closures (cloudpipe/cloudpickle#89, cloudpipe/cloudpickle#90)
* Fix ResourceWarnings and DeprecationWarnings (cloudpipe/cloudpickle#88)
* Assume modules with __file__ attribute are not dynamic (cloudpipe/cloudpickle#85)
* Make cloudpickle Python 3.6 compatible (cloudpipe/cloudpickle#72)
* Allow pickling of builtin methods (cloudpipe/cloudpickle#57)
* Add ability to pickle dynamically created modules (cloudpipe/cloudpickle#52)
* Support method descriptor (cloudpipe/cloudpickle#46)
* No more pickling of closed files, was broken on Python 3 (cloudpipe/cloudpickle#32)
* ** Remove non-standard __transient__check (cloudpipe/cloudpickle#110)** -- while we don't use this internally, and have no tests or documentation for its use, downstream code may use __transient__, although it has never been part of the API, if we merge this we should include a note about this in the release notes.
* Support for pickling loggers (yay!) (cloudpipe/cloudpickle#96)
* BUG: Fix crash when pickling dynamic class cycles. (cloudpipe/cloudpickle#102)

## How was this patch tested?

Existing PySpark unit tests + the unit tests from the cloudpickle project on their own.

Author: Holden Karau <holden@us.ibm.com>
Author: Kyle Kelley <rgbkrk@gmail.com>

Closes apache#18734 from holdenk/holden-rgbkrk-cloudpickle-upgrades.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to serialize function that instantiates a subclass
4 participants