New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickle compound models #3867

Merged
merged 2 commits into from Jul 1, 2015

Conversation

Projects
None yet
3 participants
@embray
Member

embray commented Jun 26, 2015

@embray It can be useful to be able to serialize compound models. In my case, I'm using ipython parallel and would like to send them through the network. I'm looking for a way to get the individual models of a compound_model and pickle them and then concatenate them on the other side.

@embray embray added the modeling label Jun 23, 2015

@embray

This comment has been minimized.

Show comment
Hide comment
@embray

embray Jun 23, 2015

Member

Are they not already pickle-able?

Member

embray commented Jun 23, 2015

Are they not already pickle-able?

@wkerzendorf

This comment has been minimized.

Show comment
Hide comment
@wkerzendorf

wkerzendorf Jun 23, 2015

Member

unfortunately not:

In [1]: %paste
>>> from astropy.modeling import models
>>> g1 = models.Gaussian1D(1, 0, 0.2)
>>> g2 = models.Gaussian1D(2.5, 0.5, 0.1)
>>> g1_plus_2 = g1 + g2

## -- End pasted text --

In [2]: import cPickle as pickle

In [3]: test = pickle.dump
pickle.dump   pickle.dumps  

In [3]: test = pickle.dumps(g1_plus_2)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-15d07ab5c550> in <module>()
----> 1 test = pickle.dumps(g1_plus_2)

/Users/wkerzend/anaconda3/envs/nuclear/lib/python2.7/site-packages/astropy/modeling/core.pyc in __repr__(cls)
   1448             return cls._format_cls_repr()
   1449 
-> 1450         expression = cls._format_expression()
   1451         components = '\n\n'.join('[{0}]: {1!r}'.format(idx, m)
   1452                                  for idx, m in enumerate(cls._get_submodels()))

/Users/wkerzend/anaconda3/envs/nuclear/lib/python2.7/site-packages/astropy/modeling/core.pyc in _format_expression(cls)
   1858         # TODO: At some point might be useful to make a public version of this,
   1859         # albeit with more formatting options
-> 1860         return cls._tree.format_expression(OPERATOR_PRECEDENCE)
   1861 
   1862     def _normalize_index(cls, index):

/Users/wkerzend/anaconda3/envs/nuclear/lib/python2.7/site-packages/astropy/modeling/utils.pyc in format_expression(self, operator_precedence, format_leaf)
    182             format_leaf = lambda i, l: '[{0}]'.format(i)
    183 
--> 184         for node in self.traverse_postorder():
    185             if node.isleaf:
    186                 operands.append(format_leaf(leaf_idx, node))

/Users/wkerzend/anaconda3/envs/nuclear/lib/python2.7/site-packages/astropy/modeling/utils.pyc in traverse_postorder(self)
     62 
     63     def traverse_postorder(self):
---> 64         stack = deque([self])
     65         last = None
     66         while stack:

AttributeError: 'module' object has no attribute 'CompoundModel0'
Member

wkerzendorf commented Jun 23, 2015

unfortunately not:

In [1]: %paste
>>> from astropy.modeling import models
>>> g1 = models.Gaussian1D(1, 0, 0.2)
>>> g2 = models.Gaussian1D(2.5, 0.5, 0.1)
>>> g1_plus_2 = g1 + g2

## -- End pasted text --

In [2]: import cPickle as pickle

In [3]: test = pickle.dump
pickle.dump   pickle.dumps  

In [3]: test = pickle.dumps(g1_plus_2)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-15d07ab5c550> in <module>()
----> 1 test = pickle.dumps(g1_plus_2)

/Users/wkerzend/anaconda3/envs/nuclear/lib/python2.7/site-packages/astropy/modeling/core.pyc in __repr__(cls)
   1448             return cls._format_cls_repr()
   1449 
-> 1450         expression = cls._format_expression()
   1451         components = '\n\n'.join('[{0}]: {1!r}'.format(idx, m)
   1452                                  for idx, m in enumerate(cls._get_submodels()))

/Users/wkerzend/anaconda3/envs/nuclear/lib/python2.7/site-packages/astropy/modeling/core.pyc in _format_expression(cls)
   1858         # TODO: At some point might be useful to make a public version of this,
   1859         # albeit with more formatting options
-> 1860         return cls._tree.format_expression(OPERATOR_PRECEDENCE)
   1861 
   1862     def _normalize_index(cls, index):

/Users/wkerzend/anaconda3/envs/nuclear/lib/python2.7/site-packages/astropy/modeling/utils.pyc in format_expression(self, operator_precedence, format_leaf)
    182             format_leaf = lambda i, l: '[{0}]'.format(i)
    183 
--> 184         for node in self.traverse_postorder():
    185             if node.isleaf:
    186                 operands.append(format_leaf(leaf_idx, node))

/Users/wkerzend/anaconda3/envs/nuclear/lib/python2.7/site-packages/astropy/modeling/utils.pyc in traverse_postorder(self)
     62 
     63     def traverse_postorder(self):
---> 64         stack = deque([self])
     65         last = None
     66         while stack:

AttributeError: 'module' object has no attribute 'CompoundModel0'
@embray

This comment has been minimized.

Show comment
Hide comment
@embray

embray Jun 23, 2015

Member

Weird. That should probably be possible to resolve.

Member

embray commented Jun 23, 2015

Weird. That should probably be possible to resolve.

@embray

This comment has been minimized.

Show comment
Hide comment
@embray

embray Jun 24, 2015

Member

Okay, I understand what's going on here, and I can see what motivated you to try to reconstruct the compound model. Very interesting--since the names of compound model classes are just dynamically generated they don't appear in the module they claim to belong to, so pickling them fails (since apparently for classes the default is to just pickle the name and module of the class, and then re-import it when loaded on the other side). This won't quite do for compound models, but I think we can manage something. Will have to use the copy_reg module, so far as I can tell, to override the default behavior for _CompoundModelMeta classes.

Member

embray commented Jun 24, 2015

Okay, I understand what's going on here, and I can see what motivated you to try to reconstruct the compound model. Very interesting--since the names of compound model classes are just dynamically generated they don't appear in the module they claim to belong to, so pickling them fails (since apparently for classes the default is to just pickle the name and module of the class, and then re-import it when loaded on the other side). This won't quite do for compound models, but I think we can manage something. Will have to use the copy_reg module, so far as I can tell, to override the default behavior for _CompoundModelMeta classes.

@embray embray added this to the v1.0.4 milestone Jun 24, 2015

@mdboom

This comment has been minimized.

Show comment
Hide comment
@mdboom

mdboom Jun 24, 2015

Contributor

I find it's generally better, if you control the classes, which we do, to use __reduce__ (which despite the docs is useful for more than just extension types) than to use copy_reg which relies on global state. astropy.units uses it to make sure pickled units remain singletons, for example.

Contributor

mdboom commented Jun 24, 2015

I find it's generally better, if you control the classes, which we do, to use __reduce__ (which despite the docs is useful for more than just extension types) than to use copy_reg which relies on global state. astropy.units uses it to make sure pickled units remain singletons, for example.

@wkerzendorf

This comment has been minimized.

Show comment
Hide comment
@wkerzendorf

wkerzendorf Jun 24, 2015

Member

so what's with __getstate__ and __setstate__?

Member

wkerzendorf commented Jun 24, 2015

so what's with __getstate__ and __setstate__?

@mdboom

This comment has been minimized.

Show comment
Hide comment
@mdboom

mdboom Jun 24, 2015

Contributor

__getstate__ and __setstate__ don't allow one to override how the the instance is created from a class. You still need the path to the class to resolve to something static in the source code in order to find the __setstate__ method. Here we have dynamically created classes, so you can use __reduce__ in conjunction with a free function that knows how to "reconstitute" a compound model class.

Contributor

mdboom commented Jun 24, 2015

__getstate__ and __setstate__ don't allow one to override how the the instance is created from a class. You still need the path to the class to resolve to something static in the source code in order to find the __setstate__ method. Here we have dynamically created classes, so you can use __reduce__ in conjunction with a free function that knows how to "reconstitute" a compound model class.

@embray

This comment has been minimized.

Show comment
Hide comment
@embray

embray Jun 24, 2015

Member

@mdboom Yes, for _ModelMeta this will have to use a __reduce__ equivalent to return the class's (name, bases, members) tuple, then use the metaclass to reconstruct the class later. Unfortunately my experiments indicate that __reduce__ doesn't work for type subclasses because the pickle module goes immediately to its dispatch table where it finds an already registered reduce function for type, which is the one that relies on importing classes from modules. So I have to use copy_reg to register a new entry in the dispatch table. Let me know if you know any other way around that.

Member

embray commented Jun 24, 2015

@mdboom Yes, for _ModelMeta this will have to use a __reduce__ equivalent to return the class's (name, bases, members) tuple, then use the metaclass to reconstruct the class later. Unfortunately my experiments indicate that __reduce__ doesn't work for type subclasses because the pickle module goes immediately to its dispatch table where it finds an already registered reduce function for type, which is the one that relies on importing classes from modules. So I have to use copy_reg to register a new entry in the dispatch table. Let me know if you know any other way around that.

@embray

This comment has been minimized.

Show comment
Hide comment
@embray

embray Jun 24, 2015

Member

More correctly, looking at the code for the pickle module, it seems that if it does not find an object's type in the dispatch table (which for a Model class it wouldn't since its type is either _ModelMeta or _CompoundModelMeta) it will automatically pickle it as a module global. It doesn't even check if the metaclass has a __reduce__. I feel like this is a bug, but there's a comment in there "Check for a class with a custom metaclass; treat as regular class", suggesting that it is for some reason intentional. I'll look into that. In the meantime the best workaround seems to be to use copy_reg (though I might go ahead and define a __reduce__ method anyways and just copy_reg.pickle(_ModelMeta, _ModelMeta.__reduce__) or something.

Member

embray commented Jun 24, 2015

More correctly, looking at the code for the pickle module, it seems that if it does not find an object's type in the dispatch table (which for a Model class it wouldn't since its type is either _ModelMeta or _CompoundModelMeta) it will automatically pickle it as a module global. It doesn't even check if the metaclass has a __reduce__. I feel like this is a bug, but there's a comment in there "Check for a class with a custom metaclass; treat as regular class", suggesting that it is for some reason intentional. I'll look into that. In the meantime the best workaround seems to be to use copy_reg (though I might go ahead and define a __reduce__ method anyways and just copy_reg.pickle(_ModelMeta, _ModelMeta.__reduce__) or something.

@mdboom

This comment has been minimized.

Show comment
Hide comment
@mdboom

mdboom Jun 24, 2015

Contributor

Can't you just add a __reduce__ method to the class rather than the metaclass (presumably at the point where the metaclass builds the class)? That does seem to work:

from astropy.modeling import projections
import pickle

def _reconstitute(*args):
    print(args)

def _reduce(self):
    print("Reduce called", self, type(self))
    return (_reconstitute, (1, 2, 3))

x = projections.Pix2Sky_CAR() + projections.Pix2Sky_CAR()

x.__class__.__reduce__ = _reduce
y = pickle.dumps(x)
pickle.loads(y)
Contributor

mdboom commented Jun 24, 2015

Can't you just add a __reduce__ method to the class rather than the metaclass (presumably at the point where the metaclass builds the class)? That does seem to work:

from astropy.modeling import projections
import pickle

def _reconstitute(*args):
    print(args)

def _reduce(self):
    print("Reduce called", self, type(self))
    return (_reconstitute, (1, 2, 3))

x = projections.Pix2Sky_CAR() + projections.Pix2Sky_CAR()

x.__class__.__reduce__ = _reduce
y = pickle.dumps(x)
pickle.loads(y)
@embray

This comment has been minimized.

Show comment
Hide comment
@embray

embray Jun 24, 2015

Member

Could work--I'll give that a try. Though I think it may still require both. Remember it is possible to make a new compound model class with an expression like Pix2Sky_CAR + projections.Pix2Sky_CAR, so the custom class needs to be pickleable too.

Member

embray commented Jun 24, 2015

Could work--I'll give that a try. Though I think it may still require both. Remember it is possible to make a new compound model class with an expression like Pix2Sky_CAR + projections.Pix2Sky_CAR, so the custom class needs to be pickleable too.

@embray

This comment has been minimized.

Show comment
Hide comment
@embray

embray Jun 24, 2015

Member

I should add, just registering a reduce function for the metaclass itself is not quite good enough either, because then any class of type _ModelMeta, including ones that are not dynamically generated (i.e. any of the built-in models) will use the custom reduce. This only needs to happen for classes that are dynamically generated (there is another example of this in the case of the _ModelMeta.rename method). I have a way of doing that that's already buried elsewhere in the metaclass code (specifically the bits that auto-generate the __init__ method), which I can repurpose here.

Member

embray commented Jun 24, 2015

I should add, just registering a reduce function for the metaclass itself is not quite good enough either, because then any class of type _ModelMeta, including ones that are not dynamically generated (i.e. any of the built-in models) will use the custom reduce. This only needs to happen for classes that are dynamically generated (there is another example of this in the case of the _ModelMeta.rename method). I have a way of doing that that's already buried elsewhere in the metaclass code (specifically the bits that auto-generate the __init__ method), which I can repurpose here.

Improves pickling of compound models, including several tests. Should f…
…ix #3867 though I'm interested to see more testing in 'the wild' before declaring this truly fixed.

Originally I hoped that the _is_concrete property would be enough, which is why this change splits it off as a reusable property.  But it turned out that what I really needed for pickling purposes was a way to know if a model class was created dynamically or not.  There are some heuristics I could have used for this (for example, does the class truly exist in the __dict__ of the module it claims to belong to?).  But I think the most robust approach, albeit a bit hacky, was to have the explicit _is_dynamic flag.  So this is something that people creating model classes dynamically will want to be aware of, though I think that is a tiny class of people.  That said, I will add mention of this in the docs I guess.  It will also be interesting to see if this can be applied to renamed models or models created by custom_model, but for now this just focused on compound models.
@embray

This comment has been minimized.

Show comment
Hide comment
@embray

embray Jun 26, 2015

Member

@wkerzendorf I've gone ahead and upgraded this issue to a PR, with a proposed fix.
I think this should do it, but I haven't tried it out yet in the context of multiprocessing that you mentioned. It'd be good if you could give it a try, or at least provide me some sample code for how you're trying to do this with IPython. The basic pickle example you gave above works now.

Member

embray commented Jun 26, 2015

@wkerzendorf I've gone ahead and upgraded this issue to a PR, with a proposed fix.
I think this should do it, but I haven't tried it out yet in the context of multiprocessing that you mentioned. It'd be good if you could give it a try, or at least provide me some sample code for how you're trying to do this with IPython. The basic pickle example you gave above works now.

@embray

This comment has been minimized.

Show comment
Hide comment
@embray

embray Jun 26, 2015

Member

Looks like Python 2.6 didn't like it. I don't think this would be an issue of different pickle protocols, but it may be a Python 2.6-specific bug in pickle. Need to investigate further...

Member

embray commented Jun 26, 2015

Looks like Python 2.6 didn't like it. I don't think this would be an issue of different pickle protocols, but it may be a Python 2.6-specific bug in pickle. Need to investigate further...

@embray

This comment has been minimized.

Show comment
Hide comment
@embray

embray Jun 26, 2015

Member

Yup, here's where the issue was raised: http://bugs.python.org/issue7689 This is broken before Python 2.7.3. I'll see if I can find a workaround.

Member

embray commented Jun 26, 2015

Yup, here's where the issue was raised: http://bugs.python.org/issue7689 This is broken before Python 2.7.3. I'll see if I can find a workaround.

@embray

This comment has been minimized.

Show comment
Hide comment
@embray

embray Jun 26, 2015

Member

Well, it looks like this would be very difficult to work around, actually. For the Python pickle module it's a simple enough monkey-patch. For cPickle, however, (which is widely used) this would be very non-trivial and probably not worth the effort outside a large demand.

I will disable these tests for Python < 2.7.3 and add a known issue. I can also add something in the code to raise an exception at least when pickling instances of affected objects.

Member

embray commented Jun 26, 2015

Well, it looks like this would be very difficult to work around, actually. For the Python pickle module it's a simple enough monkey-patch. For cPickle, however, (which is widely used) this would be very non-trivial and probably not worth the effort outside a large demand.

I will disable these tests for Python < 2.7.3 and add a known issue. I can also add something in the code to raise an exception at least when pickling instances of affected objects.

@embray embray added the Python2.6 label Jun 26, 2015

@wkerzendorf

This comment has been minimized.

Show comment
Hide comment
@wkerzendorf

wkerzendorf Jun 26, 2015

Member

That's great - I'll try it out as soon as possible.
On Fri, Jun 26, 2015 at 23:01 Erik Bray notifications@github.com wrote:

Well, it looks like this would be very difficult to work around, actually.
For the Python pickle module it's a simple enough monkey-patch. For
cPickle, however, (which is widely used) this would be very non-trivial and
probably not worth the effort outside a large demand.

I will disable these tests for Python < 2.7.3 and add a known issue. I can
also add something in the code to raise an exception at least when pickling
instances of affected objects.


Reply to this email directly or view it on GitHub
#3867 (comment).

Member

wkerzendorf commented Jun 26, 2015

That's great - I'll try it out as soon as possible.
On Fri, Jun 26, 2015 at 23:01 Erik Bray notifications@github.com wrote:

Well, it looks like this would be very difficult to work around, actually.
For the Python pickle module it's a simple enough monkey-patch. For
cPickle, however, (which is widely used) this would be very non-trivial and
probably not worth the effort outside a large demand.

I will disable these tests for Python < 2.7.3 and add a known issue. I can
also add something in the code to raise an exception at least when pickling
instances of affected objects.


Reply to this email directly or view it on GitHub
#3867 (comment).

Adds a known issue to the docs concerning pickling compound models on…
… older Python versions with the bug http://bugs.python.org/issue7689.  I think there are a few possible workarounds involving manipulation of the module __dict__ and other messing around with the __reduce__ for compound models.  But before going to those lengths let's see if anyone is actually affected.
@wkerzendorf

This comment has been minimized.

Show comment
Hide comment
@wkerzendorf

wkerzendorf Jul 1, 2015

Member

@embray @mdboom works like a charm!! I'm happy for this to merge

Member

wkerzendorf commented Jul 1, 2015

@embray @mdboom works like a charm!! I'm happy for this to merge

@embray

This comment has been minimized.

Show comment
Hide comment
@embray

embray Jul 1, 2015

Member

@wkerzendorf Thanks for the confirmation!

Member

embray commented Jul 1, 2015

@wkerzendorf Thanks for the confirmation!

embray added a commit that referenced this pull request Jul 1, 2015

@embray embray merged commit 3f6de97 into astropy:master Jul 1, 2015

3 checks passed

continuous-integration/appveyor AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
coverage/coveralls Coverage increased (+0.01%) to 77.33%
Details

@embray embray deleted the embray:modeling/compound-pickle branch Jul 1, 2015

embray added a commit that referenced this pull request Aug 5, 2015

embray added a commit that referenced this pull request Aug 7, 2015

dhomeier added a commit to dhomeier/astropy that referenced this pull request Aug 11, 2015

Improves pickling of compound models, including several tests. Should f…
…ix astropy#3867 though I'm interested to see more testing in 'the wild' before declaring this truly fixed.

Originally I hoped that the _is_concrete property would be enough, which is why this change splits it off as a reusable property.  But it turned out that what I really needed for pickling purposes was a way to know if a model class was created dynamically or not.  There are some heuristics I could have used for this (for example, does the class truly exist in the __dict__ of the module it claims to belong to?).  But I think the most robust approach, albeit a bit hacky, was to have the explicit _is_dynamic flag.  So this is something that people creating model classes dynamically will want to be aware of, though I think that is a tiny class of people.  That said, I will add mention of this in the docs I guess.  It will also be interesting to see if this can be applied to renamed models or models created by custom_model, but for now this just focused on compound models.

embray added a commit that referenced this pull request Aug 11, 2015

embray added a commit that referenced this pull request Aug 11, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment