Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pickle protocol #229

Closed
SimplyKnownAsG opened this issue Feb 27, 2018 · 7 comments
Closed

pickle protocol #229

SimplyKnownAsG opened this issue Feb 27, 2018 · 7 comments

Comments

@SimplyKnownAsG
Copy link

It seems as though it would be helpful to modify the pickle.dump(s) methods to use protocol=HIGHEST_PROTOCOL, as this would be consistent with Python3 and should be functional in Python2.6+ (stated functionality of six).

The default Python2 protocol is 0, which require __getstate__ and __setstate__ methods; however Python 3 uses protocol=HIGEST_PROTCOL (equivalent to 4 since Python 3.4).

Manipulating the dump(s) methods to use the highest protocol would allow someone to write the most "slim" code that will work in both Python2 and Python3.

@benjaminp
Copy link
Owner

I'm not sure what concretely you're suggesting. Currently, six's only interaction with pickle is providing a mapping between cPickle on Python 2 and pickle on Python 3. What beyond that specifically do you suggest?

@SimplyKnownAsG
Copy link
Author

SimplyKnownAsG commented Jul 9, 2018

If the objective of six.py is to make Py2 code compliant with Py3, then it would make sense to override the protocol to use the highest. In Python3, if you have a class that uses __slots__ you can use pickle.dumps(instance) and it works. In Python2, you have to specify the protocol: pickle.dump(instance, protocol=2).

something like this might work?

diff --git a/six.py b/six.py
index 8d9ac41..b1bce8b 100644
--- a/six.py
+++ b/six.py
@@ -102,8 +102,9 @@ class _LazyDescr(object):
 
 class MovedModule(_LazyDescr):
 
-    def __init__(self, name, old, new=None):
+    def __init__(self, name, old, new=None, decorators=None):
         super(MovedModule, self).__init__(name)
+        self.decorators = decorators or {}
         if PY3:
             if new is None:
                 new = name
@@ -117,6 +118,11 @@ class MovedModule(_LazyDescr):
     def __getattr__(self, attr):
         _module = self._resolve()
         value = getattr(_module, attr)
+        decorator = self.decorators.get(attr)
+
+        if decorator:
+            value = decorator(value)
+
         setattr(self, attr, value)
         return value
 
@@ -232,6 +238,20 @@ class _MovedItems(_LazyModule):
     __path__ = []  # mark as package
 
 
+def _pickle_dump_decorator(func):
+    def wrapper(obj, file, protocol=None, **kwargs):
+        if protocol is None and PY2:
+            protocol = 2
+        return func(obj, file, protocol=None, **kwargs)
+    return wrapper
+
+def _pickle_dumps_decorator(func):
+    def wrapper(obj, protocol=None, **kwargs):
+        if protocol is None and PY2:
+            protocol = 2
+        return func(obj, protocol=None, **kwargs)
+    return wrapper
+
 _moved_attributes = [
     MovedAttribute("cStringIO", "cStringIO", "io", "StringIO"),
     MovedAttribute("filter", "itertools", "builtins", "ifilter", "filter"),
@@ -271,7 +291,8 @@ _moved_attributes = [
     MovedModule("BaseHTTPServer", "BaseHTTPServer", "http.server"),
     MovedModule("CGIHTTPServer", "CGIHTTPServer", "http.server"),
     MovedModule("SimpleHTTPServer", "SimpleHTTPServer", "http.server"),
-    MovedModule("cPickle", "cPickle", "pickle"),
+    MovedModule("cPickle", "cPickle", "pickle", decorators={'dumps': _pickle_dumps_decorator,
+                                                            'dump': _pickle_dump_decorator}),
     MovedModule("queue", "Queue"),
     MovedModule("reprlib", "repr"),
     MovedModule("socketserver", "SocketServer"),

@benjaminp
Copy link
Owner

Why can't you use pickle.HIGHEST_PROTOCOL?

@SimplyKnownAsG
Copy link
Author

Do you mean in the diff above? I suppose you could, but you'd have to import it somewhere, and the lazy initialization wouldn't let you do that in a very easily.

I'm guessing you didn't mean in the diff.

I wouldn't use pickle.HIGHEST_PROTOCOL because that is the default in Python3 (with some caveats). When Python2 goes by the wayside in 2020, specifying the protocol will no longer be necessary (at least for __slots__). If six automatically uses the highest protocol, then less code will need to be migrated. Likewise, __getstate__ and __setstate__ methods for Python2 may not be necessary in Python3 because of the default protocol change.

@benjaminp
Copy link
Owner

I'm guessing you didn't mean in the diff.

Correct.

I wouldn't use pickle.HIGHEST_PROTOCOL because that is the default in Python3 (with some caveats). When Python2 goes by the wayside in 2020, specifying the protocol will no longer be necessary (at least for slots). If six automatically uses the highest protocol, then less code will need to be migrated. Likewise, getstate and setstate methods for Python2 may not be necessary in Python3 because of the default protocol change.

I don't think writing pickle.HIGHEST_PROTOCOL is particularly odious. It's explicit and does what you want on every Python version.

We can't make a change like you propose above because it would change the API of six in a backwards-incompatible way.

@SimplyKnownAsG
Copy link
Author

Thanks for hearing me out and responding.

At this point, it doesn't really impact me much since I've already adjusted my code. It was a pain to figure out, and something that I thought would have "just worked" with six.

I understand that you are closing it, but not sure I understand the "backwards incompatible way." the various pickle protocols should dump/load between versions of Python ... except if you don't explicitly state one that is on both Py2 and Py3. Arguably, the current implementation is not backwards compatible because pickle is neither backwards nor forwards compatible

demonstration of breaking backwards compatibility

$ python3 -c 'from six.moves.cPickle import dump; dump([1,2,3], open("py3.pkl", "wb"))'
$ python2 -c 'from six.moves.cPickle import load; print(load(open("py3.pkl", "rb")))'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ValueError: unsupported pickle protocol: 3

demonstration of breaking forward compatibility

$ python2 -c 'from six.moves.cPickle import dumps; Slotted = type("Slotted", (object,), {"__slots__": ("slot",)}); print(dumps(Slotted()))'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib64/python2.7/copy_reg.py", line 77, in _reduce_ex
    raise TypeError("a class that defines __slots__ without "
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled
$ python3 -c 'from six.moves.cPickle import dumps; Slotted = type("Slotted", (object,), {"__slots__": ("slot",)}); print(dumps(Slotted()))'
b'\x80\x03c__main__\nSlotted\nq\x00)\x81q\x01.'

Thanks for making six.py!

@benjaminp
Copy link
Owner

Whether it's good or not, the current behavior of six.moves.cPickle.load/dump is to use settings from the running version of Python. If that changed, it would break the backwards compatibility of six.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants