New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cPickle.PicklingError: Can't pickle <type 'ellipsis'>: attribute lookup __builtin__.ellipsis failed #30

Closed
Dieterbe opened this Issue May 10, 2011 · 10 comments

Comments

Projects
None yet
2 participants
@Dieterbe
Contributor

Dieterbe commented May 10, 2011

When I have an instance of SparseMatrixSimilarity, and I try to save() it, I get this:

INFO:gensim.utils:saving Similarity object to 846c57f-dirty--CHNK100-EB0-FW1-FW_NA0.5-FW_NB5-M0-NFdata__bom-nerfile-withmediaobjectfragmentids-NUMBEST10-PATRN_INCL-S_L0-S_P0-SQ_K5-SQ_R1-TFIDF0_sim_dense_disk
Traceback (most recent call last):
  File "./build-models.py", line 250, in <module>
    rebuild_data_files(r, args.tag)
  File "./build-models.py", line 118, in rebuild_data_files
    sim.save(sim_filename(tag))
  File "/usr/lib/python2.7/site-packages/gensim/utils.py", line 118, in save
    pickle(self, fname)
  File "/usr/lib/python2.7/site-packages/gensim/utils.py", line 414, in pickle
    cPickle.dump(obj, fout, protocol=protocol)
cPickle.PicklingError: Can't pickle <type 'ellipsis'>: attribute lookup __builtin__.ellipsis failed

Interestingly, I have done this hundreds of times before without issues.
I wonder if it has anything to do with an update to python or numpy, but I don't think so (I did upgrade scipy and python2 a week ago, but reverting didn't fix it)
tested with:

  • python-scipy 0.8.0-4
  • python-scipy 0.9.0-1
  • python2-numpy 1.5.1-2
  • python2 2.7.1-7
  • python2 2.7.1-9

@ghost ghost assigned piskvorky May 10, 2011

@Dieterbe

This comment has been minimized.

Contributor

Dieterbe commented Jun 7, 2011

I'm currently working around this issue by using http://jsonpickle.github.com/

diff --git a/src/gensim/utils.py b/src/gensim/utils.py
index 817f3b7..3d797a9 100644
--- a/src/gensim/utils.py
+++ b/src/gensim/utils.py
@@ -1,4 +1,4 @@
 #!/usr/bin/env python
 # -*- coding: utf-8 -*-
 #
 # Copyright (C) 2010 Radim Rehurek <radimrehurek@seznam.cz>
@@ -13,6 +13,7 @@ from __future__ import with_statement
 import logging
 import re
 import unicodedata
+import jsonpickle
 import cPickle
 import itertools
 from functools import wraps # for `synchronous` function lock
@@ -421,16 +422,24 @@ def chunkize(corpus, chunks, maxsize=0):
         for chunk in chunkize_serial(corpus, chunks):
             yield chunk


 def pickle(obj, fname, protocol=-1):
     """Pickle object `obj` to file `fname`."""
-    with open(fname, 'wb') as fout: # 'b' for binary, needed on Windows
-        cPickle.dump(obj, fout, protocol=protocol)
+    with open(fname, 'w') as fout:
+       fout.write(jsonpickle.encode(obj))


 def unpickle(fname):
     """Load pickled object from `fname`"""
-    return cPickle.load(open(fname, 'rb'))
+    with open(fname, 'r') as fin:
+       return jsonpickle.decode(fin.read())

@piskvorky

This comment has been minimized.

Member

piskvorky commented Jun 7, 2011

Can jsonpickle handle very large objects (reasonably memory efficient during save/load)? Dedan had another issue with cPickle, see #31 , so perhaps completely switching from pickle to json would solve both at the same time...

@Dieterbe

This comment has been minimized.

Contributor

Dieterbe commented Jun 16, 2011

Radim, your question triggered this little experiment:
http://dieter.plaetinck.be/poor_mans_pickle_implementations_benchmark.html
I shall check out your numpy-based approach, it is probably better than my jsonpickle approach.

@piskvorky

This comment has been minimized.

Member

piskvorky commented Jun 17, 2011

Nice! I like benchmarks :)

How about the standard json package? (simplejson in python <2.6)

@Dieterbe

This comment has been minimized.

Contributor

Dieterbe commented Jun 17, 2011

What do you mean? what about it?
the jsonpickle page says "The standard Python libraries for encoding Python into JSON, such as the stdlib’s json, simplejson, and demjson, can only handle Python primitives that have a direct JSON equivalent (e.g. dicts, lists, strings, ints, etc.). jsonpickle builds on top of these libraries"

http://jsonpickle.github.com/

@piskvorky

This comment has been minimized.

Member

piskvorky commented Jun 17, 2011

Oh, I didn't know it builds on json. In that case its performance is prolly nearly identical, no need to test.

Btw I remembered reading about json speed on metaoptimize some time ago, I managed to googled it up: http://metaoptimize.com/blog/2009/03/22/fast-deserialization-in-python/

@Dieterbe

This comment has been minimized.

Contributor

Dieterbe commented Jun 18, 2011

Well, the article explicitly discourages using it because it's buggy and unmaintained.

@piskvorky

This comment has been minimized.

Member

piskvorky commented Jun 18, 2011

?? It's part of the standard python library. You probably mean cjson.

@Dieterbe

This comment has been minimized.

Contributor

Dieterbe commented Jun 18, 2011

Yes, I meant cjson. Anyway I don't feel the need to test more things right now, as the numpy native persistency thing you did is probably best anyway. Or am I missing something?

@piskvorky

This comment has been minimized.

Member

piskvorky commented Jun 18, 2011

For numpy arrays, I think you're right :) Numpy is also very actively developed/maintained, so there's a good chance potential bugs will be fixed quickly. The core numpy guys are very good engineers.

@piskvorky piskvorky closed this Sep 18, 2011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment