Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pickling fails with a pure Python Trie that reach some depths and hits the recursion limit #28

Closed
pombredanne opened this issue Jun 22, 2016 · 4 comments
Assignees

Comments

@pombredanne
Copy link
Collaborator

This is a well known issue for pickle: recursive data structure (such as nested dict as used in py/pyahocorasick.py) do not pickle when you reach a certain depth.

First I comment out the slots line in pyahocorasick.py (such as I do not have to implement the __*state__ methods):
#__slots__ = ['char', 'output', 'fail', 'children']

Then I run this:

$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyahocorasick as pa
>>> for x in range(10):
...  key = str(range(x, x+100))
... 
>>> t=pa.Trie()
>>> for x in range(10):
...  key = str(range(x, x+100))
...  t.add_word(key, x)
... 
>>> import pickle
>>> p=pickle.dumps(t)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/pickle.py", line 1374, in dumps
    Pickler(file, protocol).dump(obj)
[...]
  File "/usr/lib/python2.7/pickle.py", line 271, in save
    pid = self.persistent_id(obj)
RuntimeError: maximum recursion depth exceeded

The obvious solution is to increase the recursion limit, but this fails quickly and can exhaust system resources:

>>> import sys
>>> sys.setrecursionlimit(10000)
>>> p=pickle.dumps(t)

Add a few more and it dies too:

>>> for x in range(100):
...  key = str(range(x, x+1000))
...  t.add_word(key, x)
... 
>>> p=pickle.dumps(t)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/pickle.py", line 1374, in dumps
    Pickler(file, protocol).dump(obj)
[...]
  File "/usr/lib/python2.7/pickle.py", line 271, in save
    pid = self.persistent_id(obj)
RuntimeError: maximum recursion depth exceeded

The culprit is that the Trie uses nested dictionaries (as opposed to the C Automaton that uses an array-based Trie structure. One way out would be to have a similar data structure in Python such that pickling works.

@pombredanne
Copy link
Collaborator Author

Now why would I want to use this rather than the C automaton? as a go between to run a slower automaton on Windows for now that does not compile yet there.

@WojciechMula
Copy link
Owner

@pombredanne it's a request for Windows compilation :)

@pombredanne
Copy link
Collaborator Author

@WojciechMula you wrote:

it's a request for Windows compilation

True! But for the sake of completeness I wanted to see if I could use the Python version as a fallback!

@pombredanne
Copy link
Collaborator Author

Since this is really no longer an issue and that we have a windows build with #11 and the pure Python implementation is not meant to be a fool proof full feature implementation, I am closing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants