Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3.4 Recursion Depth Error #71

Closed
notconfusing opened this issue May 9, 2014 · 7 comments
Closed

Python 3.4 Recursion Depth Error #71

notconfusing opened this issue May 9, 2014 · 7 comments
Assignees

Comments

@notconfusing
Copy link

I am using mediawiki utilities xml dump parser (python3 only). https://pypi.python.org/pypi/mediawiki-utilities/0.2.1

When I try to parse any pages, am getting

  File "/usr/local/lib/python3.4/dist-packages/mwparserfromhell/string_mixin.py", line 112, in __getattr__
    return getattr(self.__unicode__(), attr)
  File "/usr/local/lib/python3.4/dist-packages/mwparserfromhell/nodes/text.py", line 38, in __unicode__
    return self.value
  File "/usr/local/lib/python3.4/dist-packages/mwparserfromhell/nodes/text.py", line 49, in value
    return self._value
RuntimeError: maximum recursion depth exceeded while calling a Python object
@earwig
Copy link
Owner

earwig commented May 9, 2014

Can you give a specific page and mwparser function call that cause the error? I can't reproduce it.

@earwig earwig self-assigned this May 9, 2014
@earwig earwig added this to the version 0.4 milestone May 9, 2014
@notconfusing
Copy link
Author

I will try to reproduce it. Unfortunately the error is happening within a
multiprocessing map-reduce job, that does not fully re-raise errors. So I
must fix that before being able to report further.

Max Klein
http://notconfusing.com/

On Fri, May 9, 2014 at 10:04 AM, Ben Kurtovic notifications@github.comwrote:

Can you give a specific page and mwparser function call that cause the
error? I can't reproduce it.


Reply to this email directly or view it on GitHubhttps://github.com//issues/71#issuecomment-42689401
.

@earwig
Copy link
Owner

earwig commented Jan 9, 2015

@notconfusing any news on this one, or should I close it?

@notconfusing
Copy link
Author

@earwig I haven't produced anything new on this, so if you think it's "aged" out, then feel free to close it.

BTW, have you experimented with running mwparserfromhell on "stackless python" which purports not have these issues?

@earwig
Copy link
Owner

earwig commented Jan 12, 2015

@notconfusing No, I haven't tried Stackless Python, but my understanding was that it's solving a different issue – microthreading, rather than having an infinitely long call stack? Either way, it doesn't seem like a proper solution (since the parser shouldn't be recursing that deeply in the first place...) but I am a little curious. Will look into it more.

@earwig earwig closed this as completed Jan 12, 2015
@earwig earwig removed this from the version 0.4 milestone Jan 15, 2015
@borzunov
Copy link

borzunov commented Mar 1, 2017

I've helped @nkruglikov today who stubled upon a similar problem. He used multiprocessing.imap_unordered to iterate through texts, parse them with mwparserfromhell.parse in parallel, and do other transformations. He got the following traceback:

Traceback (most recent call last):
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 429, in _handle_results
    task = get()
  File "/usr/lib/python3.5/multiprocessing/connection.py", line 253, in recv
    return ForkingPickler.loads(buf.getbuffer())
  File "/usr/local/lib/python3.5/dist-packages/mwparserfromhell/string_mixin.py", line 111, in __getattr__
    return getattr(self.__unicode__(), attr)
  File "/usr/local/lib/python3.5/dist-packages/mwparserfromhell/nodes/text.py", line 38, in __unicode__
    return self.value
  File "/usr/local/lib/python3.5/dist-packages/mwparserfromhell/nodes/text.py", line 49, in value
    return self._value

  [...]
  
  File "/usr/local/lib/python3.5/dist-packages/mwparserfromhell/string_mixin.py", line 111, in __getattr__
    return getattr(self.__unicode__(), attr)
  File "/usr/local/lib/python3.5/dist-packages/mwparserfromhell/nodes/text.py", line 38, in __unicode__
    return self.value
  File "/usr/local/lib/python3.5/dist-packages/mwparserfromhell/nodes/text.py", line 49, in value
    return self._value
RecursionError: maximum recursion depth exceeded

Eventually, it turned out that there were incorrect texts, and parse raised ValueError on them. This exception contained some information about source texts (in particular, StringMixin subclasses).

Since processes don't share common address space, Python uses pickle to send information between them. This includes sending of occured exceptions, but StringMixins inside the ValueErrors failed to be deserialized with pickle due to a bug with infinite calls in __getattr__. This led to the very unclear traceback above.

One way to fix this problem in the library can be to make subclasses of StringMixin pickle-serializable. If this isn't possible, I think this post would be useful to people who stumbled upon the same problem anyway. The possible solution for them is to handle exceptions from mwparserfromhell inside the child process and return some serializable value indicating the error.

@earwig
Copy link
Owner

earwig commented Mar 4, 2017

Thanks for the bug report. I think I've fixed it in 6ffdfa5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants