Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unbounded recursion in LexerStream.next() #52

Closed
jwilk opened this issue Feb 24, 2016 · 3 comments
Closed

unbounded recursion in LexerStream.next() #52

jwilk opened this issue Feb 24, 2016 · 3 comments

Comments

@jwilk
Copy link
Contributor

jwilk commented Feb 24, 2016

This test program

from rply import LexerGenerator
lg = LexerGenerator()
lg.ignore(r'\s')
for token in lg.build().lex(' ' * 1000):
    pass

makes the Python interpreter sad:

Traceback (most recent call last):
  File "test.py", line 4, in <module>
    for token in lg.build().lex(' ' * 1000):
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 56, in __next__
    return self.next()
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 41, in next
    return self.next()
...
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 41, in next
    return self.next()
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 38, in next
    match = rule.matches(self.s, self.idx)
  File "/usr/lib/python3/dist-packages/rply/lexergenerator.py", line 33, in matches
    return Match(*m.span(0)) if m is not None else None
RecursionError: maximum recursion depth exceeded
@alex
Copy link
Owner

alex commented Feb 24, 2016

I'm not sure I understand why this causes unbound recursion :-(

@jwilk
Copy link
Contributor Author

jwilk commented Feb 24, 2016

The relevant code is:

def next(self):
    if self.idx >= len(self.s):
        raise StopIteration
    for rule in self.lexer.ignore_rules:
        match = rule.matches(self.s, self.idx)
        if match:
            self._update_pos(match)
            return self.next()
    ...

CPython doesn't do tail recursion elimination, so if there's N consecutive ignorable tokens in the input, N stack frames will be consumed.
If N is big enough (1000 in my reproducer), you get a recursion error.
To fix this, use a while loop instead of recursion:

def next(self):
    while True:
        if self.idx >= len(self.s):
            raise StopIteration
        for rule in self.lexer.ignore_rules:
            match = rule.matches(self.s, self.idx)
            if match:
                self._update_pos(match)
                break
        else:
            break
    ...

(untested, sorry!)

@alex
Copy link
Owner

alex commented Feb 24, 2016

Ugh, right, I forgot how I implemented ignores. Will try to fix this
tonight.

On Wed, Feb 24, 2016 at 9:39 AM, Jakub Wilk notifications@github.com
wrote:

The relevant code is:

def next(self):
if self.idx >= len(self.s):
raise StopIteration
for rule in self.lexer.ignore_rules:
match = rule.matches(self.s, self.idx)
if match:
self._update_pos(match)
return self.next()
...

CPython doesn't do tail recursion elimination, so if there's N consecutive
ignorable tokens in the input, N stack frames will be consumed.
If N is big enough (1000 in my reproducer), you get a recursion error.
To fix this, use a while loop instead of recursion:

def next(self):
while True:
if self.idx >= len(self.s):
raise StopIteration
for rule in self.lexer.ignore_rules:
match = rule.matches(self.s, self.idx)
if match:
self._update_pos(match)
break
else:
break
...

(untested, sorry!)


Reply to this email directly or view it on GitHub
#52 (comment).

"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084

@alex alex closed this as completed in f395747 Feb 25, 2016
alex added a commit that referenced this issue Feb 25, 2016
Fixed #52 -- don't die with a recursion error if lots of tokens are s…
refi64 added a commit to refi64/hy that referenced this issue Jul 12, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants