unbounded recursion in LexerStream.next() #52

jwilk · 2016-02-24T13:24:52Z

This test program

from rply import LexerGenerator
lg = LexerGenerator()
lg.ignore(r'\s')
for token in lg.build().lex(' ' * 1000):
    pass

makes the Python interpreter sad:

Traceback (most recent call last):
  File "test.py", line 4, in <module>
    for token in lg.build().lex(' ' * 1000):
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 56, in __next__
    return self.next()
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 41, in next
    return self.next()
...
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 41, in next
    return self.next()
  File "/usr/lib/python3/dist-packages/rply/lexer.py", line 38, in next
    match = rule.matches(self.s, self.idx)
  File "/usr/lib/python3/dist-packages/rply/lexergenerator.py", line 33, in matches
    return Match(*m.span(0)) if m is not None else None
RecursionError: maximum recursion depth exceeded

The text was updated successfully, but these errors were encountered:

alex · 2016-02-24T13:40:06Z

I'm not sure I understand why this causes unbound recursion :-(

jwilk · 2016-02-24T14:39:43Z

The relevant code is:

def next(self):
    if self.idx >= len(self.s):
        raise StopIteration
    for rule in self.lexer.ignore_rules:
        match = rule.matches(self.s, self.idx)
        if match:
            self._update_pos(match)
            return self.next()
    ...

CPython doesn't do tail recursion elimination, so if there's N consecutive ignorable tokens in the input, N stack frames will be consumed.
If N is big enough (1000 in my reproducer), you get a recursion error.
To fix this, use a while loop instead of recursion:

def next(self):
    while True:
        if self.idx >= len(self.s):
            raise StopIteration
        for rule in self.lexer.ignore_rules:
            match = rule.matches(self.s, self.idx)
            if match:
                self._update_pos(match)
                break
        else:
            break
    ...

(untested, sorry!)

alex · 2016-02-24T14:41:03Z

Ugh, right, I forgot how I implemented ignores. Will try to fix this
tonight.

On Wed, Feb 24, 2016 at 9:39 AM, Jakub Wilk notifications@github.com
wrote:

The relevant code is:

def next(self):
if self.idx >= len(self.s):
raise StopIteration
for rule in self.lexer.ignore_rules:
match = rule.matches(self.s, self.idx)
if match:
self._update_pos(match)
return self.next()
...

CPython doesn't do tail recursion elimination, so if there's N consecutive
ignorable tokens in the input, N stack frames will be consumed.
If N is big enough (1000 in my reproducer), you get a recursion error.
To fix this, use a while loop instead of recursion:

def next(self):
while True:
if self.idx >= len(self.s):
raise StopIteration
for rule in self.lexer.ignore_rules:
match = rule.matches(self.s, self.idx)
if match:
self._update_pos(match)
break
else:
break
...

(untested, sorry!)

—
Reply to this email directly or view it on GitHub
#52 (comment).

"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084

Fixed #52 -- don't die with a recursion error if lots of tokens are s…

Closes hylang#1313. Ref. alex/rply#52, alex/rply#71.

alex closed this as completed in f395747 Feb 25, 2016

alex added a commit that referenced this issue Feb 25, 2016

Merge pull request #53 from alex/ignore-recursion

9aa990f

Fixed #52 -- don't die with a recursion error if lots of tokens are s…

Kodiologist mentioned this issue Jul 12, 2017

The lexer hits the maximum recursion depth given a file with too many comment lines hylang/hy#1313

Closed

refi64 added a commit to refi64/hy that referenced this issue Jul 12, 2017

Update rply to 0.7.5

292f445

Closes hylang#1313. Ref. alex/rply#52, alex/rply#71.

refi64 mentioned this issue Jul 12, 2017

Update rply to 0.7.5 hylang/hy#1322

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unbounded recursion in LexerStream.next() #52

unbounded recursion in LexerStream.next() #52

jwilk commented Feb 24, 2016

alex commented Feb 24, 2016

jwilk commented Feb 24, 2016

alex commented Feb 24, 2016

unbounded recursion in LexerStream.next() #52

unbounded recursion in LexerStream.next() #52

Comments

jwilk commented Feb 24, 2016

alex commented Feb 24, 2016

jwilk commented Feb 24, 2016

alex commented Feb 24, 2016