Error in lexer matching rules #15

srathbun · 2012-05-08T13:45:06Z

I have a set of rules which both match my input, but instead of returning the first matching rule, PLY returns the broadest rule every time. Adjusting the rule ordering does not seem to help, and I've checked to see if my input string has other characters in it.

    tokens = (
            'FORMFEED','PAGE','ACCOUNTS','ENDSTATEMENT','START','VALIDLINE',
    )

    ##
    ## Regexes for use in tokens
    ##
    ##

    FORMFEED  = r'\f'
    PAGE      = r'\s+STATEMENT PAGE \#: 1\s*'
    ACCOUNTS  = r'=+ S H A R E  A C C O U N T S =+'
    ENDSTATEMENT = r'<\d+>=+ E N D   O F   S T A T E M E N T =+'
    VALIDLINE = r'[\S \t]+'
    START     = r'[\x00]+[ ]+'

    ##
    ## Lexer states
    ##
    states = (
    )

    # Newlines
    def t_NEWLINE(self, t):
        r'\n+'
        t.lexer.lineno += t.value.count("\n")

    @TOKEN(START)
    def t_START(self, t):
        return t

    @TOKEN(PAGE)
    def t_PAGE(self, t):
        return t

    @TOKEN(ACCOUNTS)
    def t_ACCOUNTS(self, t):
        return t

    @TOKEN(ENDSTATEMENT)
    def t_ENDSTATEMENT(self, t):
        return t

    @TOKEN(VALIDLINE)
    def t_VALIDLINE(self, t):
        return t

    @TOKEN(FORMFEED)
    def t_FORMFEED(self, t):
        return t

When I give it the input string:

=============================== S H A R E A C C O U N T S ===============================

I expect to receive an ACCOUNTS token. Instead I get a VALIDLINE token. Putting the lexer into debug mode shows that the master regex is:

I tested that in python and I receive the proper return values. Why is PLY returning the wrong token?

The text was updated successfully, but these errors were encountered:

srathbun · 2012-05-08T20:43:42Z

After additional testing, I've determined that this is caused by the spaces in the regex patterns. PLY apparently operates like flex, in that the regexes must use [ ] or \s for whitespace. This is different from the python re module, which allows whitespace in the string.

That is why my tests of the master regex worked with the re module, but not with PLY. Should this be fixed to follow python, or should a note be added to the documentation?

dabeaz · 2015-04-17T17:40:02Z

I've added a comment in the documentation about this.

dabeaz closed this as completed Apr 17, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in lexer matching rules #15

Error in lexer matching rules #15

srathbun commented May 8, 2012

srathbun commented May 8, 2012

dabeaz commented Apr 17, 2015

Error in lexer matching rules #15

Error in lexer matching rules #15

Comments

srathbun commented May 8, 2012

srathbun commented May 8, 2012

dabeaz commented Apr 17, 2015