You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a set of rules which both match my input, but instead of returning the first matching rule, PLY returns the broadest rule every time. Adjusting the rule ordering does not seem to help, and I've checked to see if my input string has other characters in it.
tokens = (
'FORMFEED','PAGE','ACCOUNTS','ENDSTATEMENT','START','VALIDLINE',
)
##
## Regexes for use in tokens
##
##
FORMFEED = r'\f'
PAGE = r'\s+STATEMENT PAGE \#: 1\s*'
ACCOUNTS = r'=+ S H A R E A C C O U N T S =+'
ENDSTATEMENT = r'<\d+>=+ E N D O F S T A T E M E N T =+'
VALIDLINE = r'[\S \t]+'
START = r'[\x00]+[ ]+'
##
## Lexer states
##
states = (
)
# Newlines
def t_NEWLINE(self, t):
r'\n+'
t.lexer.lineno += t.value.count("\n")
@TOKEN(START)
def t_START(self, t):
return t
@TOKEN(PAGE)
def t_PAGE(self, t):
return t
@TOKEN(ACCOUNTS)
def t_ACCOUNTS(self, t):
return t
@TOKEN(ENDSTATEMENT)
def t_ENDSTATEMENT(self, t):
return t
@TOKEN(VALIDLINE)
def t_VALIDLINE(self, t):
return t
@TOKEN(FORMFEED)
def t_FORMFEED(self, t):
return t
When I give it the input string:
=============================== S H A R E A C C O U N T S ===============================
I expect to receive an ACCOUNTS token. Instead I get a VALIDLINE token. Putting the lexer into debug mode shows that the master regex is:
'(?P<t_NEWLINE>\\n+)|(?P<t_START>[\\x00]+[ ]+)|(?P<t_PAGE>\\s+STATEMENT PAGE \\#: 1\\s*)|(?P<t_ACCOUNTS>=+ S H A R E A C C O U N T S =+)|(?P<t_ENDSTATEMENT><\\d+>=+ E N D O F S T A T E M E N T =+)|(?P<t_VALIDLINE>.+)|(?P<t_FORMFEED>\\f)'
I tested that in python and I receive the proper return values. Why is PLY returning the wrong token?
The text was updated successfully, but these errors were encountered:
After additional testing, I've determined that this is caused by the spaces in the regex patterns. PLY apparently operates like flex, in that the regexes must use [ ] or \s for whitespace. This is different from the python re module, which allows whitespace in the string.
That is why my tests of the master regex worked with the re module, but not with PLY. Should this be fixed to follow python, or should a note be added to the documentation?
I have a set of rules which both match my input, but instead of returning the first matching rule, PLY returns the broadest rule every time. Adjusting the rule ordering does not seem to help, and I've checked to see if my input string has other characters in it.
When I give it the input string:
=============================== S H A R E A C C O U N T S ===============================
I expect to receive an ACCOUNTS token. Instead I get a VALIDLINE token. Putting the lexer into debug mode shows that the master regex is:
'(?P<t_NEWLINE>\\n+)|(?P<t_START>[\\x00]+[ ]+)|(?P<t_PAGE>\\s+STATEMENT PAGE \\#: 1\\s*)|(?P<t_ACCOUNTS>=+ S H A R E A C C O U N T S =+)|(?P<t_ENDSTATEMENT><\\d+>=+ E N D O F S T A T E M E N T =+)|(?P<t_VALIDLINE>.+)|(?P<t_FORMFEED>\\f)'
I tested that in python and I receive the proper return values. Why is PLY returning the wrong token?
The text was updated successfully, but these errors were encountered: