Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ways to define tokens seems to have different effects. #43

Closed
petermlm opened this issue Dec 21, 2013 · 1 comment
Closed

Ways to define tokens seems to have different effects. #43

petermlm opened this issue Dec 21, 2013 · 1 comment

Comments

@petermlm
Copy link

Suppose you have the following tokens defined:

tokens = (
    'THIS',
    'ANYTHING'
)

t_THIS = r'this'
t_ANYTHING = r'[a-z]+'

Any string is tokenized has an "anything" token. At first I thought this was what should be expected, but then I tried doing the following.

tokens = (
    'THIS',
    'ANYTHING'
)

def t_THIS(t):
    r'this'
    return t

def t_ANYTHING(t):
    r'[a-z]+'
    return t

Like this, any "this" string gets tokenized as a "this" token because that definition comes first, while anything else gets tokenized as "anything".

If I switch the two rules around, then nothing gets tokenized as a "this" token again, which seems to make sense and also is what happens when I do the same thing with flex.

Is this what is supposed to be expected and I am missing something, or is it a bug? Even if it not really a bug, should it be working like this?

@helpermethod
Copy link

This works as expected and is covered in the documentation

(i.e., functions are matched in order of specification whereas strings are sorted by regular expression length).

This is why in your first example this is matched by t_ANYTHING (its regular is expression is longer than that of t_THIS.

In the second example t_THIS(t) comes before t_ANYTHING(t), so matches first.

@dabeaz dabeaz closed this as completed Apr 16, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants