Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"global flags not at the start of the expression" error when using in-line global regex flag with Python 11+ #282

Closed
BobbyRBruce opened this issue Jan 23, 2023 · 1 comment

Comments

@BobbyRBruce
Copy link

We have this in our code:

    def t_STRLIT(self, t):
        r'(?m)"([^"])*"'
        # strip off quotes
        t.value = t.value[1:-1]
        t.lexer.lineno += t.value.count('\n')
        return t

This code works fine for Python versions <11. With Python 11, however, we get the following error:

Invalid regular expression for rule 't_STRLIT'. global flags not at the start of the expression at position 13
SyntaxError: Can't build lexer

As of Python 11 you cannot put an in-line global regex flag anywhere but the start of an expression (e.g., (?m) in this case must be at the start of the regex expression). We require (?m) in this expression so MULTILINE is enabled for it. The issue manifests in this line in lex.py's validate_rules function:

c = re.compile('(?P<%s>%s)' % (fname, _get_regex(f)), self.reflags)

As can be seen, the appending of the regex string after ?P<%s> is problematic (the (?m) is now not at the start of the expression).

I cannot find a good solution to this problem. I'm aware I can set the regex flags for lex (via the reflags parameter), but it is my understanding these are apply to all expressions. I require the flags to be set on a per-expression basis (i.e., t_STRLIT should have MULTILINE enabled, but no others).

@dabeaz
Copy link
Owner

dabeaz commented Jan 24, 2023

I don't have any advice on fixing this problem other than a change to the regex itself.

mkjost0 pushed a commit to mkjost0/gem5-actions that referenced this issue Jun 15, 2023
In python3.11, it requires the global specifier should be the first
token of regex. However it's not possible when using ply library. In
fastmodel case, we actually don't need to support multiline string
literal. We fix this issue by just making the string literal single
line.

Ref: dabeaz/ply#282

Change-Id: I746b628db7ad4c1d7834f1a1b2c1243cef68aa01
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71018
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
mkjost0 pushed a commit to mkjost0/gem5-actions that referenced this issue Jun 15, 2023
In python3.11, it requires the global specifier should be the first
token of regex. However it's not possible when using ply library.
Instead, we set the rules are multiline regex by default and modifies
those single line rules.

Ref: dabeaz/ply#282

Change-Id: I7bdbfeb97a9dd74f45c1890a76f8cc16100e5a42
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71019
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
linedot pushed a commit to linedot/gem5 that referenced this issue Jul 15, 2023
In python3.11, it requires the global specifier should be the first
token of regex. However it's not possible when using ply library. In
fastmodel case, we actually don't need to support multiline string
literal. We fix this issue by just making the string literal single
line.

Ref: dabeaz/ply#282

Change-Id: I746b628db7ad4c1d7834f1a1b2c1243cef68aa01
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71018
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
linedot pushed a commit to linedot/gem5 that referenced this issue Jul 15, 2023
In python3.11, it requires the global specifier should be the first
token of regex. However it's not possible when using ply library.
Instead, we set the rules are multiline regex by default and modifies
those single line rules.

Ref: dabeaz/ply#282

Change-Id: I7bdbfeb97a9dd74f45c1890a76f8cc16100e5a42
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71019
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
BobbyRBruce pushed a commit to BobbyRBruce/gem5 that referenced this issue Jul 19, 2023
In python3.11, it requires the global specifier should be the first
token of regex. However it's not possible when using ply library. In
fastmodel case, we actually don't need to support multiline string
literal. We fix this issue by just making the string literal single
line.

Ref: dabeaz/ply#282

Change-Id: I746b628db7ad4c1d7834f1a1b2c1243cef68aa01
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71018
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Bobby Bruce <bbruce@ucdavis.edu>
Reviewed-by: Bobby Bruce <bbruce@ucdavis.edu>
BobbyRBruce pushed a commit to BobbyRBruce/gem5 that referenced this issue Jul 19, 2023
In python3.11, it requires the global specifier should be the first
token of regex. However it's not possible when using ply library.
Instead, we set the rules are multiline regex by default and modifies
those single line rules.

Ref: dabeaz/ply#282

Change-Id: I7bdbfeb97a9dd74f45c1890a76f8cc16100e5a42
Reviewed-on: https://gem5-review.googlesource.com/c/public/gem5/+/71019
Reviewed-by: Richard Cooper <richard.cooper@arm.com>
Tested-by: kokoro <noreply+kokoro@google.com>
Maintainer: Jason Lowe-Power <power.jg@gmail.com>
Reviewed-by: Jason Lowe-Power <power.jg@gmail.com>
phoe added a commit to phoe-trash/ply that referenced this issue Sep 5, 2023
Fixes dabeaz#282

It should be possible to rewrite individual regular expressions near
the place where they're verified and compiled into one master
expression in a way that collects all global flags near the start of
the final expression - in particular, before the `(?P<%s>%s)`
mentioned in dabeaz#282. This should work, since multiple global flag groups
like `(?i)(?m)` are valid in Python.
@dabeaz dabeaz closed this as completed Mar 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants