-
Notifications
You must be signed in to change notification settings - Fork 235
Description
This does not match:
PCRE2 version 10.42 2022-12-11
re> /(?=.*(.))(?=.*\1.*\1).*(?!\1)(.).*\2/
data> 22331
No match
This matches:
re> /(?=.*(.).*\1)(?=.*\1.*\1).*(?!\1)(.).*\2/
data> 22331
0: 22
1: 3
2: 2
The only difference is an additional constraint which forces the initial lookahead not to pick 1
.
Basically I think PCRE doesn't understand that the initial lookahead is a dependency of subsequent behaviour, and hence the matcher must exhaust its possible matches before declaring a failure. Interestingly, changing the matcher does seem to support backtracking to some extent, as for example the following works:
PCRE2 version 10.42 2022-12-11
re> /(?=.*(.))(?=(.)).*\2(?!.*\1$)/
data> 23321
0: 1
1: 1
2: 1
Weirdly, moving the \2
to the end, which shouldn't affect the match (again, this should in principle actually increase the options available to the matcher as it allows the .*
in the negative lookahead to start with the \2
while in the matching variant, it couldn't) causes this to no longer match.
I assume I'm hitting some kind of backtracking safety mechanism, but if so it'd be preferable for the engine to just refuse to compile such monstrosities, or at least error at runtime when the constraint is hit. Better that than execute them incorrectly.
Also tested on latest master (10.43-DEV 2023-04-14
). Same results.