-
Notifications
You must be signed in to change notification settings - Fork 243
Remove PT_ANY support from matchers #605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| case ESC_P: | ||
| ptr++; | ||
| if (meta_arg == ESC_p && *ptr == PT_ANY) | ||
| if (meta_arg == ESC_p && (*ptr >> 16) == PT_ANY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bug in master.
| # | ||
| AllAny | ||
| # | ||
| [c] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not caught by the [c] -> c optimization, but I think it is not worth to make it more complex. It should be used as a test though, to detect if this behavior changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems fair. I also deliberately skipped those optimisations too for things like (?[ [ab] - [b] ]) because I felt it was just too obscure.
NWilson
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I am happy with using empty OP_CLASS (with zero bits). If we care about the wasted bytes, we can always add an "OP_NONE" in just a few lines of code - but there's no real need.
src/pcre2_compile.c
Outdated
| /* The special case of \p{Any} is compiled to OP_ALLANY so as to benefit | ||
| from the auto-anchoring code. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could update the comment for \P{Any} → OP_CLASS too...
| # | ||
| AllAny | ||
| # | ||
| [c] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems fair. I also deliberately skipped those optimisations too for things like (?[ [ab] - [b] ]) because I felt it was just too obscure.
|
Patch is updated, and added more tests. Introducing an OP_NONE is not a trivial task, since it needs specialized code paths in the normal/dfa/jit matchers. I don't see any reasonable purpose to use empty classes, so I think they are very rarely used and not worth any specializations in the matchers. The |
Advantage: code size reduction / code simplification.
Disadvantage:
\P{Any}is converted to[]which uses more space. I think\P{Any}is very rarely used.(*F)is better and shorter. If empty brackets are enabled[]is even shorter.Note: we have already convert many empty constructs to
[], from now\P{Any}just follows them (consistency).Note2: since
\P{Any}uses less space, all[]could be converted to\p{Any}, but it requires unicode support. My opinion is that empty classes do not worth much optimizations.