Skip to content

Conversation

@zherczeg
Copy link
Collaborator

@zherczeg zherczeg commented Dec 9, 2024

Advantage: code size reduction / code simplification.
Disadvantage: \P{Any} is converted to [] which uses more space. I think \P{Any} is very rarely used. (*F) is better and shorter. If empty brackets are enabled [] is even shorter.
Note: we have already convert many empty constructs to [], from now \P{Any} just follows them (consistency).
Note2: since \P{Any} uses less space, all [] could be converted to \p{Any}, but it requires unicode support. My opinion is that empty classes do not worth much optimizations.

case ESC_P:
ptr++;
if (meta_arg == ESC_p && *ptr == PT_ANY)
if (meta_arg == ESC_p && (*ptr >> 16) == PT_ANY)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug in master.

#
AllAny
#
[c]
Copy link
Collaborator Author

@zherczeg zherczeg Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not caught by the [c] -> c optimization, but I think it is not worth to make it more complex. It should be used as a test though, to detect if this behavior changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems fair. I also deliberately skipped those optimisations too for things like (?[ [ab] - [b] ]) because I felt it was just too obscure.

Copy link
Member

@NWilson NWilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I am happy with using empty OP_CLASS (with zero bits). If we care about the wasted bytes, we can always add an "OP_NONE" in just a few lines of code - but there's no real need.

Comment on lines 8055 to 8056
/* The special case of \p{Any} is compiled to OP_ALLANY so as to benefit
from the auto-anchoring code. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could update the comment for \P{Any} → OP_CLASS too...

#
AllAny
#
[c]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems fair. I also deliberately skipped those optimisations too for things like (?[ [ab] - [b] ]) because I felt it was just too obscure.

@zherczeg
Copy link
Collaborator Author

Patch is updated, and added more tests.

Introducing an OP_NONE is not a trivial task, since it needs specialized code paths in the normal/dfa/jit matchers. I don't see any reasonable purpose to use empty classes, so I think they are very rarely used and not worth any specializations in the matchers. The (*F) is always a better option.

@zherczeg zherczeg merged commit 0d2c59d into PCRE2Project:master Dec 10, 2024
21 checks passed
@zherczeg zherczeg deleted the remove_any branch December 10, 2024 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants