-
Notifications
You must be signed in to change notification settings - Fork 237
Description
The largest chunk of detailed code inside pcre2demo.c involves explaining how to iterate over matches. This is non-trivial. The caller needs to try again at the end of the previous match location (in case \G does the bumpalong), and also try again (with "not-empty-at-start"), and if that doesn't match, then do a manual bump-along to search starting from the next character (but also skip \r\n as one sequence, if appropriate).
This logic is complex, and it's repeated in pcre2_substitute.
It may even be wrong, because the \G handling differs from Perl. In Perl, \G always means "end of previous match", but when there's a manual bump-along with repeated matches, PCRE2 gives it the behaviour of "start of current match process". We should at least offer an API to set a field on the match context, so that callers can specify a location for \G which is distinct from the start-offset used to begin the search for a match.
Let's offer a new API, "pcre2_match_iterator_create" which returns an object that handles this state. Then users can just call "pcre2_match_iterator_next" to go through all the matches. We'd use it ourselves in pcre2_substitute, but we'd also be able to make the demo code simpler and shorter.