Skip to content

Add a match-iterator API #685

@NWilson

Description

@NWilson

The largest chunk of detailed code inside pcre2demo.c involves explaining how to iterate over matches. This is non-trivial. The caller needs to try again at the end of the previous match location (in case \G does the bumpalong), and also try again (with "not-empty-at-start"), and if that doesn't match, then do a manual bump-along to search starting from the next character (but also skip \r\n as one sequence, if appropriate).

This logic is complex, and it's repeated in pcre2_substitute.

It may even be wrong, because the \G handling differs from Perl. In Perl, \G always means "end of previous match", but when there's a manual bump-along with repeated matches, PCRE2 gives it the behaviour of "start of current match process". We should at least offer an API to set a field on the match context, so that callers can specify a location for \G which is distinct from the start-offset used to begin the search for a match.

Let's offer a new API, "pcre2_match_iterator_create" which returns an object that handles this state. Then users can just call "pcre2_match_iterator_next" to go through all the matches. We'd use it ourselves in pcre2_substitute, but we'd also be able to make the demo code simpler and shorter.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions