Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RegExp.firstMatch should have a start argument #26815

Open
Hixie opened this issue Jul 1, 2016 · 10 comments
Open

RegExp.firstMatch should have a start argument #26815

Hixie opened this issue Jul 1, 2016 · 10 comments
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. core-2 library-core type-enhancement A request for a change that isn't a bug

Comments

@Hixie
Copy link
Contributor

Hixie commented Jul 1, 2016

I want to try a bunch of patterns from one start index, then try them all again from the end position of the first match, if there was a match, until I get to the point where there are no matches.

The only way to do this right now is to use allMatches with a start index and then see if the result has any values and if so return the first one. This is rather awkward.

It would be cleaner if I could just use "firstMatch" with a start index. (For completeness, hasMatch and matchString also should have a start argument.)

Note that "matchAsPrefix" has a start index argument, but that doesn't help because it implies an anchored match at the start of the pattern. This would only work if the patterns all started with .*?, which they don't.

@lrhn
Copy link
Member

lrhn commented Jul 2, 2016

All methods taking a string should also take start and end parameters to allow the function to work on a substring. It seems to me that that's the way they are all turning out after a while anyway.

Alternatively we should make it cheap to create a substring, so you wouldn't worry about doing it and passing the substring to the function.

I think the latter will give us better APIs in the long run.

@lrhn
Copy link
Member

lrhn commented Jul 2, 2016

Are you sure you don't want to use matchAsPrefix repeatedly at increasing start positions?

If you try your patterns until you find a match, and the matching is allowed to search, you won't necessarily find the earliest match in the string first.

E.g., if you search for the patterns "a" and "bb" in the string "cbbbbacbbba" by trying the patterns in order until you find a match, then you will find two as and no bbs.
If you use matchAsPrefix with both patterns first at first position 0, then position 1, etc., until you find a match, then you will find the earliest match in the string.

@Hixie
Copy link
Contributor Author

Hixie commented Jul 2, 2016

I ended up doing something quite different (I use allMatches for all the patterns and then verify there are no overlapping matches). I agree that the algorithm I initially described is bogus. I still think it's weird that you can't say "find the first match for this pattern after this point", though. :-)

@lrhn
Copy link
Member

lrhn commented Jul 2, 2016

You can use use allMatches(pattern, start).first. The implementation of allMatches should be lazy so it only finds the first match. Less convenient, as you write, but doable.

@Hixie
Copy link
Contributor Author

Hixie commented Jul 4, 2016

Sure but then why do we have firstMatch at all?

@lrhn
Copy link
Member

lrhn commented Jul 4, 2016

Good point. We only have it on RegExp, not on Pattern (Pattern is a small interface because String has to implement it too), so it should be possible to add a start position.
Or, even better, add a Match firstMatch(Pattern p, [int start]) on String (which is actually easier because nobody else is implementing the String type, so we won't break them by changing the interface).

@floitschG floitschG added area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. library-core labels Jul 4, 2016
@lrhn lrhn added the type-enhancement A request for a change that isn't a bug label Aug 25, 2016
@floitschG floitschG added core-2 and removed core-m labels Aug 21, 2017
@Hixie
Copy link
Contributor Author

Hixie commented Oct 20, 2017

We should move these APIs from using an index to indicate the start to using an iterator or whatever solution we come up with for #28404.

@jonpittock
Copy link

Just bumped into this. The suggestions dont account for empty matches. So the code ends up looking like:

    final Iterable<RegExpMatch> iterable = pattern.allMatches(source, start);
    final RegExpMatch? match = iterable.isEmpty ? null : iterable.first;

Which is pretty gnarly.

@lrhn
Copy link
Member

lrhn commented May 1, 2024

As usual, adding an extra parameter to the RegExp interface is a breaking change, in case someone implements the interface.

Can use something like:

extension PatternFirstMatch on Pattern {
  Match? firstMatch(String source, [int start = 0]) =>
      pattern.allMatches(source, start).firstOrNull;
}

This adds a firstMatch to all patterns, except RegExp which already has one, but you can call it on the RegExp anyway by upcasting to Pattern, (re as Pattern).firstMatch(string, start).

(Would still be nice to update the RegExp method, it's just so much harder because it's breaking. I'm really sorry we couldn't make RegExp a final class, it should have been one.)

@jonpittock
Copy link

Oh no apologies required! Just useful to share fully fledged solutions for future travellers! I had not thought of using firstOrNull either so helpful thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. core-2 library-core type-enhancement A request for a change that isn't a bug
Projects
None yet
Development

No branches or pull requests

5 participants