-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RegEx: add a way to get the positions of groups #42307
Comments
We are deliberately JavaScript RegExp compatible because Dart is compiled to JavaScript, and the JavaScript RegExp doesn't provide that information. Adding a feature which won't work when compiled to JavaScript requires a very compelling argument. I don't see that happening any time soon. For your use case, you can find the start/end easily because it's (Also, your RegExp can be abbreviated to |
That use case was just an example, in reality I will be getting a lot of regexes from a file with a predefined syntax so I can't just modify them to work differently. Here's a JS library that provides this functionality by extending the RegEx class: http://www--s0-v1.becke.ch/tool/becke-ch--regex--s0-v1/becke-ch--regex--s0-0-v1--homepage--pl--client/ Something like that could be written for Dart too, but I'd imagine it is significantly slower than just getting the information directly from the Regex engine. But perhaps you could use something like that to make it work in dart2js. I think this is a very noticeable omission from the dart stdlib as this functionality is in nearly every modern language apart from JS: C#, Python, Ruby, Java, Rust, Go, Kotlin... in fact I can't find another language besides JS that omits this. Several of those languages also compile to JS, taking varying approaches of either omitting the functionality for the JS target or implementing their own regex engines which support it (usually in WASM). |
This omission is stupid. Even ChatGpt can do this better:
|
This is what I end up doing:
|
JavaScript RegExps now allow access to the capture group indices when using the We can add a similar I'd prefer to refactor |
Now that the core blocker has been removed (JS support), will this work definitely be done? Does it have a priority in the backlog? |
@lrhn do you have a concrete idea of how you would prefer API to look like?
If somebody does it. We welcome patches as well - this is a fairly straightforward thing to implement because internally regexp engine is representing groups by their start and end indices anyway. So all the necessary work is around changing Dart code to expose this information. The biggest question is the API design - but I am sure @lrhn can provide a sketch. |
To just add this feature, I'd add a way to get a There are several ways to do that. The very-small increment approach would be just adding to int groupStart(int groupNumber);
int groupEnd(int groupNumber); That is incredibly simple, and requires no extra allccation, but it's also not a great API. (int start, int end) groupLimits(groupNumber); is not much better, but does introduce an allocation, so it's like the worst of both worlds. I'd prefer to not add yet-another partial way to access a capture. Consider something like interface class MatchSlice {
final String source
final int start;
final int end;
String? _slice;
MatchSlice(this.source, this.start, this.end);
String get match => slice ??= source.substring(start, end);
}
// We can make `Match` implement `MatchSlice`, which will give it the `.match` which is the non-nullable
// version of `[0]` that we're sorely lacking. That is breaking, though, so maybe skip it initially.
abstract interface class Match implements MatchSlice {
///
String get match => this[0]!;
}
// Then add to `RegExpMatch`
final class RegExpMatch implements Match {
// ...
/// The capture groups of this match.
///
/// An unmodifiable list of slices for each capture group of this
/// regular expression which participated in the match.
///
/// The list has length [groupCount] + 1, and has an entry for each
/// capture group of the regular expression, plus an entry for the
/// entire match, treated as capture group zero.
/// The entry for a capture is `null` if the capture did not participate in the
/// entire match.
/// The entry at index zero is always this `RegExpMatch`. The remaning
/// entries are not `RegExpMatch`es, just plain `MatchSlice` objects.
List<MatchSlice> get captures;
/// ... same for named capture groups ...
Map<String, MatchSlice> get namedCaptures; That does mean accessing these captures will involve an allocation per match accessed (can be cached if accessed more than once), and maybe an allocation for Longer-term, I'd like to remove |
When executing a RegExp, groups are returned as a map of Strings. However, for some use cases it is really important to get the index of where these groups are in the input string, rather than just the text.
For example, I have the following code:
This correctly prints 'capture'. However, there is no way to know (at least, in my use case where I am being fed regexps from an external file) whether this group is referring to 'capture' in the source string at span 2-10 or 20-28. In this case the answer is 20-28 and that is what I would like to be able to retrieve.
In other languages:
In python you would call
match.span(1)
which would return a tuple of the start and end position. In Dart this could be replaced with an object.In JavaScript this is not supported, meaning this feature would probably not be supported in dart2js.
The text was updated successfully, but these errors were encountered: