Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Add System.Text.RegularExpressions.ValueMatch.GroupName #110383

Closed
alrz opened this issue Dec 4, 2024 · 10 comments
Closed

[API Proposal]: Add System.Text.RegularExpressions.ValueMatch.GroupName #110383

alrz opened this issue Dec 4, 2024 · 10 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Text.RegularExpressions needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration

Comments

@alrz
Copy link
Member

alrz commented Dec 4, 2024

Background and motivation

Currently EnumerateMatches returns the match bounds but it doesn't expose any info about the capture group that the match comes from.

API Proposal

namespace System.Text.RegularExpressions;

public readonly ref struct ValueMatch
{
+    public string? GroupName { get; }
}

API Usage

foreach (var m in Regex.EnumerateMatches(input, @"(?<n>\d+)|(?<w>\w+)", ExplicitCatprues))  {
  if (m.GroupName == "w") {
  }
}

Alternative Designs

No response

Risks

No response

@alrz alrz added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Dec 4, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Dec 4, 2024
@stephentoub
Copy link
Member

I don't understand the proposed semantics here. The match being reported is for the whole expression, not a part of it.

Seems like what you're really after is all of the capture group information. That's covered by #73223.

@stephentoub stephentoub added the needs-author-action An issue or pull request that requires more info or actions from the author. label Dec 5, 2024
@steveharter steveharter removed the untriaged New issue has not been triaged by the area owner label Dec 5, 2024
@alrz
Copy link
Member Author

alrz commented Dec 6, 2024

I don't understand the proposed semantics here. The match being reported is for the whole expression, not a part of it.

Seems like what you're really after is all of the capture group information. That's covered by #73223.

In #73223 how would you know which group is matched?

string text = "One car red car blue car";
string pat = @"(\w+)\s+(car)"; 
Regex r = new Regex(pat, RegexOptions.IgnoreCase);
Match m = r.Match(text);
ReadOnlySpan<char> word = m.GetGroupValueSpan(1); // matches "One"

e.g. when there's multiple groups like the example mentioned: "(?<n>\d+)|(?<w>\w+)"

The current Matches method also doesn't give out this information and you should brute force each group to see there's a match.

@dotnet-policy-service dotnet-policy-service bot added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Dec 6, 2024
@alrz
Copy link
Member Author

alrz commented Dec 6, 2024

That's exactly what I mean. You can lookup groups by name and index if they are defined in a sequence like the example in the docs, but when each match comes from disjuctive groups, it would help if you know the matched group name/index in advance, otherwise you will need to call GetGroupValueSpan for each name and skip if its unmached.

@stephentoub
Copy link
Member

And when multiple groups capture, this property doesn't make sense.

@alrz
Copy link
Member Author

alrz commented Dec 6, 2024

I might be looking at it in a wrong way, what I'm trying to do is to make a (substring)->(group name) mapping given a regex in the form of (?<n>\d+)|(?<w>\w+) via EnumerableMatches, could this be done efficiently using current APIs or #73223?

@stephentoub
Copy link
Member

I might be looking at it in a wrong way, what I'm trying to do is to make a (substring)->(group name) mapping given a regex in the form of (?<n>\d+)|(?<w>\w+) via EnumerableMatches, could this be done efficiently using current APIs or #73223?

Today with Match/Matches, that's just a simple check per group:

using System.Text.RegularExpressions;

Match m = Regex.Match("abc", @"(?<n>\d+)|(?<w>\w+)");
if (m.Success)
{
    if (m.Groups["n"].Success)
        Console.WriteLine("matched n");
   
    if (m.Groups["w"].Success)
        Console.WriteLine("matched w");
}

#73223 is about coming up with a good way to represent this in a safe manner for the non-allocating EnumerateMatches.

If a GroupName existed with semantics of "return me the name of the first group that had a match", that's effectively what its implementation would be, just walking through the groups looking for one that was successful. It's not adding anything meaningful on top of what's already possible for Match, and its semantics would be really strange... it only makes sense in a world where only one capture group can possibly match, which is the vast minority use of capture groups.

@alrz
Copy link
Member Author

alrz commented Dec 7, 2024

Would you imagine an API like this could be implemented more efficiently?

namespace System.Text.RegularExpressions;

public readonly ref struct ValueMatch
{
     // only returns successful matches 
+    public ValueGroupEnumerator EnumerateGroups();
}

A hypothetical ValueGroup type would include index/name next to the value/bounds. I think this would be essentially an alternative to #73223.

@stephentoub
Copy link
Member

The API originally proposed in #73223 is not one we'd ship. That issue is just tracking the general desire to ship an allocation-free way of enumerating matches with full detail about captures. One possible way of doing so would be an enumerator like you allude to, though usability would be an issue with that design; there are lots of uses where folks pull out multiple named capture groups from a match, and having to enumerate to find them would be less than ideal. I suggest switching over to that issue to contribute to its design.

@stephentoub
Copy link
Member

Closing as a duplicate of #73223. Thanks.

@stephentoub stephentoub closed this as not planned Won't fix, can't repro, duplicate, stale Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Text.RegularExpressions needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Projects
None yet
Development

No branches or pull requests

3 participants