-
-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow iteration through named subgroups of Capture object #1114
Conversation
Yeah, it's an overlooked piece of API, I kind of been wondering when somebody will notice it ;) Iteration is good but some more straight-forward and direct is needed in addition to convenience. In the end it's a property of Regex that doesn't change on each match so it's the place to put this stuff. BTW it's seems customary now to file an enhancement request in bugzilla and reference it in commit message as "fixes Issue xxxx". |
I agree iteration does not seem to be enough; I cringed as I wrote the code. However, I'm not particularly aquatinted with the D standards on stuff like returning a slice vs exposing a range etc, so I thought iteration would be the least offensive method of exposing this information. You're probably correct that it should be a property of Regex and to be honest, the only reason that I added it to the Captures object is because the name Captures seems more closely related to the concept of named sub-captures. And yes, I guess I will submit myself to the bureaucracy of bugzilla in future (or would it be better to move this discussion there now?) |
The standard is simple - expose an opaque range of the required power unless there is an advantage to expose the concrete type. The key idea is to leave future contributors some room for improvement of internals w/o breaking interface (and user code) e.g. if we expose an array we then give on any of storage tricks inside of Regex. Bugzilla is mainly for history and discussions on bugs/enhancement until there is a pull that address it. Here we discuss implementation of pull so I believe there is no need to move discussion to Bugzilla but an entry for change log is good to have (so the change is announced with new release). |
I see how opaque ranges work now, very elegant. I do have some qualms with moving the named captures range to RegexMatch however:
which is hardly elegant. As changing the existing API is not a good thing to do, I think the best thing to do is to add it to the Captures struct. This is still bad, as it means Captures is becoming significantly more than just a non opaque range, but it's already that way with the opIndex(string) method (imo). |
Rightfully so what I meant was moving the functionality to Regex template not RegexMatch. Then aliases gives pairs of name --> number of match. (keep in mind named matches are an aliases to numberd ones) Then about captures - what if we are to make: auto m = match(foobar, regex(reg));
foreach (c; m) {
auto withName = c.named; //gives range of pairs - (name, slice) for now Forward should do
foreach(pair; withName)
writeln(pair[0]," : ", pair[1]);
} Soounds good? UPDATE: Agree about Captures, they are somewhat more then opaque range. And I mean add both things: aliases to Regex struct itself and .named or .aliased property to Captures. |
What type would you suggest for the pair? Tuple!(string, R)? That does particularly seem to be a candidate for opApply to me, for it is no longer adding core functionality (assuming a namedCaptures property somewhere else) and is simply acting as a short cut. Alternatively, it might be easier to just leave it for the moment and just do one thing at a time. I agree about implementing namedGroups on Regex. |
Yes, that fits the bill. If adventurous enough you may try Tuple!(string, "name", R, "slice") if that works fine. About opApply AFAICT there is auotmatic tuple unpacking (or I may be daydreaming) that mens that if range returns tuple one can do: foreach(first, second; range){ .... } opApply is kinda old, and was devised way before ranges with things like traversal of complex tree-like structures in mind.
Let's settle on this tiny plan: one pull for namedCaptures and (if you are inclined) another one for Captures.named. In a sense the latter it's just a shortcut for filter + map applied to captures. namedCaptures should go as random access range or at least forward+length (even length alone allows for the some powerful shortcuts in std.algorithm). |
That doesn't work in 2.061, but I think there is an open pull request for it. Either way, it can be implemented in a library like this: |
Ok, namedCaptures now seems to implement every range under the sun (well, RandomAccess, Forward, Length, and all other ranges implied by those). I wasn't sure 100% how to document which interfaces it implements, as it's not obvious from the method prototype, so I added a |
void popBack() { assert(!empty); end--; } | ||
string opIndex()(size_t i) | ||
{ | ||
assert(start + i < end, "Request named group is out of range."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: RequestED named group...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mm true. I didn't put any messages for most of the asserts in the class, is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor thing, other range are too not very verbose with asserts + the upper two won't ever trigger in user code excpet if it extracts type with typeof(namedCaptures).
Yup and to be true DbC fellow add asserts to in{ ... } contracts.
Thinking a bit about name - maybe make it namedGroups since it didn't capture anything yet as in match ;)
Another minor comment is that I'm surprized that RandomAccess doesn't require slicing ... I thought there was a movement to mandate it, I'll double check std.range. |
Looking good. The last thing is to actually put that example into unittest so that it's ehm tested :) We seem to have automatic examples in 2.062 beta but it goes the other way around - from documented unittest towards auto-generated example sections in docs, see initial pull here: Let's try the new non-verbose way of testing examples in std.regex with this pull and see how it goes. |
Aaaand it looks like we hit a compiler limitation (bug)? Behold the pull auto-tester in it's full glory :) It seems like you can't have unittest inside of Regex that relies on the outside function that relies on Regex being fully analyzed ... Sounds like compiler should be able to anyalze struct before analyzing unittests (at least to me) but it doesn't. In any case the only way out for now seems like duplicate the example in the docs and put unittests outside of Regex (say right after it). Let's do this and finally pull this in, thanks. |
Bump :) |
Yes, you've got to ping reviewers once you've addressed comments :) |
Ping ! See e.g. http://blog.steveklabnik.com/posts/2012-11-08-how-to-squash-commits-in-a-github-pull-request |
OK to close this since #1385 took care of it? |
Yup, #1385 is exactly the same commits squashed into one. |
Currently there is no way to get the names of the named subgroups from a captures/match object (afaict). This patch allows the user to iterate through the captures object and get all named group names and values.
I've no idea if this is the best API, but even if this patch isn't accepted, it'd be really useful to have some way of getting the names of the named subgroups in a Capture object.