-
-
Notifications
You must be signed in to change notification settings - Fork 706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add functional style regex pattern-matching #1241
Conversation
|
Overall - the stuff is neat and something I had in mind for a long time. There is one big concern about the current implementation though. Doing one by one matching is not good performance-wise. We could get is just M times of start, stop. (not even speaking that patterns are quite simillar in many cases) What I planned is the following :
So this leaves me wondering if it's readily amendable to what I plan on the implementation side without changing the interface. Looks like it should. I'll review this pull more throughly but of the top of my head: check 0 patterns and 1 pattern cases in regexSwitch unittest
|
|
@blackwhale How should I go about combining the patterns together? I have next to zero knowledge of regex implementation, but from a quick glance at the source of Also, my original design was to get patterns and handlers as template arguments, and use |
Well I don't expect you since to it's not strightforward as there is (sadly) no API for constructing regex from pieces manually. Plus we need to tweak one VM instruction - IR.End to return the number of sub-pattern. Thus let's focus on the interface and being able to do that later.
Hard question as by having them as template arguments we would need to use ctRegex (otherwise it's just causing tempalte bloat all over the place). If we pick them as normal arguments then we need to use normal non-CT patterns. All in all I'd say it needs both versions if we are to support both ctRegex and Regex properly. Think of it this way - since merging patterns together into a switch produces a new pattern it has the same interface problem as the original regex vs ctRegex. So regexSwitch and ctRegexSwitch ? (switchRegex sounds like I'm about to hot swap one pattern for another ;) ) |
|
@blackwhale I moved from template arguments to regular arguments to allow things like: auto r1=regex(`...`);
auto r2=regex(`...`);
auto foo(string str){
return str.switchRegex(
r1,(...)=>...,
r2,(...)=>...,
);
}But since the new implementation is going to create a big compound automaton for all patterns, this is probably redundant - and might even make it harder to implement. I don't want an interface that will allow things that will make it harder to implement your suggestion. Leaving that behind, the only interface advantage the regular arguments version have is the ability to supply regular expressions and handler delegates in runtime. But this option was limited to begin with, since delegate signature must match the pattern, and anyways if a user of this function makes a mechanism for supplying patterns and handler delegates at runtime, they already loose most of the elegance Now, the template argument version has one clear advantage in terms of elegance - if you want to use a higher order function, or to create an alias, assert(equal(
[
Schedule(WeekDay.Sunday, 1),
Schedule(WeekDay.Monday, 14),
Schedule(WeekDay.Tuesday, 3),
Schedule(WeekDay.Wednesday, 4),
Schedule(WeekDay.Thursday, 17),
Schedule(WeekDay.Friday, 6),
],
[
"Sunday 1AM",
"Monday 2PM",
"Tuesday 3",
"4AM Wednesday",
"5PM Thursday",
"6 Friday",
].map!(switchRegex!(
`(\w+) (\d+)AM`, (WeekDay day, int hour) => Schedule(day, hour % 12),
`(\w+) (\d+)PM`, (WeekDay day, int hour) => Schedule(day, hour % 12 + 12),
`(\w+) (\d+)`, (WeekDay day, int hour) => Schedule(day, hour),
`(\d+)AM (\w+)`, (int hour, WeekDay day) => Schedule(day, hour % 12),
`(\d+)PM (\w+)`, (int hour, WeekDay day) => Schedule(day, hour % 12 + 12),
`(\d+) (\w+)`, (int hour, WeekDay day) => Schedule(day, hour),
))()));Notice I removed As for your concern about template bloat - I don't think the template arguments version will be much "bloatier" than the regular argument version. The regular version already needs to instantiate for every combination of delegate argument types and return values, and I doubt there will be many cases of calls for All that being said - the template argument version has a clear disadvantage that you need to pass the variadic template argument composed of the delegate directly - it cannot be inferred from the regular variadic argument like in the regular arguments version, because there is no regular variadic argument.. That means that the variadic template argument must be the first template argument. But variadic template argument must also be the last template argument. Therefore, it must be the only template argument, which means we can't have a template argument for the input text's type, so it must always be |
Then what we are trying to achive is to implement siwtch with regex patterns + conversions that some languages have as built-in. If that case then having patterns be constant strings and delegates being passed as alias parameters fits the bill just fine.
It surely can just make your regexSwitch a template of tempalte, see also std.algortithm tempalte regexSwitch(Args...)
{
auto regexSwitch(Input)(Input input){ ... }
}
This also makes it possible to alias regexSwitch!(...) and then use the alias as callable to simplify code structure. |
|
OK, I changed it back to template argument version, and renamed it to |
|
Another idea I've got that is very closely related to this one is scanf on steriods. This is what you are already doing but as far as I can tell it's all hardcoded to The same could go with UFCS style call. And the simpliest way, of course, is to just cal it with to!Type: AFAIK to!string of string is the same string (since it's immutable there is no harm done). The good thing about this simplish API is that it doesn't add much of bloat and rides on top of what you did (a bit more generalize) so the code is basically the same and can be re-used. P.S. Before I forget - Let's do it Walter's Style: open an ehancement, fill in the details of enhancement and include into one of the commit messages "fix issue xyz". |
|
I don't really understand your "P.S." - where should I open an enhancement? And why should a new feature be labeled as an issue? Anyways, about that |
|
Oh the joy of posting at 2 A.M. :) But all in due order. Yes, Walter wanted that any enhancements being done to be opened as issue in Bugzilla so that it's automatically shown in changelog next time around. About extract - I've meant that your is fairly close. I don't want to push you to do it, just noting a similar (in my eyes) functionality. BTW I do NOT think callWithDelegate converts captures and fires up an action. Extract just converts and let's user pick what kind of conversion to use. It's a data ---> data conversion, thus suitable as a range adapter. |
| static assert(__traits(compiles, (match(input, ""))), | ||
| "Unable to match on " ~ T.stringof ~ "."); | ||
| static assert(Patterns.length % 2 == 0, | ||
| "Wrong number of arguments - each pattern must have an handler."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick - 'a handler'
Let's proof read the messages as these are the things that what makes people think "fine stuff" vs "some half-baked crap".
(Sadly so the number of typos and minor mistakes is often percieved as lack of quality)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*perceived
Hihi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't stress that better, huh ? :)
|
Otherwise this is LGTM. And I'm pleased to see this contribution came by. |
|
Fixed that typo(found a few more in the doc along the way, fixed them too) |
The function std.regex.regexSwitch matches a string on multiple regular expressions, and runs a delegate based on which regex matched the input string. The function std.regex.callWithCaptures is a helper function that calls a delegate with regex captures distributed into it's parameters. It was created to be used in regexSwitch, but it's generic enough to have other uses so I made it a separate function.
This moves std.traits.unsigned to std.conv and adds std.conv.signed, since for some reason, we have std.traits.unsigned but do not have std.traits.signed.
Fixup readln.
Implement issue# 10314 (add std.traits.signed).
Convert C-style array declarations to D-style.
|
Needs a rebase. |
The function std.regex.regexSwitch matches a string on multiple regular expressions, and runs a delegate based on which regex matched the input string. The function std.regex.callWithCaptures is a helper function that calls a delegate with regex captures distributed into it's parameters. It was created to be used in regexSwitch, but it's generic enough to have other uses so I made it a separate function.
…bos into add-regex-pattern-matching Conflicts: std/regex.d
|
@blackwhale I rebased it, but when I try to squash the commits it always breaks everything. Is it OK if I don't squash them? |
|
Mmm this is not a rebase as it shows up stray commits and the "Merge ..." stuff . Typical setup (provided you had clean but outdated state) involves The simple way out (besides some git kung-fu) is to create a new branch drop the changes here and re-post pull request, linking ot this one. |
|
OK, here is the new one - #1392 I'm closing this one |
Based on Scala's pattern-matching with regular expressions syntax, I created the function
std.regex.regexSwitch(formerlyswitchRegex). It's main use-case is for reading textual input fields that can be in multiple formats: