New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Typed regex #1
Typed regex #1
Conversation
Wrap glibc’s regex engine to allow matching and group capture in POSIX extended regular expressions. It might be worth rewriting this in terms of the C++11 regex engine; it’s more featureful and more pleasant to use, although it would require more casting. (C can’t represent the std::regex type, so I’d need to use some void pointers.)
Switch to using the C++11 regex library for better portability and ease of use. As an added bonus, this should make it easier to implement regex substitution.
Boost provides numeric_cast, which is much better than what I was using for safe numeric type conversion. This does introduce a Boost dependency, but that tends to be true of most nontrivial C++ programs, so it’s pretty reasonable.
Also fix a fencepost error in uw_Regex__FFI_do_match.
Replace the two-step compile/match process with a single compile-and-match one to avoid issues with server-client representation incompatibility. Use the browser regex engine on the client side.
That’s what the browser uses, so use it on the server side for consistency.
This brings JavaScript’s behaviour into line with C++’s – replacements should be global.
Redesign library API around highly general regex-based transformations. Instead of specifying a string to substitute for each match, you now execute an entire function over the match (and over nonmatching regions as well). The resulting C++ code is much simpler, with more functionality pushed into Ur, and the engine now supports certain types of regex transformations needed to mimic Perl.
README touch-up.
.../test.ur:137:17: (to 139:7) Anonymous function remains at code generation Function: (fn _ : {} => (case UNBOUND_1 of None => write("Failed: mismatch!") | Some {Whole = whole, Groups = {X = {Start = s, Len = l}}} => (write("Success? Whole match: "); (FFI(Basis.htmlifyInt_w(whole.Start)); (write(" + "); (FFI(Basis.htmlifyInt_w(whole.Len)); (write(", group is "); (FFI(Basis.htmlifyInt_w(s)); (write(" + "); FFI(Basis.htmlifyInt_w(l)))))))))))
Also, fixing the tests.
I think I'm basically done with it, for now. @bbarenblat would you kindly take a look into the code? |
Also, making groups optional. In JS, it is possible for groups to be undefined!
Also, found a bug!
Thanks in advance for your work on this. I’m looking forward to getting it merged. Two issues that we should address immediately:
|
8fc03dc
to
f95d2a1
Compare
I tried rewriting the history with |
This is based on Ziv Scully's proposal.
I'll be adding more stuff this week:
Tregex
)Tregex
functions (to bring to feature parity withRegex
)