Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typed regex #1

Closed
wants to merge 27 commits into from
Closed

Conversation

ashalkhakov
Copy link

This is based on Ziv Scully's proposal.

I'll be adding more stuff this week:

  • more tests for the typed API (Tregex)
  • support for missing regular expression features (character classes, etc.)
  • additional Tregex functions (to bring to feature parity with Regex)

bbarenblat and others added 25 commits July 3, 2015 15:52
Wrap glibc’s regex engine to allow matching and group capture in POSIX
extended regular expressions.

It might be worth rewriting this in terms of the C++11 regex engine;
it’s more featureful and more pleasant to use, although it would require
more casting.  (C can’t represent the std::regex type, so I’d need to
use some void pointers.)
Switch to using the C++11 regex library for better portability and ease
of use.  As an added bonus, this should make it easier to implement
regex substitution.
Boost provides numeric_cast, which is much better than what I was using for
safe numeric type conversion.  This does introduce a Boost dependency, but that
tends to be true of most nontrivial C++ programs, so it’s pretty reasonable.
Also fix a fencepost error in uw_Regex__FFI_do_match.
Replace the two-step compile/match process with a single
compile-and-match one to avoid issues with server-client representation
incompatibility.  Use the browser regex engine on the client side.
That’s what the browser uses, so use it on the server side for
consistency.
This brings JavaScript’s behaviour into line with C++’s –
replacements should be global.
Redesign library API around highly general regex-based transformations.
Instead of specifying a string to substitute for each match, you
now execute an entire function over the match (and over nonmatching
regions as well).  The resulting C++ code is much simpler, with more
functionality pushed into Ur, and the engine now supports certain
types of regex transformations needed to mimic Perl.
README touch-up.
.../test.ur:137:17: (to 139:7) Anonymous function remains at code generation
Function:
(fn _ : {} =>
  (case UNBOUND_1 of
    None => write("Failed: mismatch!") |
     Some {Whole = whole, Groups = {X = {Start = s, Len = l}}} =>
      (write("Success? Whole match: ");
       (FFI(Basis.htmlifyInt_w(whole.Start));
        (write(" + ");
         (FFI(Basis.htmlifyInt_w(whole.Len));
          (write(", group is ");
           (FFI(Basis.htmlifyInt_w(s));
            (write(" + "); FFI(Basis.htmlifyInt_w(l)))))))))))
@ashalkhakov
Copy link
Author

I think I'm basically done with it, for now. @bbarenblat would you kindly take a look into the code?

Artyom Shalkhakov added 2 commits March 9, 2017 15:05
Also, making groups optional. In JS, it is possible for groups
to be undefined!
@bbarenblat
Copy link
Owner

Thanks in advance for your work on this. I’m looking forward to getting it merged. Two issues that we should address immediately:

  • In the first couple of commits in this series, you checked in binaries which you later removed. I’m generally strongly opposed to rewriting Git history, but checked-in binaries bloat the repository and convey no useful information about development. Please rewrite your history to remove the binaries.

  • tests/test.ur defines showRecord, which is based on showRecord from ‘Type-Level Computation’. ‘Type-Level Computation’ is licensed under the CC BY-NC-ND license, which is incompatible with the Apache license used for this library. In order to use that code, we need to get Adam to allow commercial use and derivative works. I’ll send mail and ask; if Adam doesn’t want to relicense that code, however, we’ll need to expunge it from the repository history.

@ashalkhakov
Copy link
Author

I tried rewriting the history with git filter-branch and that broke GitHub. Let's try it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants