A malloc-free, wchar_t based regular expression matching library
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


aov-rx README

aov-rx is a malloc-free [1], wide-char based regular expression matching
library. This piece of code is designed to be part of MPDM [2] in the
future, but can also be used separately.

Please see the TODO file to know the completion status of this project.

Released under the GPL.

Angel Ortega <angel@triptico.com>

 [1] Free as in 'without', not as in 'free()'
 [2] http://triptico.com/software/mpdm.html (Minimum Profit Data Manager)


 - Not using dynamic memory,
 - using wide chars (wchar_t) natively,
 - returns the matching substring as offset + size,
 - source code being readable, and
 - being completely reentrant and thread-safe.


 - Start (^) and end of string ($) anchors
 - . (match any character)
 - Full quantifier support:
   - Curly bracket with optional minimum and maximum:
     - Exactly n matches ({n})
     - At least n matches ({n,})
     - At most m matches ({,m})
     - At least n and at most m matches ({n,m})
   - ? (match 0 or 1 occurrences, same as {0,1})
   - * (match 0 or many occurrences, same as {0,})
   - + (match 1 or many occurrences, same as {1,})
 - Alternate sets (str1|str2|str3)
 - Sub-regexes (between parentheses, also optionally
 - Bracket-wrapped character sets, positive and
   negative ([0-9], [^0-9])

Features that will (probably) be, but not soon

 - Optional multiline matching (i.e. ^ and $ matching lines inside
   the string instead of the start and the end of the string)

Features that (probably) will never be

 - Backreferences


aov-rx consists only of one function:

 wchar_t *aov_match(wchar_t *rx, wchar_t *tx, int *size);

And its usage is pretty straightforward:

 #include <stdio.h>
 #include <wchar.h>
 #include "aov-rx.h"
 int main(int argc, char *argv[])
    int s;
    wchar_t *res;
    res = aov_match(L"/[0-9]+/", L"The number is 123, indeed", &s);
    if (s) {
        /* res points to 123, and s contains 3 */

Angel Ortega