This is exactly what it says on the tin — a Java implementation of a simple matcher for regular expressions, based on my NFA Runner.
It supports the following standard metacharacters:
- The period
.
, which matches any character. - Kleene star
*
, which means “zero or more of the preceding character/group”. - The
+
character, which means “one or more of the preceding character/group”. - The
?
character, which means “zero or one of the preceding character/group”. - The union operator
|
, which means “either the preceding or the following character/group”. - Left and right brackets
[
and]
, which are used for specifying character groups and ranges (e.g.[aA]
would match eithera
orA
, and[a-z]
would match any character betweena
andz
). - Left and right parentheses
(
and)
, which are used for character grouping (exactly the same way as in mathematical notation).
To use a metacharacter as a standard character, escape it with a \
(backslash). For example, \?
matches a literal question mark, and \.
matches a literal full stop. Naturally, to match a backslash, use two backslashes \\
in your pattern.
This matcher is written as a proof of concept (or, more precisely, as a proof that I can write something like this from scratch), so it does not support more advanced features like character classes and backreferences.
It works by converting a regular expression into a non-deterministic finite automaton (or NFA).
To achieve this, the regex is first converted into a stream of unambigous tokens. During this process, the tokenizer also inserts explicit concatenation tokens between character literals and groups of characters.
The token stream generated by the tokenizer is then converted into postfix notation using Dijkstra’s shunting yard algorithm. Finally, the postfix token stream is converted into an NFA by the NFABuilder class. The NFA simulator itself is a modified version of my NFA Runner.
If you insist.