Composite Operations:
- Concatentation:
abmatches the subexpressiona, thenb; - Alternation:
a|bmatches either subexpressionaorb; - Repetition:
a*matches the string that repeatazero or many times; - Quotation:
(a)quotes a subexpression
Atomic Operations:
- :
xmatches a literal character - :
.matches any single character
The priority (from high to low) is: 1. Parenthesis, 2. Union, 3. Repetition 4. Concatenation
The above basic operations are able to express all operations that are supported by the google re2 engine.
First, import the regex module. Initialize a instance by
re = regex(string)
where string is the regular expression. First, regex will detect whether the expression is legal with the error module.
After deeming the expression legal, regex will then proceed to compile the expression into a NFA. It will pass through 3 stages:
- the
syntaxmodule will first parse the expression into literal parts - the
syntaxmodule will then assemble the syntax tree from those parts and detect syntax errors in the process. The syntax trees are stored as tuples:(id, args). - the
nfamodule will then construct the nfa that corresponds to the expression.
This finishes the initialization. After initialization the user will be able to pattern match by calling
re.match(text)
where text is the target string. This function will traverse the NFA via breadth first search with queues.
.*amatches any string ending withath(e|ere|ose)matchesthe,there, andthose
The following picture summarizes my algorithm:
