Generate a lexer by providing a set of lexical rules.
var lexer = Lexer.Create(new[]
{
new Rule("[0-9]+\\.?[0-9]*", "NUM"),
new Rule("[+*/-]", "OP"),
new Rule("=", "EQ"),
new Rule("[ \t\r\n]", "WS")
});
Set the input either as a string
lexer.Input = new InputStream("3.14+1.86=5");
or as a stream.
using var reader = new StreamReader("path/to/file.txt");
lexer.Input = new InputStream(reader);
Perform the tokenization.
foreach (Token token in lexer.GetNextToken())
Console.WriteLine(token);
3.14+1.86=5
[@0,0:3='3.14',<NUM>]
[@1,4:4='+',<OP>]
[@2,5:8='1.86',<NUM>]
[@3,9:9='=',<EQ>]
[@4,10:10='5',<NUM>]
Lexer.GetNextToken()
returns an iterator which yields a sequence of tokens one at a time. The Token
class contains the following properties:
public class Token
{
public int Index { get; set; }
public (int Start, int End) Position { get; set; }
public string Text { get; set; }
public string Type { get; set; }
public override string ToString() =>
$"[@{Index},{Position.Start}:{Position.End}='{Text}',<{Type}>]";
}
Depending on the grammar, the lexer construction might take some time. Hence, once built, it can be exported as a binary file and then loaded in a project.
var lexer = Lexer.Create(grammar);
lexer.ExportToFile("lexout");
var lexer = Lexer.LoadFromFile("lexout");
- Arithmetic expression
- JSON
- Regular expression
- Tokenizer for the English language
Expression | Example |
---|---|
Concatenation | ab |
Union | a|b |
Zero-or-more | a* |
One-or-more | a+ |
Optional | a? |
Grouping | (a|b)*c |
Char class | [a-z0-9,] |
Negative char class | [^a-z0-9,] |
Match exactly 2 'a's | a{2} |
Match at least 2 'a's | a{2,} |
Match between 2 and 4 'a's | a{2,4} |
Escaping (matches "a*") | a\* |
Escaping in char classes (match 'a', '-' or 'z') | [a\-z] |
- Finite-State Automata [NFA-Representation] [DFA-Representation] [Constructions] [Operations]
- Finite-State Transducer [Representation] [Construction] [Operations]
- Bimachine [Representation] [Construction]
- Optional rewrite transducer
- Obligatory rewrite transducer
- Leftmost-longest match rewrite transducer
- Trie (prefix tree) implementation
- Direct construction of a minimal, deterministic, acyclic finite-state automaton from a set of strings
Example usages of the APIs can be found in the unit tests' project.
The finite state devices Fsa
, Dfsa
and Fst
can be exported as GraphViz.
var fst = new RegExp("a+").Automaton
.Product(new RegExp("b").Automaton)
.ToRealTime()
.Transducer
.PseudoMinimal();
Console.WriteLine(fst.ToGraphViz());
digraph G {
rankdir=LR; size="8,5"
node [shape=circle]
-1 [label= "", shape=none,height=.0,width=.0];
-1 -> 0;
0 -> 1 [label="a/b"]
0 -> 0 [label="a/ε"]
1 -> 1 [label="a/ε"]
1 [shape = doublecircle]
}