Skip to content
Olivier Duhart edited this page Aug 31, 2020 · 13 revisions

presentation

CSLY comes with two kinds of lexer :

  • a regex based lexer inpired by this post So it's not a very efficient lexer. Indeed this lexer is slow and is the bottleneck of the whole lexer/parser.
  • from version 2.0.0, a "GenericLexer" that is an FSM backed lexer designed for performance though restricting the lexer.
  • Generic and Regex lexemes can not be mixed

General configuration

The full lexer configuration is done in a C# enum:

The enum is listing all the possible tokens (no special constraint here except public visibility)

Each enum value has a [Lexeme] attribute to mark it has a lexeme.

For better description look at the Lexer section

How to use

The lexer can be used apart from the parser. It provides a method that returns an IEnumerable<Token<T>> (where T is the tokens enum) from a string

 IList<Token<T>> tokens = Lexer.Tokenize(source).ToList<Token<T>>();

You can also build only a lexer using :

var source = "some source to be lexed"
ILexer<ExpressionToken> lexer = LexerBuilder.BuildLexer<ExpressionToken>();
var tokens = lexer.Tokenize(source).ToList();