Skip to content

Lexer post processing

Olivier Duhart edited this page Aug 15, 2023 · 3 revisions

Post processing lexer output

CSLY allows to rework the lexer token stream output to add/modify/remove tokens from the stream.

A basic example could be an expression parser for which we would like to be able to parse

2 x as 2 * x

There is 2 ways to do so :

  • tweak the parser but you will have hard time managing associativity and precedence.

  • insert implicit * tokens just after the lexing phase so the token stream will look INT(2) TIMES IDENTIFIER(x) instead of INT(2) IDENTIFIER(x) and the parser will not have to manage the missing TIMES token

lexer post processor is a mere delegate

List<Token> LexerPostProcess(List<Token> tokens)

the delegate takes the raw token stream and returns a modified token stream.

The lexer post processor for the expression parser [PostProcessedLexer](csly/PostProcessedLexerParserBuilder.cs at dev · b3b00/csly · GitHub)

A lexer post processor can be added when building a lexer or parser :

  • parser :
builder.BuildParser(parserInstance, ParserType.EBNF_LL_RECURSIVE_DESCENT, $"{nameof(FormulaParser)}_expressions",
                lexerPostProcess: postProcessFormula);
  • lexer :
LexerBuilder.BuildLexer<FormulaToken>(lexerPostProcess: postProcessFormula);