Skip to content
0b1kn00b edited this page Nov 14, 2014 · 2 revisions

Writing a Lexer

Basic steps

  1. Create a Lexer class, extending hxparse.Lexer.
  2. Create lexer rulesets as static variables.
  3. Instantiate your lexer class, passing it an instance of byte.ByteData and optionally the name of the source (for position tracking).
  4. Keep calling yourLexerInstance.token(ruleset) with the ruleset to get tokens.

Rules

Rules are a mapping of a regular expression String to a semantic action. They are expressed using the regex => action map notation:


public static var token = hxparse.Lexer.build([
	"rule-regex-1" => function(lexer:hxparse.Lexer) {
		// semantic action 1
	},
	"rule-regex-2" => function(lexer) {
		// semantic action 2
	}
]);

This can be simplified by letting the lexer implement hxparse.RuleBuilder, and then use the special @:rule metadata:


public static var token = @:rule ([
	"rule-regex-1" => {
		// semantic action 1, access to lexer identifier
	},
	"rule-regex-2" => {
		// semantic action 2, access to lexer identifier
	}
]);

Semantic action Api

An instance of hxparse.Lexer is available in the semantic action definitions, exposing the following Api:

  • lexer.current: the String that was matched
  • lexer.curPos(): the position information of the matched String
  • lexer.token(): get the next token (can be used to ignore the current token)

RuleBuilder macro

Other than the @:rule metadata described above, implementing hxparse.RuleBuilder provides the following helpers:

  • static var keywords = @:mapping Keyword;: Transforms the constructors of enum Keyword to a keyword => Keyword mapping, where the left side is the lower-case String representation of the enum constructor (example: "public" => Keyword.Public).
  • @:ruleHelper static var someRule = "regex" => { // semantic action }: This allows someRule to be used in multiple rule sets.
Clone this wiki locally