Skip to content

aerialship/Lex

Repository files navigation

Build Status

Lex

PHP implementation of Lexical Analyzer.

Warning This is not a GENERATOR like classical lex is. It does not produce any php code. It's a simple plain scanner of the given input string and tokenizer into given set of tokens by matching regular expressions. Thus, at runtime you can change the token definition and use one same code for any token set.

Usage

Token definition

Define the tokens in the yaml file. Key is regex pattern for the token, and value is token name. All patterns with empty name will be ignored and will not result in a token, but will keep the scan running and not brake it. On the other hand, if scanner encounters a non-recognizable chunk it will raise an exception.

# math.yml
\s:
\d+: number
\+: plus
-: minus
\*: mul
/: div

Note: Regex pattern MUST NOT include delimiters, nor start and end of string meta-chars (^ and $). They are prepeded with those during the scan. If invalid pattern supplied it will trigger InvalidArgumentException during the configuration load

Scanning with a callback

Construct the lexer with such tokens config

$lexer = new Lexer(new LexConfig(new YamlFileConfig('math.yml')));

And now you are ready to tokenize the input string

$lexer->tokenizeAsync(' 2131 + 33   / 567', function(Token $token) {
    print "{$token->getToken()}({$token->getValue()})\n";
});

Which will print the result on output

number(2131)
plus(+)
number(33)
div(/)
number(567)

Scan and return array of tokens

Optionally you could get all tokens collected into an array and returned together, without supplying a callback for individual tokens

$result = $lexer->tokenize(' 2131 + 33   / 567');
print_r($result);

Unknown token exception

If at any position content of input string does not match with any regex pattern from the config the UnknownTokenException will be thrown containing the offset of its position

try {
    $lexer->tokenize(' 2131 + blabla');
} catch (UnknownTokenException $ex) {
    print $ex->getOffset();  // 8
}