PHP implementation of Lexical Analyzer.
Warning This is not a GENERATOR like classical lex is. It does not produce any php code. It's a simple plain scanner of the given input string and tokenizer into given set of tokens by matching regular expressions. Thus, at runtime you can change the token definition and use one same code for any token set.
Define the tokens in the yaml file. Key is regex pattern for the token, and value is token name. All patterns with empty name will be ignored and will not result in a token, but will keep the scan running and not brake it. On the other hand, if scanner encounters a non-recognizable chunk it will raise an exception.
# math.yml
\s:
\d+: number
\+: plus
-: minus
\*: mul
/: div
Note: Regex pattern MUST NOT include delimiters, nor start and end of string meta-chars (^ and $). They are prepeded with those during the scan. If invalid pattern supplied it will trigger InvalidArgumentException during the configuration load
Construct the lexer with such tokens config
$lexer = new Lexer(new LexConfig(new YamlFileConfig('math.yml')));
And now you are ready to tokenize the input string
$lexer->tokenizeAsync(' 2131 + 33 / 567', function(Token $token) {
print "{$token->getToken()}({$token->getValue()})\n";
});
Which will print the result on output
number(2131)
plus(+)
number(33)
div(/)
number(567)
Optionally you could get all tokens collected into an array and returned together, without supplying a callback for individual tokens
$result = $lexer->tokenize(' 2131 + 33 / 567');
print_r($result);
If at any position content of input string does not match with any regex pattern from the config the UnknownTokenException will be thrown containing the offset of its position
try {
$lexer->tokenize(' 2131 + blabla');
} catch (UnknownTokenException $ex) {
print $ex->getOffset(); // 8
}