Simple Yet Powerful Parsing Library.
A library to create custom parsers. Based on the "ancient" concept of parser combinators, this library contains a vast variety of base parsers, decorators, combinators and helpers.
Parsers made with this library can be used in many ways. Parsing is transforming text into a usable structure.
This can be used for various purposes, whether it be transforming json / csv / xml / yaml / etc. into some kind of data structure, or parsing a custom DSL or expression language into an abstract syntax tree.
Whether you wish to create your own file format, your own programming language, interpret existing file formats or languages... This library is here to help.
For hands on how-tos, see the guide.
Using composer: composer require stratadox/parser
There's 3 base parsers: any
, text
and pattern
.
- Any matches any single character.
- Text matches a predefined string.
- Pattern matches a regular expression.
These can be upgraded by a fair amount of add-ons ("decorators"), which can be combined as needed:
- Repeatable applies the parser any number of times, yielding a list.
- Map modifies successful results based on a function.
- Full Map modifies all results based on a function.
- Ignore requires the thing to be there, and then ignores it. (Miauw)
- Maybe does not require it, but uses it if it's there.
- Optional combines the above two.
- Except "un-matches" if another parser succeeds.
- End returns an error state if there's unparsed content.
- All or Nothing fiddles with the parse error.
Parsers can be combined using these combinators:
- Either / Or returns the first matching parser of the lot.
- Sequence / AndThen puts several parsers one after the other.
All the above can be mixed and combined at will. To make life easier, there's a bunch of combinator shortcuts for "everyday tasks":
- Between matches the parser's content between start and end.
- Between Escaped matches unescaped content between start and end.
- Split yields one or more results, split by a delimiter.
- Must Split yields two or more results, split by a delimiter.
- Keep Split yields a
structure like
{delimiter: [left, right]}
.
There's several additional helpers, which are essentially mapping shortcuts:
- Join implodes the array result into a string.
- Non-Empty refuses
empty
results. - At Least refuses arrays with fewer than x entries.
- At Most refuses arrays with more than x entries.
- First transforms an array result into its first item.
- Item transforms an array result into its nth item.
To enable lazy parsers (and/or to provide a structure), different containers are available:
- Lazy Container manages lazy loading, essential for recursive parsers.
- Eager Container a basic typed list of regular parsers.
- Recursion-Safe Lazy Container prevents infinite looping on left-recursion.
- Grammar Container mixes lazy and eager containers.
For a basic "real life" example, here's a simple CSV parser:
<?php
use Stratadox\Parser\Helpers\Between;
use Stratadox\Parser\Parser;
use function Stratadox\Parser\any;
use function Stratadox\Parser\pattern;
function csvParser(
Parser|string $sep = ',',
Parser|string $esc = '"',
): Parser {
$newline = pattern('\r\n|\r|\n');
return Between::escaped('"', '"', $esc)
->or(any()->except($newline->or($sep)->or($esc))->repeatableString())
->mustSplit($sep)->maybe()
->split($newline)
->end();
}
(For associative result mapping, see the CSV example)
This next example parses basic arithmetic strings (e.g. 1 + -3 * 3 ^ 2
) into an
abstract syntax tree:
<?php
use Stratadox\Parser\Containers\Grammar;
use Stratadox\Parser\Containers\Lazy;
use Stratadox\Parser\Parser;
use function Stratadox\Parser\pattern;
use function Stratadox\Parser\text;
function calculationsParser(): Parser
{
$grammar = Grammar::with($lazy = Lazy::container());
$sign = text('+')->or('-')->maybe();
$digits = pattern('\d+');
$map = fn($op, $l, $r) => [
'op' => $op,
'arg' => [$l, $r],
];
$grammar['prio 0'] = $sign->andThen($digits, '.', $digits)->join()->map(fn($x) => (float) $x)
->or($sign->andThen($digits)->join()->map(fn($x) => (int) $x))
->between(text(' ')->or("\t", "\n", "\r")->repeatable()->optional());
$lazy['prio 1'] = $grammar['prio 0']->andThen('^', $grammar['prio 0'])->map(fn($a) => [
'op' => '^',
'arg' => [$a[0], $a[2]],
])->or($grammar['prio 0']);
$grammar['prio 2'] = $grammar['prio 1']->keepSplit(['*', '/'], $map)->or($grammar['prio 1']);
$grammar['prio 3'] = $grammar['prio 2']->keepSplit(['+', '-'], $map)->or($grammar['prio 2']);
return $grammar['prio 3']->end();
}
(For a working example, see the Calculator example)
Additional documentation is available through the guide, the reference and/or the tests.