This page contains brief descriptions of all PEGTL rule and combinator classes.
The information about how much input is consumed by the rules only applies to when the rules succeed; the PEGTL is implemented in a way that assumes that rules never consume input when they do not succeed.
Remember that there are two failure modes, only the first of which usually leads to back-tracking:
- Local failure or a return value of
false
, in which case the rule must rewind the input to the position at which the rule match was attempted. - Global failure or an exception (usually of type
tao::parse_error
) that is usually generated by a control-class'raise()
-method.
Some rule classes are said to be equivalent to a combination of other rules. These rules are not completely equivalent to the shown definition because that is not how they are implemented, therefore:
- Rule equivalence is with regard to which inputs will match, but:
- not with regard to which actions will be invoked while matching.
However, rule equivalence does show exactly where the raise<>
rule is inserted
and therefore which rule will be used to call the control class' raise()
-method.
- Meta Rules
- Combinators
- Convenience
- Action Rules
- Atomic Rules
- ASCII Rules
- UTF-8 Rules
- UTF-16 Rules
- UTF-32 Rules
- Full Index
These rules are in namespace tao::pegtl
.
- Equivalent to
seq< R... >
, but: - Uses the given class template
A
for actions. - Actions can still be disabled explicitly (via
disable
) or implicitly (viaat
ornot_at
).
- Equivalent to
seq< R... >
, but: - Uses the given class template
C
as control class.
- Equivalent to
seq< R... >
, but: - Disables all actions.
- Equivalent to
success
, but: - Calls the input's
discard()
-method. - See Incremental Input for details.
- Equivalent to
seq< R... >
, but: - Enables all actions (if any).
- Succeeds if at least
Num
further input bytes are available. - With Incremental Input reads the bytes into the buffer.
- Equivalent to
seq< R... >
, but: - Replaces all state arguments with a new instance
s
of typeS
. s
is constructed with the input and all previous states as arguments.- If
seq< R... >
succeeds thens.success()
is called with the input after the match and all previous states as arguments, and, if expected, withA, M, Action, Control
as template parameters.
Combinators (or combinator rules) are rules that combine (other) rules into new ones.
These are the classical PEG combinator rules defined in namespace tao::pegtl
.
- PEG and-predicate &e
- Succeeds if and only if
seq< R... >
would succeed. - Consumes nothing, i.e. rewinds after matching.
- Disables all actions.
- Allows local failure of
R...
even withinmust<>
etc.
- PEG not-predicate !e
- Succeeds if and only if
seq< R... >
would not succeed. - Consumes nothing, i.e. rewinds after matching.
- Disables all actions.
- Allows local failure of
R...
even withinmust<>
etc.
- PEG optional e?
- Optional
seq< R... >
, i.e. attempt to matchseq< R... >
and signal success regardless of the result. - Equivalent to
sor< seq< R... >, success >
. - Allows local failure of
R...
even withinmust<>
etc.
- PEG one-or-more e+
- Matches
seq< R, ... >
as often as possible and succeeds if it matches at least once. - Equivalent to
rep_min< 1, R, ... >
. - Requires at least one rule
R
.
- PEG sequence e1 e2
- Sequence or conjunction of rules.
- Matches the given rules
R...
in the given order. - Fails and stops matching when one of the given rules fails.
- Consumes everything that the rules
R...
consumed. - Succeeds if no rule is given.
- PEG ordered choice e1 / e2
- Choice or disjunction of rules.
- Matches the given rules
R...
in the given order. - Succeeds and stops matching when one of the given rules succeeds.
- Consumes whatever the first rule that succeeded consumed.
- Allows local failure of
R...
even withinmust<>
etc. - Fails if no rule is given.
- PEG zero-or-more e*
- Matches
seq< R, ... >
as often as possible and always succeeds. - Allows local failure of
R, ...
even withinmust<>
etc. - Requires at least one rule
R
.
The PEGTL offers a variety of convenience rules which help writing concise grammars as well as offering performance benefits over the equivalent implementation with classical PEG combinators.
These rules are in namespace tao::pegtl
.
- Attempts to match
R
and depending on the result proceeds with eithermust< S... >
orfailure
. - Equivalent to
seq< R, must< S... > >
. - Equivalent to
if_then_else< R, must< S... >, failure >
.
- Attempts to match
R
and depending on the result proceeds with eithermust< S >
ormust< T >
. - Equivalent to
if_then_else< R, must< S >, must< T > >
.
- Equivalent to
sor< seq< R, S >, seq< not_at< R >, T > >
.
- Matches a non-empty list of
R
separated byS
. - Equivalent to
seq< R, star< S, R > >
.
- Matches a non-empty list of
R
separated byS
where eachS
can be padded byP
. - Equivalent to
seq< R, star< pad< S, P >, R > >
.
- Matches a non-empty list of
R
separated byS
. - Similar to
list< R, S >
, but if there is anS
it must be followed by anR
. - Equivalent to
seq< R, star< if_must< S, R > > >
.
- Matches a non-empty list of
R
separated byS
where eachS
can be padded byP
. - Similar to
list< R, S, P >
, but if there is anS
it must be followed by anR
. - Equivalent to
seq< R, star< if_must< pad< S, P >, R > > >
.
- Matches a non-empty list of
R
separated byS
with optional trailingS
. - Equivalent to
seq< list< R, S >, opt< S > >
.
- Matches a non-empty list of
R
separated byS
with optional trailingS
and paddingP
inside the list. - Equivalent to
seq< list< R, S, P >, opt< star< P >, S > >
.
- Succeeds if
M
matches, and thenS
does not match all of the input thatM
matched. - Does not call actions for
S
(unlessS
containsenable<>
). - Ignores
S
for the grammar analysis.
- Equivalent to
seq< R... >
, but: - Converts local failure of
R...
into global failure. - Calls
raise< R >
for theR
that failed. - Equivalent to
seq< sor< R, raise< R > >... >
.
- Matches an
R
that can be padded by arbitrary manyS
on the left andT
on the right. - Equivalent to
seq< star< S >, R, star< T > >
.
- Matches an optional
R
that can be padded by arbitrary manyP
or just arbitrary manyP
. - Equivalent to
seq< star< P >, opt< R, star< P > > >
.
- Matches
seq< R... >
forNum
times without checking for further matches. - Equivalent to
seq< seq< R... >, ..., seq< R... > >
whereseq< R... >
is repeatedNum
times.
- Matches
seq< R... >
for at mostMax
times and verifies that it doesn't match more often. - Equivalent to
rep_min_max< 0, Max, R... >
.
- Matches
seq< R, ... >
as often as possible and succeeds if it matches at leastMin
times. - Equivalent to
seq< rep< Min, R, ... >, star< R, ... > >
. - Requires at least one rule
R
.
- Matches
seq< R... >
forMin
toMax
times and verifies that it doesn't match more often. - Equivalent to
seq< rep< Min, R... >, rep_opt< Max - Min, R... >, not_at< R... > >
.
- Matches
seq< R... >
for zero toNum
times without check for further matches. - Equivalent to
rep< Num, opt< R... > >
.
- Equivalent to
star< if_must< R, S... > >
.
- Equivalent to
seq< R... >
, but: - Converts global failure (exception) into local failure (return value
false
). - Catches exceptions of type
tao::pegtl::parse_error
.
- Equivalent to
seq< R... >
, but: - Converts global failure (exception) into local failure (return value
false
). - Catches exceptions of type
E
.
- Consumes all input until
R
matches. - Equivalent to
until< R, any >
.
- Matches
seq< S, ... >
as long asat< R >
does not match and succeeds whenR
matches. - Equivalent to
seq< star< not_at< R >, not_at< eof >, S, ... >, R >
.
These rules are in namespace tao::pegtl
.
These rules replicate the intrusive way actions were called from within the grammar in the PEGTL 0.x with the apply<>
and if_apply<>
rules.
The actions for these rules are classes (rather than class templates as required for the parse()
-functions and action<>
-rule).
These rules respect the current apply_mode
, but neither use the control-class to invoke the actions, nor support actions that return bool
.
- Equivalent to
success
wrt. parsing, but also: - Calls
A::apply()
for allA
, in order, with an empty input and all states as arguments.
- Equivalent to
success
wrt. parsing, but also: - Calls
A::apply0()
for allA
, in order, with all states as arguments.
- Equivalent to
R
wrt. parsing, but also: - If
R
matches, callsA::apply()
, for allA
, in order, with the input matched byR
and all states as arguments.
These rules are in namespace tao::pegtl
.
Atomic rules do not rely on other rules.
- Succeeds at "beginning-of-file", i.e. when the input's
byte()
method returns zero. - Does not consume input.
- Does not work with inputs that don't have a
byte()
method.
- Succeeds at "beginning-of-line", i.e. when the input's
byte_in_line()
method returns zero. - Does not consume input.
- Does not work with inputs that don't have a
byte_in_line()
method.
- Succeeds when the input contains at least
Num
further bytes. - Consumes these
Num
bytes from the input.
- Succeeds at "end-of-file", i.e. when the input is empty or all input has been consumed.
- Does not consume input.
- Dummy rule that never succeeds.
- Does not consume input.
- Generates a global failure.
- Calls the control-class'
Control< T >::raise()
-method. T
can be a rule, but it does not have to be a rule.- Does not consume input.
- Dummy rule that always succeeds.
- Does not consume input.
These rules are in the inline namespace tao::pegtl::ascii
.
The ASCII rules operate on single bytes, without restricting the range of values to 7 bits.
They are compatible with input with the 8th bit set in the sense that nothing breaks in their presence.
Rules like ascii::any
or ascii::not_one< 'a' >
will match all possible byte values,
and all possible byte values excluding 'a'
, respectively. However the character class rules like
ascii::alpha
only match the corresponding ASCII characters.
(It is possible to match UTF-8 multi-byte characters with the ASCII rules,
for example the Euro sign code point U+20AC
, which is encoded by the UTF-8 sequence E2 82 AC
,
can be matched by either tao::pegtl::ascii::string< 0xe2, 0x82, 0xac >
or tao::pegtl::utf8::one< 0x20ac >
.)
- Matches and consumes a single ASCII alphabetic or numeric character.
- Equivalent to
ranges< 'a', 'z', 'A', 'Z', '0', '9' >
.
- Matches and consumes a single ASCII alphabetic character.
- Equivalent to
ranges< 'a', 'z', 'A', 'Z' >
.
- Matches and consumes any single byte, including all ASCII characters.
- Equivalent to
bytes< 1 >
.
- Matches and consumes a single ASCII horizontal space or horizontal tabulator character.
- Equivalent to
one< ' ', '\t' >
.
- Matches and consumes a single ASCII decimal digit character.
- Equivalent to
range< '0', '9' >
.
- Depends on the
Eol
template parameter of the input, by default: - Matches and consumes a Unix or MS-DOS line ending, that is:
- Equivalent to
sor< one< '\n' >, string< '\r', '\n' > >
.
- Equivalent to
sor< eof, eol >
.
- Matches and consumes a single ASCII character permissible as first character of a C identifier.
- Equivalent to
ranges< 'a', 'z', 'A', 'Z', '_' >
.
- Matches and consumes a single ASCII character permissible as subsequent character of a C identifier.
- Equivalent to
ranges< 'a', 'z', 'A', 'Z', '0', '9', '_' >
.
- Matches and consumes an ASCII identifier as defined for the C programming language.
- Equivalent to
seq< identifier_first, star< identifier_other > >
.
- Matches and consumes the given ASCII string
C, ...
with case insensitive matching. - Similar to
string< C, ... >
, but: - For ASCII letters a-z and A-Z the match is case insensitive.
- Matches and consumes a non-empty string not followed by a subsequent identifier character.
- Equivalent to
seq< string< C, ... >, not_at< identifier_other > >
.
- Matches and consumes a single ASCII lower-case alphabetic character.
- Equivalent to
range< 'a', 'z' >
.
- Succeeds when the input is not empty, and:
- The next input byte is not one of
C, ...
. - Consumes one byte when it succeeds.
- Succeeds when the input is not empty, and:
- The next input byte is not in the closed range
C ... D
. - Consumes one byte when it succeeds.
- Matches and consumes an ASCII nul character.
- Equivalent to
one< 0 >
.
- Succeeds when the input is not empty, and:
- The next input byte is one of
C, ...
. - Consumes one byte when it succeeds.
- Matches and consumes any single ASCII character traditionally defined as printable.
- Equivalent to
range< 32, 126 >
.
- Succeeds when the input is not empty, and:
- The next input byte is in the closed range
C ... D
. - Consumes one byte when it succeeds.
- Equivalent to
sor< range< C1, D1 >, range< C2, D2 >, ... >
.
- Equivalent to
sor< range< C1, D1 >, range< C2, D2 >, ..., one< E > >
.
- Matches and consumes any single true ASCII character that fits into 7 bits.
- Equivalent to
range< 0, 127 >
.
- Equivalent to
seq< string< '#', '!' >, until< eolf > >
.
- Matches and consumes a single space, line-feed, carriage-return, horizontal-tab, vertical-tab or form-feed.
- Equivalent to
one< ' ', '\n', '\r', 't', '\v', '\f' >
.
- Matches and consumes a string, a sequence of bytes or single-byte characters.
- Equivalent to
seq< one< C1 >, one< C2 >, ... >
.
- Macro where
TAOCPP_PEGTL_ISTRING( "foo" )
yields
istring< 'f', 'o', 'o' >
. - The argument must be a string literal.
- Works for strings up to 512 bytes of length (excluding trailing
'\0'
). - Strings may contain embedded
'\0'
.
- Macro where
TAOCPP_PEGTL_KEYWORD( "foo" )
yields
keyword< 'f', 'o', 'o' >
. - The argument must be a string literal.
- Works for keywords up to 512 bytes of length (excluding trailing
'\0'
). - Strings may contain embedded
'\0'
.
- Macro where
TAOCPP_PEGTL_STRING( "foo" )
yields
string< 'f', 'o', 'o' >
. - The argument must be a string literal.
- Works for strings up to 512 bytes of length (excluding trailing
'\0'
). - Strings may contain embedded
'\0'
.
- Succeeds when the input contains at least two bytes, and:
- These two input bytes are both
C
. - Consumes two bytes when it succeeds.
- Matches and consumes a single ASCII upper-case alphabetic character.
- Equivalent to
range< 'A', 'Z' >
.
- Matches and consumes a single ASCII hexadecimal digit character.
- Equivalent to
ranges< '0', '9', 'a', 'f', 'A', 'F' >
.
These rules are in namespace tao::pegtl::utf8
.
A unicode code point is considered valid when it is in the range 0
to 0x10ffff
.
- Succeeds when the input is not empty, and:
- The next 1-4 bytes are the UTF-8 encoding of a valid unicode code point.
- Consumes the 1-4 bytes when it succeeds.
- Succeeds when the input is not empty, and:
- The next 3 bytes are the UTF-8 encoding of character U+FEFF, byte order mark (BOM).
- Equivalent to
one< 0xfeff >
.
- Succeeds when the input is not empty, and:
- The next 1-4 bytes are the UTF-8 encoding of a valid unicode code point, and:
- The input code point is not one of the given code points
C, ...
. - Consumes the 1-4 bytes when it succeeds.
- Succeeds when the input is not empty, and:
- The next 1-4 bytes are the UTF-8 encoding of a valid unicode code point, and:
- The input code point
B
satisfiesB < C || D < B
. - Consumes the 1-4 bytes when it succeeds.
- Succeeds when the input is not empty, and:
- The next 1-4 bytes are the UTF-8 encoding of a valid unicode code point, and:
- The input code point is one of the given code points
C, ...
. - Consumes the 1-4 bytes when it succeeds.
- Succeeds when the input is not empty, and:
- The next 1-4 bytes are the UTF-8 encoding of a valid unicode code point, and:
- The input code point
B
satisfiesC <= B && B <= D
. - Consumes the 1-4 bytes when it succeeds.
- Equivalent to
sor< range< C1, D1 >, range< C2, D2 >, ... >
.
- Equivalent to
sor< range< C1, D1 >, range< C2, D2 >, ..., one< E > >
.
- Equivalent to
seq< one< C1 >, one< C2 >, ... >
.
These rules are in namespace tao::pegtl::utf16
.
The UTF-16 rules are surrogate-pair-aware and will consume 4 bytes for a single matched code point,
rather than 2, whenever a valid surrogate pair is detected. Following what appears to be "best" practice,
it is not an error when a code unit in the range 0xd800
to 0xdfff
is encountered outside of a valid surrogate pair.
UTF-16 support should be considered experimental and the following limitations apply to the UTF-16 rules:
- Native byte order is assumed for the input.
- Unaligned input leads to unaligned memory access.
- The line and column numbers are not counted correctly.
Unaligned memory is no problem on x86 compatible processors; on some other architectures like ARM an unaligned access will crash the application.
- Succeeds when the input contains at least 2 bytes, and:
- The next 2 (or 4) input bytes encode a valid unicode code point.
- Consumes these 2 (or 4) bytes when it succeeds.
- Succeeds when the input is not empty, and:
- The next 2 bytes are the UTF-16 encoding of character U+FEFF, byte order mark (BOM).
- Equivalent to
one< 0xfeff >
.
- Succeeds when the input contains at least 2 bytes, and:
- The next 2 (or 4) input bytes encode a valid unicode code point, and:
- The input code point is not one of the given code points
C, ...
. - Consumes these 2 (or 4) bytes when it succeeds.
- Succeeds when the input contains at least 2 bytes, and:
- The next 2 (or 4) input bytes encode a valid unicode code point, and:
- The input code point
B
satisfiesB < C || D < B
. - Consumes these 2 (or 4) bytes when it succeeds.
- Succeeds when the input contains at least 2 bytes, and:
- The next 2 (or 4) input bytes encode a valid unicode code point, and:
- The input code point is one of the given code points
C, ...
. - Consumes these 2 (or 4) bytes when it succeeds.
- Succeeds when the input contains at least 2 bytes, and:
- The next 2 (or 4) input bytes encode a valid unicode code point, and:
- The input code point
B
satisfiesC <= B && B <= D
. - Consumes these 2 (or 4) bytes when it succeeds.
- Equivalent to
sor< range< C1, D1 >, range< C2, D2 >, ... >
.
- Equivalent to
sor< range< C1, D1 >, range< C2, D2 >, ..., one< E > >
.
- Equivalent to
seq< one< C1 >, one< C2 >, ... >
.
These rules are in namespace tao::pegtl::utf32
.
UTF-32 support should be considered experimental and the following limitations apply to the UTF-32 rules:
- Native byte order is assumed for the input.
- Unaligned input leads to unaligned memory access.
- The line and column numbers are not counted correctly.
Unaligned memory is no problem on x86 compatible processors; on some other architectures like ARM an unaligned access will crash the application.
- Succeeds when the input contains at least 4 bytes, and:
- The next 4 input bytes encode a valid unicode code point.
- Consumes these 4 bytes when it succeeds.
- Succeeds when the input is not empty, and:
- The next 4 bytes are the UTF-32 encoding of character U+FEFF, byte order mark (BOM).
- Equivalent to
one< 0xfeff >
.
- Succeeds when the input contains at least 4 bytes, and:
- The next 4 input bytes encode a valid unicode code point, and:
- The input code point is not one of the given code points
C, ...
. - Consumes these 4 bytes when it succeeds.
- Succeeds when the input contains at least 4 bytes, and:
- The next 4 input bytes encode a valid unicode code point, and:
- The input code point
B
satisfiesB < C || D < B
. - Consumes these 4 bytes when it succeeds.
- Succeeds when the input contains at least 4 bytes, and:
- The next 4 input bytes encode a valid unicode code point, and:
- The input code point is one of the given code points
C, ...
. - Consumes these 4 bytes when it succeeds.
- Succeeds when the input contains at least 4 bytes, and:
- The next 4 input bytes encode a valid unicode code point, and:
- The input code point
B
satisfiesC <= B && B <= D
. - Consumes these 4 bytes when it succeeds.
- Equivalent to
sor< range< C1, D1 >, range< C2, D2 >, ... >
.
- Equivalent to
sor< range< C1, D1 >, range< C2, D2 >, ..., one< E > >
.
- Equivalent to
seq< one< C1 >, one< C2 >, ... >
.
action< A, R... >
(meta rules)alnum
(ascii rules)alpha
(ascii rules)any
(ascii rules)any
(utf-8 rules)any
(utf-16 rules)any
(utf-32 rules)apply< A... >
(action rules)apply0< A... >
(action rules)at< R... >
(combinators)blank
(ascii rules)bof
(atomic rules)bol
(atomic rules)bom
(utf-8 rules)bom
(utf-16 rules)bom
(utf-32 rules)bytes< Num >
(atomic rules)control< C, R... >
(meta rules)digit
(ascii rules)disable< R... >
(meta rules)discard
(meta rules)enable< R... >
(meta-rules)eof
(atomic rules)eol
(ascii rules)eolf
(ascii rules)failure
(atomic rules)identifier_first
(ascii rules)identifier_other
(ascii rules)identifier
(ascii rules)if_apply< R, A... >
(action rules)if_must< R, S... >
(convenience)if_must_else< R, S, T >
(convenience)if_then_else< R, S, T >
(convenience)istring< C, D, ... >
(ascii rules)keyword< C, ... >
(ascii rules)list< R, S >
(convenience)list< R, S, P >
(convenience)list_must< R, S >
(convenience)list_must< R, S, P >
(convenience)list_tail< R, S >
(convenience)list_tail< R, S, P >
(convenience)lower
(ascii rules)minus< M, S >
(convenience)must< R... >
(convenience)not_at< R... >
(combinators)not_one< C, ... >
(ascii rules)not_one< C, ... >
(utf-8 rules)not_one< C, ... >
(utf-16 rules)not_one< C, ... >
(utf-32 rules)not_range< C, D >
(ascii rules)not_range< C, D >
(utf-8 rules)not_range< C, D >
(utf-16 rules)not_range< C, D >
(utf-32 rules)nul
(ascii rules)one< C, ... >
(ascii rules)one< C, ... >
(utf-8 rules)one< C, ... >
(utf-16 rules)one< C, ... >
(utf-32 rules)opt< R... >
(combinators)pad< R, S, T = S >
(convenience)pad_opt< R, P >
(convenience)plus< R, ... >
(combinators)print
(ascii rules)raise< T >
(atomic rules)range< C, D >
(ascii rules)range< C, D >
(utf-8 rules)range< C, D >
(utf-16 rules)range< C, D >
(utf-32 rules)ranges< C1, D1, C2, D2, ... >
(ascii rules)ranges< C1, D1, C2, D2, ... >
(utf-8 rules)ranges< C1, D1, C2, D2, ... >
(utf-16 rules)ranges< C1, D1, C2, D2, ... >
(utf-32 rules)ranges< C1, D1, C2, D2, ..., E >
(ascii rules)ranges< C1, D1, C2, D2, ..., E >
(utf-8 rules)ranges< C1, D1, C2, D2, ..., E >
(utf-16 rules)ranges< C1, D1, C2, D2, ..., E >
(utf-32 rules)rep< Num, R... >
(convenience)rep_max< Max, R... >
(convenience)rep_min< Min, R, ... >
(convenience)rep_min_max< Min, Max, R... >
(convenience)rep_opt< Num, R... >
(convenience)require< Num >
(meta-rules)seq< R... >
(combinators)seven
(ascii rules)shebang
(ascii rules)sor< R... >
(combinators)space
(ascii rules)star< R, ... >
(combinators)star_must< R, S... >
(convenience)state< R, S... >
(meta rules)string< C1, C2, ... >
(ascii rules)string< C1, C2, ... >
(utf-8 rules)string< C1, C2, ... >
(utf-16 rules)string< C1, C2, ... >
(utf-32 rules)success
(atomic rules)TAOCPP_PEGTL_ISTRING( "..." )
(ascii rules)TAOCPP_PEGTL_KEYWORD( "..." )
(ascii rules)TAOCPP_PEGTL_STRING( "..." )
(ascii rules)try_catch< R... >
(convenience)try_catch_type< E, R... >
(convenience)two< C >
(ascii rules)until< R >
(convenience)until< R, S, ... >
(convenience)upper
(ascii rules)xdigit
(ascii rules)
Copyright (c) 2014-2017 Dr. Colin Hirsch and Daniel Frey