Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
[grammars] Add some more prose, change some wording
  • Loading branch information
perlpilot committed Mar 13, 2011
1 parent d9d8f42 commit dfed068
Showing 1 changed file with 39 additions and 13 deletions.
52 changes: 39 additions & 13 deletions src/grammars.pod
Expand Up @@ -71,10 +71,33 @@ introduced (see L<multis>).

A grammar contains various named regex. Regex names may be constructed
the same as subroutine names or method names. While regex names are
completely up to the grammar writer, a rule named C<TOP>
will, by default, be invoked when the C<.parse()> method is
executed on a grammar. The above call to C<JSON::Tiny::Grammar.parse($tester)>
starts by attempting to match the regex named C<TOP> to the string C<$tester>.
completely up to the grammar writer, a regex named C<TOP> will, by
default, be invoked when the C<.parse()> method is executed on a
grammarN<The name of the regex that is automatically invoked may also be
specified as a parameter to C<.parse()>. For instance,
C<< JSON::Tiny::Grammar.parse($tester, :rule<object>) >> will start
parsing at the regex named C<object> >. So, the call to
C<JSON::Tiny::Grammar.parse($tester)> attempts to match the regex named
C<TOP> to the string C<$tester>.

However, the example code does not seem to have a regex named C<TOP>; it
has a I<rule> named C<TOP>. What's the difference? A rule is a regex
with special behavior. Within a rule, sequences of whitespace are deemed
significant and will match actual whitespace within the string (this is
equivalent to using the C<:sigspace> modifier). Looking at the grammar,
you'll also note that there are also C<token> declarations as well. A
C<token> is also a regex with special behavior. A C<token> is the same
as a rule (has significant whitespace), but it also does not backtrack
so that when a partial pattern match fails, the regex engine will not go
back and try another alternative (this is equivalent to using the
C<:ratchet> modifier).

See L<src/regex> for more information on modifiers and how they may be
used or look at L<S05>.

Usually, when talking about regex that are rules or tokens, we tend to
call them rules or tokens rather than the more general term "regex", to
distinguish their special behaviors.

In this example, the C<TOP> rule anchors the match to the start and end
of the string, so that the whole string has to be in valid JSON format
Expand All @@ -95,7 +118,7 @@ and never proceed to other parts of the grammar.
X<goal matching>
X<regex meta character,~>

The example grammar given above introduces the I<goal matching syntax>
The example JSON grammar introduces the I<goal matching syntax>
which can be presented abstractly as: C<A ~ B C>. In
C<JSON::Tiny::Grammar>, C<A> is C<'{'>, C<B> is C<'}'> and C<C> is C<<
<pairlist> >>. The atom on the left of the tilde (C<A>) is matched
Expand Down Expand Up @@ -123,8 +146,8 @@ Another novelty is the declaration of a I<proto token>:
[ <[eE]> [\+|\-]? <[0..9]>+ ]?
}

token value:sym<true> { <sym> };
token value:sym<false> { <sym> };
token value:sym<true> { <sym> };
token value:sym<false> { <sym> };

=end programlisting

Expand All @@ -146,16 +169,18 @@ The similarity of grammars to classes goes deeper than storing regexes in a
namespace as a class might store methods. You can inherit from and extend
grammars, mix roles into them, and take advantage of polymorphism. In fact, a
grammar is a class which by default inherits from C<Grammar> instead of
C<Any>.
C<Any>. The C<Grammar> base grammar contains broadly useful rules
predefined. For instance, there is a rule to match alphabetic
characters (C<< <alpha> >>), and another to match digits (C<< <digit>
>>), and another to match whitespace (C<< <ws> >>), etc.

Suppose you want to enhance the JSON grammar to allow single-line C++ or
JavaScript comments, which begin with C<//> and continue until the end of the
line. The simplest enhancement is to allow such a comment in any place where
whitespace is valid.

However, C<JSON::Tiny::Grammar> only implicitly matches whitespace
through the use of I<rules>, which are like tokens but with the
C<:sigspace> modifier enabled. Implicit whitespace is matched with the
through the use of I<rules>. Implicit whitespace is matched with the
inherited regex C<< <ws> >>, so the simplest approach to enable single-
line comments is to override that named regex:

Expand All @@ -182,9 +207,10 @@ line comments is to override that named regex:
=end programlisting

The first two lines introduce a grammar that inherits from
C<JSON::Tiny::Grammar>. As subclasses inherit methods from superclasses,
so any grammar rule not present in the derived grammar will come from
its base grammar.
C<JSON::Tiny::Grammar>. Just as subclasses inherit methods from superclasses,
so grammars inherit rules from its base grammar. Any rule used within
the grammar will be looked for first in the grammar in which it was used,
then within its parent grammar.

In this minimal JSON grammar, whitespace is never mandatory, so C<ws>
can match nothing at all. After optional spaces, two slashes C<'//'>
Expand Down

0 comments on commit dfed068

Please sign in to comment.