diff --git a/docs/lpeg-128.gif b/docs/lpeg-128.gif new file mode 100644 index 00000000..bbf5e78b Binary files /dev/null and b/docs/lpeg-128.gif differ diff --git a/docs/lpeg.html b/docs/lpeg.html index 47612ec4..9e7ff395 100644 --- a/docs/lpeg.html +++ b/docs/lpeg.html @@ -10,7 +10,7 @@ - +
@@ -22,7 +22,7 @@
LPeg
- Parsing Expression Grammars For Lua, version 0.9 + Parsing Expression Grammars For Lua, version 0.10
@@ -87,7 +87,7 @@

Introduction

On the other hand, first-class patterns allow much better documentation (as it is easy to comment the code, -to use auxiliary variables to break complex definitions, etc.) +to break complex definitions in smaller parts, etc.) and are extensible, as we can define new functions to create and compose patterns.

@@ -101,8 +101,8 @@

Introduction

OperatorDescription lpeg.P(string) Matches string literally -lpeg.P(number) - Matches exactly number characters +lpeg.P(n) + Matches exactly n characters lpeg.S(string) Matches any character in string (Set) lpeg.R("xy") @@ -122,6 +122,9 @@

Introduction

Equivalent to ("" - patt) #patt Matches patt but consumes no input +lpeg.B(patt, n) + Matches patt n characters behind + the current position, consuming no input

As a very simple example, @@ -139,8 +142,9 @@

Introduction

LPeg also offers the re module, which implements patterns following a regular-expression style (e.g., [09]+). -(This module is 200 lines of Lua code, -and of course uses LPeg to parse regular expressions.) +(This module is 250 lines of Lua code, +and of course uses LPeg to parse regular expressions and +translate them to regular LPeg patterns.)

@@ -153,7 +157,7 @@

lpeg.match (pattern, subject [, init])captured values +or the captured values (if the pattern captured any value).

@@ -189,6 +193,17 @@

lpeg.version ()

Returns a string with the running version of LPeg.

+

lpeg.setmaxstack (max)

+

+Sets the maximum size for the backtrack stack used by LPeg to +track calls and choices. +Most well-written patterns need little backtrack levels and +therefore you seldom need to change this maximum; +but a few useful patterns may need more space. +Before changing this maximum you should try to rewrite your +pattern to avoid the need for extra space. +

+

Basic Constructions

@@ -228,8 +243,9 @@

lpeg.P (value)

If the argument is a negative number -n, the result is a pattern that succeeds only if the input string does not have n characters: -It is equivalent to the unary minus operation -applied over the pattern corresponding to the (non-negative) value n. +lpeg.P(-n) +is equivalent to -lpeg.P(n) +(see the unary minus operation).

  • @@ -254,6 +270,33 @@

    lpeg.P (value)

    +

    lpeg.B(patt [, n])

    +

    +Returns a pattern that +matches only if the input string matches patt +starting n positions behind the current position. +(The default value for n is 1.) +If the current position is less than or equal to n, +this pattern fails. +

    + +

    +Like the and predicate, +this pattern never consumes any input, +independently of success or failure, +and it never produces any capture. +

    + +

    +The pattern patt cannot contain any open reference +to grammar rules (see grammars). +

    + +

    +(This is an experimental feature. +There is a good chance it will change in future versions.) +

    +

    lpeg.R ({range})

    @@ -337,13 +380,14 @@

    #patt

    matches only if the input string matches patt, but without consuming any input, independently of success or failure. -(This pattern is equivalent to +(This pattern is called an and predicate +and it is equivalent to &patt in the original PEG notation.)

    +

    -When it succeeds, -#patt produces all captures produced by patt. +This pattern never produces any capture.

    . @@ -402,6 +446,14 @@

    patt1 - patt2

    patt2 and then matches patt1.

    +

    +When succeeded, +this pattern produces all captures from patt1. +It never produces any capture from patt2 +(as either patt2 fails or +patt1 - patt2 fails). +

    +

    If both patt1 and patt2 are character sets, @@ -443,7 +495,7 @@

    patt^n

    Otherwise, when n is negative, this pattern is equivalent to (patt?)-n. -That is, it matches at most -n +That is, it matches at most |n| occurrences of patt.

    @@ -520,6 +572,14 @@

    Grammars

    B = "b" * lpeg.V"S" + "a" * lpeg.V"B" * lpeg.V"B", } * -1 +

    +It is equivalent to the following grammar in standard PEG notation: +

    +
    +  S <- 'a' B / 'b' A / ''
    +  A <- 'a' S / 'b' A A
    +  B <- 'b' S / 'a' B B
    +

    Captures

    @@ -539,7 +599,8 @@

    Captures

    - + @@ -551,7 +612,7 @@

    Captures

    - + @@ -592,7 +653,7 @@

    Captures

    Usually, LPeg evaluates all captures only after (and if) the entire match succeeds. -At match time it only gathers enough information +During the match time it only gathers enough information to produce the capture values later. As a particularly important consequence, most captures cannot affect the way a pattern matches a subject. @@ -644,7 +705,7 @@

    lpeg.Cb (name)

    -

    lpeg.Cc ({value})

    +

    lpeg.Cc ([value, ...])

    Creates a constant capture. This pattern matches the empty string and @@ -654,7 +715,7 @@

    lpeg.Cc ({value})

    lpeg.Cf (patt, func)

    -Creates an fold capture. +Creates a fold capture. If patt produces a list of captures C1 C2 ... Cn, this capture will produce the value @@ -808,7 +869,7 @@

    lpeg.Cmt(patt, function)

    -The function gets as arguments the entire subject, +The given function gets as arguments the entire subject, the current position (after the match of patt), plus any capture values produced by patt.

    @@ -821,6 +882,9 @@

    lpeg.Cmt(patt, function)

    and the returned number becomes the new current position. (Assuming a subject s and current position i, the returned number must be in the range [i, len(s) + 1].) +If the call returns true, +the match succeeds without consuming any input. +(So, to return true is equivalent to return i.) If the call returns false, nil, or no value, the match fails.

    @@ -835,9 +899,66 @@

    lpeg.Cmt(patt, function)

    Some Examples

    +

    Using a Pattern

    +

    +This example shows a very simple but complete program +that builds and uses a pattern: +

    +
    +local lpeg = require "lpeg"
    +
    +-- matches a word followed by end-of-string
    +p = lpeg.R"az"^1 * -1
    +
    +print(p:match("hello"))        --> 6
    +print(lpeg.match(p, "hello"))  --> 6
    +print(p:match("1 hello"))      --> nil
    +
    +

    +The pattern is simply a sequence of one or more lower-case letters +followed by the end of string (-1). +The program calls match both as a method +and as a function. +In both sucessful cases, +the match returns +the index of the first character after the match, +which is the string length plus one. +

    + + +

    Name-value lists

    +

    +This example parses a list of name-value pairs and returns a table +with those pairs: +

    +
    +lpeg.locale(lpeg)   -- adds locale entries into 'lpeg' table
    +
    +local space = lpeg.space^0
    +local name = lpeg.C(lpeg.alpha^1) * space
    +local sep = lpeg.S(",;") * space
    +local pair = lpeg.Cg(name * "=" * space * name) * sep^-1
    +local list = lpeg.Cf(lpeg.Ct("") * pair^0, rawset)
    +t = list:match("a=b, c = hi; next = pi")  --> { a = "b", c = "hi", next = "pi" }
    +
    +

    +Each pair has the format name = name followed by +an optional separator (a comma or a semicolon). +The pair pattern encloses the pair in a group pattern, +so that the names become the values of a single capture. +The list pattern then folds these captures. +It starts with an empty table, +created by a table capture matching an empty string; +then for each capture (a pair of names) it applies rawset +over the accumulator (the table) and the capture values (the pair of names). +rawset returns the table itself, +so the accumulator is always the table. +

    +

    Splitting a string

    -The following code splits a string using a given pattern +The following code builds a pattern that +splits a string using a given pattern sep as a separator:

    @@ -982,36 +1103,6 @@ 

    Global substitution

    -

    Name-value lists

    -

    -This example parses a list of name-value pairs and returns a table -with those pairs: -

    -
    -lpeg.locale(lpeg)
    -
    -local space = lpeg.space^0
    -local name = lpeg.C(lpeg.alpha^1) * space
    -local sep = lpeg.S(",;") * space
    -local pair = lpeg.Cg(name * "=" * space * name) * sep^-1
    -local list = lpeg.Cf(lpeg.Ct("") * pair^0, rawset)
    -t = list:match("a=b, c = hi; next = pi")  --> { a = "b", c = "hi", next = "pi" }
    -
    -

    -Each pair has the format name = name followed by -an optional separator (a comma or a semicolon). -The pair pattern encloses the pair in a group pattern, -so that the names become the values of a single capture. -The list pattern then folds these captures. -It starts with an empty table, -created by a table capture matching an empty string; -then for each capture (a pair of names) it applies rawset -over the accumulator (the table) and the capture values (the pair of names). -rawset returns the table itself, -so the accumulator is always the table. -

    - -

    Comma-Separated Values (CSV)

    This example breaks a string into comma-separated values, @@ -1037,6 +1128,17 @@

    Comma-Separated Values (CSV)

    ending with a newline or the string end (-1).

    +

    +As it is, +the previous pattern returns each field as a separated result. +If we add a table capture in the definition of record, +the pattern will return instead a single table +containing all fields: +

    +
    +local record = lpeg.Ct(field * (',' * field)^0) * (lpeg.P'\n' + -1)
    +
    +

    UTF-8 and Latin 1

    @@ -1141,18 +1243,20 @@

    Lua's long strings

    -open = "[" * lpeg.Cg(lpeg.P"="^0, "init") * "[" * lpeg.P"\n"^-1
    -close = "]" * lpeg.C(lpeg.P"="^0) * "]"
    +equals = lpeg.P"="^0
    +open = "[" * lpeg.Cg(equals, "init") * "[" * lpeg.P"\n"^-1
    +close = "]" * lpeg.C(equals) * "]"
     closeeq = lpeg.Cmt(close * lpeg.Cb("init"), function (s, i, a, b) return a == b end)
    -string = open * m.C((lpeg.P(1) - closeeq)^0) * close /
    -  function (o, s) return s end
    +string = open * lpeg.C((lpeg.P(1) - closeeq)^0) * close /
    +  function (s, o) return s end
     

    The open pattern matches [=*[, capturing the repetitions of equal signs in a group named init; it also discharges an optional newline, if present. -The close pattern matches ]=*]. +The close pattern matches ]=*], +also capturing the repetitions of equal signs. The closeeq pattern first matches close; then it uses a back capture to recover the capture made by the previous open, @@ -1162,10 +1266,8 @@

    Lua's long strings

    The string pattern starts with an open, then it goes as far as possible until matching closeeq, and then matches the final close. -The final function capture simply consumes -the captures made by open and close -and returns only the middle capture, -which is the string contents. +The final function capture simply discards +the capture made by close.

    @@ -1267,7 +1369,7 @@

    Arithmetic expressions

    Download

    LPeg -source code.

    +source code.

    License

    @@ -1311,7 +1413,7 @@

    License

    -$Id: lpeg.html,v 1.1 2009-02-21 19:49:19 nick Exp $ +$Id: lpeg.html,v 1.62 2010/11/05 12:52:33 roberto Exp $

    diff --git a/docs/re.html b/docs/re.html index 39b3c122..fefacd80 100644 --- a/docs/re.html +++ b/docs/re.html @@ -10,7 +10,7 @@ - +
    @@ -46,7 +46,7 @@

    re

    The re Module

    -The re Module +The re module (provided by file re.lua in the distribution) supports a somewhat conventional regex syntax for pattern usage within LPeg. @@ -57,7 +57,7 @@

    The re Module

    A p represents an arbitrary pattern; num represents a number ([0-9]+); name represents an identifier -([a-zA-Z][a-zA-Z0-9]*). +([a-zA-Z][a-zA-Z0-9_]*). Constructions are listed in order of decreasing precedence.
    OperationWhat it Produces
    lpeg.C(patt)the match for patt
    the match for patt plus all captures + made by patt
    lpeg.Carg(n) the value of the nth extra argument to lpeg.match (matches the empty string)
    the given values (matches the empty string)
    lpeg.Cf(patt, func) a folding of the captures from patt
    lpeg.Cg(patt, [name])
    lpeg.Cg(patt [, name]) the values produced by patt, optionally tagged with name
    lpeg.Cp()
    @@ -68,6 +68,7 @@

    The re Module

    + @@ -153,6 +154,13 @@

    re.find (subject, pattern [, init])

    Otherwise, returns nil.

    +

    +An optional numeric argument init makes the search +starts at that position in the subject string. +As usual in Lua libraries, +a negative value counts from the end. +

    +

    re.match (subject, pattern)

    Matches the given pattern against the given subject. @@ -166,15 +174,34 @@

    re.updatelocale ()

    Some Examples

    +

    A complete simple program

    +

    +The next code shows a simple complete Lua program using +the re module: +

    +
    +local re = require"re"
    +
    +-- find the position of the first number in a string
    +print(re.find("the number 423 is odd", "[0-9]+"))  --> 12
    +
    +-- similar, but also captures (and returns) the number
    +print(re.find("the number 423 is odd", "{[0-9]+}"))  --> 12    423
    +
    +-- returns all words in a string
    +print(re.match("the number 423 is odd", "({%a+} / .)*"))
    +--> the    number    is    odd
    +
    + +

    Balanced parentheses

    -As a simple example, -the following call will produce the same pattern produced by the +The following call will produce the same pattern produced by the Lua expression in the balanced parentheses example:

    -b = re.compile[[  balanced <- "(" ([^()] / <balanced>)* ")"  ]]
    +b = re.compile[[  balanced <- "(" ([^()] / balanced)* ")"  ]]
     

    String reversal

    @@ -182,7 +209,7 @@

    String reversal

    The next example reverses a string:

    -rev = re.compile[[ R <- (!.) -> '' / ({.} <R>) -> '%2%1']]
    +rev = re.compile[[ R <- (!.) -> '' / ({.} R) -> '%2%1']]
     print(rev:match"0123456789")   --> 9876543210
     
    @@ -192,8 +219,8 @@

    CSV decoder

     record = re.compile[[
    -  record <- ( <field> (',' <field>)* ) -> {} (%nl / !.)
    -  field <- <escaped> / <nonescaped>
    +  record <- ( field (',' field)* ) -> {} (%nl / !.)
    +  field <- escaped / nonescaped
       nonescaped <- { [^,"%nl]* }
       escaped <- '"' {~ ([^"] / '""' -> '"')* ~} '"'
     ]]
    @@ -201,13 +228,13 @@ 

    CSV decoder

    Lua's long strings

    -The next example mathes Lua long strings: +The next example matches Lua long strings:

     c = re.compile([[
    -  longstring <- ('[' {:eq: '='* :} '[' <close>) => void
    -  close <- ']' =eq ']' / . <close>
    -]], {void = function () return true end})
    +  longstring <- ('[' {:eq: '='* :} '[' close) -> void
    +  close <- ']' =eq ']' / . close
    +]], {void = function () end})
     
     print(c:match'[==[]]===]]]]==]===[]')   --> 17
     
    @@ -219,8 +246,8 @@

    Indented blocks

     p = re.compile[[
    -  block <- ({:ident:' '*:} <line>
    -           ((=ident !' ' <line>) / &(=ident ' ') <block>)*) -> {}
    +  block <- ({:ident:' '*:} line
    +           ((=ident !' ' line) / &(=ident ' ') block)*) -> {}
       line <- {[^%nl]*} %nl
     ]]
     
    @@ -259,14 +286,14 @@

    Macro expander

     p = re.compile[[
    -      text <- {~ <item>* ~}
    -      item <- <macro> / [^()] / '(' <item>* ')'
    -      arg <- ' '* {~ (!',' <item>)* ~}
    -      args <- '(' <arg> (',' <arg>)* ')'
    +      text <- {~ item* ~}
    +      item <- macro / [^()] / '(' item* ')'
    +      arg <- ' '* {~ (!',' item)* ~}
    +      args <- '(' arg (',' arg)* ')'
           -- now we define some macros
    -      macro <- ('apply' <args>) -> '%1(%2)'
    -             / ('add' <args>) -> '%1 + %2'
    -             / ('mul' <args>) -> '%1 * %2'
    +      macro <- ('apply' args) -> '%1(%2)'
    +             / ('add' args) -> '%1 + %2'
    +             / ('mul' args) -> '%1 * %2'
     ]]
     
     print(p:match"add(mul(a,b), apply(f,x))")   --> a * b + f(x)
    @@ -290,13 +317,60 @@ 

    Macro expander

    with each %n replaced by the n-th argument.

    +

    Patterns

    +

    +This example shows the complete syntax +of patterns accepted by re. +

    +
    +p = [=[
    +
    +pattern         <- exp !.
    +exp             <- S (alternative / grammar)
    +
    +alternative     <- seq ('/' S seq)*
    +seq             <- prefix*
    +prefix          <- '&' S prefix / '!' S prefix / suffix
    +suffix          <- primary S (([+*?]
    +                            / '^' [+-]? num
    +                            / '->' S (string / '{}' / name)
    +                            / '=>' S name) S)*
    +
    +primary         <- '(' exp ')' / string / class / defined
    +                 / '{:' (name ':')? exp ':}'
    +                 / '=' name
    +                 / '{}'
    +                 / '{~' exp '~}'
    +                 / '{' exp '}'
    +                 / '.'
    +                 / name S !arrow
    +                 / '<' name '>'          -- old-style non terminals
    +
    +grammar         <- definition+
    +definition      <- name S arrow exp
    +
    +class           <- '[' '^'? item (!']' item)* ']'
    +item            <- defined / range / .
    +range           <- . '-' [^]]
    +
    +S               <- (%s / '--' [^%nl]*)*   -- spaces and comments
    +name            <- [A-Za-z][A-Za-z0-9_]*
    +arrow           <- '<-'
    +num             <- [0-9]+
    +string          <- '"' [^"]* '"' / "'" [^']* "'"
    +defined         <- '%' name
    +
    +]=]
    +
    +print(re.match(p, p))   -- a self description must match itself
    +

    License

    -Copyright © 2008 Lua.org, PUC-Rio. +Copyright © 2008-2010 Lua.org, PUC-Rio.

    Permission is hereby granted, free of charge, @@ -334,7 +408,7 @@

    License

    -$Id: re.html,v 1.1 2009-02-21 19:49:19 nick Exp $ +$Id: re.html,v 1.15 2010/11/05 12:53:43 roberto Exp $

    SyntaxDescription
    . any character
    %name pattern defs[name] or a pre-defined pattern
    namenon terminal
    <name>non terminal
    {} position capture
    { p } simple capture