lpeg.Cp() |
@@ -592,7 +653,7 @@
Usually,
LPeg evaluates all captures only after (and if) the entire match succeeds.
-At match time it only gathers enough information
+During the match time it only gathers enough information
to produce the capture values later.
As a particularly important consequence,
most captures cannot affect the way a pattern matches a subject.
@@ -644,7 +705,7 @@
lpeg.Cb (name)
-lpeg.Cc ({value})
+lpeg.Cc ([value, ...])
Creates a constant capture.
This pattern matches the empty string and
@@ -654,7 +715,7 @@
lpeg.Cc ({value})
lpeg.Cf (patt, func)
-Creates an fold capture.
+Creates a fold capture.
If patt
produces a list of captures
C1 C2 ... Cn,
this capture will produce the value
@@ -808,7 +869,7 @@
lpeg.Cmt(patt, function)
-The function gets as arguments the entire subject,
+The given function gets as arguments the entire subject,
the current position (after the match of patt
),
plus any capture values produced by patt
.
@@ -821,6 +882,9 @@ lpeg.Cmt(patt, function)
and the returned number becomes the new current position.
(Assuming a subject s and current position i,
the returned number must be in the range [i, len(s) + 1].)
+If the call returns true,
+the match succeeds without consuming any input.
+(So, to return true is equivalent to return i.)
If the call returns false, nil, or no value,
the match fails.
@@ -835,9 +899,66 @@ lpeg.Cmt(patt, function)
+Using a Pattern
+
+This example shows a very simple but complete program
+that builds and uses a pattern:
+
+
+local lpeg = require "lpeg"
+
+-- matches a word followed by end-of-string
+p = lpeg.R"az"^1 * -1
+
+print(p:match("hello")) --> 6
+print(lpeg.match(p, "hello")) --> 6
+print(p:match("1 hello")) --> nil
+
+
+The pattern is simply a sequence of one or more lower-case letters
+followed by the end of string (-1).
+The program calls match
both as a method
+and as a function.
+In both sucessful cases,
+the match returns
+the index of the first character after the match,
+which is the string length plus one.
+
+
+
+Name-value lists
+
+This example parses a list of name-value pairs and returns a table
+with those pairs:
+
+
+lpeg.locale(lpeg) -- adds locale entries into 'lpeg' table
+
+local space = lpeg.space^0
+local name = lpeg.C(lpeg.alpha^1) * space
+local sep = lpeg.S(",;") * space
+local pair = lpeg.Cg(name * "=" * space * name) * sep^-1
+local list = lpeg.Cf(lpeg.Ct("") * pair^0, rawset)
+t = list:match("a=b, c = hi; next = pi") --> { a = "b", c = "hi", next = "pi" }
+
+
+Each pair has the format name = name
followed by
+an optional separator (a comma or a semicolon).
+The pair
pattern encloses the pair in a group pattern,
+so that the names become the values of a single capture.
+The list
pattern then folds these captures.
+It starts with an empty table,
+created by a table capture matching an empty string;
+then for each capture (a pair of names) it applies rawset
+over the accumulator (the table) and the capture values (the pair of names).
+rawset
returns the table itself,
+so the accumulator is always the table.
+
+
Splitting a string
-The following code splits a string using a given pattern
+The following code builds a pattern that
+splits a string using a given pattern
sep
as a separator:
@@ -982,36 +1103,6 @@ Global substitution
-Name-value lists
-
-This example parses a list of name-value pairs and returns a table
-with those pairs:
-
-
-lpeg.locale(lpeg)
-
-local space = lpeg.space^0
-local name = lpeg.C(lpeg.alpha^1) * space
-local sep = lpeg.S(",;") * space
-local pair = lpeg.Cg(name * "=" * space * name) * sep^-1
-local list = lpeg.Cf(lpeg.Ct("") * pair^0, rawset)
-t = list:match("a=b, c = hi; next = pi") --> { a = "b", c = "hi", next = "pi" }
-
-
-Each pair has the format name = name
followed by
-an optional separator (a comma or a semicolon).
-The pair
pattern encloses the pair in a group pattern,
-so that the names become the values of a single capture.
-The list
pattern then folds these captures.
-It starts with an empty table,
-created by a table capture matching an empty string;
-then for each capture (a pair of names) it applies rawset
-over the accumulator (the table) and the capture values (the pair of names).
-rawset
returns the table itself,
-so the accumulator is always the table.
-
-
-
Comma-Separated Values (CSV)
This example breaks a string into comma-separated values,
@@ -1037,6 +1128,17 @@
Comma-Separated Values (CSV)
ending with a newline or the string end (-1).
+
+As it is,
+the previous pattern returns each field as a separated result.
+If we add a table capture in the definition of record
,
+the pattern will return instead a single table
+containing all fields:
+
+
+local record = lpeg.Ct(field * (',' * field)^0) * (lpeg.P'\n' + -1)
+
+
UTF-8 and Latin 1
@@ -1141,18 +1243,20 @@
Lua's long strings
-open = "[" * lpeg.Cg(lpeg.P"="^0, "init") * "[" * lpeg.P"\n"^-1
-close = "]" * lpeg.C(lpeg.P"="^0) * "]"
+equals = lpeg.P"="^0
+open = "[" * lpeg.Cg(equals, "init") * "[" * lpeg.P"\n"^-1
+close = "]" * lpeg.C(equals) * "]"
closeeq = lpeg.Cmt(close * lpeg.Cb("init"), function (s, i, a, b) return a == b end)
-string = open * m.C((lpeg.P(1) - closeeq)^0) * close /
- function (o, s) return s end
+string = open * lpeg.C((lpeg.P(1) - closeeq)^0) * close /
+ function (s, o) return s end
The open
pattern matches [=*[
,
capturing the repetitions of equal signs in a group named init
;
it also discharges an optional newline, if present.
-The close
pattern matches ]=*]
.
+The close
pattern matches ]=*]
,
+also capturing the repetitions of equal signs.
The closeeq
pattern first matches close
;
then it uses a back capture to recover the capture made
by the previous open
,
@@ -1162,10 +1266,8 @@
Lua's long strings
The string
pattern starts with an open
,
then it goes as far as possible until matching closeeq
,
and then matches the final close
.
-The final function capture simply consumes
-the captures made by open
and close
-and returns only the middle capture,
-which is the string contents.
+The final function capture simply discards
+the capture made by close
.
@@ -1267,7 +1369,7 @@ Arithmetic expressions
Download
LPeg
-source code.
+source code.
@@ -1311,7 +1413,7 @@
-$Id: lpeg.html,v 1.1 2009-02-21 19:49:19 nick Exp $
+$Id: lpeg.html,v 1.62 2010/11/05 12:52:33 roberto Exp $
diff --git a/docs/re.html b/docs/re.html
index 39b3c122..fefacd80 100644
--- a/docs/re.html
+++ b/docs/re.html
@@ -10,7 +10,7 @@
-
+
@@ -46,7 +46,7 @@
re
The re
Module
-The re
Module
+The re
module
(provided by file re.lua
in the distribution)
supports a somewhat conventional regex syntax
for pattern usage within LPeg.
@@ -57,7 +57,7 @@
The re
Module
A
p
represents an arbitrary pattern;
num
represents a number (
[0-9]+
);
name
represents an identifier
-(
[a-zA-Z][a-zA-Z0-9]*
).
+(
[a-zA-Z][a-zA-Z0-9_]*
).
Constructions are listed in order of decreasing precedence.
Syntax | Description |
@@ -68,6 +68,7 @@ The re
Module
. | any character |
%name |
pattern defs[name] or a pre-defined pattern |
+name | non terminal |
<name> | non terminal |
{} | position capture |
{ p } | simple capture |
@@ -153,6 +154,13 @@ re.find (subject, pattern [, init])
Otherwise, returns nil.
+
+An optional numeric argument init
makes the search
+starts at that position in the subject string.
+As usual in Lua libraries,
+a negative value counts from the end.
+
+
re.match (subject, pattern)
Matches the given pattern against the given subject.
@@ -166,15 +174,34 @@
re.updatelocale ()
+A complete simple program
+
+The next code shows a simple complete Lua program using
+the re
module:
+
+
+local re = require"re"
+
+-- find the position of the first number in a string
+print(re.find("the number 423 is odd", "[0-9]+")) --> 12
+
+-- similar, but also captures (and returns) the number
+print(re.find("the number 423 is odd", "{[0-9]+}")) --> 12 423
+
+-- returns all words in a string
+print(re.match("the number 423 is odd", "({%a+} / .)*"))
+--> the number is odd
+
+
+
Balanced parentheses
-As a simple example,
-the following call will produce the same pattern produced by the
+The following call will produce the same pattern produced by the
Lua expression in the
balanced parentheses example:
-b = re.compile[[ balanced <- "(" ([^()] / <balanced>)* ")" ]]
+b = re.compile[[ balanced <- "(" ([^()] / balanced)* ")" ]]
String reversal
@@ -182,7 +209,7 @@ String reversal
The next example reverses a string:
-rev = re.compile[[ R <- (!.) -> '' / ({.} <R>) -> '%2%1']]
+rev = re.compile[[ R <- (!.) -> '' / ({.} R) -> '%2%1']]
print(rev:match"0123456789") --> 9876543210
@@ -192,8 +219,8 @@ CSV decoder
record = re.compile[[
- record <- ( <field> (',' <field>)* ) -> {} (%nl / !.)
- field <- <escaped> / <nonescaped>
+ record <- ( field (',' field)* ) -> {} (%nl / !.)
+ field <- escaped / nonescaped
nonescaped <- { [^,"%nl]* }
escaped <- '"' {~ ([^"] / '""' -> '"')* ~} '"'
]]
@@ -201,13 +228,13 @@ CSV decoder
Lua's long strings
-The next example mathes Lua long strings:
+The next example matches Lua long strings:
c = re.compile([[
- longstring <- ('[' {:eq: '='* :} '[' <close>) => void
- close <- ']' =eq ']' / . <close>
-]], {void = function () return true end})
+ longstring <- ('[' {:eq: '='* :} '[' close) -> void
+ close <- ']' =eq ']' / . close
+]], {void = function () end})
print(c:match'[==[]]===]]]]==]===[]') --> 17
@@ -219,8 +246,8 @@ Indented blocks
p = re.compile[[
- block <- ({:ident:' '*:} <line>
- ((=ident !' ' <line>) / &(=ident ' ') <block>)*) -> {}
+ block <- ({:ident:' '*:} line
+ ((=ident !' ' line) / &(=ident ' ') block)*) -> {}
line <- {[^%nl]*} %nl
]]
@@ -259,14 +286,14 @@ Macro expander
p = re.compile[[
- text <- {~ <item>* ~}
- item <- <macro> / [^()] / '(' <item>* ')'
- arg <- ' '* {~ (!',' <item>)* ~}
- args <- '(' <arg> (',' <arg>)* ')'
+ text <- {~ item* ~}
+ item <- macro / [^()] / '(' item* ')'
+ arg <- ' '* {~ (!',' item)* ~}
+ args <- '(' arg (',' arg)* ')'
-- now we define some macros
- macro <- ('apply' <args>) -> '%1(%2)'
- / ('add' <args>) -> '%1 + %2'
- / ('mul' <args>) -> '%1 * %2'
+ macro <- ('apply' args) -> '%1(%2)'
+ / ('add' args) -> '%1 + %2'
+ / ('mul' args) -> '%1 * %2'
]]
print(p:match"add(mul(a,b), apply(f,x))") --> a * b + f(x)
@@ -290,13 +317,60 @@ Macro expander
with each %
n replaced by the n-th argument.
+Patterns
+
+This example shows the complete syntax
+of patterns accepted by re
.
+
+
+p = [=[
+
+pattern <- exp !.
+exp <- S (alternative / grammar)
+
+alternative <- seq ('/' S seq)*
+seq <- prefix*
+prefix <- '&' S prefix / '!' S prefix / suffix
+suffix <- primary S (([+*?]
+ / '^' [+-]? num
+ / '->' S (string / '{}' / name)
+ / '=>' S name) S)*
+
+primary <- '(' exp ')' / string / class / defined
+ / '{:' (name ':')? exp ':}'
+ / '=' name
+ / '{}'
+ / '{~' exp '~}'
+ / '{' exp '}'
+ / '.'
+ / name S !arrow
+ / '<' name '>' -- old-style non terminals
+
+grammar <- definition+
+definition <- name S arrow exp
+
+class <- '[' '^'? item (!']' item)* ']'
+item <- defined / range / .
+range <- . '-' [^]]
+
+S <- (%s / '--' [^%nl]*)* -- spaces and comments
+name <- [A-Za-z][A-Za-z0-9_]*
+arrow <- '<-'
+num <- [0-9]+
+string <- '"' [^"]* '"' / "'" [^']* "'"
+defined <- '%' name
+
+]=]
+
+print(re.match(p, p)) -- a self description must match itself
+
-Copyright © 2008 Lua.org, PUC-Rio.
+Copyright © 2008-2010 Lua.org, PUC-Rio.
Permission is hereby granted, free of charge,
@@ -334,7 +408,7 @@
-$Id: re.html,v 1.1 2009-02-21 19:49:19 nick Exp $
+$Id: re.html,v 1.15 2010/11/05 12:53:43 roberto Exp $