Skip to content
Find file
Fetching contributors…
Cannot retrieve contributors at this time
469 lines (397 sloc) 16.3 KB
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>
<head>
<title>The Meta Library</title>
</head>
<body>
<h1>The Meta Library</h1>
This is a reorganized version of the original text-only
documentation. It became more and more unreadable, so here is a
more structured HTML-Version.<p>
<ol>
<li><a href="#introduction">Introduction</a></li>
<li><a href="#syntax-exports">Exported Syntactic Constructors</a></li>
<ul><li><a href="#helper-exports">Exported Helper Constructs</a></li>
<li><a href="#type-exports">Exported Character Types</a></li></ul>
<li><a href="#syntax">Meta Syntax</a></li>
<li><a href="#expressions">Meta expressions</a></li>
<li><a href="#example">Example code</a></li>
<li><a href="#references">References</a></li>
</ol>
<a name="introduction"><h2>1. Introduction</h2></a>
<p>
This is an implementation of Meta, a technique used to simplify the
task of writing parsers. <a href="#baker91">[Baker91]</a> describes
Meta and shows the main ideas for an implementation in Common Lisp.
<blockquote>
If all META did was recognize regular expressions, it would not be
very useful. It is a programming language, however, and the
operations [], {} and $ correspond to the Common Lisp control
structures AND, OR, and DO.[8] Therefore, we can utilize META to
not only parse, but also to transform. In this way, META is
analogous to "attributed grammars" [Aho86], but it is an order of
magnitude simpler and more efficient. Thus, with the addition of
the "escape" operation "!", which allows us to incorporate
arbitrary Lisp expressions into META, we can not only parse
integers, but produce their integral value as a result.
-- <a href="#baker91">[Baker91]</a>
</blockquote>
The macro defined here is an attempt to implement Meta (with
slightly adapted syntax) for Dylan. It is functional, but not yet
optimized.<p>
<a name="syntax-exports"><h2>2. Exported facilities</h2></a><p>
The <tt>meta</tt>-library exports the <tt>meta</tt> module with the
following macros:
<dl>
<dt><tt>meta-definer</tt> (or, more clearly, <tt>define meta </tt>
<em>&lt;foo></em>)</dt>
<dd><p>Implements Meta and provides some additional functionality to
define variables. See <a href="#syntax">below</a>.</p></dd>
<dt><tt>with-meta-syntax</tt>
<dd><p>The guts of the <tt>meta-definer</tt> form; use when requiring
precise control of variables or constructs</p></dd>
<dt><tt>collect-definer</tt></dt>
<dd><p>General facility to collect data into sequences (by default,
into strings). Initially <tt>with-meta-syntax</tt> had this
functionality integrated--a mess. Now this is a more
modular approach.</p></dd>
<dt><a href="With-collector.html"><tt>with-collector</tt></a>
<dd><p>The guts of the <tt>collect-definer</tt> form.</p></dd>
</dl>
<p>Aside from the syntactic constructors exported above, the Meta
library also provides some
<a name="helper-exports">commonly-used forms</a> for scanning and
parsing:</p>
<dl>
<dt><b>scan-s</b>()</dt>
<dd><p>Scans in at lease one space</p></dd>
<dt><b>scan-word</b>(<em>word</em>)</dt>
<dd><p>Scans in a token and returns that
token as a <tt>&lt;string></tt> instance. A "word" is
surrounded by spaces or any of the following characters:
'&lt;', '>', '{', '}', '[', ']', punctuation (',', '?', or
'!'), or the single- or double- quotation-mark.</p></dd>
<dt><b>scan-int</b>(<em>int</em>)</dt>
<dd><p>Reads in digit characters and returns an <tt>&lt;integer></tt>
instance.</p></dd>
<dt><b>scan-number</b>(<em>real</em>)</dt>
<dd><p>Although this is not an all-encompassing conversion
utility (although, IMHO, it's good enough to be part of the
standard, once there is one, YMMV), it reads in just about
any fixed-point number format and returns a
<tt>&lt;real></tt> instance.</p></dd>
<!-- dt><p><strong>N.B.:</strong> the above meta calls are called as
such <em>within the meta syntax</em>! One must prepend "scan-"
to the meta names and use the formal arguments when calling
outside the scope of the meta syntax.</p></dt -->
<dt><b>string-to-number</b>(<em>str</em>, <tt>#key</tt>
<em>base</em>) => (<em>ans</em>)
<p><b>Arguments:</b></p></dt>
<dd><p><em>str</em>, a <tt>&lt;string></tt>, the string to convert to
a number</p>
<p><em>base</em>, an <tt>&lt;integer></tt>, defaults to <tt>
<font color="green">10</font></tt>, the base of the number
in the string.</p></dd>
<dt><p><b>Values:</b></p></dt>
<dd><p><em>ans</em>, a <tt>&lt;real></tt>, the resulting
number.</p></dd>
<dt><p><d>Discussion:</b></p></dt>
<dd><p>This really should belong to the common-dylan spec, so that
instead of rolling their own, everyone should use this
function. ... It is therefore exported to this end.</p></dd>
</dl>
<p>Scanning tokens usually entails using some
<a name="type-exports">common character types</a>. Meta exports
the following:</p>
<dl>
<dd><b>$space</b> -- any whitespace</dd>
<dd><b>$digit</b> <tt>[0-9]</tt></dd>
<dd><b>$letter</b> <tt>[a-zA-Z]</tt></dd>
<dd><b>$num-char</b> <tt>$digit ++ [.eE+]</tt></dd>
<dd><b>$graphic-char</b> <tt>[_@#$%^&*()+=~/]</tt></dd>
<dd><b>$any-char</b> <tt>$letter ++ $num-char ++ $graphic-char</tt></dd>
</dl>
<a name="syntax"><h2>3. Meta Syntax</h2></a>
<p>
Meta integrates the ability to parse from streams and strings in one
facility. (The parsing of lists is not implemented yet, because
it's rather useless in Dylan. This addition would be simple to do,
though.)
<dl>
<dd><pre><b>define meta</b> <em>name</em> (<em>variables</em>) => (<em>results</em>)
<em>meta body</em>
<b>end</b></pre></dd>
<dt><b>Arguments:</b></dt>
<dd><p><em>name</em> -- the meta-function name, which is
immediately transformed into <tt>scan-</tt><em>name</em></p>
<p><em>variables</em> -- token-holders used in <em>meta
body</em></p>
<p><em>results</em> -- an expression returned on a successful
scan</p>
<p><em>meta body</em> -- a sequence of
<a href="#and-expr">anded Meta expressions</a> to scan</p></dd>
<dt><b>Discussion:</b></dt>
<dd><p>The <tt>meta-definer</tt> form works only with the
<a href="#parse-string"><code>parse-string</code>
<em>source-type</em></a> of the
<a href="#with-meta-syntax-definition">with-meta-syntax</a>
form.</p>
<p>The user of this form has control over the return value.
Usually <code><font color="green">#t</font></code> is
sufficient (in which case the results clause may be omitted,
see below); however, e.g., the values of the
<em>variables</em> may need to be manipulated during the
parse phase.</p></dd>
<dt><b>Example:</b></dt>
<dd><pre>define meta public-id(s, pub) => (pub)
"PUBLIC", scan-s(s), scan-pubid-literal(pub)
end meta public-id;</pre>
<p>This definition returns <tt>pub</tt> when it successfully
scans the tokens "PUBLIC", (some) spaces, and a literal
which <tt>pub</tt> receives. Note that, hereafter, the meta
definition is referred to as <tt>scan-public-id</tt> outside
the meta syntax block <!-- ... inside the block it is called
<tt>public-id</tt -->.</p></dd>
<hr>
<dd><pre><b>define meta</b> <em>name</em> (<em>variables</em>)
<em>meta body</em>
<b>end</b></pre>
<p>Same as the above form except that <em>results</em> is
<code><font color="green">#t</font></code></p></dd>
<dt><b>Example</b> (from the meta library itself):</dt>
<dd><pre>define meta s(c)
element-of($space, c), loop(element-of($space, c))
end meta s;</pre>
<p>Scans in at least one space (<code>element-of</code> and
<code>loop</code> are discussed in the section on
<a href="#expressions">Meta expressions</a>).</p></dd>
<hr>
<dd><pre>
<b><a name="with-meta-syntax-definition">with-meta-syntax</a></b>
<i>source-type</i> (<i>source</i> <tt>#key</tt> <i>keys</i>)
<i>[ variables ]</i>
<i>meta;</i>
<i>body</i>
<b>end</b>
</pre>
<dt><b>Arguments:</b>
<dd>
<i>source-type</i>---either <code>
<a name="parse-stream">parse-stream</a></code>
or <code><a name="parse-string">parse-string</a></code><p>
<i>source</i>---either a stream or a string, depending on
<i>source-type</i><p>
<i>keys</i>---<i>source-type</i> specific.<p>
<i>meta</i>---a <a href="#expressions">Meta expression</a>.<p>
<i>body</i>---a body. Evaluated only if parsing is successful.<p>
<dt><b>Values:</b>
<dd>
If parsing fails <code>#f</code>, otherwise the values of
<i>body</i>.<p>
<dt><b>Keyword arguments:</b>
<dd>
<tt>parse-stream</tt> does not accept keyword arguments
currently.
<p><tt>parse-string</tt> recognizes the following keywords: <p>
<i>start</i>---Index to start at<p>
<i>end</i>---Index to finish before<p>
<i>pos</i>---A name that will be bound to the current index
during execution of the <tt>with-meta-syntax</tt> forms.<p>
<dt><b>Special programming aids:</b>
<dd>
<code><b>variables</b> (<i>variable [ :: type ] [ = init ], ...</i>);</code>
<p>Bind variables to <i>init</i>, which defaults to #f;<p>
Future versions will have further special forms.<p>
<dt><b>Example fragments:</b>
<dd><pre>
<b>with-meta-syntax parse-stream</b> (*standard-input*)
<i>body</i>
<b>end with-meta-syntax</b>;
<b>let</b> query :: &lt;string&gt; = ask-user();
<b>with-meta-syntax parse-string</b> (query, start: 23, end: 42)
<i>body</i>
<b>end with-meta-syntax</b>;
<b>with-meta-syntax parse-string</b> (query)
... ['\n', finish()] ...
values(these, values, will, be, returned);
<b>end with-meta-syntax</b>;
</pre>
</dl>
<a name="expressions">
<h2>4. Meta expressions</h2></a>
<p>
Meta is a small, but featureful language, so naturally it has its
own syntax. This syntax is adapted to Dylan's way of writing
things, of course.
<p>There are several basic Meta expressions implementing the core
functionality. Additionally there are some <i>pseudo-functions</i>,
syntactically function-like constructs which simplify certain tasks
that would otherwise have to be written manually.
<h3>Basic Meta expressions as described by Baker</h3>
<table>
<tr>
<td><b>Baker</b></td>
<td><b><tt>with-meta-syntax</tt></b></td>
<td><b>Description</b><td>
</tr>
<tr>
<td><code><i>fragment</i></code></td>
<td><code><i>fragment</i></code></td>
<td>try to match this</td>
</tr> <tr>
<td><a name="and-expr"><code>[a b c ... n]</code></a></td>
<td><code>[a, b, c, ..., n]</code></td>
<td>and/try all</td>
</tr> <tr>
<td><code>{a b c ... n}</code></td>
<td><code>{a, b, c, ..., n}</code></td>
<td>or/first hit</td>
</tr>
<tr>
<td><code>@(<i>type</i> <i>variable</i>)</code></td>
<td><code>type(<i>type</i>, <i>variable</i>)</code></td>
<td>match any <i>type</i>, store result in <i>variable</i>
<br><strong>Warning:</strong> <em>deprecated</em>
<code>type</code> is most often used in seeing if a
character is one of several possibilities. Use
<code>element-of</code> instead.
</td>
</tr>
<tr>
<td><code>$foo</code></td>
<td><code>loop(foo)</td>
<td>zero or more</td>
</tr> <tr>
<td><code>!<i>Lisp</i></code></td>
<td><code>(<i>Dylan</i>)</code></td>
<td>call the code (and check result)</td>
</tr>
</table>
<p>
The same grammar which works for streams will works for strings.
When parsing strings, more than just one-character look-ahead is
possible, though. You can therefore not only match against
characters, but also whole substrings. This does not work when
reading from a stream.<p>
<h3>Additional pseudo-function expressions</h3>
<table>
<tr>
<td><b><tt>with-meta-syntax</tt></b></td>
<td><b>Description</b></td>
<td><b>Could be written as</b></td>
</tr> <tr>
<td><code>do(<i>Dylan</i>)</code></td>
<td>call the code and continue (whatever the result is)</td>
<td><code>(<i>Dylan</i>; #t)</code></td>
</tr> <tr>
<td><code>finish()</code></td>
<td>finish parsing successfully</td>
<td>not possible</td>
</tr> <tr>
<td><code>test(<i>predicate</i>)</code></td>
<td>Match against a predicate.</td>
<td>not possible</td>
</tr> <tr>
<td><code>test(<i>predicate</i>, <i>variable</i>)</code></td>
<td>Match against a predicate, saving the result.</td>
<td>not possible</td>
</tr> <tr>
<td><code>peeking(<i>variable</i>, <i>test</i>)</code></td>
<td>Save result first, so that expression test can use it.</td>
<td>not possible;<br> <b>Warning:</b> <em>deprecated, use
</em><code>peek</code> <em>instead</em></td>
</tr>
<tr>
<td><code>peek(<i>variable</i>, <i>test</i>)</code></td>
<td>Look one character ahead and store in <i>variable</i> if
it passes <i>test</i>. Leave the character on the stream.</td>
<td>not possible</td>
</tr>
<tr><td><code>element-of(<i>sequence</i>, <i>variable</i>)</code></td>
<td>Sees if the <i>variable</i> (a character) is a member of the
<i>sequence</i>, storing the result</td>
<td>{ 'a', 'b', 'c' } (but not storing result)</td>
</tr>
<tr>
<td><code>yes!(<i>variable</i>)</code></td>
<td>Set <i>variable</i> to #t and continue.</td>
<td><code>(<i>variable</i> := #t)</code></td>
</tr> <tr>
<td><code>no!(<i>variable</i>)</code></td>
<td>Set <i>variable</i> to #f and continue.</td>
<td><code>(<i>variable</i> := #f; #t)</code></td>
</tr> <tr>
<td><code>set!(<i>variable</i>, <i>value</i>)</code></td>
<td>Set <i>variable</i> to <i>value</i> and continue.</td>
<td><code>(<i>variable</i> := <i>value</i>; #t)</code></td>
</tr> <tr>
<td><code>accept(<i>variable</i>)</code></td>
<td>Match anything and save result.</td>
<td><code>type(&lt;object&gt;, <i>variable</i>)</code></td>
</tr>
</table>
<a name="example">
<h2>5. Example code</h2></a><p>
<h3>Parsing an integer (base 10)</h3><p>
Common Lisp version:<p>
<pre>
(defun parse-integer (&aux (s +1) d (n 0))
(and
(matchit
[{#\+ [#\- !(setq s -1)] []}
@(digit d) !(setq n (digit-to-integer d))
$[@(digit d) !(setq n (+ (* n 10) (digit-to-integer d)))]])
(* s n)))
</pre>
Direct translation to Dylan:<p>
<pre>
define constant &lt;digit&gt; = one-of('0','1','2','3','4','5','6','7','8','9');
define function parse-integer (source :: &lt;stream&gt;);
let s = +1; // sign
let n = 0; // number
with-meta-syntax parse-stream (source)
variables(d);
[{'+', ['-', (s := -1)], []},
type(&lt;digit&gt;, d), (n := digit-to-integer(d)),
loop([type(&lt;digit&gt;, d), (n := digit-to-integer(d) + 10 * n)])];
(s * n)
end with-meta-syntax;
end function parse-integer;
</pre>
Alternative version:
<pre>
// this will actually return a fn named 'scan-int', not 'parse-int'
define collector int(i) => (as(&lt;string>, str).string-to-integer)
loop([element-of("+-0123456789", i), do(collect(i))])
end collector int;
</pre>
<a name="finger">
<h3>Parsing finger queries</h3></a><p>
<pre>
define function parse-finger-query (query :: &lt;string&gt;)
with-collector into-buffer user like query (collect: collect)
with-meta-syntax parse-string (query)
variables (whois, at, c);
[loop(' '), {[{"/W", "/w"}, yes!(whois)], []}, // Whois switch?
loop(' '), loop({[{'\n', '\r'}, finish()], // Newline? Quit.
{['@', yes!(at), do(collect('@'))], // @? Indirect.
[accept(c), do(collect(c))]}})]; // then collect char
values(whois, user(), at);
end with-meta-syntax;
end with-collector;
end function parse-finger-query;
</pre>
<a name="references">
<h2>6. References</h2></a>
<p>
<a name="baker91"></a>
<a href="lisp-meta.htm">[Baker91]</a>
Baker, Henry. "Pragmatic Parsing in Common Lisp".
<i>ACM Lisp Pointers 4, 2</i> (Apr-Jun 1991), 3-15.<p>
<hr>
<address>Original author is <a href="mailto:david.lichteblau@snafu.de">David Lichteblau</a>, with additions by <a href="mailto:dauclair@hotmail.com">Douglas M. Auclair</a>.</address>
<!-- hhmts start -->
Last modified: Tue Jan 22 01:35:59 EST 2002
<!-- hhmts end -->
</body>
</html>
Something went wrong with that request. Please try again.