ref/macros.html

<html dir="ltr">
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type"/>
<title>Arc Language Internals: Macros</title>
<link rel="stylesheet" type="text/css" href="code.css">
</head>
<body>
<a href="http://arcfn.com"><img src="arc.gif" style="position:absolute; right:0px; top:0px; width:135px; height:125; border-style: none" title="Arc de Triomphe under construction.  From English
Illustrated Magazine."/></a>

<h1>Arc internals: Macros</h1>

This article describes how the Arc language handles macros internally.  It explains the steps of macro processing in Arc, and how macros are translated to Scheme functions.  This article doesn't provide useful knowledge of how to use macros; for that, see chapter 7 of <a href="http://www.paulgraham.com/onlisp.html">On Lisp</a>.  The macros below are not intended as examples of good macro style or usefulness; instead they illustrate points of macro processing.
This article also assumes some previous understanding of macros, which can be obtained from the <a href="http://ycombinator.com/arc/tut.txt">Arc Tutorial</a>.
<p>
To summarize macros briefly, a macro is like a function that generates Lisp code, which is then substituted in place of the macro.  The first phase of macro processing is macroexpansion, in which the macro generates an expression.  The second phase is evaluation, in which the generated expression is evaluated.
<h2>How macros work</h2>
It is very important to understand when the macroexpansion and evaluation phases take place.  An example may help clarify this:
<pre class="repl">
arc> (mac foo () (prn "macro executed") '(prn "generated code executed"))
#3(tagged mac #&lt;procedure>)
arc> (foo)
macro executed
generated code executed
"generated code executed"
</pre>
The macro foo does two things: it prints "macro executed" when it is executed during macroexpansion, and it returns the list: <code>(prn "generated code executed")</code>.  When that list is executed in evaluation, it will display "generated code executed".  (Since <code>prn</code> returns that string, the string is also displayed a second time as the result of the command.)
<p>
Howerver, the two phases of macroexpansion and evaluation can be separated in time.
If we use the macro inside a function definition, things get interesting:
<pre class="repl">
arc> (def bar () (prn "bar executed") (foo))
macro executed
#&lt;procedure: bar>
</pre>
Macroexpansion takes place above, but evaluation of the generated code does not take place; the generated code becomes part of <code>bar</code>.  This is demonstrated by the <code>prn</code> statements, which show that only the macro itself is executed.  Next, the macro itself can be destroyed, to illustrate that it is not necessary for the evaluation phase:
<pre class="repl">
arc> (= foo nil)
nil
</pre>
Finally, evaluation can take place.  If procedure <code>bar</code> is executed three times, note that the body of <code>bar</code> is executed, as well as the code generated by <code>foo</code> during macroexpansion:
<pre class="repl">
arc> (repeat 3 (bar))
bar executed
generated code executed
bar executed
generated code executed
bar executed
generated code executed
nil
</pre>
This illustrates the two phases: first, the macro is executed and generates code during macroexpansion.  Second, the generated code is executed, potentially much later, during evaluation.  Note that the macro itself does not take part in the second step.  In the example above, the macro has been destroyed.  A more subtle issue is if the macro definition is changed, these changes will have no effect on previous callers of the macro; this can lead to confusion, when the old version of a macro appears to be active.
<p>
Another important difference between macros and functions is the arguments to a function are evaluated before passing them to the function, while the arguments to a macro are passed unprocessed.  This is vital for macros that execute their arguments multiple times (e.g. <code>repeat</code>), macros that execute their arguments conditionally (e.g. <code>and</code>), and macros that process their arguments (e.g. =).  To see the difference, note that the function receives the result of evaluating <code>(+ 1 1)</code>, while the macro receives the list <code>(+ 1 1)</code>:
<pre class="repl">
arc> (def f args (prn args) nil)
#&lt;procedure: f>
arc> (mac m args (prn args) nil)
#3(tagged mac #&lt;procedure>)
arc> (f (+ 1 1))
(2)
nil
arc> (m (+ 1 1))
((+ 1 1))
nil
</pre>
<p>
Like functions, macros support destructuring bind on the arguments.  Despite its intimidating name, destructuring bind is simply means that the list of parameters can be a complex nested list.  The arguments passed in will be "destructured" and mapped onto the parameter list.  For example:
<pre class="repl">
arc> (mac db (a (b c d) e) (prs a b c d e) nil)
arc> (db 1 (2 3 4) 5)
1 2 3 4 5
</pre>
Note that the list <code>(2 3 4)</code> has been destructured and mapped onto individual arguments.  In a more complex example, note that the expressions are unevaluated, expressions can be destructured, and any extra arguments are discarded.
<pre class="repl">
arc> (db (+ m n o) (+ p q r) (+ s t u))
(+ m n o) + p q (+ s t u)
</pre>
<p>
Now that the phases of macro processing are clear, these phases can be illustrated in Arc.
<h2>Creation of a macro</h2>
Macro creation in Arc is, somewhat surprisingly, not part of the <code>ac.scm</code> foundation, but is implemented in <code>arc.arc</code> out of annotated functions.
A macro can be generated "manually", illustrating the steps.  Create <code>baz</code>, which is like the macro <code>foo</code> above, except it is a function and not a macro.  Note that when executed, it returns the generated code:
<pre class="repl">
arc> (def baz () ((prn "macro executed") '(prn "generated code executed"))
arc> (baz)
macro executed
(prn "generated code executed")
</pre>
To turn this function into a macro, it is simply annotated as type <code>'mac</code>.
The resulting <code>mbaz</code> functions exactly like the macro <code>foo</code>:
<pre class="repl">
arc> (= mbaz (annotate 'mac baz))
#3(tagged mac #&lt;procedure: baz>)
arc> (mbaz)
macro executed
generated code executed
"generated code executed"
</pre>
To summarize, any function that generates Arc code can be annotated as type <code>'mac</code>, and the result is a macro.  The annotation process is the key to forming a macro.  There is nothing special about using <code>mac</code> to create a macro; it is just a "convenience" macro that performs this annotation.  The fact that <code>mac</code> is a macro may seem circular, but since it is implemented using <code>annotate</code> directly, there is no circularity.
<p>
The <code>annotate</code> function provides a general typing system.  It is implemented in Scheme by <code>ar-tag</code>, which creates a Scheme vector of length 3: <code>'tagged</code>, the type, and the contents.  For macros, the type is <code>'mac</code>, but <code>annotate</code> supports arbitrary types.  (For detailed background on <code>annotate</code>, see <a href="http://www.paulgraham.com/ilc03.html">Some Work on Arc</a>.) 
<p>
The internal representation of a macro as a vector can be seen by entering a macro name:
<pre class="repl">
arc> do
#3(tagged mac #&lt;procedure>)
</pre>
<p>
Because macros are implemented by functions, the body of the macro is initially processed by <code>ac-fn</code>.  A simple function is turned into a Scheme lambda function, while a function with complex (destructuring) arguments is processed by <code>ac-complex-fn</code>.  The destructuring is implemented by inserting the appropriate combinations of <code>car</code> and <code>cdr</code> to pull each argument out of the list.  In other words, destructuring is just syntactic sugar; the same effect can be obtained by manually using <code>car</code> and </code>cdr</code> to extract each argument from a rest argument.
<p>
The net result is that a Arc macro becomes an Arc procedure, tagged as <code>'mac</code>.  Internally, the macro is a Scheme vector, with the third argument a Scheme procedure that will take the macro arguments, perform any necessary destructuring, and execute the macro code.

<h2>Macroexpansion</h2>
The macroexpansion takes place when Arc code is converted to Scheme by <code>ac</code>.  (See <a href="http://arcfn.com/2008/02/arc-internals-part-1.html">Arc Internals, Part 1</a> for background.)  Code of the form <code>(foo ...)</code> is passed by <code>ac</code> to <code>ac-call</code>.  If the first argument is a macro (i.e. a vector tagged with <code>'mac</code>), it is evaluated by <code>ac-mac-call</code>, which evaluates the macro function on the arguments, and then passes the macroexpanded result to <code>ac</code> to be converted to Scheme.  The net result is the code generated by the macro gets treated as if it were part of the original expression being processed.
<p>
Comparing <code>ac-mac-call</code> to <code>ac-call</code> shows why macros receive their arguments unevaluated.  <code>ac-mac-call</code> applies the macro function to the arguments, while <code>ac-call</code> maps <code>ac</code> on the arguments before applying the function, causing the arguments to be evaluated. 

<h2>Evaluation</h2>
Nothing partcularly special happens for macro code during evaluation.  At this point it is Scheme code, and it gets executed the same as any other code.  Specifically, <code>arc-eval</code> calls <code>eval</code> on the Scheme code generated by <code>ac</code>.  It may use <code>ar-funcall<i>n</i></code> to execute functions, and execution will typically bottom-out with foundation operations implemented in native Scheme.  (See <a href="http://arcfn.com/foundation-doc.html">Arc Foundation Documentation</a> for details.)

<h2>macex and macex1</h2>
The <code>macex</code> and <code>macex1</code> functions perform macro expansion.  If the outermost form is a macro, <code>macex1</code> expands it once, while <code>macex</code> expands it repeatedly until it is no longer a macro.  These functions are typically used for debugging, to see what a macros is doing.  Note that neither <code>macex</code> nor <code>macex1</code> expand inner macros.
<pre class="repl">
arc> (macex1 '(or 1 2 3))
(let gs2418 1 (if gs2418 gs2418 (or 2 3)))
arc> (macex '(or 1 2 3))
((fn (gs2419) (if gs2419 gs2419 (or 2 3))) 1)
</pre>
<p>
The <code>macex</code> and <code>macex1</code> functions are
implemented by <code>ac-macex</code>, which, if passed a macro expands it.  Expansion is done by applying the macro function to the arguments.  For <code>macex</code>, <code>ac-macex</code> calls itself recursively, terminating when the outer call is no longer a macro.  However, <code>macex1</code> passes a <code>'once</code> flag to <code>ac-macex</code>, so only one step of macro expansion is performed.
<p>
The <code>set</code> special form takes a symbol and a value, and sets the variable indicated by the symbol to the value.  It is implemented by <code>ac-set</code>, which uses <code>ac-macex</code>.  This ensures the first argument to <code>set</code> is a symbol, or a macro that macroexpands to a symbol.  Note that something (other than a macro) that evaluates to a symbol is not permitted as the first argument to set.  Thus, macro processing for <code>set</code> uses a separate path from the rest of Arc's macro processing.
<p>
In conclusion, Arc's macro processing is implemented relatively straightforwardly.  The biggest surprise is the representation of macros as tagged procedures, which are implemented as vectors.  Macros may be somewhat nonintuitive, but understanding the internal implementation may help with writing and understanding code that makes use of macros.
<p>
Copyright 2008 Ken Shirriff
</body>
</html>