site/compiling-expressions.html

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8" />
<title>Compiling Expressions &middot; Crafting Interpreters</title>

<!-- Tell mobile browsers we're optimized for them and they don't need to crop
     the viewport. -->
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<link rel="stylesheet" type="text/css" href="style.css" />

<!-- Oh, God, Source Code Pro is so beautiful it makes me want to cry. -->
<link href='https://fonts.googleapis.com/css?family=Source+Code+Pro:400|Source+Sans+Pro:300,400,600' rel='stylesheet' type='text/css'>

<link rel="icon" type="image/png" href="image/favicon.png" />
<script src="jquery-3.4.1.min.js"></script>
<script src="script.js"></script>

<!-- Google analytics -->
<script>
  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
  m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
  })(window,document,'script','https://www.google-analytics.com/analytics.js','ga');

  ga('create', 'UA-42804721-2', 'auto');
  ga('send', 'pageview');
</script>

</head>
<body id="top">

<!-- <div class="scrim"></div> -->
<nav class="wide">
  <a href="/"><img src="image/logotype.png" title="Crafting Interpreters"></a>
  <div class="contents">
<h3><a href="#top">Compiling Expressions<small>17</small></a></h3>

<ul>
    <li><a href="#single-pass-compilation"><small>17.1</small> Single-Pass Compilation</a></li>
    <li><a href="#parsing-tokens"><small>17.2</small> Parsing Tokens</a></li>
    <li><a href="#emitting-bytecode"><small>17.3</small> Emitting Bytecode</a></li>
    <li><a href="#parsing-prefix-expressions"><small>17.4</small> Parsing Prefix Expressions</a></li>
    <li><a href="#parsing-infix-expressions"><small>17.5</small> Parsing Infix Expressions</a></li>
    <li><a href="#a-pratt-parser"><small>17.6</small> A Pratt Parser</a></li>
    <li><a href="#dumping-chunks"><small>17.7</small> Dumping Chunks</a></li>
    <li class="divider"></li>
    <li class="end-part"><a href="#challenges">Challenges</a></li>
    <li class="end-part"><a href="#design-note"><small>note</small>It&#x27;s Just Parsing</a></li>
</ul>


<div class="prev-next">
    <a href="scanning-on-demand.html" title="Scanning on Demand" class="left">&larr;&nbsp;Previous</a>
    <a href="a-bytecode-virtual-machine.html" title="A Bytecode Virtual Machine">&uarr;&nbsp;Up</a>
    <a href="types-of-values.html" title="Types of Values" class="right">Next&nbsp;&rarr;</a>
</div>  </div>
</nav>

<nav class="narrow">
<a href="/"><img src="image/logotype.png" title="Crafting Interpreters"></a>
<a href="scanning-on-demand.html" title="Scanning on Demand" class="prev">←</a>
<a href="types-of-values.html" title="Types of Values" class="next">→</a>
</nav>

<div class="page">
<div class="nav-wrapper">
<nav class="floating">
  <a href="/"><img src="image/logotype.png" title="Crafting Interpreters"></a>
  <div class="expandable">
<h3><a href="#top">Compiling Expressions<small>17</small></a></h3>

<ul>
    <li><a href="#single-pass-compilation"><small>17.1</small> Single-Pass Compilation</a></li>
    <li><a href="#parsing-tokens"><small>17.2</small> Parsing Tokens</a></li>
    <li><a href="#emitting-bytecode"><small>17.3</small> Emitting Bytecode</a></li>
    <li><a href="#parsing-prefix-expressions"><small>17.4</small> Parsing Prefix Expressions</a></li>
    <li><a href="#parsing-infix-expressions"><small>17.5</small> Parsing Infix Expressions</a></li>
    <li><a href="#a-pratt-parser"><small>17.6</small> A Pratt Parser</a></li>
    <li><a href="#dumping-chunks"><small>17.7</small> Dumping Chunks</a></li>
    <li class="divider"></li>
    <li class="end-part"><a href="#challenges">Challenges</a></li>
    <li class="end-part"><a href="#design-note"><small>note</small>It&#x27;s Just Parsing</a></li>
</ul>


<div class="prev-next">
    <a href="scanning-on-demand.html" title="Scanning on Demand" class="left">&larr;&nbsp;Previous</a>
    <a href="a-bytecode-virtual-machine.html" title="A Bytecode Virtual Machine">&uarr;&nbsp;Up</a>
    <a href="types-of-values.html" title="Types of Values" class="right">Next&nbsp;&rarr;</a>
</div>  </div>
  <a id="expand-nav">≡</a>
</nav>
</div>

<article class="chapter">

  <div class="number">17</div>
  <h1>Compiling Expressions</h1>

<blockquote>
<p>In the middle of the journey of our life I found myself within a dark woods
where the straight way was lost.</p>
<p><cite>Dante Alighieri, <em>Inferno</em></cite></p>
</blockquote>
<p>This chapter is exciting for not one, not two, but <em>three</em> reasons. First, it
provides the final segment of our VM&rsquo;s execution pipeline. Once in place, we can
plumb the user&rsquo;s source code from scanning all the way through to executing it.</p><img src="image/compiling-expressions/pipeline.png" alt="Lowering the 'compiler' section of pipe between 'scanner' and 'VM'." />
<p>Second, we get to write an actual, honest-to-God <em>compiler</em>. It parses source
code and outputs a low-level series of binary instructions. Sure, it&rsquo;s <span
name="wirth">bytecode</span> and not some chip&rsquo;s native instruction set, but
it&rsquo;s way closer to the metal than jlox was. We&rsquo;re about to be real language
hackers.</p>
<aside name="wirth">
<p>Bytecode was good enough for Niklaus Wirth, and no one questions his street
cred.</p>
</aside>
<p><span name="pratt">Third</span> and finally, I get to show you one of my
absolute favorite algorithms: Vaughan Pratt&rsquo;s &ldquo;top-down operator precedence
parsing&rdquo;. It&rsquo;s the most elegant way I know to parse expressions. It gracefully
handles prefix operators, postfix, infix, <em>mixfix</em>, any kind of <em>-fix</em> you got.
It deals with precedence and associativity without breaking a sweat. I love it.</p>
<aside name="pratt">
<p>Pratt parsers are a sort of oral tradition in industry. No compiler or language
book I&rsquo;ve read teaches them. Academia is very focused on generated parsers, and
Pratt&rsquo;s technique is for handwritten ones, so it gets overlooked.</p>
<p>But in production compilers, where hand-rolled parsers are common, you&rsquo;d be
surprised how many people know it. Ask where they learned it, and it&rsquo;s always,
&ldquo;Oh, I worked on this compiler years ago and my coworker said they took it from
this old front end<span class="ellipse">&thinsp;.&thinsp;.&thinsp;.&nbsp;</span>&rdquo;</p>
</aside>
<p>As usual, before we get to the fun stuff, we&rsquo;ve got some preliminaries to work
through. You have to eat your vegetables before you get dessert. First, let&rsquo;s
ditch that temporary scaffolding we wrote for testing the scanner and replace it
with something more useful.</p>
<div class="codehilite"><pre class="insert-before">InterpretResult interpret(const char* source) {
</pre><div class="source-file"><em>vm.c</em><br>
in <em>interpret</em>()<br>
replace 2 lines</div>
<pre class="insert">  <span class="t">Chunk</span> <span class="i">chunk</span>;
  <span class="i">initChunk</span>(&amp;<span class="i">chunk</span>);

  <span class="k">if</span> (!<span class="i">compile</span>(<span class="i">source</span>, &amp;<span class="i">chunk</span>)) {
    <span class="i">freeChunk</span>(&amp;<span class="i">chunk</span>);
    <span class="k">return</span> <span class="a">INTERPRET_COMPILE_ERROR</span>;
  }

  <span class="i">vm</span>.<span class="i">chunk</span> = &amp;<span class="i">chunk</span>;
  <span class="i">vm</span>.<span class="i">ip</span> = <span class="i">vm</span>.<span class="i">chunk</span>-&gt;<span class="i">code</span>;

  <span class="t">InterpretResult</span> <span class="i">result</span> = <span class="i">run</span>();

  <span class="i">freeChunk</span>(&amp;<span class="i">chunk</span>);
  <span class="k">return</span> <span class="i">result</span>;
</pre><pre class="insert-after">}
</pre></div>
<div class="source-file-narrow"><em>vm.c</em>, in <em>interpret</em>(), replace 2 lines</div>

<p>We create a new empty chunk and pass it over to the compiler. The compiler will
take the user&rsquo;s program and fill up the chunk with bytecode. At least, that&rsquo;s
what it will do if the program doesn&rsquo;t have any compile errors. If it does
encounter an error, <code>compile()</code> returns <code>false</code> and we discard the unusable
chunk.</p>
<p>Otherwise, we send the completed chunk over to the VM to be executed. When the
VM finishes, we free the chunk and we&rsquo;re done. As you can see, the signature to
<code>compile()</code> is different now.</p>
<div class="codehilite"><pre class="insert-before">#define clox_compiler_h

</pre><div class="source-file"><em>compiler.h</em><br>
replace 1 line</div>
<pre class="insert"><span class="a">#include &quot;vm.h&quot;</span>

<span class="t">bool</span> <span class="i">compile</span>(<span class="k">const</span> <span class="t">char</span>* <span class="i">source</span>, <span class="t">Chunk</span>* <span class="i">chunk</span>);
</pre><pre class="insert-after">

#endif
</pre></div>
<div class="source-file-narrow"><em>compiler.h</em>, replace 1 line</div>

<p>We pass in the chunk where the compiler will write the code, and then
<code>compile()</code> returns whether or not compilation succeeded. We make the same
change to the signature in the implementation.</p>
<div class="codehilite"><pre class="insert-before">#include &quot;scanner.h&quot;

</pre><div class="source-file"><em>compiler.c</em><br>
function <em>compile</em>()<br>
replace 1 line</div>
<pre class="insert"><span class="t">bool</span> <span class="i">compile</span>(<span class="k">const</span> <span class="t">char</span>* <span class="i">source</span>, <span class="t">Chunk</span>* <span class="i">chunk</span>) {
</pre><pre class="insert-after">  initScanner(source);
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, function <em>compile</em>(), replace 1 line</div>

<p>That call to <code>initScanner()</code> is the only line that survives this chapter. Rip
out the temporary code we wrote to test the scanner and replace it with these
three lines:</p>
<div class="codehilite"><pre class="insert-before">  initScanner(source);
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>compile</em>()<br>
replace 13 lines</div>
<pre class="insert">  <span class="i">advance</span>();
  <span class="i">expression</span>();
  <span class="i">consume</span>(<span class="a">TOKEN_EOF</span>, <span class="s">&quot;Expect end of expression.&quot;</span>);
</pre><pre class="insert-after">}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>compile</em>(), replace 13 lines</div>

<p>The call to <code>advance()</code> &ldquo;primes the pump&rdquo; on the scanner. We&rsquo;ll see what it does
soon. Then we parse a single expression. We aren&rsquo;t going to do statements yet,
so that&rsquo;s the only subset of the grammar we support. We&rsquo;ll revisit this when we
<a href="global-variables.html">add statements in a few chapters</a>. After we compile the expression, we
should be at the end of the source code, so we check for the sentinel EOF token.</p>
<p>We&rsquo;re going to spend the rest of the chapter making this function work,
especially that little <code>expression()</code> call. Normally, we&rsquo;d dive right into that
function definition and work our way through the implementation from top to
bottom.</p>
<p>This chapter is <span name="blog">different</span>. Pratt&rsquo;s parsing technique is
remarkably simple once you have it all loaded in your head, but it&rsquo;s a little
tricky to break into bite-sized pieces. It&rsquo;s recursive, of course, which is part
of the problem. But it also relies on a big table of data. As we build up the
algorithm, that table grows additional columns.</p>
<aside name="blog">
<p>If this chapter isn&rsquo;t clicking with you and you&rsquo;d like another take on the
concepts, I wrote an article that teaches the same algorithm but using Java and
an object-oriented style: <a href="http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/">&ldquo;Pratt Parsing: Expression Parsing Made Easy&rdquo;</a>.</p>
</aside>
<p>I don&rsquo;t want to revisit 40-something lines of code each time we extend the
table. So we&rsquo;re going to work our way into the core of the parser from the
outside and cover all of the surrounding bits before we get to the juicy center.
This will require a little more patience and mental scratch space than most
chapters, but it&rsquo;s the best I could do.</p>
<h2><a href="#single-pass-compilation" id="single-pass-compilation"><small>17&#8202;.&#8202;1</small>Single-Pass Compilation</a></h2>
<p>A compiler has roughly two jobs. It parses the user&rsquo;s source code to understand
what it means. Then it takes that knowledge and outputs low-level instructions
that produce the same semantics. Many languages split those two roles into two
separate <span name="passes">passes</span> in the implementation. A parser
produces an AST<span class="em">&mdash;</span>just like jlox does<span class="em">&mdash;</span>and then a code generator traverses
the AST and outputs target code.</p>
<aside name="passes">
<p>In fact, most sophisticated optimizing compilers have a heck of a lot more than
two passes. Determining not just <em>what</em> optimization passes to have, but how to
order them to squeeze the most performance out of the compiler<span class="em">&mdash;</span>since the
optimizations often interact in complex ways<span class="em">&mdash;</span>is somewhere between an &ldquo;open
area of research&rdquo; and a &ldquo;dark art&rdquo;.</p>
</aside>
<p>In clox, we&rsquo;re taking an old-school approach and merging these two passes into
one. Back in the day, language hackers did this because computers literally
didn&rsquo;t have enough memory to store an entire source file&rsquo;s AST. We&rsquo;re doing it
because it keeps our compiler simpler, which is a real asset when programming in
C.</p>
<p>Single-pass compilers like we&rsquo;re going to build don&rsquo;t work well for all
languages. Since the compiler has only a peephole view into the user&rsquo;s program
while generating code, the language must be designed such that you don&rsquo;t need
much surrounding context to understand a piece of syntax. Fortunately, tiny,
dynamically typed Lox is <span name="lox">well-suited</span> to that.</p>
<aside name="lox">
<p>Not that this should come as much of a surprise. I did design the language
specifically for this book after all.</p><img src="image/compiling-expressions/keyhole.png" alt="Peering through a keyhole at 'var x;'" />
</aside>
<p>What this means in practical terms is that our &ldquo;compiler&rdquo; C module has
functionality you&rsquo;ll recognize from jlox for parsing<span class="em">&mdash;</span>consuming tokens,
matching expected token types, etc. And it also has functions for code gen<span class="em">&mdash;</span>emitting bytecode and adding constants to the destination chunk. (And it means
I&rsquo;ll use &ldquo;parsing&rdquo; and &ldquo;compiling&rdquo; interchangeably throughout this and later
chapters.)</p>
<p>We&rsquo;ll build the parsing and code generation halves first. Then we&rsquo;ll stitch them
together with the code in the middle that uses Pratt&rsquo;s technique to parse Lox&rsquo;s
particular grammar and output the right bytecode.</p>
<h2><a href="#parsing-tokens" id="parsing-tokens"><small>17&#8202;.&#8202;2</small>Parsing Tokens</a></h2>
<p>First up, the front half of the compiler. This function&rsquo;s name should sound
familiar.</p>
<div class="codehilite"><pre class="insert-before">#include &quot;scanner.h&quot;
</pre><div class="source-file"><em>compiler.c</em></div>
<pre class="insert">

<span class="k">static</span> <span class="t">void</span> <span class="i">advance</span>() {
  <span class="i">parser</span>.<span class="i">previous</span> = <span class="i">parser</span>.<span class="i">current</span>;

  <span class="k">for</span> (;;) {
    <span class="i">parser</span>.<span class="i">current</span> = <span class="i">scanToken</span>();
    <span class="k">if</span> (<span class="i">parser</span>.<span class="i">current</span>.<span class="i">type</span> != <span class="a">TOKEN_ERROR</span>) <span class="k">break</span>;

    <span class="i">errorAtCurrent</span>(<span class="i">parser</span>.<span class="i">current</span>.<span class="i">start</span>);
  }
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em></div>

<p>Just like in jlox, it steps forward through the token stream. It asks the
scanner for the next token and stores it for later use. Before doing that, it
takes the old <code>current</code> token and stashes that in a <code>previous</code> field. That will
come in handy later so that we can get at the lexeme after we match a token.</p>
<p>The code to read the next token is wrapped in a loop. Remember, clox&rsquo;s scanner
doesn&rsquo;t report lexical errors. Instead, it creates special <em>error tokens</em> and
leaves it up to the parser to report them. We do that here.</p>
<p>We keep looping, reading tokens and reporting the errors, until we hit a
non-error one or reach the end. That way, the rest of the parser sees only valid
tokens. The current and previous token are stored in this struct:</p>
<div class="codehilite"><pre class="insert-before">#include &quot;scanner.h&quot;
</pre><div class="source-file"><em>compiler.c</em></div>
<pre class="insert">

<span class="k">typedef</span> <span class="k">struct</span> {
  <span class="t">Token</span> <span class="i">current</span>;
  <span class="t">Token</span> <span class="i">previous</span>;
} <span class="t">Parser</span>;

<span class="t">Parser</span> <span class="i">parser</span>;
</pre><pre class="insert-after">

static void advance() {
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em></div>

<p>Like we did in other modules, we have a single global variable of this struct
type so we don&rsquo;t need to pass the state around from function to function in the
compiler.</p>
<h3><a href="#handling-syntax-errors" id="handling-syntax-errors"><small>17&#8202;.&#8202;2&#8202;.&#8202;1</small>Handling syntax errors</a></h3>
<p>If the scanner hands us an error token, we need to actually tell the user. That
happens using this:</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after variable <em>parser</em></div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">errorAtCurrent</span>(<span class="k">const</span> <span class="t">char</span>* <span class="i">message</span>) {
  <span class="i">errorAt</span>(&amp;<span class="i">parser</span>.<span class="i">current</span>, <span class="i">message</span>);
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after variable <em>parser</em></div>

<p>We pull the location out of the current token in order to tell the user where
the error occurred and forward it to <code>errorAt()</code>. More often, we&rsquo;ll report an
error at the location of the token we just consumed, so we give the shorter name
to this other function:</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after variable <em>parser</em></div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">error</span>(<span class="k">const</span> <span class="t">char</span>* <span class="i">message</span>) {
  <span class="i">errorAt</span>(&amp;<span class="i">parser</span>.<span class="i">previous</span>, <span class="i">message</span>);
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after variable <em>parser</em></div>

<p>The actual work happens here:</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after variable <em>parser</em></div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">errorAt</span>(<span class="t">Token</span>* <span class="i">token</span>, <span class="k">const</span> <span class="t">char</span>* <span class="i">message</span>) {
  <span class="i">fprintf</span>(<span class="i">stderr</span>, <span class="s">&quot;[line %d] Error&quot;</span>, <span class="i">token</span>-&gt;<span class="i">line</span>);

  <span class="k">if</span> (<span class="i">token</span>-&gt;<span class="i">type</span> == <span class="a">TOKEN_EOF</span>) {
    <span class="i">fprintf</span>(<span class="i">stderr</span>, <span class="s">&quot; at end&quot;</span>);
  } <span class="k">else</span> <span class="k">if</span> (<span class="i">token</span>-&gt;<span class="i">type</span> == <span class="a">TOKEN_ERROR</span>) {
    <span class="c">// Nothing.</span>
  } <span class="k">else</span> {
    <span class="i">fprintf</span>(<span class="i">stderr</span>, <span class="s">&quot; at &#39;%.*s&#39;&quot;</span>, <span class="i">token</span>-&gt;<span class="i">length</span>, <span class="i">token</span>-&gt;<span class="i">start</span>);
  }

  <span class="i">fprintf</span>(<span class="i">stderr</span>, <span class="s">&quot;: %s</span><span class="e">\n</span><span class="s">&quot;</span>, <span class="i">message</span>);
  <span class="i">parser</span>.<span class="i">hadError</span> = <span class="k">true</span>;
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after variable <em>parser</em></div>

<p>First, we print where the error occurred. We try to show the lexeme if it&rsquo;s
human-readable. Then we print the error message itself. After that, we set this
<code>hadError</code> flag. That records whether any errors occurred during compilation.
This field also lives in the parser struct.</p>
<div class="codehilite"><pre class="insert-before">  Token previous;
</pre><div class="source-file"><em>compiler.c</em><br>
in struct <em>Parser</em></div>
<pre class="insert">  <span class="t">bool</span> <span class="i">hadError</span>;
</pre><pre class="insert-after">} Parser;
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in struct <em>Parser</em></div>

<p>Earlier I said that <code>compile()</code> should return <code>false</code> if an error occurred. Now
we can make it do that.</p>
<div class="codehilite"><pre class="insert-before">  consume(TOKEN_EOF, &quot;Expect end of expression.&quot;);
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>compile</em>()</div>
<pre class="insert">  <span class="k">return</span> !<span class="i">parser</span>.<span class="i">hadError</span>;
</pre><pre class="insert-after">}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>compile</em>()</div>

<p>I&rsquo;ve got another flag to introduce for error handling. We want to avoid error
cascades. If the user has a mistake in their code and the parser gets confused
about where it is in the grammar, we don&rsquo;t want it to spew out a whole pile of
meaningless knock-on errors after the first one.</p>
<p>We fixed that in jlox using panic mode error recovery. In the Java interpreter,
we threw an exception to unwind out of all of the parser code to a point where
we could skip tokens and resynchronize. We don&rsquo;t have <span
name="setjmp">exceptions</span> in C. Instead, we&rsquo;ll do a little smoke and
mirrors. We add a flag to track whether we&rsquo;re currently in panic mode.</p>
<aside name="setjmp">
<p>There is <code>setjmp()</code> and <code>longjmp()</code>, but I&rsquo;d rather not go there. Those make it
too easy to leak memory, forget to maintain invariants, or otherwise have a Very
Bad Day.</p>
</aside>
<div class="codehilite"><pre class="insert-before">  bool hadError;
</pre><div class="source-file"><em>compiler.c</em><br>
in struct <em>Parser</em></div>
<pre class="insert">  <span class="t">bool</span> <span class="i">panicMode</span>;
</pre><pre class="insert-after">} Parser;
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in struct <em>Parser</em></div>

<p>When an error occurs, we set it.</p>
<div class="codehilite"><pre class="insert-before">static void errorAt(Token* token, const char* message) {
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>errorAt</em>()</div>
<pre class="insert">  <span class="i">parser</span>.<span class="i">panicMode</span> = <span class="k">true</span>;
</pre><pre class="insert-after">  fprintf(stderr, &quot;[line %d] Error&quot;, token-&gt;line);
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>errorAt</em>()</div>

<p>After that, we go ahead and keep compiling as normal as if the error never
occurred. The bytecode will never get executed, so it&rsquo;s harmless to keep on
trucking. The trick is that while the panic mode flag is set, we simply suppress
any other errors that get detected.</p>
<div class="codehilite"><pre class="insert-before">static void errorAt(Token* token, const char* message) {
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>errorAt</em>()</div>
<pre class="insert">  <span class="k">if</span> (<span class="i">parser</span>.<span class="i">panicMode</span>) <span class="k">return</span>;
</pre><pre class="insert-after">  parser.panicMode = true;
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>errorAt</em>()</div>

<p>There&rsquo;s a good chance the parser will go off in the weeds, but the user won&rsquo;t
know because the errors all get swallowed. Panic mode ends when the parser
reaches a synchronization point. For Lox, we chose statement boundaries, so when
we later add those to our compiler, we&rsquo;ll clear the flag there.</p>
<p>These new fields need to be initialized.</p>
<div class="codehilite"><pre class="insert-before">  initScanner(source);
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>compile</em>()</div>
<pre class="insert">

  <span class="i">parser</span>.<span class="i">hadError</span> = <span class="k">false</span>;
  <span class="i">parser</span>.<span class="i">panicMode</span> = <span class="k">false</span>;

</pre><pre class="insert-after">  advance();
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>compile</em>()</div>

<p>And to display the errors, we need a standard header.</p>
<div class="codehilite"><pre class="insert-before">#include &lt;stdio.h&gt;
</pre><div class="source-file"><em>compiler.c</em></div>
<pre class="insert"><span class="a">#include &lt;stdlib.h&gt;</span>
</pre><pre class="insert-after">

#include &quot;common.h&quot;
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em></div>

<p>There&rsquo;s one last parsing function, another old friend from jlox.</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>advance</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">consume</span>(<span class="t">TokenType</span> <span class="i">type</span>, <span class="k">const</span> <span class="t">char</span>* <span class="i">message</span>) {
  <span class="k">if</span> (<span class="i">parser</span>.<span class="i">current</span>.<span class="i">type</span> == <span class="i">type</span>) {
    <span class="i">advance</span>();
    <span class="k">return</span>;
  }

  <span class="i">errorAtCurrent</span>(<span class="i">message</span>);
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>advance</em>()</div>

<p>It&rsquo;s similar to <code>advance()</code> in that it reads the next token. But it also
validates that the token has an expected type. If not, it reports an error. This
function is the foundation of most syntax errors in the compiler.</p>
<p>OK, that&rsquo;s enough on the front end for now.</p>
<h2><a href="#emitting-bytecode" id="emitting-bytecode"><small>17&#8202;.&#8202;3</small>Emitting Bytecode</a></h2>
<p>After we parse and understand a piece of the user&rsquo;s program, the next step is to
translate that to a series of bytecode instructions. It starts with the easiest
possible step: appending a single byte to the chunk.</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>consume</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">emitByte</span>(<span class="t">uint8_t</span> <span class="i">byte</span>) {
  <span class="i">writeChunk</span>(<span class="i">currentChunk</span>(), <span class="i">byte</span>, <span class="i">parser</span>.<span class="i">previous</span>.<span class="i">line</span>);
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>consume</em>()</div>

<p>It&rsquo;s hard to believe great things will flow through such a simple function. It
writes the given byte, which may be an opcode or an operand to an instruction.
It sends in the previous token&rsquo;s line information so that runtime errors are
associated with that line.</p>
<p>The chunk that we&rsquo;re writing gets passed into <code>compile()</code>, but it needs to make
its way to <code>emitByte()</code>. To do that, we rely on this intermediary function:</p>
<div class="codehilite"><pre class="insert-before">Parser parser;
</pre><div class="source-file"><em>compiler.c</em><br>
add after variable <em>parser</em></div>
<pre class="insert"><span class="t">Chunk</span>* <span class="i">compilingChunk</span>;

<span class="k">static</span> <span class="t">Chunk</span>* <span class="i">currentChunk</span>() {
  <span class="k">return</span> <span class="i">compilingChunk</span>;
}

</pre><pre class="insert-after">static void errorAt(Token* token, const char* message) {
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after variable <em>parser</em></div>

<p>Right now, the chunk pointer is stored in a module-level variable like we store
other global state. Later, when we start compiling user-defined functions, the
notion of &ldquo;current chunk&rdquo; gets more complicated. To avoid having to go back and
change a lot of code, I encapsulate that logic in the <code>currentChunk()</code> function.</p>
<p>We initialize this new module variable before we write any bytecode:</p>
<div class="codehilite"><pre class="insert-before">bool compile(const char* source, Chunk* chunk) {
  initScanner(source);
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>compile</em>()</div>
<pre class="insert">  <span class="i">compilingChunk</span> = <span class="i">chunk</span>;
</pre><pre class="insert-after">

  parser.hadError = false;
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>compile</em>()</div>

<p>Then, at the very end, when we&rsquo;re done compiling the chunk, we wrap things up.</p>
<div class="codehilite"><pre class="insert-before">  consume(TOKEN_EOF, &quot;Expect end of expression.&quot;);
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>compile</em>()</div>
<pre class="insert">  <span class="i">endCompiler</span>();
</pre><pre class="insert-after">  return !parser.hadError;
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>compile</em>()</div>

<p>That calls this:</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>emitByte</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">endCompiler</span>() {
  <span class="i">emitReturn</span>();
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>emitByte</em>()</div>

<p>In this chapter, our VM deals only with expressions. When you run clox, it will
parse, compile, and execute a single expression, then print the result. To print
that value, we are temporarily using the <code>OP_RETURN</code> instruction. So we have the
compiler add one of those to the end of the chunk.</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>emitByte</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">emitReturn</span>() {
  <span class="i">emitByte</span>(<span class="a">OP_RETURN</span>);
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>emitByte</em>()</div>

<p>While we&rsquo;re here in the back end we may as well make our lives easier.</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>emitByte</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">emitBytes</span>(<span class="t">uint8_t</span> <span class="i">byte1</span>, <span class="t">uint8_t</span> <span class="i">byte2</span>) {
  <span class="i">emitByte</span>(<span class="i">byte1</span>);
  <span class="i">emitByte</span>(<span class="i">byte2</span>);
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>emitByte</em>()</div>

<p>Over time, we&rsquo;ll have enough cases where we need to write an opcode followed by
a one-byte operand that it&rsquo;s worth defining this convenience function.</p>
<h2><a href="#parsing-prefix-expressions" id="parsing-prefix-expressions"><small>17&#8202;.&#8202;4</small>Parsing Prefix Expressions</a></h2>
<p>We&rsquo;ve assembled our parsing and code generation utility functions. The missing
piece is the code in the middle that connects those together.</p><img src="image/compiling-expressions/mystery.png" alt="Parsing functions on the left, bytecode emitting functions on the right. What goes in the middle?" />
<p>The only step in <code>compile()</code> that we have left to implement is this function:</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>endCompiler</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">expression</span>() {
  <span class="c">// What goes here?</span>
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>endCompiler</em>()</div>

<p>We aren&rsquo;t ready to implement every kind of expression in Lox yet. Heck, we don&rsquo;t
even have Booleans. For this chapter, we&rsquo;re only going to worry about four:</p>
<ul>
<li>Number literals: <code>123</code></li>
<li>Parentheses for grouping: <code>(123)</code></li>
<li>Unary negation: <code>-123</code></li>
<li>The Four Horsemen of the Arithmetic: <code>+</code>, <code>-</code>, <code>*</code>, <code>/</code></li>
</ul>
<p>As we work through the functions to compile each of those kinds of expressions,
we&rsquo;ll also assemble the requirements for the table-driven parser that calls
them.</p>
<h3><a href="#parsers-for-tokens" id="parsers-for-tokens"><small>17&#8202;.&#8202;4&#8202;.&#8202;1</small>Parsers for tokens</a></h3>
<p>For now, let&rsquo;s focus on the Lox expressions that are each only a single token.
In this chapter, that&rsquo;s just number literals, but there will be more later. Here&rsquo;s
how we can compile them:</p>
<p>We map each token type to a different kind of expression. We define a function
for each expression that outputs the appropriate bytecode. Then we build an
array of function pointers. The indexes in the array correspond to the
<code>TokenType</code> enum values, and the function at each index is the code to compile
an expression of that token type.</p>
<p>To compile number literals, we store a pointer to the following function at the
<code>TOKEN_NUMBER</code> index in the array.</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>endCompiler</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">number</span>() {
  <span class="t">double</span> <span class="i">value</span> = <span class="i">strtod</span>(<span class="i">parser</span>.<span class="i">previous</span>.<span class="i">start</span>, <span class="a">NULL</span>);
  <span class="i">emitConstant</span>(<span class="i">value</span>);
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>endCompiler</em>()</div>

<p>We assume the token for the number literal has already been consumed and is
stored in <code>previous</code>. We take that lexeme and use the C standard library to
convert it to a double value. Then we generate the code to load that value using
this function:</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>emitReturn</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">emitConstant</span>(<span class="t">Value</span> <span class="i">value</span>) {
  <span class="i">emitBytes</span>(<span class="a">OP_CONSTANT</span>, <span class="i">makeConstant</span>(<span class="i">value</span>));
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>emitReturn</em>()</div>

<p>First, we add the value to the constant table, then we emit an <code>OP_CONSTANT</code>
instruction that pushes it onto the stack at runtime. To insert an entry in the
constant table, we rely on:</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>emitReturn</em>()</div>
<pre><span class="k">static</span> <span class="t">uint8_t</span> <span class="i">makeConstant</span>(<span class="t">Value</span> <span class="i">value</span>) {
  <span class="t">int</span> <span class="i">constant</span> = <span class="i">addConstant</span>(<span class="i">currentChunk</span>(), <span class="i">value</span>);
  <span class="k">if</span> (<span class="i">constant</span> &gt; <span class="a">UINT8_MAX</span>) {
    <span class="i">error</span>(<span class="s">&quot;Too many constants in one chunk.&quot;</span>);
    <span class="k">return</span> <span class="n">0</span>;
  }

  <span class="k">return</span> (<span class="t">uint8_t</span>)<span class="i">constant</span>;
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>emitReturn</em>()</div>

<p>Most of the work happens in <code>addConstant()</code>, which we defined back in an
<a href="chunks-of-bytecode.html">earlier chapter</a>. That adds the given value to the end of the chunk&rsquo;s
constant table and returns its index. The new function&rsquo;s job is mostly to make
sure we don&rsquo;t have too many constants. Since the <code>OP_CONSTANT</code> instruction uses
a single byte for the index operand, we can store and load only up to <span
name="256">256</span> constants in a chunk.</p>
<aside name="256">
<p>Yes, that limit is pretty low. If this were a full-sized language
implementation, we&rsquo;d want to add another instruction like <code>OP_CONSTANT_16</code> that
stores the index as a two-byte operand so we could handle more constants when
needed.</p>
<p>The code to support that isn&rsquo;t particularly illuminating, so I omitted it from
clox, but you&rsquo;ll want your VMs to scale to larger programs.</p>
</aside>
<p>That&rsquo;s basically all it takes. Provided there is some suitable code that
consumes a <code>TOKEN_NUMBER</code> token, looks up <code>number()</code> in the function pointer
array, and then calls it, we can now compile number literals to bytecode.</p>
<h3><a href="#parentheses-for-grouping" id="parentheses-for-grouping"><small>17&#8202;.&#8202;4&#8202;.&#8202;2</small>Parentheses for grouping</a></h3>
<p>Our as-yet-imaginary array of parsing function pointers would be great if every
expression was only a single token long. Alas, most are longer. However, many
expressions <em>start</em> with a particular token. We call these <em>prefix</em> expressions.
For example, when we&rsquo;re parsing an expression and the current token is <code>(</code>, we
know we must be looking at a parenthesized grouping expression.</p>
<p>It turns out our function pointer array handles those too. The parsing function
for an expression type can consume any additional tokens that it wants to, just
like in a regular recursive descent parser. Here&rsquo;s how parentheses work:</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>endCompiler</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">grouping</span>() {
  <span class="i">expression</span>();
  <span class="i">consume</span>(<span class="a">TOKEN_RIGHT_PAREN</span>, <span class="s">&quot;Expect &#39;)&#39; after expression.&quot;</span>);
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>endCompiler</em>()</div>

<p>Again, we assume the initial <code>(</code> has already been consumed. We <span
name="recursive">recursively</span> call back into <code>expression()</code> to compile the
expression between the parentheses, then parse the closing <code>)</code> at the end.</p>
<aside name="recursive">
<p>A Pratt parser isn&rsquo;t a recursive <em>descent</em> parser, but it&rsquo;s still recursive.
That&rsquo;s to be expected since the grammar itself is recursive.</p>
</aside>
<p>As far as the back end is concerned, there&rsquo;s literally nothing to a grouping
expression. Its sole function is syntactic<span class="em">&mdash;</span>it lets you insert a
lower-precedence expression where a higher precedence is expected. Thus, it has
no runtime semantics on its own and therefore doesn&rsquo;t emit any bytecode. The
inner call to <code>expression()</code> takes care of generating bytecode for the
expression inside the parentheses.</p>
<h3><a href="#unary-negation" id="unary-negation"><small>17&#8202;.&#8202;4&#8202;.&#8202;3</small>Unary negation</a></h3>
<p>Unary minus is also a prefix expression, so it works with our model too.</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>number</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">unary</span>() {
  <span class="t">TokenType</span> <span class="i">operatorType</span> = <span class="i">parser</span>.<span class="i">previous</span>.<span class="i">type</span>;

  <span class="c">// Compile the operand.</span>
  <span class="i">expression</span>();

  <span class="c">// Emit the operator instruction.</span>
  <span class="k">switch</span> (<span class="i">operatorType</span>) {
    <span class="k">case</span> <span class="a">TOKEN_MINUS</span>: <span class="i">emitByte</span>(<span class="a">OP_NEGATE</span>); <span class="k">break</span>;
    <span class="k">default</span>: <span class="k">return</span>; <span class="c">// Unreachable.</span>
  }
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>number</em>()</div>

<p>The leading <code>-</code> token has been consumed and is sitting in <code>parser.previous</code>. We
grab the token type from that to note which unary operator we&rsquo;re dealing with.
It&rsquo;s unnecessary right now, but this will make more sense when we use this same
function to compile the <code>!</code> operator in <a href="types-of-values.html">the next chapter</a>.</p>
<p>As in <code>grouping()</code>, we recursively call <code>expression()</code> to compile the operand.
After that, we emit the bytecode to perform the negation. It might seem a little
weird to write the negate instruction <em>after</em> its operand&rsquo;s bytecode since the
<code>-</code> appears on the left, but think about it in terms of order of execution:</p>
<ol>
<li>
<p>We evaluate the operand first which leaves its value on the stack.</p>
</li>
<li>
<p>Then we pop that value, negate it, and push the result.</p>
</li>
</ol>
<p>So the <code>OP_NEGATE</code> instruction should be emitted <span name="line">last</span>.
This is part of the compiler&rsquo;s job<span class="em">&mdash;</span>parsing the program in the order it
appears in the source code and rearranging it into the order that execution
happens.</p>
<aside name="line">
<p>Emitting the <code>OP_NEGATE</code> instruction after the operands does mean that the
current token when the bytecode is written is <em>not</em> the <code>-</code> token. That mostly
doesn&rsquo;t matter, except that we use that token for the line number to associate
with that instruction.</p>
<p>This means if you have a multi-line negation expression, like:</p>
<div class="codehilite"><pre><span class="k">print</span> -
  <span class="k">true</span>;
</pre></div>
<p>Then the runtime error will be reported on the wrong line. Here, it would show
the error on line 2, even though the <code>-</code> is on line 1. A more robust approach
would be to store the token&rsquo;s line before compiling the operand and then pass
that into <code>emitByte()</code>, but I wanted to keep things simple for the book.</p>
</aside>
<p>There is one problem with this code, though. The <code>expression()</code> function it
calls will parse any expression for the operand, regardless of precedence. Once
we add binary operators and other syntax, that will do the wrong thing.
Consider:</p>
<div class="codehilite"><pre>-<span class="i">a</span>.<span class="i">b</span> + <span class="i">c</span>;
</pre></div>
<p>Here, the operand to <code>-</code> should be just the <code>a.b</code> expression, not the entire
<code>a.b + c</code>. But if <code>unary()</code> calls <code>expression()</code>, the latter will happily chew
through all of the remaining code including the <code>+</code>. It will erroneously treat
the <code>-</code> as lower precedence than the <code>+</code>.</p>
<p>When parsing the operand to unary <code>-</code>, we need to compile only expressions at a
certain precedence level or higher. In jlox&rsquo;s recursive descent parser we
accomplished that by calling into the parsing method for the lowest-precedence
expression we wanted to allow (in this case, <code>call()</code>). Each method for parsing
a specific expression also parsed any expressions of higher precedence too, so
that included the rest of the precedence table.</p>
<p>The parsing functions like <code>number()</code> and <code>unary()</code> here in clox are different.
Each only parses exactly one type of expression. They don&rsquo;t cascade to include
higher-precedence expression types too. We need a different solution, and it
looks like this:</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>unary</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">parsePrecedence</span>(<span class="t">Precedence</span> <span class="i">precedence</span>) {
  <span class="c">// What goes here?</span>
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>unary</em>()</div>

<p>This function<span class="em">&mdash;</span>once we implement it<span class="em">&mdash;</span>starts at the current token and parses
any expression at the given precedence level or higher. We have some other setup
to get through before we can write the body of this function, but you can
probably guess that it will use that table of parsing function pointers I&rsquo;ve
been talking about. For now, don&rsquo;t worry too much about how it works. In order
to take the &ldquo;precedence&rdquo; as a parameter, we define it numerically.</p>
<div class="codehilite"><pre class="insert-before">} Parser;
</pre><div class="source-file"><em>compiler.c</em><br>
add after struct <em>Parser</em></div>
<pre class="insert">

<span class="k">typedef</span> <span class="k">enum</span> {
  <span class="a">PREC_NONE</span>,
  <span class="a">PREC_ASSIGNMENT</span>,  <span class="c">// =</span>
  <span class="a">PREC_OR</span>,          <span class="c">// or</span>
  <span class="a">PREC_AND</span>,         <span class="c">// and</span>
  <span class="a">PREC_EQUALITY</span>,    <span class="c">// == !=</span>
  <span class="a">PREC_COMPARISON</span>,  <span class="c">// &lt; &gt; &lt;= &gt;=</span>
  <span class="a">PREC_TERM</span>,        <span class="c">// + -</span>
  <span class="a">PREC_FACTOR</span>,      <span class="c">// * /</span>
  <span class="a">PREC_UNARY</span>,       <span class="c">// ! -</span>
  <span class="a">PREC_CALL</span>,        <span class="c">// . ()</span>
  <span class="a">PREC_PRIMARY</span>
} <span class="t">Precedence</span>;
</pre><pre class="insert-after">

Parser parser;
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after struct <em>Parser</em></div>

<p>These are all of Lox&rsquo;s precedence levels in order from lowest to highest. Since
C implicitly gives successively larger numbers for enums, this means that
<code>PREC_CALL</code> is numerically larger than <code>PREC_UNARY</code>. For example, say the
compiler is sitting on a chunk of code like:</p>
<div class="codehilite"><pre>-<span class="i">a</span>.<span class="i">b</span> + <span class="i">c</span>
</pre></div>
<p>If we call <code>parsePrecedence(PREC_ASSIGNMENT)</code>, then it will parse the entire
expression because <code>+</code> has higher precedence than assignment. If instead we
call <code>parsePrecedence(PREC_UNARY)</code>, it will compile the <code>-a.b</code> and stop there.
It doesn&rsquo;t keep going through the <code>+</code> because the addition has lower precedence
than unary operators.</p>
<p>With this function in hand, it&rsquo;s a snap to fill in the missing body for
<code>expression()</code>.</p>
<div class="codehilite"><pre class="insert-before">static void expression() {
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>expression</em>()<br>
replace 1 line</div>
<pre class="insert">  <span class="i">parsePrecedence</span>(<span class="a">PREC_ASSIGNMENT</span>);
</pre><pre class="insert-after">}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>expression</em>(), replace 1 line</div>

<p>We simply parse the lowest precedence level, which subsumes all of the
higher-precedence expressions too. Now, to compile the operand for a unary
expression, we call this new function and limit it to the appropriate level:</p>
<div class="codehilite"><pre class="insert-before">  // Compile the operand.
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>unary</em>()<br>
replace 1 line</div>
<pre class="insert">  <span class="i">parsePrecedence</span>(<span class="a">PREC_UNARY</span>);
</pre><pre class="insert-after">

  // Emit the operator instruction.
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>unary</em>(), replace 1 line</div>

<p>We use the unary operator&rsquo;s own <code>PREC_UNARY</code> precedence to permit <span
name="useful">nested</span> unary expressions like <code>!!doubleNegative</code>. Since
unary operators have pretty high precedence, that correctly excludes things like
binary operators. Speaking of which<span class="ellipse">&thinsp;.&thinsp;.&thinsp;.&nbsp;</span></p>
<aside name="useful">
<p>Not that nesting unary expressions is particularly useful in Lox. But other
languages let you do it, so we do too.</p>
</aside>
<h2><a href="#parsing-infix-expressions" id="parsing-infix-expressions"><small>17&#8202;.&#8202;5</small>Parsing Infix Expressions</a></h2>
<p>Binary operators are different from the previous expressions because they are
<em>infix</em>. With the other expressions, we know what we are parsing from the very
first token. With infix expressions, we don&rsquo;t know we&rsquo;re in the middle of a
binary operator until <em>after</em> we&rsquo;ve parsed its left operand and then stumbled
onto the operator token in the middle.</p>
<p>Here&rsquo;s an example:</p>
<div class="codehilite"><pre><span class="n">1</span> + <span class="n">2</span>
</pre></div>
<p>Let&rsquo;s walk through trying to compile it with what we know so far:</p>
<ol>
<li>
<p>We call <code>expression()</code>. That in turn calls
<code>parsePrecedence(PREC_ASSIGNMENT)</code>.</p>
</li>
<li>
<p>That function (once we implement it) sees the leading number token and
recognizes it is parsing a number literal. It hands off control to
<code>number()</code>.</p>
</li>
<li>
<p><code>number()</code> creates a constant, emits an <code>OP_CONSTANT</code>, and returns back to
<code>parsePrecedence()</code>.</p>
</li>
</ol>
<p>Now what? The call to <code>parsePrecedence()</code> should consume the entire addition
expression, so it needs to keep going somehow. Fortunately, the parser is right
where we need it to be. Now that we&rsquo;ve compiled the leading number expression,
the next token is <code>+</code>. That&rsquo;s the exact token that <code>parsePrecedence()</code> needs to
detect that we&rsquo;re in the middle of an infix expression and to realize that the
expression we already compiled is actually an operand to that.</p>
<p>So this hypothetical array of function pointers doesn&rsquo;t just list functions to
parse expressions that start with a given token. Instead, it&rsquo;s a <em>table</em> of
function pointers. One column associates prefix parser functions with token
types. The second column associates infix parser functions with token types.</p>
<p>The function we will use as the infix parser for <code>TOKEN_PLUS</code>, <code>TOKEN_MINUS</code>,
<code>TOKEN_STAR</code>, and <code>TOKEN_SLASH</code> is this:</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>endCompiler</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">binary</span>() {
  <span class="t">TokenType</span> <span class="i">operatorType</span> = <span class="i">parser</span>.<span class="i">previous</span>.<span class="i">type</span>;
  <span class="t">ParseRule</span>* <span class="i">rule</span> = <span class="i">getRule</span>(<span class="i">operatorType</span>);
  <span class="i">parsePrecedence</span>((<span class="t">Precedence</span>)(<span class="i">rule</span>-&gt;<span class="i">precedence</span> + <span class="n">1</span>));

  <span class="k">switch</span> (<span class="i">operatorType</span>) {
    <span class="k">case</span> <span class="a">TOKEN_PLUS</span>:          <span class="i">emitByte</span>(<span class="a">OP_ADD</span>); <span class="k">break</span>;
    <span class="k">case</span> <span class="a">TOKEN_MINUS</span>:         <span class="i">emitByte</span>(<span class="a">OP_SUBTRACT</span>); <span class="k">break</span>;
    <span class="k">case</span> <span class="a">TOKEN_STAR</span>:          <span class="i">emitByte</span>(<span class="a">OP_MULTIPLY</span>); <span class="k">break</span>;
    <span class="k">case</span> <span class="a">TOKEN_SLASH</span>:         <span class="i">emitByte</span>(<span class="a">OP_DIVIDE</span>); <span class="k">break</span>;
    <span class="k">default</span>: <span class="k">return</span>; <span class="c">// Unreachable.</span>
  }
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>endCompiler</em>()</div>

<p>When a prefix parser function is called, the leading token has already been
consumed. An infix parser function is even more <em>in medias res</em><span class="em">&mdash;</span>the entire
left-hand operand expression has already been compiled and the subsequent infix
operator consumed.</p>
<p>The fact that the left operand gets compiled first works out fine. It means at
runtime, that code gets executed first. When it runs, the value it produces will
end up on the stack. That&rsquo;s right where the infix operator is going to need it.</p>
<p>Then we come here to <code>binary()</code> to handle the rest of the arithmetic operators.
This function compiles the right operand, much like how <code>unary()</code> compiles its
own trailing operand. Finally, it emits the bytecode instruction that performs
the binary operation.</p>
<p>When run, the VM will execute the left and right operand code, in that order,
leaving their values on the stack. Then it executes the instruction for the
operator. That pops the two values, computes the operation, and pushes the
result.</p>
<p>The code that probably caught your eye here is that <code>getRule()</code> line. When we
parse the right-hand operand, we again need to worry about precedence. Take an
expression like:</p>
<div class="codehilite"><pre><span class="n">2</span> * <span class="n">3</span> + <span class="n">4</span>
</pre></div>
<p>When we parse the right operand of the <code>*</code> expression, we need to just capture
<code>3</code>, and not <code>3 + 4</code>, because <code>+</code> is lower precedence than <code>*</code>. We could define
a separate function for each binary operator. Each would call
<code>parsePrecedence()</code> and pass in the correct precedence level for its operand.</p>
<p>But that&rsquo;s kind of tedious. Each binary operator&rsquo;s right-hand operand precedence
is one level <span name="higher">higher</span> than its own. We can look that up
dynamically with this <code>getRule()</code> thing we&rsquo;ll get to soon. Using that, we call
<code>parsePrecedence()</code> with one level higher than this operator&rsquo;s level.</p>
<aside name="higher">
<p>We use one <em>higher</em> level of precedence for the right operand because the binary
operators are left-associative. Given a series of the <em>same</em> operator, like:</p>
<div class="codehilite"><pre><span class="n">1</span> + <span class="n">2</span> + <span class="n">3</span> + <span class="n">4</span>
</pre></div>
<p>We want to parse it like:</p>
<div class="codehilite"><pre>((<span class="n">1</span> + <span class="n">2</span>) + <span class="n">3</span>) + <span class="n">4</span>
</pre></div>
<p>Thus, when parsing the right-hand operand to the first <code>+</code>, we want to consume
the <code>2</code>, but not the rest, so we use one level above <code>+</code>&rsquo;s precedence. But if
our operator was <em>right</em>-associative, this would be wrong. Given:</p>
<div class="codehilite"><pre><span class="i">a</span> = <span class="i">b</span> = <span class="i">c</span> = <span class="i">d</span>
</pre></div>
<p>Since assignment is right-associative, we want to parse it as:</p>
<div class="codehilite"><pre><span class="i">a</span> = (<span class="i">b</span> = (<span class="i">c</span> = <span class="i">d</span>))
</pre></div>
<p>To enable that, we would call <code>parsePrecedence()</code> with the <em>same</em> precedence as
the current operator.</p>
</aside>
<p>This way, we can use a single <code>binary()</code> function for all binary operators even
though they have different precedences.</p>
<h2><a href="#a-pratt-parser" id="a-pratt-parser"><small>17&#8202;.&#8202;6</small>A Pratt Parser</a></h2>
<p>We now have all of the pieces and parts of the compiler laid out. We have a
function for each grammar production: <code>number()</code>, <code>grouping()</code>, <code>unary()</code>, and
<code>binary()</code>. We still need to implement <code>parsePrecedence()</code>, and <code>getRule()</code>. We
also know we need a table that, given a token type, lets us find</p>
<ul>
<li>
<p>the function to compile a prefix expression starting with a token of that
type,</p>
</li>
<li>
<p>the function to compile an infix expression whose left operand is followed
by a token of that type, and</p>
</li>
<li>
<p>the precedence of an <span name="prefix">infix</span> expression that uses
that token as an operator.</p>
</li>
</ul>
<aside name="prefix">
<p>We don&rsquo;t need to track the precedence of the <em>prefix</em> expression starting with a
given token because all prefix operators in Lox have the same precedence.</p>
</aside>
<p>We wrap these three properties in a little struct which represents a single row
in the parser table.</p>
<div class="codehilite"><pre class="insert-before">} Precedence;
</pre><div class="source-file"><em>compiler.c</em><br>
add after enum <em>Precedence</em></div>
<pre class="insert">

<span class="k">typedef</span> <span class="k">struct</span> {
  <span class="t">ParseFn</span> <span class="i">prefix</span>;
  <span class="t">ParseFn</span> <span class="i">infix</span>;
  <span class="t">Precedence</span> <span class="i">precedence</span>;
} <span class="t">ParseRule</span>;
</pre><pre class="insert-after">

Parser parser;
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after enum <em>Precedence</em></div>

<p>That ParseFn type is a simple <span name="typedef">typedef</span> for a function
type that takes no arguments and returns nothing.</p>
<aside name="typedef" class="bottom">
<p>C&rsquo;s syntax for function pointer types is so bad that I always hide it behind a
typedef. I understand the intent behind the syntax<span class="em">&mdash;</span>the whole &ldquo;declaration
reflects use&rdquo; thing<span class="em">&mdash;</span>but I think it was a failed syntactic experiment.</p>
</aside>
<div class="codehilite"><pre class="insert-before">} Precedence;
</pre><div class="source-file"><em>compiler.c</em><br>
add after enum <em>Precedence</em></div>
<pre class="insert">

<span class="k">typedef</span> <span class="t">void</span> (*<span class="t">ParseFn</span>)();
</pre><pre class="insert-after">

typedef struct {
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after enum <em>Precedence</em></div>

<p>The table that drives our whole parser is an array of ParseRules. We&rsquo;ve been
talking about it forever, and finally you get to see it.</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>unary</em>()</div>
<pre><span class="t">ParseRule</span> <span class="i">rules</span>[] = {
  [<span class="a">TOKEN_LEFT_PAREN</span>]    = {<span class="i">grouping</span>, <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_RIGHT_PAREN</span>]   = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_LEFT_BRACE</span>]    = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},<span name="big"> </span>
  [<span class="a">TOKEN_RIGHT_BRACE</span>]   = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_COMMA</span>]         = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_DOT</span>]           = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_MINUS</span>]         = {<span class="i">unary</span>,    <span class="i">binary</span>, <span class="a">PREC_TERM</span>},
  [<span class="a">TOKEN_PLUS</span>]          = {<span class="a">NULL</span>,     <span class="i">binary</span>, <span class="a">PREC_TERM</span>},
  [<span class="a">TOKEN_SEMICOLON</span>]     = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_SLASH</span>]         = {<span class="a">NULL</span>,     <span class="i">binary</span>, <span class="a">PREC_FACTOR</span>},
  [<span class="a">TOKEN_STAR</span>]          = {<span class="a">NULL</span>,     <span class="i">binary</span>, <span class="a">PREC_FACTOR</span>},
  [<span class="a">TOKEN_BANG</span>]          = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_BANG_EQUAL</span>]    = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_EQUAL</span>]         = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_EQUAL_EQUAL</span>]   = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_GREATER</span>]       = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_GREATER_EQUAL</span>] = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_LESS</span>]          = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_LESS_EQUAL</span>]    = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_IDENTIFIER</span>]    = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_STRING</span>]        = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_NUMBER</span>]        = {<span class="i">number</span>,   <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_AND</span>]           = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_CLASS</span>]         = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_ELSE</span>]          = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_FALSE</span>]         = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_FOR</span>]           = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_FUN</span>]           = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_IF</span>]            = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_NIL</span>]           = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_OR</span>]            = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_PRINT</span>]         = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_RETURN</span>]        = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_SUPER</span>]         = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_THIS</span>]          = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_TRUE</span>]          = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_VAR</span>]           = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_WHILE</span>]         = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_ERROR</span>]         = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
  [<span class="a">TOKEN_EOF</span>]           = {<span class="a">NULL</span>,     <span class="a">NULL</span>,   <span class="a">PREC_NONE</span>},
};
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>unary</em>()</div>

<aside name="big">
<p>See what I mean about not wanting to revisit the table each time we needed a new
column? It&rsquo;s a beast.</p>
<p>If you haven&rsquo;t seen the <code>[TOKEN_DOT] =</code> syntax in a C array literal, that is
C99&rsquo;s designated initializer syntax. It&rsquo;s clearer than having to count array
indexes by hand.</p>
</aside>
<p>You can see how <code>grouping</code> and <code>unary</code> are slotted into the prefix parser column
for their respective token types. In the next column, <code>binary</code> is wired up to
the four arithmetic infix operators. Those infix operators also have their
precedences set in the last column.</p>
<p>Aside from those, the rest of the table is full of <code>NULL</code> and <code>PREC_NONE</code>. Most
of those empty cells are because there is no expression associated with those
tokens. You can&rsquo;t start an expression with, say, <code>else</code>, and <code>}</code> would make for
a pretty confusing infix operator.</p>
<p>But, also, we haven&rsquo;t filled in the entire grammar yet. In later chapters, as we
add new expression types, some of these slots will get functions in them. One of
the things I like about this approach to parsing is that it makes it very easy
to see which tokens are in use by the grammar and which are available.</p>
<p>Now that we have the table, we are finally ready to write the code that uses it.
This is where our Pratt parser comes to life. The easiest function to define is
<code>getRule()</code>.</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>parsePrecedence</em>()</div>
<pre><span class="k">static</span> <span class="t">ParseRule</span>* <span class="i">getRule</span>(<span class="t">TokenType</span> <span class="i">type</span>) {
  <span class="k">return</span> &amp;<span class="i">rules</span>[<span class="i">type</span>];
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>parsePrecedence</em>()</div>

<p>It simply returns the rule at the given index. It&rsquo;s called by <code>binary()</code> to look
up the precedence of the current operator. This function exists solely to handle
a declaration cycle in the C code. <code>binary()</code> is defined <em>before</em> the rules
table so that the table can store a pointer to it. That means the body of
<code>binary()</code> cannot access the table directly.</p>
<p>Instead, we wrap the lookup in a function. That lets us forward declare
<code>getRule()</code> before the definition of <code>binary()</code>, and <span
name="forward">then</span> <em>define</em> <code>getRule()</code> after the table. We&rsquo;ll need a
couple of other forward declarations to handle the fact that our grammar is
recursive, so let&rsquo;s get them all out of the way.</p>
<aside name="forward">
<p>This is what happens when you write your VM in a language that was designed to
be compiled on a PDP-11.</p>
</aside>
<div class="codehilite"><pre class="insert-before">  emitReturn();
}
</pre><div class="source-file"><em>compiler.c</em><br>
add after <em>endCompiler</em>()</div>
<pre class="insert">

<span class="k">static</span> <span class="t">void</span> <span class="i">expression</span>();
<span class="k">static</span> <span class="t">ParseRule</span>* <span class="i">getRule</span>(<span class="t">TokenType</span> <span class="i">type</span>);
<span class="k">static</span> <span class="t">void</span> <span class="i">parsePrecedence</span>(<span class="t">Precedence</span> <span class="i">precedence</span>);

</pre><pre class="insert-after">static void binary() {
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>endCompiler</em>()</div>

<p>If you&rsquo;re following along and implementing clox yourself, pay close attention to
the little annotations that tell you where to put these code snippets. Don&rsquo;t
worry, though, if you get it wrong, the C compiler will be happy to tell you.</p>
<h3><a href="#parsing-with-precedence" id="parsing-with-precedence"><small>17&#8202;.&#8202;6&#8202;.&#8202;1</small>Parsing with precedence</a></h3>
<p>Now we&rsquo;re getting to the fun stuff. The maestro that orchestrates all of the
parsing functions we&rsquo;ve defined is <code>parsePrecedence()</code>. Let&rsquo;s start with parsing
prefix expressions.</p>
<div class="codehilite"><pre class="insert-before">static void parsePrecedence(Precedence precedence) {
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>parsePrecedence</em>()<br>
replace 1 line</div>
<pre class="insert">  <span class="i">advance</span>();
  <span class="t">ParseFn</span> <span class="i">prefixRule</span> = <span class="i">getRule</span>(<span class="i">parser</span>.<span class="i">previous</span>.<span class="i">type</span>)-&gt;<span class="i">prefix</span>;
  <span class="k">if</span> (<span class="i">prefixRule</span> == <span class="a">NULL</span>) {
    <span class="i">error</span>(<span class="s">&quot;Expect expression.&quot;</span>);
    <span class="k">return</span>;
  }

  <span class="i">prefixRule</span>();
</pre><pre class="insert-after">}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>parsePrecedence</em>(), replace 1 line</div>

<p>We read the next token and look up the corresponding ParseRule. If there is no
prefix parser, then the token must be a syntax error. We report that and return
to the caller.</p>
<p>Otherwise, we call that prefix parse function and let it do its thing. That
prefix parser compiles the rest of the prefix expression, consuming any other
tokens it needs, and returns back here. Infix expressions are where it gets
interesting since precedence comes into play. The implementation is remarkably
simple.</p>
<div class="codehilite"><pre class="insert-before">  prefixRule();
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>parsePrecedence</em>()</div>
<pre class="insert">

  <span class="k">while</span> (<span class="i">precedence</span> &lt;= <span class="i">getRule</span>(<span class="i">parser</span>.<span class="i">current</span>.<span class="i">type</span>)-&gt;<span class="i">precedence</span>) {
    <span class="i">advance</span>();
    <span class="t">ParseFn</span> <span class="i">infixRule</span> = <span class="i">getRule</span>(<span class="i">parser</span>.<span class="i">previous</span>.<span class="i">type</span>)-&gt;<span class="i">infix</span>;
    <span class="i">infixRule</span>();
  }
</pre><pre class="insert-after">}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>parsePrecedence</em>()</div>

<p>That&rsquo;s the whole thing. Really. Here&rsquo;s how the entire function works: At the
beginning of <code>parsePrecedence()</code>, we look up a prefix parser for the current
token. The first token is <em>always</em> going to belong to some kind of prefix
expression, by definition. It may turn out to be nested as an operand inside one
or more infix expressions, but as you read the code from left to right, the
first token you hit always belongs to a prefix expression.</p>
<p>After parsing that, which may consume more tokens, the prefix expression is
done. Now we look for an infix parser for the next token. If we find one, it
means the prefix expression we already compiled might be an operand for it. But
only if the call to <code>parsePrecedence()</code> has a <code>precedence</code> that is low enough to
permit that infix operator.</p>
<p>If the next token is too low precedence, or isn&rsquo;t an infix operator at all,
we&rsquo;re done. We&rsquo;ve parsed as much expression as we can. Otherwise, we consume the
operator and hand off control to the infix parser we found. It consumes whatever
other tokens it needs (usually the right operand) and returns back to
<code>parsePrecedence()</code>. Then we loop back around and see if the <em>next</em> token is
also a valid infix operator that can take the entire preceding expression as its
operand. We keep looping like that, crunching through infix operators and their
operands until we hit a token that isn&rsquo;t an infix operator or is too low
precedence and stop.</p>
<p>That&rsquo;s a lot of prose, but if you really want to mind meld with Vaughan Pratt
and fully understand the algorithm, step through the parser in your debugger as
it works through some expressions. Maybe a picture will help. There&rsquo;s only a
handful of functions, but they are marvelously intertwined:</p>
<p><span name="connections"></span></p>
<p><img src="image/compiling-expressions/connections.png" alt="The various parsing
functions and how they call each other." /></p>
<aside name="connections">
<p>The <img src="image/compiling-expressions/calls.png" alt="A solid arrow."
class="arrow" /> arrow connects a function to another function it directly
calls. The <img src="image/compiling-expressions/points-to.png" alt="An open
arrow." class="arrow" /> arrow shows the table&rsquo;s pointers to the parsing
functions.</p>
</aside>
<p>Later, we&rsquo;ll need to tweak the code in this chapter to handle assignment. But,
otherwise, what we wrote covers all of our expression compiling needs for the
rest of the book. We&rsquo;ll plug additional parsing functions into the table when we
add new kinds of expressions, but <code>parsePrecedence()</code> is complete.</p>
<h2><a href="#dumping-chunks" id="dumping-chunks"><small>17&#8202;.&#8202;7</small>Dumping Chunks</a></h2>
<p>While we&rsquo;re here in the core of our compiler, we should put in some
instrumentation. To help debug the generated bytecode, we&rsquo;ll add support for
dumping the chunk once the compiler finishes. We had some temporary logging
earlier when we hand-authored the chunk. Now we&rsquo;ll put in some real code so that
we can enable it whenever we want.</p>
<p>Since this isn&rsquo;t for end users, we hide it behind a flag.</p>
<div class="codehilite"><pre class="insert-before">#include &lt;stdint.h&gt;

</pre><div class="source-file"><em>common.h</em></div>
<pre class="insert"><span class="a">#define DEBUG_PRINT_CODE</span>
</pre><pre class="insert-after">#define DEBUG_TRACE_EXECUTION
</pre></div>
<div class="source-file-narrow"><em>common.h</em></div>

<p>When that flag is defined, we use our existing &ldquo;debug&rdquo; module to print out the
chunk&rsquo;s bytecode.</p>
<div class="codehilite"><pre class="insert-before">  emitReturn();
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>endCompiler</em>()</div>
<pre class="insert"><span class="a">#ifdef DEBUG_PRINT_CODE</span>
  <span class="k">if</span> (!<span class="i">parser</span>.<span class="i">hadError</span>) {
    <span class="i">disassembleChunk</span>(<span class="i">currentChunk</span>(), <span class="s">&quot;code&quot;</span>);
  }
<span class="a">#endif</span>
</pre><pre class="insert-after">}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>endCompiler</em>()</div>

<p>We do this only if the code was free of errors. After a syntax error, the
compiler keeps on going but it&rsquo;s in kind of a weird state and might produce
broken code. That&rsquo;s harmless because it won&rsquo;t get executed, but we&rsquo;ll just
confuse ourselves if we try to read it.</p>
<p>Finally, to access <code>disassembleChunk()</code>, we need to include its header.</p>
<div class="codehilite"><pre class="insert-before">#include &quot;scanner.h&quot;
</pre><div class="source-file"><em>compiler.c</em></div>
<pre class="insert">

<span class="a">#ifdef DEBUG_PRINT_CODE</span>
<span class="a">#include &quot;debug.h&quot;</span>
<span class="a">#endif</span>
</pre><pre class="insert-after">

typedef struct {
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em></div>

<p>We made it! This was the last major section to install in our VM&rsquo;s compilation
and execution pipeline. Our interpreter doesn&rsquo;t <em>look</em> like much, but inside it
is scanning, parsing, compiling to bytecode, and executing.</p>
<p>Fire up the VM and type in an expression. If we did everything right, it should
calculate and print the result. We now have a very over-engineered arithmetic
calculator. We have a lot of language features to add in the coming chapters,
but the foundation is in place.</p>
<div class="challenges">
<h2><a href="#challenges" id="challenges">Challenges</a></h2>
<ol>
<li>
<p>To really understand the parser, you need to see how execution threads
through the interesting parsing functions<span class="em">&mdash;</span><code>parsePrecedence()</code> and the
parser functions stored in the table. Take this (strange) expression:</p>
<div class="codehilite"><pre>(-<span class="n">1</span> + <span class="n">2</span>) * <span class="n">3</span> - -<span class="n">4</span>
</pre></div>
<p>Write a trace of how those functions are called. Show the order they are
called, which calls which, and the arguments passed to them.</p>
</li>
<li>
<p>The ParseRule row for <code>TOKEN_MINUS</code> has both prefix and infix function
pointers. That&rsquo;s because <code>-</code> is both a prefix operator (unary negation) and
an infix one (subtraction).</p>
<p>In the full Lox language, what other tokens can be used in both prefix and
infix positions? What about in C or in another language of your choice?</p>
</li>
<li>
<p>You might be wondering about complex &ldquo;mixfix&rdquo; expressions that have more
than two operands separated by tokens. C&rsquo;s conditional or &ldquo;ternary&rdquo;
operator, <code>?:</code>, is a widely known one.</p>
<p>Add support for that operator to the compiler. You don&rsquo;t have to generate
any bytecode, just show how you would hook it up to the parser and handle
the operands.</p>
</li>
</ol>
</div>
<div class="design-note">
<h2><a href="#design-note" id="design-note">Design Note: It&rsquo;s Just Parsing</a></h2>
<p>I&rsquo;m going to make a claim here that will be unpopular with some compiler and
language people. It&rsquo;s OK if you don&rsquo;t agree. Personally, I learn more from
strongly stated opinions that I disagree with than I do from several pages of
qualifiers and equivocation. My claim is that <em>parsing doesn&rsquo;t matter</em>.</p>
<p>Over the years, many programming language people, especially in academia, have
gotten <em>really</em> into parsers and taken them very seriously. Initially, it was
the compiler folks who got into <span name="yacc">compiler-compilers</span>,
LALR, and other stuff like that. The first half of the dragon book is a long
love letter to the wonders of parser generators.</p>
<aside name="yacc">
<p>All of us suffer from the vice of &ldquo;when all you have is a hammer, everything
looks like a nail&rdquo;, but perhaps none so visibly as compiler people. You wouldn&rsquo;t
believe the breadth of software problems that miraculously seem to require a new
little language in their solution as soon as you ask a compiler hacker for help.</p>
<p>Yacc and other compiler-compilers are the most delightfully recursive example.
&ldquo;Wow, writing compilers is a chore. I know, let&rsquo;s write a compiler to write our
compiler for us.&rdquo;</p>
<p>For the record, I don&rsquo;t claim immunity to this affliction.</p>
</aside>
<p>Later, the functional programming folks got into parser combinators, packrat
parsers, and other sorts of things. Because, obviously, if you give a functional
programmer a problem, the first thing they&rsquo;ll do is whip out a pocketful of
higher-order functions.</p>
<p>Over in math and algorithm analysis land, there is a long legacy of research
into proving time and memory usage for various parsing techniques, transforming
parsing problems into other problems and back, and assigning complexity classes
to different grammars.</p>
<p>At one level, this stuff is important. If you&rsquo;re implementing a language, you
want some assurance that your parser won&rsquo;t go exponential and take 7,000 years
to parse a weird edge case in the grammar. Parser theory gives you that bound.
As an intellectual exercise, learning about parsing techniques is also fun and
rewarding.</p>
<p>But if your goal is just to implement a language and get it in front of users,
almost all of that stuff doesn&rsquo;t matter. It&rsquo;s really easy to get worked up by
the enthusiasm of the people who <em>are</em> into it and think that your front end
<em>needs</em> some whiz-bang generated combinator-parser-factory thing. I&rsquo;ve seen
people burn tons of time writing and rewriting their parser using whatever
today&rsquo;s hot library or technique is.</p>
<p>That&rsquo;s time that doesn&rsquo;t add any value to your user&rsquo;s life. If you&rsquo;re just
trying to get your parser done, pick one of the bog-standard techniques, use it,
and move on. Recursive descent, Pratt parsing, and the popular parser generators
like ANTLR or Bison are all fine.</p>
<p>Take the extra time you saved not rewriting your parsing code and spend it
improving the compile error messages your compiler shows users. Good error
handling and reporting is more valuable to users than almost anything else you
can put time into in the front end.</p>
</div>

<footer>
<a href="types-of-values.html" class="next">
  Next Chapter: &ldquo;Types of Values&rdquo; &rarr;
</a>
Handcrafted by Robert Nystrom&ensp;&mdash;&ensp;<a href="https://github.com/munificent/craftinginterpreters/blob/master/LICENSE" target="_blank">&copy; 2015&hairsp;&ndash;&hairsp;2021</a>
</footer>
</article>

</div>
</body>
</html>