-
Notifications
You must be signed in to change notification settings - Fork 1.1k
/
Copy pathcompiling-expressions.html
1330 lines (1224 loc) · 82.9 KB
/
compiling-expressions.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-type" content="text/html;charset=UTF-8" />
<title>Compiling Expressions · Crafting Interpreters</title>
<!-- Tell mobile browsers we're optimized for them and they don't need to crop
the viewport. -->
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<link rel="stylesheet" type="text/css" href="style.css" />
<!-- Oh, God, Source Code Pro is so beautiful it makes me want to cry. -->
<link href='https://fonts.googleapis.com/css?family=Source+Code+Pro:400|Source+Sans+Pro:300,400,600' rel='stylesheet' type='text/css'>
<link rel="icon" type="image/png" href="image/favicon.png" />
<script src="jquery-3.4.1.min.js"></script>
<script src="script.js"></script>
<!-- Google analytics -->
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-42804721-2', 'auto');
ga('send', 'pageview');
</script>
</head>
<body id="top">
<!-- <div class="scrim"></div> -->
<nav class="wide">
<a href="/"><img src="image/logotype.png" title="Crafting Interpreters"></a>
<div class="contents">
<h3><a href="#top">Compiling Expressions<small>17</small></a></h3>
<ul>
<li><a href="#single-pass-compilation"><small>17.1</small> Single-Pass Compilation</a></li>
<li><a href="#parsing-tokens"><small>17.2</small> Parsing Tokens</a></li>
<li><a href="#emitting-bytecode"><small>17.3</small> Emitting Bytecode</a></li>
<li><a href="#parsing-prefix-expressions"><small>17.4</small> Parsing Prefix Expressions</a></li>
<li><a href="#parsing-infix-expressions"><small>17.5</small> Parsing Infix Expressions</a></li>
<li><a href="#a-pratt-parser"><small>17.6</small> A Pratt Parser</a></li>
<li><a href="#dumping-chunks"><small>17.7</small> Dumping Chunks</a></li>
<li class="divider"></li>
<li class="end-part"><a href="#challenges">Challenges</a></li>
<li class="end-part"><a href="#design-note"><small>note</small>It's Just Parsing</a></li>
</ul>
<div class="prev-next">
<a href="scanning-on-demand.html" title="Scanning on Demand" class="left">← Previous</a>
<a href="a-bytecode-virtual-machine.html" title="A Bytecode Virtual Machine">↑ Up</a>
<a href="types-of-values.html" title="Types of Values" class="right">Next →</a>
</div> </div>
</nav>
<nav class="narrow">
<a href="/"><img src="image/logotype.png" title="Crafting Interpreters"></a>
<a href="scanning-on-demand.html" title="Scanning on Demand" class="prev">←</a>
<a href="types-of-values.html" title="Types of Values" class="next">→</a>
</nav>
<div class="page">
<div class="nav-wrapper">
<nav class="floating">
<a href="/"><img src="image/logotype.png" title="Crafting Interpreters"></a>
<div class="expandable">
<h3><a href="#top">Compiling Expressions<small>17</small></a></h3>
<ul>
<li><a href="#single-pass-compilation"><small>17.1</small> Single-Pass Compilation</a></li>
<li><a href="#parsing-tokens"><small>17.2</small> Parsing Tokens</a></li>
<li><a href="#emitting-bytecode"><small>17.3</small> Emitting Bytecode</a></li>
<li><a href="#parsing-prefix-expressions"><small>17.4</small> Parsing Prefix Expressions</a></li>
<li><a href="#parsing-infix-expressions"><small>17.5</small> Parsing Infix Expressions</a></li>
<li><a href="#a-pratt-parser"><small>17.6</small> A Pratt Parser</a></li>
<li><a href="#dumping-chunks"><small>17.7</small> Dumping Chunks</a></li>
<li class="divider"></li>
<li class="end-part"><a href="#challenges">Challenges</a></li>
<li class="end-part"><a href="#design-note"><small>note</small>It's Just Parsing</a></li>
</ul>
<div class="prev-next">
<a href="scanning-on-demand.html" title="Scanning on Demand" class="left">← Previous</a>
<a href="a-bytecode-virtual-machine.html" title="A Bytecode Virtual Machine">↑ Up</a>
<a href="types-of-values.html" title="Types of Values" class="right">Next →</a>
</div> </div>
<a id="expand-nav">≡</a>
</nav>
</div>
<article class="chapter">
<div class="number">17</div>
<h1>Compiling Expressions</h1>
<blockquote>
<p>In the middle of the journey of our life I found myself within a dark woods
where the straight way was lost.</p>
<p><cite>Dante Alighieri, <em>Inferno</em></cite></p>
</blockquote>
<p>This chapter is exciting for not one, not two, but <em>three</em> reasons. First, it
provides the final segment of our VM’s execution pipeline. Once in place, we can
plumb the user’s source code from scanning all the way through to executing it.</p><img src="image/compiling-expressions/pipeline.png" alt="Lowering the 'compiler' section of pipe between 'scanner' and 'VM'." />
<p>Second, we get to write an actual, honest-to-God <em>compiler</em>. It parses source
code and outputs a low-level series of binary instructions. Sure, it’s <span
name="wirth">bytecode</span> and not some chip’s native instruction set, but
it’s way closer to the metal than jlox was. We’re about to be real language
hackers.</p>
<aside name="wirth">
<p>Bytecode was good enough for Niklaus Wirth, and no one questions his street
cred.</p>
</aside>
<p><span name="pratt">Third</span> and finally, I get to show you one of my
absolute favorite algorithms: Vaughan Pratt’s “top-down operator precedence
parsing”. It’s the most elegant way I know to parse expressions. It gracefully
handles prefix operators, postfix, infix, <em>mixfix</em>, any kind of <em>-fix</em> you got.
It deals with precedence and associativity without breaking a sweat. I love it.</p>
<aside name="pratt">
<p>Pratt parsers are a sort of oral tradition in industry. No compiler or language
book I’ve read teaches them. Academia is very focused on generated parsers, and
Pratt’s technique is for handwritten ones, so it gets overlooked.</p>
<p>But in production compilers, where hand-rolled parsers are common, you’d be
surprised how many people know it. Ask where they learned it, and it’s always,
“Oh, I worked on this compiler years ago and my coworker said they took it from
this old front end<span class="ellipse"> . . . </span>”</p>
</aside>
<p>As usual, before we get to the fun stuff, we’ve got some preliminaries to work
through. You have to eat your vegetables before you get dessert. First, let’s
ditch that temporary scaffolding we wrote for testing the scanner and replace it
with something more useful.</p>
<div class="codehilite"><pre class="insert-before">InterpretResult interpret(const char* source) {
</pre><div class="source-file"><em>vm.c</em><br>
in <em>interpret</em>()<br>
replace 2 lines</div>
<pre class="insert"> <span class="t">Chunk</span> <span class="i">chunk</span>;
<span class="i">initChunk</span>(&<span class="i">chunk</span>);
<span class="k">if</span> (!<span class="i">compile</span>(<span class="i">source</span>, &<span class="i">chunk</span>)) {
<span class="i">freeChunk</span>(&<span class="i">chunk</span>);
<span class="k">return</span> <span class="a">INTERPRET_COMPILE_ERROR</span>;
}
<span class="i">vm</span>.<span class="i">chunk</span> = &<span class="i">chunk</span>;
<span class="i">vm</span>.<span class="i">ip</span> = <span class="i">vm</span>.<span class="i">chunk</span>-><span class="i">code</span>;
<span class="t">InterpretResult</span> <span class="i">result</span> = <span class="i">run</span>();
<span class="i">freeChunk</span>(&<span class="i">chunk</span>);
<span class="k">return</span> <span class="i">result</span>;
</pre><pre class="insert-after">}
</pre></div>
<div class="source-file-narrow"><em>vm.c</em>, in <em>interpret</em>(), replace 2 lines</div>
<p>We create a new empty chunk and pass it over to the compiler. The compiler will
take the user’s program and fill up the chunk with bytecode. At least, that’s
what it will do if the program doesn’t have any compile errors. If it does
encounter an error, <code>compile()</code> returns <code>false</code> and we discard the unusable
chunk.</p>
<p>Otherwise, we send the completed chunk over to the VM to be executed. When the
VM finishes, we free the chunk and we’re done. As you can see, the signature to
<code>compile()</code> is different now.</p>
<div class="codehilite"><pre class="insert-before">#define clox_compiler_h
</pre><div class="source-file"><em>compiler.h</em><br>
replace 1 line</div>
<pre class="insert"><span class="a">#include "vm.h"</span>
<span class="t">bool</span> <span class="i">compile</span>(<span class="k">const</span> <span class="t">char</span>* <span class="i">source</span>, <span class="t">Chunk</span>* <span class="i">chunk</span>);
</pre><pre class="insert-after">
#endif
</pre></div>
<div class="source-file-narrow"><em>compiler.h</em>, replace 1 line</div>
<p>We pass in the chunk where the compiler will write the code, and then
<code>compile()</code> returns whether or not compilation succeeded. We make the same
change to the signature in the implementation.</p>
<div class="codehilite"><pre class="insert-before">#include "scanner.h"
</pre><div class="source-file"><em>compiler.c</em><br>
function <em>compile</em>()<br>
replace 1 line</div>
<pre class="insert"><span class="t">bool</span> <span class="i">compile</span>(<span class="k">const</span> <span class="t">char</span>* <span class="i">source</span>, <span class="t">Chunk</span>* <span class="i">chunk</span>) {
</pre><pre class="insert-after"> initScanner(source);
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, function <em>compile</em>(), replace 1 line</div>
<p>That call to <code>initScanner()</code> is the only line that survives this chapter. Rip
out the temporary code we wrote to test the scanner and replace it with these
three lines:</p>
<div class="codehilite"><pre class="insert-before"> initScanner(source);
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>compile</em>()<br>
replace 13 lines</div>
<pre class="insert"> <span class="i">advance</span>();
<span class="i">expression</span>();
<span class="i">consume</span>(<span class="a">TOKEN_EOF</span>, <span class="s">"Expect end of expression."</span>);
</pre><pre class="insert-after">}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>compile</em>(), replace 13 lines</div>
<p>The call to <code>advance()</code> “primes the pump” on the scanner. We’ll see what it does
soon. Then we parse a single expression. We aren’t going to do statements yet,
so that’s the only subset of the grammar we support. We’ll revisit this when we
<a href="global-variables.html">add statements in a few chapters</a>. After we compile the expression, we
should be at the end of the source code, so we check for the sentinel EOF token.</p>
<p>We’re going to spend the rest of the chapter making this function work,
especially that little <code>expression()</code> call. Normally, we’d dive right into that
function definition and work our way through the implementation from top to
bottom.</p>
<p>This chapter is <span name="blog">different</span>. Pratt’s parsing technique is
remarkably simple once you have it all loaded in your head, but it’s a little
tricky to break into bite-sized pieces. It’s recursive, of course, which is part
of the problem. But it also relies on a big table of data. As we build up the
algorithm, that table grows additional columns.</p>
<aside name="blog">
<p>If this chapter isn’t clicking with you and you’d like another take on the
concepts, I wrote an article that teaches the same algorithm but using Java and
an object-oriented style: <a href="http://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/">“Pratt Parsing: Expression Parsing Made Easy”</a>.</p>
</aside>
<p>I don’t want to revisit 40-something lines of code each time we extend the
table. So we’re going to work our way into the core of the parser from the
outside and cover all of the surrounding bits before we get to the juicy center.
This will require a little more patience and mental scratch space than most
chapters, but it’s the best I could do.</p>
<h2><a href="#single-pass-compilation" id="single-pass-compilation"><small>17 . 1</small>Single-Pass Compilation</a></h2>
<p>A compiler has roughly two jobs. It parses the user’s source code to understand
what it means. Then it takes that knowledge and outputs low-level instructions
that produce the same semantics. Many languages split those two roles into two
separate <span name="passes">passes</span> in the implementation. A parser
produces an AST<span class="em">—</span>just like jlox does<span class="em">—</span>and then a code generator traverses
the AST and outputs target code.</p>
<aside name="passes">
<p>In fact, most sophisticated optimizing compilers have a heck of a lot more than
two passes. Determining not just <em>what</em> optimization passes to have, but how to
order them to squeeze the most performance out of the compiler<span class="em">—</span>since the
optimizations often interact in complex ways<span class="em">—</span>is somewhere between an “open
area of research” and a “dark art”.</p>
</aside>
<p>In clox, we’re taking an old-school approach and merging these two passes into
one. Back in the day, language hackers did this because computers literally
didn’t have enough memory to store an entire source file’s AST. We’re doing it
because it keeps our compiler simpler, which is a real asset when programming in
C.</p>
<p>Single-pass compilers like we’re going to build don’t work well for all
languages. Since the compiler has only a peephole view into the user’s program
while generating code, the language must be designed such that you don’t need
much surrounding context to understand a piece of syntax. Fortunately, tiny,
dynamically typed Lox is <span name="lox">well-suited</span> to that.</p>
<aside name="lox">
<p>Not that this should come as much of a surprise. I did design the language
specifically for this book after all.</p><img src="image/compiling-expressions/keyhole.png" alt="Peering through a keyhole at 'var x;'" />
</aside>
<p>What this means in practical terms is that our “compiler” C module has
functionality you’ll recognize from jlox for parsing<span class="em">—</span>consuming tokens,
matching expected token types, etc. And it also has functions for code gen<span class="em">—</span>emitting bytecode and adding constants to the destination chunk. (And it means
I’ll use “parsing” and “compiling” interchangeably throughout this and later
chapters.)</p>
<p>We’ll build the parsing and code generation halves first. Then we’ll stitch them
together with the code in the middle that uses Pratt’s technique to parse Lox’s
particular grammar and output the right bytecode.</p>
<h2><a href="#parsing-tokens" id="parsing-tokens"><small>17 . 2</small>Parsing Tokens</a></h2>
<p>First up, the front half of the compiler. This function’s name should sound
familiar.</p>
<div class="codehilite"><pre class="insert-before">#include "scanner.h"
</pre><div class="source-file"><em>compiler.c</em></div>
<pre class="insert">
<span class="k">static</span> <span class="t">void</span> <span class="i">advance</span>() {
<span class="i">parser</span>.<span class="i">previous</span> = <span class="i">parser</span>.<span class="i">current</span>;
<span class="k">for</span> (;;) {
<span class="i">parser</span>.<span class="i">current</span> = <span class="i">scanToken</span>();
<span class="k">if</span> (<span class="i">parser</span>.<span class="i">current</span>.<span class="i">type</span> != <span class="a">TOKEN_ERROR</span>) <span class="k">break</span>;
<span class="i">errorAtCurrent</span>(<span class="i">parser</span>.<span class="i">current</span>.<span class="i">start</span>);
}
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em></div>
<p>Just like in jlox, it steps forward through the token stream. It asks the
scanner for the next token and stores it for later use. Before doing that, it
takes the old <code>current</code> token and stashes that in a <code>previous</code> field. That will
come in handy later so that we can get at the lexeme after we match a token.</p>
<p>The code to read the next token is wrapped in a loop. Remember, clox’s scanner
doesn’t report lexical errors. Instead, it creates special <em>error tokens</em> and
leaves it up to the parser to report them. We do that here.</p>
<p>We keep looping, reading tokens and reporting the errors, until we hit a
non-error one or reach the end. That way, the rest of the parser sees only valid
tokens. The current and previous token are stored in this struct:</p>
<div class="codehilite"><pre class="insert-before">#include "scanner.h"
</pre><div class="source-file"><em>compiler.c</em></div>
<pre class="insert">
<span class="k">typedef</span> <span class="k">struct</span> {
<span class="t">Token</span> <span class="i">current</span>;
<span class="t">Token</span> <span class="i">previous</span>;
} <span class="t">Parser</span>;
<span class="t">Parser</span> <span class="i">parser</span>;
</pre><pre class="insert-after">
static void advance() {
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em></div>
<p>Like we did in other modules, we have a single global variable of this struct
type so we don’t need to pass the state around from function to function in the
compiler.</p>
<h3><a href="#handling-syntax-errors" id="handling-syntax-errors"><small>17 . 2 . 1</small>Handling syntax errors</a></h3>
<p>If the scanner hands us an error token, we need to actually tell the user. That
happens using this:</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after variable <em>parser</em></div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">errorAtCurrent</span>(<span class="k">const</span> <span class="t">char</span>* <span class="i">message</span>) {
<span class="i">errorAt</span>(&<span class="i">parser</span>.<span class="i">current</span>, <span class="i">message</span>);
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after variable <em>parser</em></div>
<p>We pull the location out of the current token in order to tell the user where
the error occurred and forward it to <code>errorAt()</code>. More often, we’ll report an
error at the location of the token we just consumed, so we give the shorter name
to this other function:</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after variable <em>parser</em></div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">error</span>(<span class="k">const</span> <span class="t">char</span>* <span class="i">message</span>) {
<span class="i">errorAt</span>(&<span class="i">parser</span>.<span class="i">previous</span>, <span class="i">message</span>);
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after variable <em>parser</em></div>
<p>The actual work happens here:</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after variable <em>parser</em></div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">errorAt</span>(<span class="t">Token</span>* <span class="i">token</span>, <span class="k">const</span> <span class="t">char</span>* <span class="i">message</span>) {
<span class="i">fprintf</span>(<span class="i">stderr</span>, <span class="s">"[line %d] Error"</span>, <span class="i">token</span>-><span class="i">line</span>);
<span class="k">if</span> (<span class="i">token</span>-><span class="i">type</span> == <span class="a">TOKEN_EOF</span>) {
<span class="i">fprintf</span>(<span class="i">stderr</span>, <span class="s">" at end"</span>);
} <span class="k">else</span> <span class="k">if</span> (<span class="i">token</span>-><span class="i">type</span> == <span class="a">TOKEN_ERROR</span>) {
<span class="c">// Nothing.</span>
} <span class="k">else</span> {
<span class="i">fprintf</span>(<span class="i">stderr</span>, <span class="s">" at '%.*s'"</span>, <span class="i">token</span>-><span class="i">length</span>, <span class="i">token</span>-><span class="i">start</span>);
}
<span class="i">fprintf</span>(<span class="i">stderr</span>, <span class="s">": %s</span><span class="e">\n</span><span class="s">"</span>, <span class="i">message</span>);
<span class="i">parser</span>.<span class="i">hadError</span> = <span class="k">true</span>;
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after variable <em>parser</em></div>
<p>First, we print where the error occurred. We try to show the lexeme if it’s
human-readable. Then we print the error message itself. After that, we set this
<code>hadError</code> flag. That records whether any errors occurred during compilation.
This field also lives in the parser struct.</p>
<div class="codehilite"><pre class="insert-before"> Token previous;
</pre><div class="source-file"><em>compiler.c</em><br>
in struct <em>Parser</em></div>
<pre class="insert"> <span class="t">bool</span> <span class="i">hadError</span>;
</pre><pre class="insert-after">} Parser;
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in struct <em>Parser</em></div>
<p>Earlier I said that <code>compile()</code> should return <code>false</code> if an error occurred. Now
we can make it do that.</p>
<div class="codehilite"><pre class="insert-before"> consume(TOKEN_EOF, "Expect end of expression.");
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>compile</em>()</div>
<pre class="insert"> <span class="k">return</span> !<span class="i">parser</span>.<span class="i">hadError</span>;
</pre><pre class="insert-after">}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>compile</em>()</div>
<p>I’ve got another flag to introduce for error handling. We want to avoid error
cascades. If the user has a mistake in their code and the parser gets confused
about where it is in the grammar, we don’t want it to spew out a whole pile of
meaningless knock-on errors after the first one.</p>
<p>We fixed that in jlox using panic mode error recovery. In the Java interpreter,
we threw an exception to unwind out of all of the parser code to a point where
we could skip tokens and resynchronize. We don’t have <span
name="setjmp">exceptions</span> in C. Instead, we’ll do a little smoke and
mirrors. We add a flag to track whether we’re currently in panic mode.</p>
<aside name="setjmp">
<p>There is <code>setjmp()</code> and <code>longjmp()</code>, but I’d rather not go there. Those make it
too easy to leak memory, forget to maintain invariants, or otherwise have a Very
Bad Day.</p>
</aside>
<div class="codehilite"><pre class="insert-before"> bool hadError;
</pre><div class="source-file"><em>compiler.c</em><br>
in struct <em>Parser</em></div>
<pre class="insert"> <span class="t">bool</span> <span class="i">panicMode</span>;
</pre><pre class="insert-after">} Parser;
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in struct <em>Parser</em></div>
<p>When an error occurs, we set it.</p>
<div class="codehilite"><pre class="insert-before">static void errorAt(Token* token, const char* message) {
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>errorAt</em>()</div>
<pre class="insert"> <span class="i">parser</span>.<span class="i">panicMode</span> = <span class="k">true</span>;
</pre><pre class="insert-after"> fprintf(stderr, "[line %d] Error", token->line);
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>errorAt</em>()</div>
<p>After that, we go ahead and keep compiling as normal as if the error never
occurred. The bytecode will never get executed, so it’s harmless to keep on
trucking. The trick is that while the panic mode flag is set, we simply suppress
any other errors that get detected.</p>
<div class="codehilite"><pre class="insert-before">static void errorAt(Token* token, const char* message) {
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>errorAt</em>()</div>
<pre class="insert"> <span class="k">if</span> (<span class="i">parser</span>.<span class="i">panicMode</span>) <span class="k">return</span>;
</pre><pre class="insert-after"> parser.panicMode = true;
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>errorAt</em>()</div>
<p>There’s a good chance the parser will go off in the weeds, but the user won’t
know because the errors all get swallowed. Panic mode ends when the parser
reaches a synchronization point. For Lox, we chose statement boundaries, so when
we later add those to our compiler, we’ll clear the flag there.</p>
<p>These new fields need to be initialized.</p>
<div class="codehilite"><pre class="insert-before"> initScanner(source);
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>compile</em>()</div>
<pre class="insert">
<span class="i">parser</span>.<span class="i">hadError</span> = <span class="k">false</span>;
<span class="i">parser</span>.<span class="i">panicMode</span> = <span class="k">false</span>;
</pre><pre class="insert-after"> advance();
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>compile</em>()</div>
<p>And to display the errors, we need a standard header.</p>
<div class="codehilite"><pre class="insert-before">#include <stdio.h>
</pre><div class="source-file"><em>compiler.c</em></div>
<pre class="insert"><span class="a">#include <stdlib.h></span>
</pre><pre class="insert-after">
#include "common.h"
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em></div>
<p>There’s one last parsing function, another old friend from jlox.</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>advance</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">consume</span>(<span class="t">TokenType</span> <span class="i">type</span>, <span class="k">const</span> <span class="t">char</span>* <span class="i">message</span>) {
<span class="k">if</span> (<span class="i">parser</span>.<span class="i">current</span>.<span class="i">type</span> == <span class="i">type</span>) {
<span class="i">advance</span>();
<span class="k">return</span>;
}
<span class="i">errorAtCurrent</span>(<span class="i">message</span>);
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>advance</em>()</div>
<p>It’s similar to <code>advance()</code> in that it reads the next token. But it also
validates that the token has an expected type. If not, it reports an error. This
function is the foundation of most syntax errors in the compiler.</p>
<p>OK, that’s enough on the front end for now.</p>
<h2><a href="#emitting-bytecode" id="emitting-bytecode"><small>17 . 3</small>Emitting Bytecode</a></h2>
<p>After we parse and understand a piece of the user’s program, the next step is to
translate that to a series of bytecode instructions. It starts with the easiest
possible step: appending a single byte to the chunk.</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>consume</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">emitByte</span>(<span class="t">uint8_t</span> <span class="i">byte</span>) {
<span class="i">writeChunk</span>(<span class="i">currentChunk</span>(), <span class="i">byte</span>, <span class="i">parser</span>.<span class="i">previous</span>.<span class="i">line</span>);
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>consume</em>()</div>
<p>It’s hard to believe great things will flow through such a simple function. It
writes the given byte, which may be an opcode or an operand to an instruction.
It sends in the previous token’s line information so that runtime errors are
associated with that line.</p>
<p>The chunk that we’re writing gets passed into <code>compile()</code>, but it needs to make
its way to <code>emitByte()</code>. To do that, we rely on this intermediary function:</p>
<div class="codehilite"><pre class="insert-before">Parser parser;
</pre><div class="source-file"><em>compiler.c</em><br>
add after variable <em>parser</em></div>
<pre class="insert"><span class="t">Chunk</span>* <span class="i">compilingChunk</span>;
<span class="k">static</span> <span class="t">Chunk</span>* <span class="i">currentChunk</span>() {
<span class="k">return</span> <span class="i">compilingChunk</span>;
}
</pre><pre class="insert-after">static void errorAt(Token* token, const char* message) {
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after variable <em>parser</em></div>
<p>Right now, the chunk pointer is stored in a module-level variable like we store
other global state. Later, when we start compiling user-defined functions, the
notion of “current chunk” gets more complicated. To avoid having to go back and
change a lot of code, I encapsulate that logic in the <code>currentChunk()</code> function.</p>
<p>We initialize this new module variable before we write any bytecode:</p>
<div class="codehilite"><pre class="insert-before">bool compile(const char* source, Chunk* chunk) {
initScanner(source);
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>compile</em>()</div>
<pre class="insert"> <span class="i">compilingChunk</span> = <span class="i">chunk</span>;
</pre><pre class="insert-after">
parser.hadError = false;
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>compile</em>()</div>
<p>Then, at the very end, when we’re done compiling the chunk, we wrap things up.</p>
<div class="codehilite"><pre class="insert-before"> consume(TOKEN_EOF, "Expect end of expression.");
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>compile</em>()</div>
<pre class="insert"> <span class="i">endCompiler</span>();
</pre><pre class="insert-after"> return !parser.hadError;
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>compile</em>()</div>
<p>That calls this:</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>emitByte</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">endCompiler</span>() {
<span class="i">emitReturn</span>();
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>emitByte</em>()</div>
<p>In this chapter, our VM deals only with expressions. When you run clox, it will
parse, compile, and execute a single expression, then print the result. To print
that value, we are temporarily using the <code>OP_RETURN</code> instruction. So we have the
compiler add one of those to the end of the chunk.</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>emitByte</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">emitReturn</span>() {
<span class="i">emitByte</span>(<span class="a">OP_RETURN</span>);
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>emitByte</em>()</div>
<p>While we’re here in the back end we may as well make our lives easier.</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>emitByte</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">emitBytes</span>(<span class="t">uint8_t</span> <span class="i">byte1</span>, <span class="t">uint8_t</span> <span class="i">byte2</span>) {
<span class="i">emitByte</span>(<span class="i">byte1</span>);
<span class="i">emitByte</span>(<span class="i">byte2</span>);
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>emitByte</em>()</div>
<p>Over time, we’ll have enough cases where we need to write an opcode followed by
a one-byte operand that it’s worth defining this convenience function.</p>
<h2><a href="#parsing-prefix-expressions" id="parsing-prefix-expressions"><small>17 . 4</small>Parsing Prefix Expressions</a></h2>
<p>We’ve assembled our parsing and code generation utility functions. The missing
piece is the code in the middle that connects those together.</p><img src="image/compiling-expressions/mystery.png" alt="Parsing functions on the left, bytecode emitting functions on the right. What goes in the middle?" />
<p>The only step in <code>compile()</code> that we have left to implement is this function:</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>endCompiler</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">expression</span>() {
<span class="c">// What goes here?</span>
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>endCompiler</em>()</div>
<p>We aren’t ready to implement every kind of expression in Lox yet. Heck, we don’t
even have Booleans. For this chapter, we’re only going to worry about four:</p>
<ul>
<li>Number literals: <code>123</code></li>
<li>Parentheses for grouping: <code>(123)</code></li>
<li>Unary negation: <code>-123</code></li>
<li>The Four Horsemen of the Arithmetic: <code>+</code>, <code>-</code>, <code>*</code>, <code>/</code></li>
</ul>
<p>As we work through the functions to compile each of those kinds of expressions,
we’ll also assemble the requirements for the table-driven parser that calls
them.</p>
<h3><a href="#parsers-for-tokens" id="parsers-for-tokens"><small>17 . 4 . 1</small>Parsers for tokens</a></h3>
<p>For now, let’s focus on the Lox expressions that are each only a single token.
In this chapter, that’s just number literals, but there will be more later. Here’s
how we can compile them:</p>
<p>We map each token type to a different kind of expression. We define a function
for each expression that outputs the appropriate bytecode. Then we build an
array of function pointers. The indexes in the array correspond to the
<code>TokenType</code> enum values, and the function at each index is the code to compile
an expression of that token type.</p>
<p>To compile number literals, we store a pointer to the following function at the
<code>TOKEN_NUMBER</code> index in the array.</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>endCompiler</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">number</span>() {
<span class="t">double</span> <span class="i">value</span> = <span class="i">strtod</span>(<span class="i">parser</span>.<span class="i">previous</span>.<span class="i">start</span>, <span class="a">NULL</span>);
<span class="i">emitConstant</span>(<span class="i">value</span>);
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>endCompiler</em>()</div>
<p>We assume the token for the number literal has already been consumed and is
stored in <code>previous</code>. We take that lexeme and use the C standard library to
convert it to a double value. Then we generate the code to load that value using
this function:</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>emitReturn</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">emitConstant</span>(<span class="t">Value</span> <span class="i">value</span>) {
<span class="i">emitBytes</span>(<span class="a">OP_CONSTANT</span>, <span class="i">makeConstant</span>(<span class="i">value</span>));
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>emitReturn</em>()</div>
<p>First, we add the value to the constant table, then we emit an <code>OP_CONSTANT</code>
instruction that pushes it onto the stack at runtime. To insert an entry in the
constant table, we rely on:</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>emitReturn</em>()</div>
<pre><span class="k">static</span> <span class="t">uint8_t</span> <span class="i">makeConstant</span>(<span class="t">Value</span> <span class="i">value</span>) {
<span class="t">int</span> <span class="i">constant</span> = <span class="i">addConstant</span>(<span class="i">currentChunk</span>(), <span class="i">value</span>);
<span class="k">if</span> (<span class="i">constant</span> > <span class="a">UINT8_MAX</span>) {
<span class="i">error</span>(<span class="s">"Too many constants in one chunk."</span>);
<span class="k">return</span> <span class="n">0</span>;
}
<span class="k">return</span> (<span class="t">uint8_t</span>)<span class="i">constant</span>;
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>emitReturn</em>()</div>
<p>Most of the work happens in <code>addConstant()</code>, which we defined back in an
<a href="chunks-of-bytecode.html">earlier chapter</a>. That adds the given value to the end of the chunk’s
constant table and returns its index. The new function’s job is mostly to make
sure we don’t have too many constants. Since the <code>OP_CONSTANT</code> instruction uses
a single byte for the index operand, we can store and load only up to <span
name="256">256</span> constants in a chunk.</p>
<aside name="256">
<p>Yes, that limit is pretty low. If this were a full-sized language
implementation, we’d want to add another instruction like <code>OP_CONSTANT_16</code> that
stores the index as a two-byte operand so we could handle more constants when
needed.</p>
<p>The code to support that isn’t particularly illuminating, so I omitted it from
clox, but you’ll want your VMs to scale to larger programs.</p>
</aside>
<p>That’s basically all it takes. Provided there is some suitable code that
consumes a <code>TOKEN_NUMBER</code> token, looks up <code>number()</code> in the function pointer
array, and then calls it, we can now compile number literals to bytecode.</p>
<h3><a href="#parentheses-for-grouping" id="parentheses-for-grouping"><small>17 . 4 . 2</small>Parentheses for grouping</a></h3>
<p>Our as-yet-imaginary array of parsing function pointers would be great if every
expression was only a single token long. Alas, most are longer. However, many
expressions <em>start</em> with a particular token. We call these <em>prefix</em> expressions.
For example, when we’re parsing an expression and the current token is <code>(</code>, we
know we must be looking at a parenthesized grouping expression.</p>
<p>It turns out our function pointer array handles those too. The parsing function
for an expression type can consume any additional tokens that it wants to, just
like in a regular recursive descent parser. Here’s how parentheses work:</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>endCompiler</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">grouping</span>() {
<span class="i">expression</span>();
<span class="i">consume</span>(<span class="a">TOKEN_RIGHT_PAREN</span>, <span class="s">"Expect ')' after expression."</span>);
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>endCompiler</em>()</div>
<p>Again, we assume the initial <code>(</code> has already been consumed. We <span
name="recursive">recursively</span> call back into <code>expression()</code> to compile the
expression between the parentheses, then parse the closing <code>)</code> at the end.</p>
<aside name="recursive">
<p>A Pratt parser isn’t a recursive <em>descent</em> parser, but it’s still recursive.
That’s to be expected since the grammar itself is recursive.</p>
</aside>
<p>As far as the back end is concerned, there’s literally nothing to a grouping
expression. Its sole function is syntactic<span class="em">—</span>it lets you insert a
lower-precedence expression where a higher precedence is expected. Thus, it has
no runtime semantics on its own and therefore doesn’t emit any bytecode. The
inner call to <code>expression()</code> takes care of generating bytecode for the
expression inside the parentheses.</p>
<h3><a href="#unary-negation" id="unary-negation"><small>17 . 4 . 3</small>Unary negation</a></h3>
<p>Unary minus is also a prefix expression, so it works with our model too.</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>number</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">unary</span>() {
<span class="t">TokenType</span> <span class="i">operatorType</span> = <span class="i">parser</span>.<span class="i">previous</span>.<span class="i">type</span>;
<span class="c">// Compile the operand.</span>
<span class="i">expression</span>();
<span class="c">// Emit the operator instruction.</span>
<span class="k">switch</span> (<span class="i">operatorType</span>) {
<span class="k">case</span> <span class="a">TOKEN_MINUS</span>: <span class="i">emitByte</span>(<span class="a">OP_NEGATE</span>); <span class="k">break</span>;
<span class="k">default</span>: <span class="k">return</span>; <span class="c">// Unreachable.</span>
}
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>number</em>()</div>
<p>The leading <code>-</code> token has been consumed and is sitting in <code>parser.previous</code>. We
grab the token type from that to note which unary operator we’re dealing with.
It’s unnecessary right now, but this will make more sense when we use this same
function to compile the <code>!</code> operator in <a href="types-of-values.html">the next chapter</a>.</p>
<p>As in <code>grouping()</code>, we recursively call <code>expression()</code> to compile the operand.
After that, we emit the bytecode to perform the negation. It might seem a little
weird to write the negate instruction <em>after</em> its operand’s bytecode since the
<code>-</code> appears on the left, but think about it in terms of order of execution:</p>
<ol>
<li>
<p>We evaluate the operand first which leaves its value on the stack.</p>
</li>
<li>
<p>Then we pop that value, negate it, and push the result.</p>
</li>
</ol>
<p>So the <code>OP_NEGATE</code> instruction should be emitted <span name="line">last</span>.
This is part of the compiler’s job<span class="em">—</span>parsing the program in the order it
appears in the source code and rearranging it into the order that execution
happens.</p>
<aside name="line">
<p>Emitting the <code>OP_NEGATE</code> instruction after the operands does mean that the
current token when the bytecode is written is <em>not</em> the <code>-</code> token. That mostly
doesn’t matter, except that we use that token for the line number to associate
with that instruction.</p>
<p>This means if you have a multi-line negation expression, like:</p>
<div class="codehilite"><pre><span class="k">print</span> -
<span class="k">true</span>;
</pre></div>
<p>Then the runtime error will be reported on the wrong line. Here, it would show
the error on line 2, even though the <code>-</code> is on line 1. A more robust approach
would be to store the token’s line before compiling the operand and then pass
that into <code>emitByte()</code>, but I wanted to keep things simple for the book.</p>
</aside>
<p>There is one problem with this code, though. The <code>expression()</code> function it
calls will parse any expression for the operand, regardless of precedence. Once
we add binary operators and other syntax, that will do the wrong thing.
Consider:</p>
<div class="codehilite"><pre>-<span class="i">a</span>.<span class="i">b</span> + <span class="i">c</span>;
</pre></div>
<p>Here, the operand to <code>-</code> should be just the <code>a.b</code> expression, not the entire
<code>a.b + c</code>. But if <code>unary()</code> calls <code>expression()</code>, the latter will happily chew
through all of the remaining code including the <code>+</code>. It will erroneously treat
the <code>-</code> as lower precedence than the <code>+</code>.</p>
<p>When parsing the operand to unary <code>-</code>, we need to compile only expressions at a
certain precedence level or higher. In jlox’s recursive descent parser we
accomplished that by calling into the parsing method for the lowest-precedence
expression we wanted to allow (in this case, <code>call()</code>). Each method for parsing
a specific expression also parsed any expressions of higher precedence too, so
that included the rest of the precedence table.</p>
<p>The parsing functions like <code>number()</code> and <code>unary()</code> here in clox are different.
Each only parses exactly one type of expression. They don’t cascade to include
higher-precedence expression types too. We need a different solution, and it
looks like this:</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>unary</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">parsePrecedence</span>(<span class="t">Precedence</span> <span class="i">precedence</span>) {
<span class="c">// What goes here?</span>
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>unary</em>()</div>
<p>This function<span class="em">—</span>once we implement it<span class="em">—</span>starts at the current token and parses
any expression at the given precedence level or higher. We have some other setup
to get through before we can write the body of this function, but you can
probably guess that it will use that table of parsing function pointers I’ve
been talking about. For now, don’t worry too much about how it works. In order
to take the “precedence” as a parameter, we define it numerically.</p>
<div class="codehilite"><pre class="insert-before">} Parser;
</pre><div class="source-file"><em>compiler.c</em><br>
add after struct <em>Parser</em></div>
<pre class="insert">
<span class="k">typedef</span> <span class="k">enum</span> {
<span class="a">PREC_NONE</span>,
<span class="a">PREC_ASSIGNMENT</span>, <span class="c">// =</span>
<span class="a">PREC_OR</span>, <span class="c">// or</span>
<span class="a">PREC_AND</span>, <span class="c">// and</span>
<span class="a">PREC_EQUALITY</span>, <span class="c">// == !=</span>
<span class="a">PREC_COMPARISON</span>, <span class="c">// < > <= >=</span>
<span class="a">PREC_TERM</span>, <span class="c">// + -</span>
<span class="a">PREC_FACTOR</span>, <span class="c">// * /</span>
<span class="a">PREC_UNARY</span>, <span class="c">// ! -</span>
<span class="a">PREC_CALL</span>, <span class="c">// . ()</span>
<span class="a">PREC_PRIMARY</span>
} <span class="t">Precedence</span>;
</pre><pre class="insert-after">
Parser parser;
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after struct <em>Parser</em></div>
<p>These are all of Lox’s precedence levels in order from lowest to highest. Since
C implicitly gives successively larger numbers for enums, this means that
<code>PREC_CALL</code> is numerically larger than <code>PREC_UNARY</code>. For example, say the
compiler is sitting on a chunk of code like:</p>
<div class="codehilite"><pre>-<span class="i">a</span>.<span class="i">b</span> + <span class="i">c</span>
</pre></div>
<p>If we call <code>parsePrecedence(PREC_ASSIGNMENT)</code>, then it will parse the entire
expression because <code>+</code> has higher precedence than assignment. If instead we
call <code>parsePrecedence(PREC_UNARY)</code>, it will compile the <code>-a.b</code> and stop there.
It doesn’t keep going through the <code>+</code> because the addition has lower precedence
than unary operators.</p>
<p>With this function in hand, it’s a snap to fill in the missing body for
<code>expression()</code>.</p>
<div class="codehilite"><pre class="insert-before">static void expression() {
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>expression</em>()<br>
replace 1 line</div>
<pre class="insert"> <span class="i">parsePrecedence</span>(<span class="a">PREC_ASSIGNMENT</span>);
</pre><pre class="insert-after">}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>expression</em>(), replace 1 line</div>
<p>We simply parse the lowest precedence level, which subsumes all of the
higher-precedence expressions too. Now, to compile the operand for a unary
expression, we call this new function and limit it to the appropriate level:</p>
<div class="codehilite"><pre class="insert-before"> // Compile the operand.
</pre><div class="source-file"><em>compiler.c</em><br>
in <em>unary</em>()<br>
replace 1 line</div>
<pre class="insert"> <span class="i">parsePrecedence</span>(<span class="a">PREC_UNARY</span>);
</pre><pre class="insert-after">
// Emit the operator instruction.
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, in <em>unary</em>(), replace 1 line</div>
<p>We use the unary operator’s own <code>PREC_UNARY</code> precedence to permit <span
name="useful">nested</span> unary expressions like <code>!!doubleNegative</code>. Since
unary operators have pretty high precedence, that correctly excludes things like
binary operators. Speaking of which<span class="ellipse"> . . . </span></p>
<aside name="useful">
<p>Not that nesting unary expressions is particularly useful in Lox. But other
languages let you do it, so we do too.</p>
</aside>
<h2><a href="#parsing-infix-expressions" id="parsing-infix-expressions"><small>17 . 5</small>Parsing Infix Expressions</a></h2>
<p>Binary operators are different from the previous expressions because they are
<em>infix</em>. With the other expressions, we know what we are parsing from the very
first token. With infix expressions, we don’t know we’re in the middle of a
binary operator until <em>after</em> we’ve parsed its left operand and then stumbled
onto the operator token in the middle.</p>
<p>Here’s an example:</p>
<div class="codehilite"><pre><span class="n">1</span> + <span class="n">2</span>
</pre></div>
<p>Let’s walk through trying to compile it with what we know so far:</p>
<ol>
<li>
<p>We call <code>expression()</code>. That in turn calls
<code>parsePrecedence(PREC_ASSIGNMENT)</code>.</p>
</li>
<li>
<p>That function (once we implement it) sees the leading number token and
recognizes it is parsing a number literal. It hands off control to
<code>number()</code>.</p>
</li>
<li>
<p><code>number()</code> creates a constant, emits an <code>OP_CONSTANT</code>, and returns back to
<code>parsePrecedence()</code>.</p>
</li>
</ol>
<p>Now what? The call to <code>parsePrecedence()</code> should consume the entire addition
expression, so it needs to keep going somehow. Fortunately, the parser is right
where we need it to be. Now that we’ve compiled the leading number expression,
the next token is <code>+</code>. That’s the exact token that <code>parsePrecedence()</code> needs to
detect that we’re in the middle of an infix expression and to realize that the
expression we already compiled is actually an operand to that.</p>
<p>So this hypothetical array of function pointers doesn’t just list functions to
parse expressions that start with a given token. Instead, it’s a <em>table</em> of
function pointers. One column associates prefix parser functions with token
types. The second column associates infix parser functions with token types.</p>
<p>The function we will use as the infix parser for <code>TOKEN_PLUS</code>, <code>TOKEN_MINUS</code>,
<code>TOKEN_STAR</code>, and <code>TOKEN_SLASH</code> is this:</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>endCompiler</em>()</div>
<pre><span class="k">static</span> <span class="t">void</span> <span class="i">binary</span>() {
<span class="t">TokenType</span> <span class="i">operatorType</span> = <span class="i">parser</span>.<span class="i">previous</span>.<span class="i">type</span>;
<span class="t">ParseRule</span>* <span class="i">rule</span> = <span class="i">getRule</span>(<span class="i">operatorType</span>);
<span class="i">parsePrecedence</span>((<span class="t">Precedence</span>)(<span class="i">rule</span>-><span class="i">precedence</span> + <span class="n">1</span>));
<span class="k">switch</span> (<span class="i">operatorType</span>) {
<span class="k">case</span> <span class="a">TOKEN_PLUS</span>: <span class="i">emitByte</span>(<span class="a">OP_ADD</span>); <span class="k">break</span>;
<span class="k">case</span> <span class="a">TOKEN_MINUS</span>: <span class="i">emitByte</span>(<span class="a">OP_SUBTRACT</span>); <span class="k">break</span>;
<span class="k">case</span> <span class="a">TOKEN_STAR</span>: <span class="i">emitByte</span>(<span class="a">OP_MULTIPLY</span>); <span class="k">break</span>;
<span class="k">case</span> <span class="a">TOKEN_SLASH</span>: <span class="i">emitByte</span>(<span class="a">OP_DIVIDE</span>); <span class="k">break</span>;
<span class="k">default</span>: <span class="k">return</span>; <span class="c">// Unreachable.</span>
}
}
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after <em>endCompiler</em>()</div>
<p>When a prefix parser function is called, the leading token has already been
consumed. An infix parser function is even more <em>in medias res</em><span class="em">—</span>the entire
left-hand operand expression has already been compiled and the subsequent infix
operator consumed.</p>
<p>The fact that the left operand gets compiled first works out fine. It means at
runtime, that code gets executed first. When it runs, the value it produces will
end up on the stack. That’s right where the infix operator is going to need it.</p>
<p>Then we come here to <code>binary()</code> to handle the rest of the arithmetic operators.
This function compiles the right operand, much like how <code>unary()</code> compiles its
own trailing operand. Finally, it emits the bytecode instruction that performs
the binary operation.</p>
<p>When run, the VM will execute the left and right operand code, in that order,
leaving their values on the stack. Then it executes the instruction for the
operator. That pops the two values, computes the operation, and pushes the
result.</p>
<p>The code that probably caught your eye here is that <code>getRule()</code> line. When we
parse the right-hand operand, we again need to worry about precedence. Take an
expression like:</p>
<div class="codehilite"><pre><span class="n">2</span> * <span class="n">3</span> + <span class="n">4</span>
</pre></div>
<p>When we parse the right operand of the <code>*</code> expression, we need to just capture
<code>3</code>, and not <code>3 + 4</code>, because <code>+</code> is lower precedence than <code>*</code>. We could define
a separate function for each binary operator. Each would call
<code>parsePrecedence()</code> and pass in the correct precedence level for its operand.</p>
<p>But that’s kind of tedious. Each binary operator’s right-hand operand precedence
is one level <span name="higher">higher</span> than its own. We can look that up
dynamically with this <code>getRule()</code> thing we’ll get to soon. Using that, we call
<code>parsePrecedence()</code> with one level higher than this operator’s level.</p>
<aside name="higher">
<p>We use one <em>higher</em> level of precedence for the right operand because the binary
operators are left-associative. Given a series of the <em>same</em> operator, like:</p>
<div class="codehilite"><pre><span class="n">1</span> + <span class="n">2</span> + <span class="n">3</span> + <span class="n">4</span>
</pre></div>
<p>We want to parse it like:</p>
<div class="codehilite"><pre>((<span class="n">1</span> + <span class="n">2</span>) + <span class="n">3</span>) + <span class="n">4</span>
</pre></div>
<p>Thus, when parsing the right-hand operand to the first <code>+</code>, we want to consume
the <code>2</code>, but not the rest, so we use one level above <code>+</code>’s precedence. But if
our operator was <em>right</em>-associative, this would be wrong. Given:</p>
<div class="codehilite"><pre><span class="i">a</span> = <span class="i">b</span> = <span class="i">c</span> = <span class="i">d</span>
</pre></div>
<p>Since assignment is right-associative, we want to parse it as:</p>
<div class="codehilite"><pre><span class="i">a</span> = (<span class="i">b</span> = (<span class="i">c</span> = <span class="i">d</span>))
</pre></div>
<p>To enable that, we would call <code>parsePrecedence()</code> with the <em>same</em> precedence as
the current operator.</p>
</aside>
<p>This way, we can use a single <code>binary()</code> function for all binary operators even
though they have different precedences.</p>
<h2><a href="#a-pratt-parser" id="a-pratt-parser"><small>17 . 6</small>A Pratt Parser</a></h2>
<p>We now have all of the pieces and parts of the compiler laid out. We have a
function for each grammar production: <code>number()</code>, <code>grouping()</code>, <code>unary()</code>, and
<code>binary()</code>. We still need to implement <code>parsePrecedence()</code>, and <code>getRule()</code>. We
also know we need a table that, given a token type, lets us find</p>
<ul>
<li>
<p>the function to compile a prefix expression starting with a token of that
type,</p>
</li>
<li>
<p>the function to compile an infix expression whose left operand is followed
by a token of that type, and</p>
</li>
<li>
<p>the precedence of an <span name="prefix">infix</span> expression that uses
that token as an operator.</p>
</li>
</ul>
<aside name="prefix">
<p>We don’t need to track the precedence of the <em>prefix</em> expression starting with a
given token because all prefix operators in Lox have the same precedence.</p>
</aside>
<p>We wrap these three properties in a little struct which represents a single row
in the parser table.</p>
<div class="codehilite"><pre class="insert-before">} Precedence;
</pre><div class="source-file"><em>compiler.c</em><br>
add after enum <em>Precedence</em></div>
<pre class="insert">
<span class="k">typedef</span> <span class="k">struct</span> {
<span class="t">ParseFn</span> <span class="i">prefix</span>;
<span class="t">ParseFn</span> <span class="i">infix</span>;
<span class="t">Precedence</span> <span class="i">precedence</span>;
} <span class="t">ParseRule</span>;
</pre><pre class="insert-after">
Parser parser;
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after enum <em>Precedence</em></div>
<p>That ParseFn type is a simple <span name="typedef">typedef</span> for a function
type that takes no arguments and returns nothing.</p>
<aside name="typedef" class="bottom">
<p>C’s syntax for function pointer types is so bad that I always hide it behind a
typedef. I understand the intent behind the syntax<span class="em">—</span>the whole “declaration
reflects use” thing<span class="em">—</span>but I think it was a failed syntactic experiment.</p>
</aside>
<div class="codehilite"><pre class="insert-before">} Precedence;
</pre><div class="source-file"><em>compiler.c</em><br>
add after enum <em>Precedence</em></div>
<pre class="insert">
<span class="k">typedef</span> <span class="t">void</span> (*<span class="t">ParseFn</span>)();
</pre><pre class="insert-after">
typedef struct {
</pre></div>
<div class="source-file-narrow"><em>compiler.c</em>, add after enum <em>Precedence</em></div>
<p>The table that drives our whole parser is an array of ParseRules. We’ve been
talking about it forever, and finally you get to see it.</p>
<div class="codehilite"><div class="source-file"><em>compiler.c</em><br>
add after <em>unary</em>()</div>