forked from waxeye-org/waxeye
-
Notifications
You must be signed in to change notification settings - Fork 0
/
book
2297 lines (1621 loc) · 72.6 KB
/
book
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Language Development with Waxeye
================================
Orlando Hill
v0.8.0, August 2010
http://waxeye.org/[Waxeye Parser Generator]
Copyright (C) 2010 Orlando Hill
Permission is granted to copy, distribute and/or modify this document under the
terms of the GNU Free Documentation License, Version 1.2 or any later version
published by the Free Software Foundation; with no Invariant Sections, no
Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included
in the section entitled "GNU Free Documentation License".
== Introduction ==
As programmers, we are required to make use of data that is presented in a
variety of formats. In order to extract and manipulate the desired information,
we need the ability to navigate the structure of the language the data is
written in. Unless the language is very simple, we must use a parser that
understands the language and gives us the data in a form we can more readily
use.
Manually creating parsers can be boring and time consuming. It is, therefore,
common to use a use parser generator to do the grunt work of constructing the
parser. This is where Waxeye comes in handy.
== Getting Started ==
=== Downloading ===
You can download the current official release of Waxeye from
http://sourceforge.net/projects/waxeye[Sourceforge]. If you want to get the
very latest version of Waxeye's source, you can download it from
http://github.com/orlandodarhill/waxeye/tree/master[GitHub] using the
http://git.or.cz/[Git] version control system.
=== Requirements ===
There are no external dependencies needed to run a pre-built version of Waxeye.
If you build from source, you'll need http://download.plt-scheme.org[MzScheme].
To use a generated parser, you need a supported programming language to run it
from.
=== Installation ===
==== Unix and MacOSX ====
1. Extract the files of the distribution.
2. Copy the 'waxeye' directory to where you wish to install it.
3. Add the 'bin/waxeye' binary to your search path. e.g. If you have `\~/bin` in
your PATH and installed waxeye to '/usr/local/waxeye' then you might do the
following.
-------------------------------------------------------------------------------
ln -s /usr/local/waxeye/bin/waxeye ~/bin/
-------------------------------------------------------------------------------
==== Windows ====
1. Extract the files of the distribution.
2. Copy the 'waxeye' directory to where you wish to install it.
=== Running ===
Currently, Waxeye is used from a command-line interface. You can use it as a
command-line tool or, as part of a script or build-system. There are plans to
develop a graphical tool at a later stage.
==== Unix and MacOSX ====
Run Waxeye by executing the `waxeye` binary.
==== Windows ====
Use a command prompt to run `waxeye.exe`.
== Basic Concepts ==
=== What is a parser? ===
When we want to understand data that has been written in a language of interest
('L'), we need to break our data into units of the language. This process of
breaking our input into different parts, based on the structure of 'L', is
called 'parsing'. A program used for parsing is called a 'parser'.
=== What is the result of a parser? ===
Once your input has been parsed, you need the result to be presented in a from
that is easy to understand and manipulate. Since the input was organized based
on the hierarchical structure of the language, it makes sense that the output
of the parser mimic this structure. The most effective form to do this with is
a tree.
Such a tree is known as an Abstract Syntax Tree (AST). A Waxeye parser will
automatically give you an AST that represents your input. The structure of this
AST is based on the structure of your language's grammar.
=== What is a parser generator? ===
If 'L' is simple, it is easy for us to use our programming lanugage of choice
to, manually, write a parser for 'L'. However, as the structural complexity of
'L' increases, so too, does the size and complexity of the parser program.
Writing and maintaining a large parser, by hand, can quickly become a tedious
and laborious job. Thankfully, we can use a parser generator to automate the
work of creating a parser so we can focus on other problems.
A parser generator is a tool designed to help software developers automate the
process of creating a parser. Just like compilers and assemblers, a parser
generator takes a description of a program, automatically does the boring work
for you and gives you a transformed program as output. Each tool accepts input
in one language ('L1'), performs various transformations and creates output in
another language ('L2').
L1 --> Compiler --> L2
L1 --> Assembler --> L2
L1 --> Parser Generator --> L2
The key difference between the three tools is the level of abstraction held by
the input and output languages. The assembler works at the lowest level by
taking assembly files and producing machine code. The compiler works above the
assembler by taking a more abstract programming language and generating
assembly files or machine code directly. Finally, the parser generator has the
highest level of abstraction and transforms a 'grammar file' into programming
language source code for a compiler to process.
=== What is a grammar file? ===
We can define a language as the set of strings it contains. While it is
sometimes possible to specify a language simply by enumerating all of its
strings, such an approach has significant drawbacks. Trying to write each
string in our language could be very time consuming and, potentially, take
forever.
Suppose we need to read time information as part of a larger program. In a
trivial case, the time information may be presented as two digits for the
hours, a colon `:`, and then two digits for the minutes.
-------------------------------------------------------------------------------
00:00, 00:01, 00:02, ... 14:23, 14:24, 14:25, ... 23:57, 23:58, 23:59
-------------------------------------------------------------------------------
We could describe our time language this way but, writing all 1,440 possible
hour/minute combinations wouldn't be much fun. Not to mention how bad things
would be if we extended our language to include date information.
As another example, consider the language that consists of all strings of one
or more alphabet character.
-------------------------------------------------------------------------------
a, b, c, ... z, aa, ab, ac, ... az, aaa, aab, aac, ...
-------------------------------------------------------------------------------
Even worse than our time example, this language is infinite. It would be
impossible for us to explicitly list every string in the language.
If we want to describe such languages, we need a notation that is more abstract
than simply writing out strings. We call this notation a 'grammar' and the file
that contains it a 'grammar file'.
== Waxeye Grammars ==
To generate a parser for a language, you must supply the parser generator with
a grammar file that describes the language. Waxeye grammar files are written as
text documents and are, by convention, given the `.waxeye` file extension.
A Waxeye grammar consists of a set of rule definitions, called 'non-terminals'.
Together, the non-terminals succinctly describe the syntax of the language. By
default, the first non-terminal is considered the starting point of the
language definition.
=== Non-terminals ===
Non-terminals are defined in three parts; a name, a rule type and one or more
grammar expressions.
The most common non-terminal type is the tree constructing non-terminal. A tree
constructing non-terminal has the following form:
*******************************************************************************
'Name' `\<-` '+expressions'
*******************************************************************************
Where 'Name' matches `[a-zA-Z] *[a-zA-Z0-9_-]`.
[source,waxeye]
.A tree constructing non-terminal
-------------------------------------------------------------------------------
Example <- A | B
-------------------------------------------------------------------------------
The other common non-terminal type is the void non-terminal. The result of a
void non-terminal is not included in the AST that is constructed by the parser.
To define a void non-terminal, use this form:
*******************************************************************************
'Name' `<:` '+expressions'
*******************************************************************************
[source,waxeye]
.A void non-terminal
-------------------------------------------------------------------------------
Example <: A | B
-------------------------------------------------------------------------------
=== Expressions ===
The most important part of each non-terminal definition is the set of
expressions it contains. Grammar expressions come in different forms and have
their own meanings. Places where an expression can be contained within another
expression are denoted with an 'e'.
==== Atomic Expressions ====
===== Wildcard =====
`.`
Matches any character from the input.
===== Literal =====
`\'text\'`
Matches `text` in the input.
===== Case-insensitive Literal =====
`"text"`
Matches `text` in the input while ignores case. This equivalent to the
expression `[tT][eE][xX][tT]` but, is much more readable.
===== Character Class =====
`[a-z_-]`
Character-class that matches either a lower-case English character, `_` or
`-`.
===== Non-terminal =====
`NT`
References the non-terminal named `NT`.
===== Parentheses =====
`(`'e'`)`
Raises the precedence of the expression 'e'.
///////////////////////////////////////////////////////////////////////////////
===== Context Actions =====
`@action<a, b>`
References the context-action `action` and gives the action the data held by
the labels `a` and `b`. These are used for context-sensitive parsing. Not fully
implemented yet.
///////////////////////////////////////////////////////////////////////////////
==== Prefix Expressions ====
===== Void =====
`:`'e'
Doesn't include the result of 'e' when building the AST.
===== Closure =====
`*`'e'
Puts 'e' within a closure.
===== Plus =====
`\+`'e'
Puts 'e' within a plus-closure.
===== Optional =====
`?`'e'
Puts 'e' within an optional.
===== Negative Check =====
`!`'e'
Checks that 'e' fails.
===== Positive Check =====
`&`'e'
Checks that 'e' succeeds.
///////////////////////////////////////////////////////////////////////////////
===== Labels =====
`a=`'e'
Labels the expression 'e' with the label `a`. Not fully implemented yet.
///////////////////////////////////////////////////////////////////////////////
==== Sequence Expressions ====
'e1 e2'
Matches 'e1' and 'e2' in sequence.
==== Alternation Expressions ====
'e1'`|`'e2'
Tries to match 'e1' and, if that fails, tries to match 'e2'.
=== Precedence ===
In Waxeye grammars, some expressions can have other expressions nested within
them. When we use parentheses, we are explicitly denoting the nesting structure
of the expressions.
[source,waxeye]
-------------------------------------------------------------------------------
((?A) B) | C
-------------------------------------------------------------------------------
At times, this can seem needlessly verbose. In many cases, we are able to omit
the parentheses in favor of a shorter notation. We do this by exploiting the
precedence of each expression type.
[source,waxeye]
-------------------------------------------------------------------------------
?A B | C
-------------------------------------------------------------------------------
The precedence of an expression determines the priority it has when resolving
implicitly nested expressions. Each expression type has a level of precedence
relative to all other types. There are four different precedence levels in
Waxeye grammars.
==== Level 4 ====
The highest precedence is held by the atomic expressions. Because these
expressions cannot, themselves, contain expressions, there is no need to
consider which expressions are nested within them.
==== Level 3 ====
The prefix expressions hold the next precedence level. Their nesting is
resolved directly after the atomic expressions.
==== Level 2 ====
Sequences of expressions are formed once the atomic and prefix expressions have
been resolved.
==== Level 1 ====
Finally, once all other expressions have been resolved, the different choices of
the alternation expression are resolved.
=== Pruning Non-terminals ===
Sometimes, creating a new AST node will give us more information than we need.
We might want to create a new AST node, only if doing so will tell us something
interesting about our input. If the additional node gives us nothing of
interest, our tree could be said to contain 'vertical noise'.
To make it easier to process the AST, we can remove this vertical noise by
using the 'pruning' non-terminal type. This non-terminal type has the following
form:
*******************************************************************************
'Name' `\<=` '+expressions'
*******************************************************************************
When 'Name' has successfully parsed a string, one of three things will happen,
depending on the number of results to be included from 'Name'\'s expressions.
* If there are no expression results to be included, nothing new will be added
to the AST.
* If there is one expression result to be included, that result will take the
place of the 'Name' AST node.
* Otherwise, a new 'Name' AST node will be created, just like a tree
constructing non-terminal.
To help understand how this works, consider an example from a simple arithmetic
grammar.
[source,waxeye]
-------------------------------------------------------------------------------
Product <- Number *([*/] Number)
Number <- +[0-9]
-------------------------------------------------------------------------------
If we use the 'Product' rule to parse the string `3*7`, we get a tree with
'Product' at the root and, below that, a 'Number', a `*` character and then
another 'Number'.
-------------------------------------------------------------------------------
Product
-> Number
| 3
| *
-> Number
| 7
-------------------------------------------------------------------------------
However, if the 'Product' rule parses a string with just one 'Number' in it, we
will get a tree that is slightly bigger than we need. Parsing the string `5`
produces the following tree.
-------------------------------------------------------------------------------
Product
-> Number
| 5
-------------------------------------------------------------------------------
In this case, having a 'Product' node at the root of the AST isn't necessary.
If we want to, we can rewrite the original grammar to use a pruning
non-terminal.
[source,waxeye]
-------------------------------------------------------------------------------
Product <= Number *([*/] Number)
Number <- +[0-9]
-------------------------------------------------------------------------------
Now, when we use 'Product' to parse `3*7`, we will get the same result as
before but, when parsing `5`, we get an AST with 'Number' as the root.
-------------------------------------------------------------------------------
Number
| 5
-------------------------------------------------------------------------------
As a second example, let's look at a grammar for nested parentheses.
[source,waxeye]
-------------------------------------------------------------------------------
A <- :'(' A :')' | B
B <- 'b'
-------------------------------------------------------------------------------
Here are some example inputs and their resulting ASTs:
Input: `b`
-------------------------------------------------------------------------------
A
-> B
| b
-------------------------------------------------------------------------------
Input: `(b)`
-------------------------------------------------------------------------------
A
-> A
-> B
| b
-------------------------------------------------------------------------------
Input: `\(((b)))`
-------------------------------------------------------------------------------
A
-> A
-> A
-> A
-> B
| b
-------------------------------------------------------------------------------
Unless we want to know the number of parentheses matched, trees like these
contain more information than we need. Again, we are able to solve this by
rewriting the grammar using a 'pruning' non-terminal.
[source,waxeye]
-------------------------------------------------------------------------------
A <= :'(' A :')' | B
B <- 'b'
-------------------------------------------------------------------------------
This time, parsing the input `\(((b)))` gives us a much shorter tree.
-------------------------------------------------------------------------------
B
| b
-------------------------------------------------------------------------------
=== Comments ===
There are two types of comments in Waxeye grammars; single-line and multi-line.
==== Single-line ====
Single-line comments start at the first `#` outside of an atomic expression and
extend until the end of the line.
[source,waxeye]
-------------------------------------------------------------------------------
# This is a single-line comment.
-------------------------------------------------------------------------------
==== Multi-line ====
Multi-line comments are opened at the first `/\*` outside of an atomic
expression and closed with a `\*/`.
[source,waxeye]
-------------------------------------------------------------------------------
/* This is a multi-line comment. */
-------------------------------------------------------------------------------
[source,waxeye]
-------------------------------------------------------------------------------
/* This is, also,
a multi-line comment. */
-------------------------------------------------------------------------------
As an added convenience for when editing a grammar, multi-line comments can be
nested within each other. This is handy when you want to comment out a section
of the grammar that already contains a comment.
[source,waxeye]
-------------------------------------------------------------------------------
/*
This is the outer comment.
A <- 'a'
/*
* This is the inner comment.
*/
B <- 'b'
*/
-------------------------------------------------------------------------------
== Using Waxeye ==
This chapter will show you how to setup Waxeye for your programming language.
It covers language specific installation requirements and presents some basic
boilerplate code to get you started. You can find copies of this boilerplate
code in `src/example/`. I use `$WAXEYE_HOME` to refer to the location where you
have installed the files of the Waxeye distribution.
The example grammar we'll be using can be found in `grammars/num.waxeye`. You
may wish to copy it to the directory you're working in so you can experiment
with extending and modifying the grammar.
.grammars/num.waxeye
[source,waxeye]
-------------------------------------------------------------------------------
Num <- '0' | [1-9] *[0-9]
-------------------------------------------------------------------------------
Once setup and run, the boilerplate example will use the parser you generated
to parse the string `42` and print the AST it creates.
-------------------------------------------------------------------------------
Num
| 4
| 2
-------------------------------------------------------------------------------
=== Using Waxeye from C ===
Waxeye's C runtime is currently supported on unix platforms and MacOSX.
==== Install ====
To install the C runtime, you need to compile it and, optionally, install the
header files and library in your system search paths.
To compile the runtime, perform the command `make lib` from within the `src/c`
directory of your waxeye installation.
-------------------------------------------------------------------------------
cd $WAXEYE_HOME/src/c
make lib
make clean
-------------------------------------------------------------------------------
To install the header files and library in your search paths, you could copy
the files directly but, creating symbolic links to them will make upgrading
easier.
-------------------------------------------------------------------------------
sudo ln -s $WAXEYE_HOME/lib/libwaxeye.a /usr/local/lib/
sudo ln -s $WAXEYE_HOME/src/c/include/waxeye.h /usr/local/include/
sudo ln -s $WAXEYE_HOME/src/c/include/waxeye /usr/local/include/
-------------------------------------------------------------------------------
==== Generate Parser ====
-------------------------------------------------------------------------------
waxeye -g c . num.waxeye
-------------------------------------------------------------------------------
==== Use Parser ====
.src/example/c/example.c
[source,c]
-------------------------------------------------------------------------------
#include <string.h>
#include "parser.h"
int main() {
// Create our parser
struct parser_t *parser = parser_new();
// Setup our input
char data[] = "42";
struct input_t *input = input_new(data, strlen(data));
// Parse our input
struct ast_t *ast = parse(parser, input);
// Print our ast
display_ast(ast, type_strings);
ast_recursive_delete(ast);
input_delete(input);
parser_delete(parser);
return 0;
}
-------------------------------------------------------------------------------
==== Run ====
If you installed the headers and library in your system path:
-------------------------------------------------------------------------------
gcc example.c parser.c -lwaxeye -o example
-------------------------------------------------------------------------------
Otherwise:
-------------------------------------------------------------------------------
FLAGS="-I $WAXEYE_HOME/src/c/include/ -L $WAXEYE_HOME/lib/"
gcc $FLAGS example.c parser.c -lwaxeye -o example
-------------------------------------------------------------------------------
Finally,
-------------------------------------------------------------------------------
./example
-------------------------------------------------------------------------------
=== Using Waxeye from Java ===
Waxeye's Java runtime is compatible with version 1.5 and 1.6 of the JRE. It
should also be possible to use Waxeye with JRE versions 1.3 and 1.4 of by
retrofitting the classes with http://retroweaver.sourceforge.net[Retroweaver]
or http://retrotranslator.sourceforge.net[Retrotranslator].
==== Install ====
To use a Waxeye parser from Java, you need Waxeye's Java runtime in your
classpath. The required classes are in the Jar file `lib/waxeye.jar`.
==== Generate Parser ====
-------------------------------------------------------------------------------
waxeye -g java . num.waxeye
-------------------------------------------------------------------------------
==== Use Parser ====
.src/example/java/Example.java
[source,java]
-------------------------------------------------------------------------------
import org.waxeye.input.InputBuffer;
import org.waxeye.parser.ParseResult;
public class Example {
public static void main(final String[] args) {
// Create our parser
final Parser parser = new Parser();
// Setup our input
final InputBuffer input = new InputBuffer("42".toCharArray());
// Parse our input
final ParseResult<Type> result = parser.parse(input);
// Print our ast
System.out.println(result);
}
}
-------------------------------------------------------------------------------
==== Run ====
-------------------------------------------------------------------------------
javac -cp .:$WAXEYE_HOME/lib/waxeye.jar Example.java Parser.java Type.java
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
java -cp .:$WAXEYE_HOME/lib/waxeye.jar Example
-------------------------------------------------------------------------------
=== Using Waxeye from Python ===
Waxeye's Python runtime has been tested with Python version 2.5.1 and is
intended to work with 2.x.x versions of Python.
==== Install ====
To install Waxeye's Python runtime, you need to run the `setup.py` script from
the `src/python` directory.
-------------------------------------------------------------------------------
cd $WAXEYE_HOME/src/python
python setup.py build
sudo python setup.py install
rm -rf build/
-------------------------------------------------------------------------------
==== Generate Parser ====
-------------------------------------------------------------------------------
waxeye -g python . num.waxeye
-------------------------------------------------------------------------------
==== Use Parser ====
.src/example/python/example.py
[source,python]
-------------------------------------------------------------------------------
import parser
# Create our parser
p = parser.Parser()
# Parse our input
ast = p.parse("42")
# Print our AST
print ast
-------------------------------------------------------------------------------
==== Run ====
-------------------------------------------------------------------------------
python example.py
-------------------------------------------------------------------------------
=== Using Waxeye from Ruby ===
Waxeye's Ruby runtime is compatible with Ruby version 1.8.6.
==== Install ====
Install the Waxeye gem; either from Rubyforge or, from the gem file in `lib`.
-------------------------------------------------------------------------------
# Install the Waxeye gem from Rubyforge
sudo gem install waxeye
-------------------------------------------------------------------------------
==== Generate Parser ====
-------------------------------------------------------------------------------
waxeye -g ruby . num.waxeye
-------------------------------------------------------------------------------
==== Use Parser ====
.src/example/ruby/example.rb
[source,ruby]
-------------------------------------------------------------------------------
require 'parser'
# Create our parser
p = Parser.new()
# Parse our input
ast = p.parse("42")
# Print our AST
puts ast
-------------------------------------------------------------------------------
==== Run ====
-------------------------------------------------------------------------------
ruby example.rb
-------------------------------------------------------------------------------
=== Using Waxeye from Scheme ===
Waxeye's Scheme runtime is compatible with v372 and v4 of PLT-Scheme's
http://download.plt-scheme.org/[MzScheme]. You will need to have MzScheme
installed to use it; either by itself or with DrScheme.
==== Install ====
Install the waxeye collection in your preferred place where MzScheme can find
it.
-------------------------------------------------------------------------------
# Install the Waxeye collection; change to your install paths as needed
sudo ln -s /usr/local/waxeye/src/scheme/waxeye /usr/local/plt/lib/plt/collects/
-------------------------------------------------------------------------------
==== Generate Parser ====
-------------------------------------------------------------------------------
waxeye -g scheme . num.waxeye
-------------------------------------------------------------------------------
==== Use Parser ====
.src/example/scheme/example.scm
[source,scheme]
-------------------------------------------------------------------------------
(module
example
mzscheme
(require "parser.scm")
(define (run)
;; Parse our input
(let ((ast (parser "42")))
;; Print the ast
(display-ast ast)))
(run)
)
-------------------------------------------------------------------------------
==== Run from v372 ====
-------------------------------------------------------------------------------
mzscheme -mv -t example.scm
-------------------------------------------------------------------------------
==== Run from v4 ====
-------------------------------------------------------------------------------
mzscheme -t example.scm
-------------------------------------------------------------------------------
== Using ASTs and Parse Errors ==
Since just printing an Abstract Syntax Tree isn't very interesting, let's have
a look at how to access the information the ASTs contain.
When you use a Waxeye parser, the result will be one of two things. If the
parser successfully parsed the input, the result will be an AST. If the input
doesn't match the syntax of the language, the result will be a 'parse error'.
=== ASTs ===
ASTs come in three different forms; 'tree', 'char' and 'empty'.
* A 'tree' AST contains a type, a list of children and, the start and end
position in the input.
* A 'char' AST contains a single character and has no children.
* An 'empty' AST simply signifies that parsing was successful. If your starting
non-terminal is voided or is pruning and had no children, you will get an
empty AST.
==== Using an AST node as string ====
If a given AST node will only ever have 'char' children, you may wish to treat
that node as a single string.
===== From C =====
[source,c]
-------------------------------------------------------------------------------
char *str = ast_children_as_string(ast);
printf("%s\n", str);
free(str);
-------------------------------------------------------------------------------
===== From Java =====
[source,java]
-------------------------------------------------------------------------------
System.out.println(ast.childrenAsString());
-------------------------------------------------------------------------------
===== From Python =====
[source,python]
-------------------------------------------------------------------------------
print ''.join(ast.children)
-------------------------------------------------------------------------------
===== From Ruby =====
[source,ruby]
-------------------------------------------------------------------------------
puts ast.children.to_s
-------------------------------------------------------------------------------
===== From Scheme =====
[source,scheme]
-------------------------------------------------------------------------------
(display (list->string (ast-c ast)))
(newline)
-------------------------------------------------------------------------------
=== Parse Errors ===
A parse error contains information about where the input is invalid and hints
about what is wrong with it.
=== Determining the result type ===
==== From C ====
[source,c]
-------------------------------------------------------------------------------
switch (result->type) {