Skip to content
Newer
Older
100644 678 lines (423 sloc) 40.8 KB
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
1 # Welcome to the Ruby Language Toolkit
2
4095d78 @chriswailes Continuing with the addition of documentation.
authored May 25, 2012
3 RLTK is a collection of classes and methods designed to help programmers work with languages in an easy to use and straightforward manner. This toolkit provides the following features:
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
4
07a6908 @chriswailes Finished moving README.md over to Markdown. Started adding YARD annot…
authored May 23, 2012
5 * Lexer generator
6 * Parser generator
7 * AST node baseclass
8 * Class for representing context free grammars
4095d78 @chriswailes Continuing with the addition of documentation.
authored May 25, 2012
9 * [Low Level Virtual Machine](http://llvm.org) (LLVM) bindings for code generation
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
10
07a6908 @chriswailes Finished moving README.md over to Markdown. Started adding YARD annot…
authored May 24, 2012
11 In addition, RLTK includes several ready-made lexers and parsers and a Turing-complete language called Kazoo for use in your code and as examples for how to use the toolkit.
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
12
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
13 ## Why Use RLTK
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 12, 2011
14
15 Here are some reasons to use RLTK to build your lexers, parsers, and abstract syntax trees:
16
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
17 * **Lexer and Parser Definitions in Ruby** - Many tools require you to write your lexer/parser definitions in their own format, which is then processed and used to generate Ruby code. RLTK lexers/parsers are written entirely in Ruby and use syntax you are already familiar with.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
18
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
19 * **Re-entrant Code** - The lexers and parsers generated by RLTK are fully re-entrant.
554798a Broke some functionality out of Token and put it into StreamPossition.
chris.wailes@gmail.com authored May 6, 2011
20
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
21 * **Multiple Lexers and Parsers** - You can define as many lexers and parses as you want, and instantiate as many of them as you need.
554798a Broke some functionality out of Token and put it into StreamPossition.
chris.wailes@gmail.com authored May 6, 2011
22
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
23 * **Token Positions** - Detailed information about a token's position is available in the parser.
554798a Broke some functionality out of Token and put it into StreamPossition.
chris.wailes@gmail.com authored May 6, 2011
24
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
25 * **Feature Rich Lexing and Parsing** - Often, lexer and parser generators will try and force you to do everything their way. RLTK gives you more flexibility with features such as states and flags for lexers, and argument arrays for parsers. What's more, these features actually work (I'm looking at you REX).
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
26
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
27 * **LALR(1)/GLR Parsing** - RLTK parsers use the LALR(1)/GLR parsing algorithms, which means you get both speed and the ability to handle **any** context-free grammar.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
28
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
29 * **Parser Serialization** - RLTK parsers can be serialized and saved after they are generated for faster loading the next time they are required.
ac56706 Final push before the initial release?
chris.wailes@gmail.com authored May 31, 2011
30
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
31 * **Error Productions** - RLKT parsers can use error productions to recover from, and report on, errors.
554798a Broke some functionality out of Token and put it into StreamPossition.
chris.wailes@gmail.com authored May 6, 2011
32
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
33 * **Fast Prototyping** - If you need to change your lexer/parser you don't have to re-run the lexer and parser generation tools, simply make the changes and be on your way.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
34
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
35 * **Parse Tree Graphs** - RLTK parsers can print parse trees (in the DOT language) of accepted strings.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
36
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
37 * **Documentation** - We have it!
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
38
07a6908 @chriswailes Finished moving README.md over to Markdown. Started adding YARD annot…
authored May 24, 2012
39 * **I Eat My Own Dog Food** - I'm using RLTK for my own projects so if there is a bug I'll most likely be the first one to know.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
40
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
41 ## Lexers
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
42
e17b68f @chriswailes Going to merge with the master branch now. Will then work on re-doing…
authored May 22, 2012
43 To create your own lexer using RLTK you simply need to subclass the {RLTK::Lexer} class and define the *rules* that will be used for matching text and generating tokens. Here we see a simple lexer for a calculator:
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
44
45 class Calculator < RLTK::Lexer
46 rule(/\+/) { :PLS }
163db55 @chriswailes Moved some folders around; finished the README.md file, and started a…
authored Jun 4, 2012
47 rule(/-/) { :SUB }
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
48 rule(/\*/) { :MUL }
49 rule(/\//) { :DIV }
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
50
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
51 rule(/\(/) { :LPAREN }
52 rule(/\)/) { :RPAREN }
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
53
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
54 rule(/[0-9]+/) { |t| [:NUM, t.to_i] }
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
55
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
56 rule(/\s/)
57 end
58
e92415b @chriswailes Forgot a couple of changes in the README file.
authored May 23, 2012
59 The {RLTK::Lexer.rule} method's first argument is the regular expression used for matching text. The block passed to the function is the action that executes when a substring is matched by the rule. These blocks must return the *type* of the token (which must be in ALL CAPS; see the Parsers section), and optionally a *value*. In the latter case you must return an array containing the *type* and *value*, which you can see an example of in the Calculator lexer shown above. The values returned by the proc object are used to build a {RLTK::Token} object that includes the *type* and *value* information, as well as information about the line number the token was found on, the offset from the beginning of the line to the start of the token, and the length of the token's text. If the *type* value returned by the proc is `nil` the input is discarded and no token is produced.
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
60
07a6908 @chriswailes Finished moving README.md over to Markdown. Started adding YARD annot…
authored May 24, 2012
61 The {RLTK::Lexer} class provides both {RLTK::Lexer.lex} and {RLTK::Lexer.lex_file}. The {RLTK::Lexer.lex} method takes a string as its argument and returns an array of tokens, with an *end of stream* token automatically added to the result. The {RLTK::Lexer.lex_file} method takes the name of a file as input, and lexes the contents of the specified file.
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
62
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
63 ### The Lexing Environment
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
64
e92415b @chriswailes Forgot a couple of changes in the README file.
authored May 24, 2012
65 The proc objects passed to the {RLTK::Lexer.rule} methods are evaluated inside an instance of the {RLTK::Lexer::Environment} class. This gives you access to methods for manipulating the lexer's state and flags (see bellow). You can also subclass the environment inside your lexer to provide additional functionality to your rule blocks. When doing so you need to ensure that you name your new class Environment like in the following example:
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
66
67 class MyLexer < RLTK::Lexer
68 ...
69
70 class Environment < Environment
71 def helper_function
72 ...
73 end
74
75 ...
76 end
77 end
78
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
79 ### Using States
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
80
e92415b @chriswailes Forgot a couple of changes in the README file.
authored May 24, 2012
81 The lexing environment may be used to keep track of state inside your lexer. When rules are defined they are defined inside a given state, which is specified by the second parameter to {RLTK::Lexer.rule}. The default state is cleverly named `:default`. When the lexer is scanning the input string for matching rules, it only considers the rules for the given state.
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
82
83 The methods used to manipulate state are:
84
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
85 * **RLTK::Lexer::Environment.push_state** - Pushes a new state onto the stack.
86 * **RLTK::Lexer::Environment.pop_state** - Pops a state off of the stack.
87 * **RLTK::Lexer::Environment.set_state** - Sets the state at the top of the stack.
88 * **RLTK::Lexer::Environment.state** - Returns the current state.
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
89
90 States may be used to easily support nested comments.
91
92 class StateLexer < RLTK::Lexer
93 rule(/a/) { :A }
94 rule(/\s/)
95
96 rule(/\(\*/) { push_state(:comment) }
97
98 rule(/\(\*/, :comment) { push_state(:comment) }
99 rule(/\*\)/, :comment) { pop_state }
100 rule(/./, :comment)
101 end
102
07a6908 @chriswailes Finished moving README.md over to Markdown. Started adding YARD annot…
authored May 24, 2012
103 By default the lexer will start in the `:default` state. To change this, you may use the {RLTK::Lexer.start} method.
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
104
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
105 ### Using Flags
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
106
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
107 The lexing environment also maintains a set of *flags*. This set is manipulated using the following methods:
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
108
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
109 * **RLTK::Lexer::Environment.set_flag** - Adds the specified flag to the set of flags.
110 * **RLTK::Lexer::Environment.unset_flag** - Removes the specified flag from the set of flags.
111 * **RLTK::Lexer::Environment.clear_flags** - Unsets all flags.
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
112
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
113 When *rules* are defined they may use a third parameter to specify a list of flags that must be set before the rule is considered when matching substrings. An example of this usage follows:
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
114
115 class FlagLexer < RLTK::Lexer
116 rule(/a/) { set_flag(:a); :A }
117
118 rule(/\s/)
119
120 rule(/b/, :default, [:a]) { set_flag(:b); :B }
121 rule(/c/, :default, [:a, :b]) { :C }
122 end
123
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
124 ### Instantiating Lexers
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
125
07a6908 @chriswailes Finished moving README.md over to Markdown. Started adding YARD annot…
authored May 24, 2012
126 In addition to using the {RLTK::Lexer.lex} class method you may also instantiate lexer objects. The only difference then is that the lexing environment used between subsequent calls to {RLTK::Lexer#lex} is the same object, and therefor allows you to keep persistent state.
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
127
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
128 ### First and Longest Match
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
129
07a6908 @chriswailes Finished moving README.md over to Markdown. Started adding YARD annot…
authored May 24, 2012
130 A RLTK::Lexer may be told to select either the first substring that is found to match a rule or the longest substring to match any rule. The default behavior is to match the longest substring possible, but you can change this by using the {RLTK::Lexer.match_first} method inside your class definition as follows:
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
131
132 class MyLexer < RLTK::Lexer
133 match_first
134
135 ...
136 end
137
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
138 ### Match Data
3759980 @chriswailes Forgot to change the documentation to reflect the last change.
authored Jan 20, 2012
139
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
140 Because it isn't RLTK's job to tell you how to write lexers and parsers, the MatchData object from a pattern match is available inside the Lexer::Environment object via the `match` accessor.
3759980 @chriswailes Forgot to change the documentation to reflect the last change.
authored Jan 20, 2012
141
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
142 ## Parsers
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
143
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
144 To create a parser using RLTK simply subclass RLTK::Parser, define the productions of the grammar you wish to parse, and call `finalize`. During finalization RLTK will build an LALR(1) parsing table, which may contain conflicts that can't be resolved with LALR(1) lookahead sets or precedence/associativity information. Traditionally, when parser generators such as **YACC** encounter conflicts during parsing table generation they will resolve shift/reduce conflicts in favor of shifts and reduce/reduce conflicts in favor of the production that was defined first. This means that the generated parsers can't handle ambiguous grammars.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
145
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
146 RLTK parsers, on the other hand, can handle *all* context-free grammars by forking the parse stack when shift/reduce or reduce/reduce conflicts are encountered. This method is called the GLR parsing algorithm and allows the parser to explore multiple different possible derivations, discarding the ones that don't produce valid parse trees. GLR parsing is more expensive, in both time and space requirements, but these penalties are only payed when a parser for an ambiguous grammar is given an input with multiple parse trees, and as such most parsing should proceed using the faster LALR(1) base algorithm.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
147
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
148 ### Defining a Grammar
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
149
150 Let us look at the simple prefix calculator included with RLTK:
151
152 class PrefixCalc < RLTK::Parser
153 production(:e) do
154 clause('NUM') {|n| n}
155
156 clause('PLS e e') { |_, e0, e1| e0 + e1 }
157 clause('SUB e e') { |_, e0, e1| e0 - e1 }
158 clause('MUL e e') { |_, e0, e1| e0 * e1 }
159 clause('DIV e e') { |_, e0, e1| e0 / e1 }
160 end
161
162 finalize
163 end
164
e92415b @chriswailes Forgot a couple of changes in the README file.
authored May 24, 2012
165 The parser uses the same method for defining productions as the {RLTK::CFG} class. In fact, the parser forwards the {RLTK::Parser.production} and {RLTK::Parser.clause} method invocations to an internal {RLTK::CFG} object after removing the parser specific information. To see a detailed description of grammar definitions please read the Context-Free Grammars section bellow.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
166
167 It is important to note that the proc objects associated with productions should evaluate to the value you wish the left-hand side of the production to take.
168
07a6908 @chriswailes Finished moving README.md over to Markdown. Started adding YARD annot…
authored May 24, 2012
169 The default starting symbol of the grammar is the left-hand side of the first production defined (in this case, _e_). This can be changed using the {RLTK::Parser.start} function when defining your parser.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
170
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
171 **Make sure you call `finalize` at the end of your parser definition, and only call it once.**
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
172
7fe9f46 @chriswailes Added some info to the README file about the new parser shortcuts. Re…
authored Jul 30, 2012
173 ### Shortcuts
174
175 RLTK provides several shortcuts for common grammar constructs. Right now these shortcuts include the {RLTK::Parser.empty_list} and {RLTK::Parser.nonempty_list} methods. An empty list is a list that may contain 0, 1, or more elements, with a given token seperating each element. A non-empty list contains **at least** 1 element.
176
177 This example shows how these shortcuts may be used to define a list of integers separated by a `:COMMA` token:
178
179 class ListParser < RLTK::Parser
efc640b @chriswailes Changed CFG.nonempty_list to accept Strings and Symbols for its secon…
authored Jul 31, 2012
180 nonempty_list(:int_list, :INT, :COMMA)
7fe9f46 @chriswailes Added some info to the README file about the new parser shortcuts. Re…
authored Jul 31, 2012
181
182 finalize
183 end
184
185 If you wanted to define a list of floats or integers you could define your parser like this:
186
187 class ListParser < RLTK::Parser
efc640b @chriswailes Changed CFG.nonempty_list to accept Strings and Symbols for its secon…
authored Jul 31, 2012
188 nonempty_list(:mixed_list, [:INT, :FLOAT], :COMMA)
7fe9f46 @chriswailes Added some info to the README file about the new parser shortcuts. Re…
authored Jul 31, 2012
189
190 finalize
191 end
192
193 A list may also contain multiple tokens between the separator:
194
195 class ListParser < RLTK::Parser
efc640b @chriswailes Changed CFG.nonempty_list to accept Strings and Symbols for its secon…
authored Jul 31, 2012
196 nonempty_list(:foo_bar_list, 'FOO BAR', :COMMA)
7fe9f46 @chriswailes Added some info to the README file about the new parser shortcuts. Re…
authored Jul 31, 2012
197
198 finalize
199 end
200
201 Lastly, you can mix all of these features together:
202
203 class ListParser < RLTK::Parser
204 nonempty_list(:foo_list, ['FOO BAR', 'FOO BAZ+'], :COMMA)
205
206 finalize
207 end
208
209 The productions generated by these shortcuts will always evaluate to an array. In the first two examples above the productions will produce a 1-D array containing the values of the `INT` or `FLOAT` tokens. In the last two examples the productions `foo_bar_list` and `foo_list` will produce 2-D arrays where the top level array is composed of tuples coresponding to the values of `FOO`, and `BAR` or one or more `BAZ`s.
210
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
211 ### Precedence and Associativity
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
212
07a6908 @chriswailes Finished moving README.md over to Markdown. Started adding YARD annot…
authored May 24, 2012
213 To help you remove ambiguity from your grammars RLTK lets you assign precedence and associativity information to terminal symbols. Productions then get assigned precedence and associativity based on either the last terminal symbol on the right-hand side of the production, or an optional parameter to the {RLTK::Parser.production} or {RLTK::Parser.clause} methods. When an {RLTK::Parser} encounters a shift/reduce error it will attempt to resolve it using the following rules:
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
214
197edd4 Finished adding documentation to the project. From now on documentati…
chris.wailes@gmail.com authored May 4, 2011
215 1. If there is precedence and associativity information present for all reduce actions involved and for the input token we attempt to resolve the conflict using the following rule. If not, no resolution is possible and the parser generator moves on. This conflict will later be reported to the programmer.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
216
217 2. The precedence of the actions involved in the conflict are compared (a shift action's precedence is based on the input token), and the action with the highest precedence is selected. If two actions have the same precedence the associativity of the input symbol is used: left associativity means we select the reduce action, right associativity means we select the shift action, and non-associativity means that we have encountered an error.
218
07a6908 @chriswailes Finished moving README.md over to Markdown. Started adding YARD annot…
authored May 24, 2012
219 To assign precedence to terminal symbols you can use the {RLTK::Parser.left}, {RLTK::Parser.right}, and {RLTK::Parser.nonassoc} methods inside your parser class definition. Later declarations of associativity have higher levels of precedence than earlier declarations of the same associativity.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
220
221 Let's look at the infix calculator example now:
222
223 class InfixCalc < Parser
224
225 left :PLS, :SUB
226 right :MUL, :DIV
227
228 production(:e) do
229 clause('NUM') { |n| n }
230
231 clause('LPAREN e RPAREN') { |_, e, _| e }
232
233 clause('e PLS e') { |e0, _, e1| e0 + e1 }
234 clause('e SUB e') { |e0, _, e1| e0 - e1 }
235 clause('e MUL e') { |e0, _, e1| e0 * e1 }
236 clause('e DIV e') { |e0, _, e1| e0 / e1 }
237 end
238
239 finalize
240 end
241
242 Here we use associativity information to properly deal with the different precedence of the addition, subtraction, multiplication, and division operators. The PLS and SUB terminals are given left associativity with precedence of 1 (by default all terminals and productions have precedence of zero, which is to say no precedence), and the MUL and DIV terminals are given right associativity with precedence of 1.
243
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
244 ### Array Arguments
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
245
07a6908 @chriswailes Finished moving README.md over to Markdown. Started adding YARD annot…
authored May 24, 2012
246 By default the proc objects associated with productions are passed one argument for each symbol on the right-hand side of the production. This can lead to long, unwieldy argument lists. To have your parser pass in an array of the values of the right-hand side symbols as the only argument to procs you may use the {RLTK::Parser.array_args} method. It must be invoked before any productions are declared, and affects all proc objects passed to `production` and `clause` methods.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
247
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
248 ### The Parsing Environment
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
249
e17b68f @chriswailes Going to merge with the master branch now. Will then work on re-doing…
authored May 22, 2012
250 The parsing environment is the context in which the proc objects associated with productions are evaluated, and can be used to provide helper functions and to keep state while parsing. To define a custom environment simply subclass {RLTK::Parser::Environment} inside your parser definition as follows:
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
251
252 class MyParser < RLTK::Parser
253 ...
254
255 finalize
256
257 class Environment < Environment
258 def helper_function
259 ...
260 end
261
262 ...
263 end
264 end
265
266 (The definition of the Environment class may occur anywhere inside the MyParser class definition.)
267
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
268 ### Instantiating Parsers
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
269
07a6908 @chriswailes Finished moving README.md over to Markdown. Started adding YARD annot…
authored May 24, 2012
270 In addition to using the {RLTK::Parser.parse} class method you may also instantiate parser objects. The only difference then is that the parsing environment used between subsequent calls to {RLTK::Parser#parse} is the same object, and therefor allows you to keep persistent state.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
271
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
272 ### Finalization Options
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
273
07a6908 @chriswailes Finished moving README.md over to Markdown. Started adding YARD annot…
authored May 24, 2012
274 The {RLTK::Parser.finalize} method has several options that you should be aware of:
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
275
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
276 * **explain** - Value should be `true`, `false`, an `IO` object, or a file name. Default value is `false`. If a non `false` (or `nil`) value is specified `finalize` will print an explanation of the parser to $stdout, the provided `IO` object, or the specified file. This explanation will include all of the productions defined, all of the terminal symbols used in the grammar definition, and the states present in the parsing table along with their items, actions, and conflicts.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
277
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
278 * **lookahead** - Either `true` or `false`. Default value is `true`. Specifies whether the parser generator should build an LALR(1) or LR(0) parsing table. The LALR(1) table may have the same actions as the LR(0) table or fewer reduce actions if it is possible to resolve conflicts using lookahead sets.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
279
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
280 * **precedence** - Either `true` or `false`. Default value is `true`. Specifies whether the parser generator should use precedence and associativity information to solve conflicts.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
281
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
282 * **use** - Value should be `false`, the name of a file, or a file object. If the file exists and hasn't been modified since the parser definition was RLTK will load the parser definition from the file, saving a bunch of time. If the file doesn't exist or the parser has been modified since it was last used RLTK will save the parser's data structures to this file.
ac08a66 Added a feature to save/load a parser's internal data structures so t…
chris.wailes@gmail.com authored May 27, 2011
283
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
284 ### Parsing Options
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
285
07a6908 @chriswailes Finished moving README.md over to Markdown. Started adding YARD annot…
authored May 24, 2012
286 The {RLTK::Parser.parse} and {RLTK::Parser#parse} methods also have several options that you should be aware of:
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
287
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
288 * **accept** - Either `:first` or `:all`. Default value is `:first`. This option tells the parser to accept the first successful parse-tree found, or all parse-trees that enter the accept state. It only affects the behavior of the parser if the defined grammar is ambiguous.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
289
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
290 * **env** - This option specifies the environment in which the productions' proc objects are evaluated. The RLTK::Parser::parse class function will create a new RLTK::Parser::Environment on each call unless one is specified. RLTK::Parser objects have an internal, per-instance, RLTK::Parser::Environment that is the default value for this option when calling RLTK::Parser.parse
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
291
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
292 * **parse_tree** - Value should be `true`, `false`, an `IO` object, or a file name. Default value is `false`. If a non `false` (or `nil`) value is specified a DOT language description of all accepted parse trees will be printed out to $stdout, the provided `IO` object, or the specified file.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
293
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
294 * **verbose** - Value should be `true`, `false`, an `IO` object, or a file name. Default value is `false`. If a non `false` (or `nil`) value is specified a detailed description of the actions of the parser are printed to $stdout, the provided `IO` object, or the specified file as it parses the input.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
295
6687ddc @chriswailes Added an example of the parse tree printing option.
authored Jul 31, 2012
296 ### Parse Trees
297
298 The above section briefly mentions the *parse_tree* option. So that this neat feature doesn't get lost in the rest of the documentation here is the tree generated by the Kazoo parser from Chapter 7 of the tutorial when it parses the line `def fib(a) if a < 2 then 1 else fib(a-1) + fib(a-2);`:
299
df62bb4 @chriswailes A test.
authored Jul 31, 2012
300 ![Kazoo parse tree.](https://github.com/chriswailes/RLTK/raw/master/resources/simple_tree.png)
6687ddc @chriswailes Added an example of the parse tree printing option.
authored Jul 31, 2012
301
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
302 ### Parsing Exceptions
9633826 Changed the way the parsers return results when an error production i…
chris.wailes@gmail.com authored May 5, 2011
303
07a6908 @chriswailes Finished moving README.md over to Markdown. Started adding YARD annot…
authored May 24, 2012
304 Calls to {RLTK::Parser.parse} may raise one of four exceptions:
9633826 Changed the way the parsers return results when an error production i…
chris.wailes@gmail.com authored May 5, 2011
305
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
306 * **RLTK::BadToken** - This exception is raised when a token is observed in the input stream that wasn't used in the language's definition.
307 * **RLTK::HandledError** - This exception is raised whenever an error production is encountered. The input stream is not actually in the langauge, but we were able to handle the encountered errors in a way that makes it appear that it is.
308 * **RLTK::InternalParserError** - This exception tells you that something REALLY went wrong. Users should never receive this exception.
309 * **RLTK::NotInLanguage** - This exception indicates that the input token stream is not in the parser's language.
9633826 Changed the way the parsers return results when an error production i…
chris.wailes@gmail.com authored May 5, 2011
310
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
311 ### Error Productions
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
312
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
313 **Warning: this is the lest tested feature of RLTK. If you encounter any problems while using it, please let me know so I can fix any bugs as soon as possible.**
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
314
5c5988f @chriswailes Changed the error handling so that the value of an ERROR token is now…
authored Aug 31, 2012
315 When an RLTK parser encounters a token for which there are no more valid actions (and it is on the last parse stack / possible parse-tree path) it will enter error handling mode. In this mode the parser pops states and input off of the parse stack (the parser is a pushdown automaton after all) until it finds a state that has a shift action for the `ERROR` terminal. A dummy `ERROR` terminal is then placed onto the parse stack and the shift action is taken. This error token will have the position information of the token that caused the parser to enter error handling mode. Additional tokens may have been discarded after this token.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
316
5c5988f @chriswailes Changed the error handling so that the value of an ERROR token is now…
authored Sep 1, 2012
317 If the input (including the `ERROR` token) can be reduced immediately the associated error handling proc is evaluated and we continue parsing. If no shift or reduce action is available the parser will being shifting tokens off of the input stack until a token appears with a valid action in the current state, in which case parsing resumes as normal.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
318
5c5988f @chriswailes Changed the error handling so that the value of an ERROR token is now…
authored Sep 1, 2012
319 The value of an `ERROR` non-terminal will be an array containing all of the tokens that were discarded while the parser was searching for a valid action.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
320
5c5988f @chriswailes Changed the error handling so that the value of an ERROR token is now…
authored Sep 1, 2012
321 The example below, based on one of the unit tests, shows a very basic usage of error productions:
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
322
323 class ErrorCalc < RLTK::Parser
5c5988f @chriswailes Changed the error handling so that the value of an ERROR token is now…
authored Sep 1, 2012
324 left :ERROR
325 right :PLS, :SUB, :MUL, :DIV, :NUM
326
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
327 production(:e) do
5c5988f @chriswailes Changed the error handling so that the value of an ERROR token is now…
authored Sep 1, 2012
328 clause('NUM') {|n| n}
329
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
330 clause('e PLS e') { |e0, _, e1| e0 + e1 }
331 clause('e SUB e') { |e0, _, e1| e0 - e1 }
332 clause('e MUL e') { |e0, _, e1| e0 * e1 }
333 clause('e DIV e') { |e0, _, e1| e0 / e1 }
334
5c5988f @chriswailes Changed the error handling so that the value of an ERROR token is now…
authored Sep 1, 2012
335 clause('e PLS ERROR e') { |e0, _, err, e1| error("#{err.len} tokens skipped."); e0 + e1 }
336 end
337
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
338 finalize
339 end
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
340
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
341 ## ASTNode
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
342
e17b68f @chriswailes Going to merge with the master branch now. Will then work on re-doing…
authored May 22, 2012
343 The {RLTK::ASTNode} base class is meant to be a good starting point for implementing your own abstract syntax tree nodes. By subclassing {RLTK::ASTNode} you automagically get features such as tree comparison, notes, value accessors with type checking, child node accessors and `each` and `map` methods (with type checking), and the ability to retrieve the root of a tree from any member node.
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
344
07a6908 @chriswailes Finished moving README.md over to Markdown. Started adding YARD annot…
authored May 24, 2012
345 To create your own AST node classes you subclass the {RLTK::ASTNode} class and then use the {RLTK::ASTNode.child} and {RLTK::ASTNode.value} methods. By declaring the children and values of a node the class will define the appropriate accessors with type checking, know how to pack and unpack a node's children, and know how to handle constructor arguments.
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
346
cf403ac Fixed a bug that resulted in incorrect follow-set computation. This w…
chris.wailes@gmail.com authored May 19, 2011
347 Here we can see the definition of several AST node classes that might be used to implement binary operations for a language:
348
349 class Expression < RLTK::ASTNode; end
350
351 class Number < Expression
352 value :value, Fixnum
353 end
354
355 class BinOp < Expression
356 value :op, String
357
358 child :left, Expression
359 child :right, Expression
360 end
361
1213fc8 @chriswailes Added a TODO item and some more documentation to the README file.
authored Aug 8, 2012
362 The assignment functions that are generated for the children and values perform type checking to make sure that the AST is well-formed. The type of a child must be a subclass of the {RLTK::ASTNode} class, whereas the type of a value must **NOT** be a subclass of the {RLTK::ASTNode} class. While child and value objects are stored as instance variables it is unsafe to assign to these variables directly, and it is strongly recommended to always use the accessor functions.
cf403ac Fixed a bug that resulted in incorrect follow-set computation. This w…
chris.wailes@gmail.com authored May 19, 2011
363
1213fc8 @chriswailes Added a TODO item and some more documentation to the README file.
authored Aug 9, 2012
364 When instantiating a subclass of {RLTK::ASTNode} the arguments to the constructor should be the node's values (in order of definition) followed by the node's children (in order of definition). If a constructor is given fewer arguments then the number of values and children the remaining arguments are assumed to be `nil`. Example:
cf403ac Fixed a bug that resulted in incorrect follow-set computation. This w…
chris.wailes@gmail.com authored May 19, 2011
365
366 class Foo < RLTK::ASTNode
367 value :a, Fixnum
368 child :b, Bar
369 value :c, String
370 child :d, Bar
371 end
372
1213fc8 @chriswailes Added a TODO item and some more documentation to the README file.
authored Aug 9, 2012
373 class Bar < RLTK::ASTNode
374 value :a, String
375 end
376
377 Foo.new(1, 'baz', Bar.new)
cf403ac Fixed a bug that resulted in incorrect follow-set computation. This w…
chris.wailes@gmail.com authored May 19, 2011
378
379 Lastly, the type of a child or value can be defined as an array of objects of a specific type as follows:
380
381 class Foo < RLTK::ASTNode
382 value :strings, [String]
383 end
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
384
1213fc8 @chriswailes Added a TODO item and some more documentation to the README file.
authored Aug 9, 2012
385 ### Tree Iteration and Mapping
386
387 RLTK Abstract Syntax Trees may be [traversed](http://en.wikipedia.org/wiki/Tree_traversal) in three different ways:
388
389 * Pre-order
390 * Post-order
391 * Level-order
392
393 The order you wish to traverse the tree can be specified by passing the appropriate symbol to {RLTK::ASTNode#each}: `:pre`, `:post`, or `:level`.
394
395 You can also map one tree to another tree using the {RLTK::ASTNode#map} and {RLTK::ASTNode#map!} methods. In the former case a new tree is created and returned; in the latter case the current tree is transformed and the result of calling the provided block on the root node is returned. These methods will always visit nodes in *post-order*, so that all children of a node are visited before the node itself.
396
163db55 @chriswailes Moved some folders around; finished the README.md file, and started a…
authored Jun 4, 2012
397 ## Code Generation
398
399 RLTK supports the generation of native code and LLVM IR, as well as JIT compilation and execution, through the {RLTK::CG} module. This module is built on top of bindings to [LLVM](http://llvm.org) and provides much, though not all, of the functionality of the LLVM libraries.
400
1213fc8 @chriswailes Added a TODO item and some more documentation to the README file.
authored Aug 9, 2012
401 A small amount of the functionality of the RLTK::CG module requires the [LLVM Extended C Bindings](https://github.com/chriswailes/llvm-ecb) library. If this library is missing the rest of the module should behave properly, but this functionality will be missing. The features that require this library are:
402
403 * **Shared Library Loading** - Load shared libraries into the process so that their exported symbols are visible to LLVM via the {RLTK::CG::Support.load\_library} method.
404 * **ASM Printer and Parser Initialization** - Available through the {RLTK::CG::LLVM.init\_asm\_parser} and {RLTK::CG::LLVM.init\_asm\_printer} methods.
405 * **LLVM IR Loading** - LLVM IR files can be loaded into RLTK via the {RLTK::CG::Module.read\_ir\_file} method.
406 * **Value Printing** - Print any value's LLVM IR to a given file descriptor using {RLTK::CG::Value#print}.
407 * **Targets, Target Data, and Target Machines** - Manipulate LLVM structures that contain data about the target environment.
408 * **Object File Generation** - LLVM Modules can be compiled to object files via the {RLTK::CG::Module#compile} method.
163db55 @chriswailes Moved some folders around; finished the README.md file, and started a…
authored Jun 4, 2012
409
410 ### Acknowledgments and Discussion
411
412 Before we get started with the details, I would like to thank [Jeremy Voorhis](https://github.com/jvoorhis/). The bindings present in RLTK are really a fork of the great work that he did on [ruby-llvm](https://github.com/jvoorhis/ruby-llvm).
413
414 Why did I fork ruby-llvm, and why might you want to use the RLTK bindings over ruby-llvm? There are a couple of reasons:
415
416 * **Cleaner Codebase** - The RLTK bindings present a cleaner interface to the LLVM library by conforming to more standard Ruby programming practices, providing better abstractions and cleaner inheritance hierarchies, overloading constructors and other methods properly, and performing type checking on objects to better aid in debugging.
1213fc8 @chriswailes Added a TODO item and some more documentation to the README file.
authored Aug 9, 2012
417 * **Documentation** - RLTK's bindings provide better documentation.
418 * **Completeness** - The RLTK bindings provide several features that are missing from the ruby-llvm project. These include the ability to initialize LLVM for architectures besides x86 (RLTK supports all architectures supported by LLVM), the presence of all of LLVM's optimization passes, the ability to print the LLVM IR representation of modules and values to files and load modules *from* files, easy initialization of native architectures, initialization for ASM printers and parsers, and compiling modules to object files.
419 * **Ease of Use** - Several features have been added to make generating code easier such as automatic management of memory resources used by LLVM.
163db55 @chriswailes Moved some folders around; finished the README.md file, and started a…
authored Jun 4, 2012
420 * **Speed** - The RLTK bindings are ever so slightly faster due to avoiding unnecessary FFI calls.
421
422 Before you dive into generating code, here are some resources you might want to look over to build up some background knowledge on how LLVM works:
423
424 * [Static Single Assignment Form](http://en.wikipedia.org/wiki/Static_single_assignment_form)
425 * [LLVM Intermediate Representation](http://llvm.org/docs/LangRef.html)
426
427 ### LLVM
428
429 Since RLTK's code generation functionality is built on top of LLVM the first step in generating code is to inform LLVM of the target architecture. This is accomplished via the {RLTK::CG::LLVM.init} method, which is used like this: `RLTK::CG::LLVM.init(:PPC)`. The {RLTK::CG::Bindings::ARCHS} constant provides a list of supported architectures. This call must appear before any other calls to the RLTK::CG module.
430
431 If you would like to see what version of LLVM is targeted by your version of RLTK you can either call the {RLTK::CG::LLVM.version} method or looking at the {RLTK::LLVM\_TARGET\_VERSION} constant.
432
433 ### Modules
434
435 Modules are one of the core building blocks of the code generation module. Functions, constants, and global variables all exist inside a particular module and, if you use the JIT compiler, a module provides the context for your executing code. New modules can be created using the {RLTK::CG::Module#initialize RLTK::CG::Module.new} method. While this method is overloaded you, as a library user, will always pass it a string as its first argument. This allows you to name your modules for easier debugging later.
436
437 Once you have created you can serialize the code inside of it into *bitcode* via the {RLTK::CG::Module#write\_bitcode} method. This allows you to save partially generated code and then use it later. To load a module from *bitcode* you use the {RLTK::CG::Module.read\_bitcode} method.
438
439 ### Types
440
441 Types are an important part of generating code using LLVM. Functions, operations, and other constructs use types to make sure that the generated code is sane. All types in RLTK are subclasses of the {RLTK::CG::Type} class, and have class names that end in "Type". Types can be grouped into to categories: fundamental and composite.
442
443 Fundamental types are those like {RLTK::CG::Int32Type} and {RLTK::CG::FloatType} that don't take any arguments when they are created. Indeed, these types are represented using a Singleton class, and so the `new` method is disabled. Instead you can use the `instance` method to get an instantiated type, or simply pass in the class itself whenever you need to reference the type. In this last case, the method you pass the class to will instantiate the type for you.
444
445 Composite types are constructed from other types. These include the {RLTK::CG::ArrayType}, {RLTK::CG::FunctionType}, and other classes. These types you must instantiate directly before they can be used, and you may not simply pass the type class as the type argument to functions inside the RLTK::CG module.
446
447 For convenience, the native integer type of the host platform is made available via {RLTK::CG::NativeIntType}.
448
449 ### Values
450
451 The {RLTK::CG::Value} class is the common ancestor of many classes inside the RLTK::CG module. The main way in which you, the library user, will interact with them is when creating constant values. Here is a list of some of value classes you might use:
452
453 * {RLTK::CG::Int1}
454 * {RLTK::CG::Int8}
455 * {RLTK::CG::Int16}
456 * {RLTK::CG::Int32}
457 * {RLTK::CG::Int64}
458 * {RLTK::CG::Float}
459 * {RLTK::CG::Double}
460 * {RLTK::CG::ConstantArray}
461 * {RLTK::CG::ConstantStruct}
462
463 Again, for convenience, the native integer class of the host platform is made available via {RLTK::CG::NativeInt}.
464
465 ### Functions
466
467 Functions in LLVM are much like C functions; they have a return type, argument types, and a body. Functions may be created in several ways, though they all require a module in which to place the function.
468
469 The first way to create functions is via a module's function collection:
470
471 mod.functions.add('my function', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
472
473 Here we have defined a function named 'my function' in the `mod` module. It takes two native integers as arguments and returns a native integer. It is also possible to define the type of a function ahead of time and pass it to this method:
474
475 type = RLTK::CG::FunctionType.new(RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
476 mod.functions.add('my function', type)
477
478 Functions may also be created directly via the {RLTK::CG::Function#initialize RLTK::CG::Function.new} method, though a reference to a module is still necessary:
479
480 mod = Module.new('my module')
481 fun = Function.new(mod, 'my function', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
482
483 or
484
485 mod = Module.new('my module')
486 type = RLTK::CG::FunctionType.new(RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
487 fun = Function.new(mod, 'my function', type)
488
489 Lastly, whenever you use one of these methods to create a function you may give it a block to be executed inside the context of the function object. This allows for easier building of functions:
490
491 mod.functions.add('my function', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType]) do
492 bb = blocks.append('entry)'
493 ...
494 end
495
496 ### Basic Blocks
497
498 Once a function has been added to a module you will need to add {RLTK::CG::BasicBlock BasicBlocks} to the function. This can be done easily:
499
500 bb = fun.blocks.append('entry')
501
502 We now have a basic block that we can use to add instructions to our function and get it to actually do something. You can also instantiate basic blocks directly:
503
504 bb = RLTK::CG::BasicBlock.new(fun, 'entry')
505
506 ### The Builder
507
508 Now that you have a basic block you need to add instructions to it. This is accomplished using a {RLTK::CG::Builder builder}, either directly or indirectly.
509
510 To add instructions using a builder directly (this is most similar to how it is done using C/C++) you create the builder, position it where you want to add instructions, and then build them:
511
0abf7ca @chriswailes Updated the Builder class and propagated the changes throughout the c…
authored Aug 8, 2012
512 fun = mod.functions.add('add', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
163db55 @chriswailes Moved some folders around; finished the README.md file, and started a…
authored Jun 4, 2012
513 bb = fun.blocks.append('entry')
514
515 builder = RLTK::CG::Builder.new
516
517 builder.position_at_end(bb)
518
519 # Generate an add instruction.
520 inst0 = builder.add(fun.params[0], fun.params[1])
521
522 # Generate a return instruction.
523 builder.ret(inst0)
524
525 You can get rid of some of those references to the builder by using the {RLTK::CG::Builder#build} method:
526
0abf7ca @chriswailes Updated the Builder class and propagated the changes throughout the c…
authored Aug 8, 2012
527 fun = mod.functions.add('add', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
163db55 @chriswailes Moved some folders around; finished the README.md file, and started a…
authored Jun 4, 2012
528 bb = fun.blocks.append('entry')
529
530 builder = RLTK::CG::Builder.new
531
0abf7ca @chriswailes Updated the Builder class and propagated the changes throughout the c…
authored Aug 8, 2012
532 builder.build(bb) do
163db55 @chriswailes Moved some folders around; finished the README.md file, and started a…
authored Jun 4, 2012
533 ret add(fun.params[0], fun.params[1])
534 end
535
536 To get rid of more code:
537
0abf7ca @chriswailes Updated the Builder class and propagated the changes throughout the c…
authored Aug 8, 2012
538 fun = mod.functions.add('add', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
163db55 @chriswailes Moved some folders around; finished the README.md file, and started a…
authored Jun 4, 2012
539 bb = fun.blocks.append('entry')
540
0abf7ca @chriswailes Updated the Builder class and propagated the changes throughout the c…
authored Aug 8, 2012
541 RLTK::CG::Builder.new(bb) do
163db55 @chriswailes Moved some folders around; finished the README.md file, and started a…
authored Jun 4, 2012
542 ret add(fun.params[0], fun.params[1])
543 end
544
545 Or:
546
0abf7ca @chriswailes Updated the Builder class and propagated the changes throughout the c…
authored Aug 8, 2012
547 fun = mod.functions.add('add', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType])
548 fun.blocks.append('entry') do
163db55 @chriswailes Moved some folders around; finished the README.md file, and started a…
authored Jun 4, 2012
549 ret add(fun.params[0], fun.params[1])
550 end
551
552 Or even:
553
0abf7ca @chriswailes Updated the Builder class and propagated the changes throughout the c…
authored Aug 8, 2012
554 mod.functions.add('add', RLTK::CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType]) do
555 blocks.append('entry') do |fun|
163db55 @chriswailes Moved some folders around; finished the README.md file, and started a…
authored Jun 4, 2012
556 ret add(fun.params[0], fun.params[1])
557 end
558 end
559
560 In the last two examples a new builder object is created for the block. It is possible to specify the builder to be used:
561
562 builder = RLTK::CG::Builder.new
563
0abf7ca @chriswailes Updated the Builder class and propagated the changes throughout the c…
authored Aug 8, 2012
564 mod.functions.add('add', RLTK:CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType]) do
565 blocks.append('entry', builder) do |fun|
566 ret add(fun.params[0], fun.params[1])
567 end
163db55 @chriswailes Moved some folders around; finished the README.md file, and started a…
authored Jun 4, 2012
568 end
569
570 For an example of where this is useful, see the Kazoo tutorial.
571
572 ### Execution Engines
573
574 Once you have generated your code you may want to run it. RLTK provides bindings to both the LLVM interpreter and JIT compiler to help you do just that. Creating a JIT compiler is pretty simple.
575
576 mod = RLTK::CG::Module.new('my module')
577 jit = RLTK::CG::JITCompiler(mod)
578
579 mod.functions.add('add', RLTK:CG::NativeIntType, [RLTK::CG::NativeIntType, RLTK::CG::NativeIntType]) do
580 blocks.append('entry', nil, nil, self) do |fun|
581 ret add(fun.params[0], fun.params[1])
582 end
583 end
584
585 Now you can run your 'add' function like this:
586
587 jit.run(fun, 1, 2)
588
589 The result will be a {RLTK::CG::GenericValue} object, and you will want to use its {RLTK::CG::GenericValue#to\_i #to\_i} and {RLTK::CG::GenericValue#to\_f #to\_f} methods to get the Ruby value result.
590
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
591 ## Context-Free Grammars
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
592
07a6908 @chriswailes Finished moving README.md over to Markdown. Started adding YARD annot…
authored May 24, 2012
593 The {RLTK::CFG} class provides an abstraction for context-free grammars. For the purpose of this class terminal symbols appear in **ALL CAPS**, and non-terminal symbols appear in **all lowercase**. Once a grammar is defined the {RLTK::CFG#first_set} and {RLTK::CFG#follow_set} methods can be used to find *first* and *follow* sets.
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
594
163db55 @chriswailes Moved some folders around; finished the README.md file, and started a…
authored Jun 4, 2012
595 ### Defining Grammars
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
596
07a6908 @chriswailes Finished moving README.md over to Markdown. Started adding YARD annot…
authored May 24, 2012
597 A grammar is defined by first instantiating the {RLTK::CFG} class. The {RLTK::CFG#production} and {RLTK::CFG#clause} methods may then be used to define the productions of the grammar. The `production` method can take a Symbol denoting the left-hand side of the production and a string describing the right-hand side of the production, or the left-hand side symbol and a block. In the first usage a single production is created. In the second usage the block may contain repeated calls to the `clause` method, each call producing a new production with the same left-hand side but different right-hand sides. {RLTK::CFG#clause} may not be called outside of {RLTK::CFG#production}. Bellow we see a grammar definition that uses both methods:
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
598
599 grammar = RLTK::CFG.new
600
601 grammar.production(:s) do
602 clause('A G D')
603 clause('A a C')
604 clause('B a D')
605 clause('B G C')
606 end
607
608 grammar.production(:a, 'b')
609 grammar.production(:b, 'G')
610
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
611 ### Extended Backus–Naur Form
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
612
613 The RLTK::CFG class understands grammars written in the extended Backus–Naur form. This allows you to use the \*, \+, and ? operators in your grammar definitions. When each of these operators are encountered additional productions are generated. For example, if the right-hand side of a production contained 'NUM*' a production of the form 'num_star -> | NUM num_star' is added to the grammar. As such, your grammar should not contain productions with similar left-hand sides (e.g. foo_star, bar_question, or baz_plus).
614
07a6908 @chriswailes Finished moving README.md over to Markdown. Started adding YARD annot…
authored May 24, 2012
615 As these additional productions are added internally to the grammar a callback functionality is provided to let you know when such an event occurs. The callback proc object can either be specified when the CFG object is created, or by using the {RLTK::CFG#callback} method. The callback will receive three arguments: the production generated, the operator that triggered the generation, and a symbol (:first or :second) specifying which clause of the production this callback is for.
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
616
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
617 ### Helper Functions
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
618
619 Once a grammar has been defined you can use the following functions to obtain information about it:
620
14a22ac @chriswailes Finished documenting things as they currently stand. New documentatio…
authored Jun 6, 2012
621 * {RLTK::CFG#first_set} - Returns the *first set* for the provided symbol or sentence.
622 * {RLTK::CFG#follow_set} - Returns the *follow set* for the provided symbol.
623 * {RLTK::CFG#nonterms} - Returns a list of the non-terminal symbols used in the grammar's definition.
624 * {RLTK::CFG#productions} - Provides either a hash or array of the grammar's productions.
625 * {RLTK::CFG#symbols} - Returns a list of all symbols used in the grammar's definition.
626 * {RLTK::CFG#terms} - Returns a list of the terminal symbols used in the grammar's definition.
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
627
163db55 @chriswailes Moved some folders around; finished the README.md file, and started a…
authored Jun 4, 2012
628 ## Tutorial
629
630 What follows is an in-depth example of how to use the Ruby Language Toolkit. This tutorial will show you how to use RLTK to build a lexer, parser, AST nodes, and compiler to create a toy language called Kazoo. The tutorial is based on the LLVM [Kaleidoscope](http://llvm.org/docs/tutorial/) tutorial, but has been modified to:
631
632 * a) be done in Ruby
633 * 2) use a lexer and parser generator and
634 * III) use a language that I call Kazoo, which is really just a cleaned up and simplified version of the Kaleidoscope language used in the LLVM tutorial (as opposed to the [Kaleidoscope language](http://en.wikipedia.org/wiki/Kaleidoscope_%28programming_language%29) from the 90′s).
635
2b74d51 @chriswailes More work on the documentation.
authored Jun 4, 2012
636 The Kazoo toy language is a procedural language that allows you to define functions, use conditionals, and perform basic mathematical operations. Over the course of the tutorial we’ll extend Kazoo to support the if/then/else construct, for loops, JIT compilation, and a simple command line interface to the JIT.
163db55 @chriswailes Moved some folders around; finished the README.md file, and started a…
authored Jun 4, 2012
637
2b74d51 @chriswailes More work on the documentation.
authored Jun 4, 2012
638 Because we want to keep things simple the only datatype in Kazoo is a 64-bit floating point type (a C double or a Ruby float). As such, all values are implicitly double precision and the language doesn’t require type declarations. This gives the language a very nice and simple syntax. For example, the following example computes Fibonacci numbers:
163db55 @chriswailes Moved some folders around; finished the README.md file, and started a…
authored Jun 4, 2012
639
640 def fib(x)
641 if x < 3 then
642 1
643 else
644 fib(x-1) + fib(x-2)
645
646 The tutorial is organized as follows:
647
648 * [Chapter 1: The Lexer](file.Chapter1.html)
649 * [Chapter 2: The AST Nodes](file.Chapter2.html)
650 * [Chapter 3: The Parser](file.Chapter3.html)
651 * [Chapter 4: AST Translation](file.Chapter4.html)
652 * [Chapter 5: JIT Compilation](file.Chapter5.html)
653 * [Chapter 6: Adding Control Flow](file.Chapter6.html)
654 * [Chapter 7: Playtime](file.Chapter7.html)
655 * [Chapter 8: Mutable Variables](file.Chapter8.html)
656
2b74d51 @chriswailes More work on the documentation.
authored Jun 4, 2012
657 Before starting this tutorial you should know about regular expressions, the basic ideas behind lexing and parsing, and be able to read context-free grammar (CFG) definitions. By the end of this tutorial we will have written 372 lines of source code and have a JIT compiler for a Turing complete language.
163db55 @chriswailes Moved some folders around; finished the README.md file, and started a…
authored Jun 4, 2012
658
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
659 ## Provided Lexers and Parsers
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
660
661 The following lexer and parser classes are included as part of RLTK:
662
e17b68f @chriswailes Going to merge with the master branch now. Will then work on re-doing…
authored May 22, 2012
663 * {RLTK::Lexers::Calculator}
664 * {RLTK::Lexers::EBNF}
665 * {RLTK::Parsers::PrefixCalc}
666 * {RLTK::Parsers::InfixCalc}
667 * {RLTK::Parsers::PostfixCalc}
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
668
50d2cee @chriswailes In the process of switching to YARD (using Markdown) from RDoc.
authored May 21, 2012
669 ## Contributing
dc85480 Worked on the documentation.
chris.wailes@gmail.com authored Apr 10, 2011
670
671 If you are interested in contributing to RLTK you can:
672
673 * Help provide unit tests. Not all of RLTK is tested as well as it could be. Specifically, more tests for the RLTK::CFG and RLTK::Parser classes would be appreciated.
674 * Write lexers or parsers that you think others might want to use. Possibilities include HTML, JSON/YAML, Javascript, and Ruby.
675 * Write a class for dealing with regular languages.
676 * Extend the RLTK::CFG class with additional functionality.
855b3ef Finished the documentation in README.
chris.wailes@gmail.com authored Apr 13, 2011
677 * Let me know if you found any part of this documentation unclear or incomplete.
Something went wrong with that request. Please try again.