-
Notifications
You must be signed in to change notification settings - Fork 17
/
leex.txt
320 lines (244 loc) · 8.89 KB
/
leex.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
MODULE
leex
MODULE SUMMARY
Lexical analyzer generator for Erlang.
DESCRIPTION
A regular expression based lexical analyzer generator for
Erlang, similar to lex or flex.
DATATYPES
ScanRet = {ok,Tokens,EndLine} |
{eof,EndLine} |
{error,ErrorDescriptor,EndLine}
Cont = continuation()
ErrorDescriptor = {ErrorLine,Module,Error}
EXPORTS
file(FileName) -> ok | error
file(FileName, Options) -> ok | error
Generate a lexical analyzer from the definition in the input
file. The input file has the extension .xrl. This is added to
the filename if it is not given. The resulting module is the
Xrl filename without the .xrl extension.
The current options are:
dfa_graph
generate a .dot file which contains a desciption of
the DFA in a format which can be viewd with Graphviz,
www.graphviz.com
{includefile,File}
Use a specific or customised prologue file instead of
default leex/include/leexinc.hrl which is otherwise
included.
{report_errors, bool()}
Causes errors to be printed as they occur. Default is
true.
{report_warnings, bool()}
Causes warnings to be printed as they occur. Default
is true.
{report, bool()}
This is a short form for both report_errors and
report_warnings.
{return_errors, bool()}
If this flag is set, {error, Errors, Warnings} is
returned when there are errors. Default is false.
{return_warnings, bool()}
If this flag is set, an extra field containing
Warnings is added to the tuple returned upon
success. Default is false.
{return, bool()}
This is a short form for both return_errors and
return_warnings.
{scannerfile, ScannerFile}
ScannerFile is the name of the file that will contain
the Erlang scanner code that is generated. The default
("") is to add the extension .erl to FileName stripped
of the .xrl extension.
{verbose,bool()}
Outputs information from parsing the input file and
generating the internal tables.
Any of the Boolean options can be set to true by stating the
name of the option. For example, verbose is equivalent to
{verbose, true}.
GENERATED SCANNER EXPORTS
string(String) -> ScanRet
string(String, StartLine) -> ScanRet
Scan String and return all the tokens in it, or an error.
N.B. it is an error if all of the characters in String are
not consumed.
token(Cont, Chars) -> {more,Cont} | {done,ScanRet,RestChars}
token(Cont, Chars, StartLine) -> {more,Cont} | {done,ScanRet,RestChars}
This is a re-entrant call to try and scan one token from
Chars. If there are enough characters in Chars to either scan
a token or detect an error then this will be returned with
{done,...}. Otherwise {cont,Cont} will be returned where Cont
is used in the next call to token with more characters to try
an scan the token. This is continued until a token has been
scanned. Cont is initially [].
It is not designed to be called directly by an application but
used through the i/o system where it can typically called in
an application by:
io:request(InFile, {get_until,Prompt,Module,token,[Line]})
-> ScanRet
tokens(Cont, Chars) -> {more,Cont} | {done,ScanRet,RestChars}
tokens(Cont, Chars, StartLine) -> {more,Cont} | {done,ScanRet,RestChars}
This is a re-entrant call to try and scan tokens from Chars.
If there are enough characters in Chars to either scan tokens
or detect an error then this will be returned with {done,...}.
Otherwise {cont,Cont} will be returned where Cont is used in
the next call to tokens with more characters to try an scan
the tokens. This is continued until all tokens have been
scanned. Cont is initially [].
This functions differs from token in that it will continue to
scan tokens upto and including an {end_token,Token} has been
scanned (see next section). It will then return all the
tokens. This is typically used for scanning grammars like
Erlang where there is an explicit end token, '.'. If no end
token is found then the whole file will be scanned and
returned. If an error occurs then all tokens upto and
including the next end token will be skipped.
It is not designed to be called directly by an application but
used through the i/o system where it can typically called in
an application by:
io:request(InFile, {get_until,Prompt,Module,tokens,[Line]})
-> ScanRet
format_error(ErrorDescriptor) -> Chars
Types:
ErrorDescriptor = errordesc()
Chars = [char() | Chars]
Returns a string which describes the error ErrorDescriptor
returned when there is an error in a regular expression.
Input File Format
Erlang style comments starting with a '%' are allowed in
scanner files. A definition file has the following format:
<Header>
Definitions.
<Macro Definitions>
Rules.
<Token Rules>
Erlang Code.
<Erlang Code>
The "Definitions.", "Rules." and "Erlang Code." headings are
mandatory and must occur at the beginning of a source line.
The <Header>, <Macro Definitions> and <Erlang Code> sections
maybe empty but there must be at least one rule.
Macro definitions have the following format:
NAME = VALUE
and there must be spaces around '='. Macros can be used in the
regular expressions of rules by writing {NAME}. N.B. when
macros are expanded in expressions the macro calls are
replaced by the macro value without any form of quoting or
enclosing in parentheses.
Rules have the following format:
<Regexp> : <Erlang code>.
The <Regexp> must occur at the start of a line and not include
any blanks, use \t and \s to include TAB and SPACE characters
in the regular expression. If <Regexp> matches then the
corresponding <Erlang code> is evaluated to generate a
token. With the Erlang code the following predefined variables
are available:
TokenChars - a list of the characters in the matched token
TokenLen - the number of characters in the matched token
TokenLine - the line number where the token occured
The code must return:
{token,Token} - return Token to the caller
{end_token,Token} - return Token and is last token in a tokens
call
skip_token - skip this token completely
{error,ErrString} - an error in the token, ErrString is a string
describing the error
It is also possible to push back characters into the input
characters with the following returns:
{token,Token,PushBackList}
{end_token,Token,PushBackList}
{skip_token,PushBackList}
These have the same meanings as the normal returns but the
characters in PushBackList will be prepended to the input
characters and scanned for the next token. Note that pushing
back a newline will mean the line numbering will no longer be
correct. N.B. Pushing back characters gives you unexpected
possibilities to cause the scanner to loop!
The following example would match a simple Erlang integer or
float and return a token which could be sent to the Erlang
parser:
D = [0-9]
{D}+ : {token,{integer,
TokenLine,
list_to_integer(TokenChars)}}.
{D}+\.{D}+((E|e)(\+|\-)?{D}+)? :
{token,{float,TokenLine,list_to_float(TokenChars)}}.
The Erlang code in the "Erlang Code." section is written into
the output file directly after the module declaration and
predefined exports declaration so it is possible to add extra
exports, define imports and other attributes which are then
visible in the whole file.
Regular Expressions
The regular expressions allowed here is a subset of the set
found in egrep and in the AWK programming language, as defined
in the book, The AWK Programming Language, by A. V. Aho,
B. W. Kernighan, P. J. Weinberger. They are composed of the
following characters:
c
matches the non-metacharacter c.
\c
matches the escape sequence or literal character c.
.
matches any character.
^
matches the beginning of a string.
$
matches the end of a string.
[abc...]
character class, which matches any of the characters
abc... Character ranges are specified by a pair of
characters separated by a -.
[^abc...]
negated character class, which matches any character
except abc....
r1 | r2
alternation. It matches either r1 or r2.
r1r2
concatenation. It matches r1 and then r2.
r+
matches one or more rs.
r*
matches zero or more rs.
r?
matches zero or one rs.
(r)
grouping. It matches r.
The escape sequences allowed are the same as for Erlang strings:
\b
backspace
\f
form feed
\n
newline (line feed)
\r
carriage return
\t
tab
\e
escape
\v
vertical tab
\s
space
\d
delete
\ddd
the octal value ddd
\xhh
the hexadecimal value hh
\x{h...}
the hexadecimal value h...
\c
any other character literally, for example \\ for
backslash, \" for ")
The following examples define Erlang data types:
Atoms [a-z][0-9a-zA-Z_]*
Variables [A-Z_][0-9a-zA-Z_]*
Floats (\+|-)?[0-9]+\.[0-9]+((E|e)(\+|-)?[0-9]+)?
N.B. Anchoring a regular expression with ^ and $ is not
implemented in the current version of leex and just generates
a parse error.
AUTHORS
Robert Virding - rvirding@gmail.com
Copyright © 2008,2009 Robert Virding