public
Description: A system for creating fast, reusable parsers
Homepage: http://www.reverberate.org/gazelle/
Clone URL: git://github.com/haberman/gazelle.git
haberman (author)
Sat Feb 21 18:43:54 -0800 2009
commit  e3d2566a0f805cc67b44f1f065d57eb99b9ed141
tree    3886bc67dd20482d058aa425eebdb90772ee9b0e
parent  338a003e541f73d9c22741174362447f56654d02
gazelle / ReleaseNotes
100644 141 lines (117 sloc) 7.037 kb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
 
Gazelle 0.4, released January 21, 2009 =========================================
 
  Overview: This release addresses all of the most pressing issues that were
  preventing Gazelle from being usable. The biggest gap in Gazelle's
  capabilities was a lack of whitespace handling; this feature has been added
  in this release. A number of other capabilities have been added; notably, the
  API can now return parse errors gracefully to the client application instead
  of doing assert(false) or exit(1).
 
  New Features:
  * Gazelle can now handle languages with whitespace that can appear in many
    places in the grammar. You create a terse specification of where the
    whitespace can appear using the @allow command. The whitespace becomes
    part of the parse tree, and therefore a round-trip to and from the parse
    tree is lossless.
  * Gazelle can return parse errors to the client application.
  * Line/column information is now available, in addition to the raw byte
    offset that was previously available. This makes it easier to print
    useful messages for things like parse errors.
  * There is support for resource limits on the parse stack (memory) and the
    lookahead buffer (memory). Input that attempts to exceed these limits
    will cause Gazelle to report the resource overage to the client.
  * The runtime properly supports both hard EOF (grammar reaches a state where
    it cannot accept any more characters) and soft EOF (the input reaches EOF,
    and the grammar was in a state where EOF was valid).
  * There is a buffering layer that can read a file sequentially but preserve
    data for currently unfinished terminals.
 
  Packaging:
  * There is now a "make install" target. This will install:
    - gzlc (the Gazelle compiler) into $PREFIX/bin, in a standalone binary that
      has Lua linked and all the Lua source embedded. This means the binary
      isn't dependent on being able to find a bunch of Lua source files via
      LUA_PATH.
    - gzlparse into $PREFIX/bin
    - headers into $PREFIX/include
    - libraries into $PREFIX/lib
  * The header files are now in a gazelle/ directory, so that they can be
    sanely installed in /usr/include on a UNIX system.
  * The C API now has its types, functions, and constants prefixed with gzl_
    or GZL_, to keep from polluting the global namespace with types like
    "terminal."
 
Gazelle 0.3, released November 30, 2008 =======================================
 
  Overview: This release represents a lot of work on Gazelle's grammar analysis
  and lookahead generation. What this means for users is that many many cases
  where Gazelle would previously hang or generate incorrect grammar have been
  fixed, and there are lots of unit tests to ensure these don't break in the
  future. There is also one new feature: explicit ambiguity resolution.
 
  Core Parsing Algorithm:
  * Gazelle now detects many errors in grammars:
    - left-recursion
    - rules that have no non-recursive alternative
    - grammars that are probably not LL(*) (using a heuristic)
  * Gazelle now supports many LL(*) grammars (that have cyclic GLA's) in
    addition to LL(k). This puts us on par with ANTLR.
  * It is no longer possible to make Gazelle hang by feeding it the wrong
    grammar. Since the problem of whether a grammar is LL(*) is undecidable
    in general, Gazelle gives you two options for how to bound the amount of
    work. You can either use a heuristic to detect cases where the grammar
    is probably not LL(*), or you can specify a bound on k.
  * The generated lookahead can now take EOF into account, which allows us to
    support grammars where we don't know what alternative to take until we see
    EOF.
 
  New Features:
  * Grammar files now support C-style comments: // and /* */.
  * Explicit ambiguity resolution allows grammar authors to specify priorities
    between alternatives that would otherwise be ambiguous. The need for this
    arises in cases like the famous "if-then-else" problem, which is inherently
    ambigous but has a rule for how this ambiguity should be resolved.
 
  Backward Incompatibilities:
  * All regular expressions that are defined within rules (as opposed to being
    named beforehand) must now be named within the rule. For example:
 
      a -> /foo/;
 
    ...is no longer allowed, you must write:
 
      a -> .foo=/foo/;
 
    It's important that regular expressions get a reasonable name. Also,
    unnamed regular expressions conflict with the new syntax for prioritized
    choice.
 
  Caveats:
  * Although the compiler supports EOF in lookahead, proper support has not
    been added to the runtime yet.
  * There's still no way to ignore whitespace yet. This will hopefully come
    in the next release.
 
Gazelle 0.2, released June 29, 2008 ===========================================
 
  Overview: This is a major overhaul of Gazelle's guts. It also has
  significant usability improvements in its documentation and command-line
  tools. It is the first release of Gazelle I would actually recommend
  that people try out.
 
  Core Parsing Algorithm:
  * The compiler now calculates true Strong-LL(k) lookahead. As a result,
    it supports far more grammars than Gazelle 0.1 did.
  * The runtime component now represents all of its state explicitly in its
    stack. As a result, parsing can be interrupted and resumed on arbitrary
    byte boundaries.
  * The "ignore" feature was completely removed, because experience and deeper
    thought revealed that it was not the right abstraction. As a result, this
    version of Gazelle has no way to deal with arbitrary whitespace. A
    replacement feature will come in the future to fill this gap.
 
  Usability:
  * There is a manual (though very much in-progress).
  * The compiler is easier to invoke and has useful --help.
  * The compiler is capable of dumping a visual representation of the grammar
    to HTML if graphviz is installed.
  * gzlparse can dump the parse tree into JSON format.
 
  Caveats (all of these will be fixed sooner or later):
  * The language and all APIs are still very much subject to change.
  * The current release cannot handle whitespace/comments/etc. in languages.
  * Parse errors will exit the program, instead of being reported to the API.
  * There are lots of edge cases the compiler doesn't properly deal with yet.
    Some will make the compiler go into an infinite loop, others will cause
    it to generate incorrect output.
 
Gazelle 0.1, released December 16, 2007 =======================================
 
  Overview: This is a preliminary release of Gazelle, and is not very usable.
  It can parse only a very small class of languages, and all tools/APIs are
  very rough around the edges.
 
  Core Parsing Algorithm:
  * The supported lookahead is weak: only Simple-LL(1) parsing.
  * The runtime is reasonably fast and has a very low memory footprint.
  * The compiler and runtime support loading and storing the compiled version
    of the grammar to bytecode.