public
Description: A system for creating fast, reusable parsers
Homepage: http://www.reverberate.org/gazelle/
Clone URL: git://github.com/haberman/gazelle.git
haberman (author)
Sat Feb 21 18:43:54 -0800 2009
commit  e3d2566a0f805cc67b44f1f065d57eb99b9ed141
tree    3886bc67dd20482d058aa425eebdb90772ee9b0e
parent  338a003e541f73d9c22741174362447f56654d02
gazelle / TODO
100644 50 lines (41 sloc) 2.166 kb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
 
This is a list of things I definitely want to do. I don't go into too many
specifics, because the specifics change often as I think about the problem
harder.
 
* What about GLAs with EOF out of the start state? If there is only one
  non-EOF transition, it *seems* pointless, because our normal EOF stack
  unwinding seems to cover it. On the other hand, verifying this is always
  true seems like a lot of hard thought. Let's think of it as an optimization
  and do it later.
 
Language:
* make semicolons optional at end of line.
 
Compiler / Grammar Analysis (all changes should have test-cases):
* make tostring() methods for all of the grammar objects.
* create Lua bindings for the C grammar representation, which we can use
  to write tests that verify that the grammar makes it through the bytecode
  step unchanged.
* (maybe) take regular subset of non-regular lookahead when it doesn't
  cause alternatives' languages to intersect.
* (maybe) support full-LL by having first states of GLA have edges that
  are the states on the return stack at runtime.
* detect cases where some RTN alternatives have no GLA final state.
  (shouldn't happen, modulo bugs).
* deal with lexer-level ambiguity. longest match will do for now, but
  we're not currently detecting s -> "A" | "AB";
 
Runtime:
* Richer callback specifiers.
* Bring back slotbufs: a cheap (stack only, no heap) way of saving parse_vals for
  the currently-open nodes of the parse tree.
* Provide a buffering layer.
* As the runtime starts to mature: language bindings.
 
Tests! Everything should have tests, as much as possible:
* all aspects of compilation
* bitcode format (both reading and writing)
* JSON output from gzlparse. Both well-formedness and accuracy.
 
Major design areas that exist only in my head:
* Character sets other than ASCII.
* Embedding Lua to do things only an imperative language can do.
* Operator-precedence parsing using the shunting yard algorithm.
* Parallel parsers (for both embedded languages and things like whitespace/comments)
* Error recovery (basically: just yield to an imperative function).
* Generate imperative bytecode / JIT compile.
* AST-building.