Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 149 lines (118 sloc) 7.83 kb
ab273bf Daniel Spiewak Added some brief observations
authored
1 ====================================
2 Implementation Notes and Experiences
3 ====================================
4
f9a760d Daniel Spiewak Added some conceptual implementation notes
authored
5 The original GLL algorithm is quite dependent upon an unrestricted ``goto``
6 statement. In fact, the form of ``goto`` required by GLL is even unavailable in
7 C, forcing the original authors to implement a workaround of their own by using
8 a "big ``switch``" within the ``L0`` branch. Obviously, this algorithm is not
9 immediately ammenable to implementation in a functional language, much less a
10 cleanly-separated implementation using combinators.
11
12 The critical observation which allows ``goto``-less implementation of the algorithm
13 is in regards to the nature of the ``L0`` branch. Upon close examination of the
14 algorithm, it becomes apparent that ``L0`` can be viewed as a *trampoline*, a
15 concept which is quite common in functional programming as a way of implementing
16 stackless mutual tail-recursion. In the case of GLL, this trampoline function
17 must not only dispatch the various alternate productions (also represented as
18 functions) but also have some knowledge of the GSS and the dispatch queue itself.
19 In short, ``L0`` is a trampoline function with some additional smarts to deal
20 with divergent and convergent branches.
21
22 Once this observation is made, the rest of the implementation just falls into
23 place. Continuations (wrapped up in anonymous functions) can be used to satisfy
24 the functionality of an unrestricted ``goto``, assuming a trampoline function
25 as described above. Surprisingly, this scheme divides itself quite cleanly into
26 combinator-like constructs, further reinforcing the claim that GLL is just another
27 incarnation of recursive-descent.
28
29
30 Bumps in the Road
31 =================
32
ab273bf Daniel Spiewak Added some brief observations
authored
33 Computation of true PREDICT sets is impossible because no ``Parser`` instance
34 actually knows what its successor is. Thus, we cannot compute FOLLOW sets
35 without "stepping out" into the parent parser. To avoid this, we say that
36 whenever FIRST(a) = { }, PREDICT(a) = \Sigma. Less-formally, if a parser goes
37 to \epsilon, then its (uncomputed) PREDICT set is satisfied by *any* input.
38
39 Our GSS seems to be somewhat less effective than that of GLL due to the fact that
40 parallel sequential parsers with shared suffixes do not actually share state.
41 Thus, we could easily get the following situation in our GSS::
42
43 C -- D -- F
44 /
45 A -- B
46 \
47 E -- D -- F
48
49 Notice that the ``D -- F`` suffix is shared, but because it is in separate parsers,
50 it will not be merged. Note however that if these two branches *reduce* to the
51 same value, that result will be merged. Alternatively, these branches may reduce
52 to differing values but eventually go to the same parser. When this happens, it
53 will be considered as a common prefix and merged accordingly (*not sure of this is sound sound*).
54
55 Greedy vs lazy matching seems to be a problem. Consider the following grammar::
56
57 A ::= 'a' A
58 |
59
60 This grammar is actually quite ambiguous. The input string "``aaa``" may parse
61 as ``Success("", Stream('a', 'a', 'a'))``, ``Success("a", Stream('a', 'a'))``,
62 ``Success("aa", Stream('a'))`` or ``Success("aaa", Stream())``. Obviously, this
63 is a problem. Or rather, this is a problem if we want to maintain PEG semantics.
699e811 Daniel Spiewak Solution to the lazy/greedy problem (I hope)
authored
64 In order to solve this problem, we need to define ``apply(...)`` for ``NonTerminalParser``
65 so that any ``Success`` with a ``tail != Stream()`` becomes a ``Failure("Expected end of stream", tail)``.
17a4c97 Daniel Spiewak Solution to the nasty equality problem
authored
66
67 Parser equality is a very serious issue. Consider the following parser
68 declaration::
69
70 def p: Parser[Any] = p | "a"
71
72 While it would be nice to say that ``p == p'``, where ``p'`` is the "inner ``p``",
73 the recursive case. Unfortunately, these are actually two distinct instance of
74 ``DisjunctiveParser``. This means that we cannot simply check equality to avoid
75 infinite recursion.
76
77 To solve this, we need to get direct access to the ``p`` thunk and check its
78 *class* rather than its *instance*. To do this, we will use Java reflection to
79 access the field value without allowing the Scala compiler to transparently
80 invoke the thunk. Once we have this value, we can invoke ``getClass`` and quickly
29a37bf Daniel Spiewak Workarounds for issues with thunk-based equality
authored
81 perform the comparison. The only problem with this solution is it forces all of
82 the thunk-uses to be logical constants. Thus, we cannot define a parser in the
83 following way::
84
85 def p = make() | make()
86
87 def make() = literal(Math.random.toString)
88
89 The ``DisjunctiveParser`` contained by ``p`` will consider both the left ``make()``
90 and the right ``make()`` to be exactly identical. Fortunately, we can safely
91 assume that grammars are constructed in a declarative fashion. The downside is
92 when people *do* try something like this, the result will be fairly bizzare from
93 a user's standpoint.
aa41b8e Daniel Spiewak Made mention of the problem with left-recursion and infinite queueing
authored
94
95 Another interesting issue is one which arises in conjunction with left-recursion.
96 Consider the following grammar::
97
98 def p: Parser[Any] = p ~ "a" | "a"
99
100 This grammar is quite unambiguous (so long as the parse is greedy), but it will
101 still lead to non-terminating execution for an input of ``Stream('a')``. This is
102 because the parser will handle the single character using the second production
103 while simultaneously queueing up the first production rule against the untouched
104 stream (``Stream('a')``). This rule will in turn queue up two more parsers: the
105 first and second rules again. The second rule will immediately match, produce a
106 duplicate result and be discarded. However, the *first* rule will behave exactly
107 as it did before, queueing up two more parsers without consuming any of the stream.
108 Needless to say, this is a slight issue.
109
110 The solution here is that the second queueing of the first rule must lead to a
111 memoization of the relevant parse. The second pass over the second rule should
112 return that result through the second queueing, saving that result in ``popped``
113 and avoiding the divergence. Thus, left-recursive rules will go *one* extra
114 queueing, but this extra step will be pruned as the successful parse will avoid
115 any additional repetition. Unfortunately, this solution is made more difficult
116 to implement due to the fact that disjunctive parsers are never themselves pushed
117 onto the dispatch queue. ``Trampoline`` does not know of any connection between
118 the first and second productions of a disjunction. It only knows that the two
119 separate productions have been pushed.
83679bf Daniel Spiewak Described ThunkParser solution
authored
120
121 To solve this problem in a practical way, we need to introduce another ``Parser``
122 subtype: ``ThunkParser``. This parser just delegates everything to its wrapper
123 parser with the exception of ``queue``, which it leaves abstract. This parser
124 is instantiated using an anonymous inner-class within ``DisjunctiveParser`` to
125 handle the details of queueing up the separate productions without "losing" the
126 disjunction itself.
377700a Daniel Spiewak Added note about Seq#toStream
authored
127
128 Another problem encountered while attempting to implement the trampoline is that
129 Scala's ``Stream`` implementation isn't quite what one would expect. In particular,
130 equality is defined on a reference basis, rather than logical value. Thus,
131 two streams which have the same contents may not necessarily be equivalent according
132 to ``equals(...)``. This isn't normally an issue, but it does cause problems
133 with the ``Seq#toStream`` method::
134
135 "".toStream == "".toStream // => false!!
136
137 For non-left-recursive grammars, this will lead to duplicate results from the
138 parse. However, for left-recursive grammars, this could actually lead to
139 divergence. This isn't really a problem with GLL or the combinator implementation.
140 Rather, it is an issue with the Scala ``Stream`` implementation. To avoid this,
141 we must ensure that all input streams are created using ``Stream()``, ``Stream.cons``
142 and ``Stream.empty``.
a2baa20 Daniel Spiewak Added TODO note
authored
143
144
145 From Recognizer to Parser
146 =========================
147
148 *TODO*
Something went wrong with that request. Please try again.