#### All Trees with Earley's Parser

Consider Earley's parser from the course notes:

In [7]:
def parse(g: "grammar", x: "input"):
    global s
    n = len(x); x = '^' + x + '$'; S, π = g[0][0], g[0][2:]
    s = [{(S, '', π, 0)}] + [set() for _ in range(n)]#; print('   s[ 0 ]:', S, '→ •', π, ', 0')
    for i in range(n + 1):
        v = set() # visited items
        while v != s[i]:
            e = (s[i] - v).pop(); v.add(e) # pick an arbirary un-visited item
            A, σ, τ, j = e
            if len(τ) > 0 and τ[0] == x[i + 1]: # match, a == τ[0]
                f = (A, σ + τ[0], τ[1:], j)
                s[i + 1].add(f)#; print('M  s[', i + 1, ']:', f[0], '→', f[1], '•', f[2], ',', f[3])
            elif len(τ) > 0: # predict, B == ω[0]
                for f in ((r[0], '', r[2:], i) for r in g if r[0] == τ[0]):
                    s[i].add(f)#; print('P  s[', i, ']:', f[0], '→', f[1], '•', f[2], ',', f[3])
            else: # complete, len(τ) == 0
                for f in ((B, μ + ν[0], ν[1:], k) for (B, μ, ν, k) in s[j] if len(ν) > 0 and ν[0] == A):
                    s[i].add(f)#; print('C  s[', i, ']:', f[0], '→', f[1], '•', f[2], ',', f[3])
    return (S, π, '', 0) in s[n]

In grammar `G0`, the sentence `a+a+a` is accepted:

In [8]:
G0 = ("S→E", "E→a", "E→a+E")

In [9]:
assert parse(G0, "a+a+a")

In [10]:
G1 = ("S→E", "E→a", "E→E+E")

In [11]:
assert parse(G1, "a+a+a")

Modify Earley's parser to return the set of all parse trees instead of returning only if the input is accepted.

In [24]:
def parse(g: "grammar", x: "input"):
    global s
    n = len(x); x = '^' + x + '$'; S, π = g[0][0], g[0][2:]
    s = [{(S, '', π, 0)}] + [set() for _ in range(n)]
    for i in range(n + 1):
        v = set()
        while v != s[i]:
            e = (s[i] - v).pop(); v.add(e)
            A, σ, τ, j = e
            if len(τ) > 0 and τ[0] == x[i + 1]: # match, a == τ[0]
                f = (A, σ + τ[0], τ[1:], j)
                s[i + 1].add(f)
            elif len(τ) > 0: # predict, B == ω[0]
                for f in ((r[0], '', r[2:], i) for r in g if r[0] == τ[0]):
                    s[i].add(f)
            else: # complete, len(τ) == 0
                for f in ((B, μ + ν[0] + '(' + σ + ')', ν[1:], k) for (B, μ, ν, k) in s[j] if len(ν) > 0 and ν[0] == A):
                    s[i].add(f)

    return {π for (a, π, c, d) in s[n] if a == S and c == '' and d == 0}

A node `A` with children `x` and `y` can be represented by the string `A(xy)`. A simple way to construct a parse tree is inside Earley items: in `(A → σ • ω, j)`, the sequence `σ` corresponds to the part that has been recognized (starting at `j + 1`). That can now be replaced with the tree of the part that has been recognized. The completion step, upon recognizing `(A → σ •, j)`, replaces `(B → μ • A ξ, k)` with `(B → μ A • ξ, k)`. This can now be modified such that the new item represents the tree, `(B → μ A(σ) • ξ, k)`. The final set of items will then contain items of the form `(S → π •, 0)` where `π` is the tree. Then, all the different `π` can be returned. This requires only two lines to be modified. Here are some test cases:

In [25]:
assert parse(G0, "a+a+a") == {'E(a+E(a+E(a)))'}

In [26]:
assert parse(G1, "a+a+a") == {'E(E(E(a)+E(a))+E(a))', 'E(E(a)+E(E(a)+E(a)))'}

In [27]:
G3 = ("S→E", "E→T", "E→E+T", "T→F", "T→T×F", "F→a")

In [28]:
assert parse(G3, "a+a×a") == {'E(E(T(F(a)))+T(T(F(a))×F(a)))'}

Recall the grammar from Chapter Language and Syntax:
```EBNF
S → NP VP
PP → P NP
NP → D N | D N PP | 'I'
VP → V NP | VP PP
D → 'an' | 'my'
N → 'elephant' | 'pajamas'
V → 'shot'
P → 'in'
```

Since the symbols in the parser are only single characters, we use `𝒮`, `𝒫`, `𝒩`, `𝒱` for `S` (sentence), `PP` (prepositional phrase), `NP` (noun phrase), `VP` (verb phrase). As the grammar has characters as terminals, extra space characters are added at the end of words:

In [29]:
G4 = ("𝒮→𝒩𝒱", "𝒫→P𝒩", "𝒩→DN", "𝒩→DN𝒫", "𝒩→I ", "𝒱→V𝒩", "𝒱→𝒱𝒫", \
      "D→an ", "D→my ", "N→elephant ", "N→pajamas ", "V→shot ", "P→in ")

In [30]:
assert parse(G4, "I shot an elephant in my pajamas ") == \
    {'𝒩(I )𝒱(V(shot )𝒩(D(an )N(elephant )𝒫(P(in )𝒩(D(my )N(pajamas )))))',
     '𝒩(I )𝒱(𝒱(V(shot )𝒩(D(an )N(elephant )))𝒫(P(in )𝒩(D(my )N(pajamas ))))'}