### Grammar for #a ≥ #b (NLTK)

As a reminder, grammar `G = (T, N, P, S)` with `T = {a, b}`, `N = {S}`, and productions `P`

    S → ε
    S → aSbS
    S → bSaS

is written in NLTK with productions for the same nonterminal on one line with `|` or separate lines; `ε` is simply left out:

In [1]:
import nltk
grammar = nltk.CFG.fromstring("""
S -> 'a' S 'b' S | 'b' S 'a' S | 
""")
parser = nltk.ChartParser(grammar)
trees = list(parser.parse(['a', 'b', 'a', 'b']))
for t in trees: print(t)

(S a (S b (S ) a (S )) b (S ))
(S a (S ) b (S a (S ) b (S )))


The start symbol is the left-hand side of the first production. The function `parser.parse(sent)` returns a generator for parse trees, which above is used to produce a list of parse trees. If `sent` cannot be parsed, the list is empty. The parameter `['a', 'b', 'a', 'b']` can be abbreviated to `'abab'`:

In [2]:
assert list(parser.parse('abab')) != []
assert list(parser.parse('aba')) == []

---

Using NLTK, write a grammar for the language over `a` and `b` where there as least as many `a`'s as `b`'s, formally `{w ∈ Σ* | a#w ≥ b#w}` where `Σ = {a, b}`!

In [15]:
import nltk
grammar = nltk.CFG.fromstring("""
S -> 'b' S 'a' | 'a' S 'b' | 'a' S | S 'b' 'a' | S 'a' 'b' | 'b' 'a' S | 'a' 'b' S | 
""")

# S -> bSa | aSb | aS | Sba | Sab | baS | abS
parser = nltk.ChartParser(grammar)

You can insert cells to see the trees NLTK generates; the grammar can be ambiguous. Here are some tests:

In [16]:
assert list(parser.parse('')) != []
assert list(parser.parse('a')) != []
assert list(parser.parse('b')) == []
assert list(parser.parse('aa')) != []
assert list(parser.parse('ab')) != []
assert list(parser.parse('ba')) != []
assert list(parser.parse('bb')) == []

In [17]:
assert list(parser.parse('aaa')) != []
assert list(parser.parse('aab')) != []
assert list(parser.parse('aba')) != []
assert list(parser.parse('baa')) != []
assert list(parser.parse('bba')) == []
assert list(parser.parse('bab')) == []
assert list(parser.parse('abb')) == []

In [18]:
assert list(parser.parse('abab')) != []
assert list(parser.parse('baab')) != []
assert list(parser.parse('abba')) != []
assert list(parser.parse('aaba')) != []
assert list(parser.parse('bbba')) == []