#### NLTK

This question uses the Python Natural Language Toolkit. If needed, install by `pip3 install nltk` or `pip install nltk`. You may need to use `python3 -m pip install nltk` if you have multiple versions of Python. The following example is from the [NLTK Book](https://www.nltk.org/book/ch08.html). It shows the ambiguity of the sentence:

    I shot an elephant in my pajamas

This is from the Groucho Marx movie, _Animal Crackers_ (1930): "While hunting in Africa, I shot an elephant in my pajamas. How he got into my pajamas, I don't know." First, a grammar is defined that is sufficient to show the ambiguity and a parser for that grammar is created:

In [1]:
import nltk
groucho_grammar = nltk.CFG.fromstring("""
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP | 'I' | 
VP -> V NP | VP PP
Det -> 'an' | 'my'
N -> 'elephant' | 'pajamas'
V -> 'shot'
P -> 'in'
""")
parser = nltk.ChartParser(groucho_grammar)

The NLTK `ChartParser` takes an arbitrary context-free grammar and produces a parser for that grammar. Calling that parser generates all parse trees:

In [2]:
trees = list(parser.parse(['I', 'shot', 'an', 'elephant', 'in', 'my', 'pajamas']))
for t in trees: print(t)

(S
  (NP I)
  (VP
    (VP (V shot) (NP (Det an) (N elephant)))
    (PP (P in) (NP (Det my) (N pajamas)))))
(S
  (NP I)
  (VP
    (V shot)
    (NP (Det an) (N elephant) (PP (P in) (NP (Det my) (N pajamas))))))


The output shows that there are two parse trees printed with indentation. They can also be graphically visualized (depending on the font, the output can be misaligned):

In [3]:
# trees[0] # draws graphically inline; works only locally, not on JupyterHub
# trees[0].draw() # draws graphically in separate windows, works only locally, not on JupyterHub
trees[0].pretty_print() # draws textually, can sometimes be confusing, needs monospaced font
# trees[0].pprint() # prints textually, same as print(...)

     S                                       
  ___|______________                          
 |                  VP                       
 |         _________|__________               
 |        VP                   PP            
 |    ____|___              ___|___           
 |   |        NP           |       NP        
 |   |     ___|_____       |    ___|_____     
 NP  V   Det        N      P  Det        N   
 |   |    |         |      |   |         |    
 I  shot  an     elephant  in  my     pajamas



In [4]:
trees[1].pretty_print()

     S                                   
  ___|__________                          
 |              VP                       
 |    __________|______                   
 |   |                 NP                
 |   |     ____________|___               
 |   |    |     |          PP            
 |   |    |     |       ___|___           
 |   |    |     |      |       NP        
 |   |    |     |      |    ___|_____     
 NP  V   Det    N      P  Det        N   
 |   |    |     |      |   |         |    
 I  shot  an elephant  in  my     pajamas



#### Part 1

Let `G = (T, N, P, S)` where `T = {a, b}`, `N = {S}`, and productions `P` are:

    S → ε
    S → aSbS
    S → bSaS

Draw all parse trees for the sentence `abab` with NLTK!

In [82]:
import nltk
groucho_grammar = nltk.CFG.fromstring("""
S -> A S B S | B S A S |  
A -> 'a'
B -> 'b'
""")
parser = nltk.ChartParser(groucho_grammar)

In [83]:
trees = list(parser.parse(['a', 'b', 'a', 'b']))
for t in trees: print(t)

(S (A a) (S ) (B b) (S (A a) (S ) (B b) (S )))
(S (A a) (S (B b) (S ) (A a) (S )) (B b) (S ))


In [84]:
trees[0].pretty_print()
trees[1].pretty_print()

             S             
  ___________|___           
 |   |   |       S         
 |   |   |    ___|_______   
 A   S   B   A   S   B   S 
 |   |   |   |   |   |   |  
 a  ...  b   a  ...  b  ...

             S             
  ___________|___________   
 |       S           |   | 
 |    ___|_______    |   |  
 A   B   S   A   S   B   S 
 |   |   |   |   |   |   |  
 a   b  ...  a  ...  b  ...



#### Part 2

Draw the parse tree of `id × (id + id)` in grammar `G₈` using NLTK!

In [85]:
import nltk
groucho_grammar = nltk.CFG.fromstring("""
E -> T | E '+' T
T -> F | T 'x' F
F -> 'id' | '(' E ')'
""")
parser = nltk.ChartParser(groucho_grammar)

In [86]:
trees = list(parser.parse(['id', 'x', '(', 'id', '+', 'id', ')']))
for t in trees: print(t)

(E (T (T (F id)) x (F ( (E (E (T (F id))) + (T (F id))) ))))


In [87]:
trees[0].pretty_print()

             E             
             |              
             T             
  ___________|___           
 |   |           F         
 |   |    _______|___       
 |   |   |   |       E     
 |   |   |   |    ___|___   
 |   |   |   |   |   E   | 
 |   |   |   |   |   |   |  
 |   T   |   |   |   T   T 
 |   |   |   |   |   |   |  
 |   F   |   |   |   F   F 
 |   |   |   |   |   |   |  
 x   id  (   )   +   id  id

