# Chapter 8

In [1]:
import nltk, re

In [2]:
# Grammar for the ambiguous sentence:
# "I shot an elephant in my pajamas."
groucho_grammar = nltk.CFG.fromstring("""
S -> NP VP
PP -> P NP
NP -> Det N | Det N PP | 'I'
VP -> V NP | VP PP
Det -> 'an' | 'my'
N -> 'elephant' | 'pajamas'
V -> 'shot'
P -> 'in'
""")

In [3]:
# Produces two trees
# One meaning "I" shot an elephant while "I" was wearing pajamas
# Two meaning "I" show an elephant while the elephant was wearing "I"'s pajamas
sent = ['I', 'shot', 'an', 'elephant', 'in', 'my', 'pajamas']
parser = nltk.ChartParser(groucho_grammar)
for tree in parser.parse(sent):
    print(tree)

(S
  (NP I)
  (VP
    (VP (V shot) (NP (Det an) (N elephant)))
    (PP (P in) (NP (Det my) (N pajamas)))))
(S
  (NP I)
  (VP
    (V shot)
    (NP (Det an) (N elephant) (PP (P in) (NP (Det my) (N pajamas))))))


In [8]:
for x in parser.parse(sent):
    x.draw()

### Beyond N-Grams, Grammar

Can combine 2 Noun Phrases to create a Noun Phrase. 
Same with Adjective Phrases.
Cannot combine a NP with an AP
examples)
* The book's ending was (NP the worst part and the best part) for me.
* On land they are (AP slow and clumsy looking).

* ~~the worst part and clumsy looking~~

### Constituent Structure
Constituent Structure allows you to substitute smaller words for larger words and phrases


### How a sentence can be Reduced

The little bear saw the fine fat trout in the brook.


<img src="data/ic_diagram_labeled.png" height=500 width=800>

<table>
<tbody valign="top">
<tr><td>S</td>
<td>sentence</td>
<td><span class="example">the man walked</span></td>
</tr>
<tr><td>NP</td>
<td>noun phrase</td>
<td><span class="example">a dog</span></td>
</tr>
<tr><td>VP</td>
<td>verb phrase</td>
<td><span class="example">saw a park</span></td>
</tr>
<tr><td>PP</td>
<td>prepositional phrase</td>
<td><span class="example">with a telescope</span></td>
</tr>
<tr><td>Det</td>
<td>determiner</td>
<td><span class="example">the</span></td>
</tr>
<tr><td>N</td>
<td>noun</td>
<td><span class="example">dog</span></td>
</tr>
<tr><td>V</td>
<td>verb</td>
<td><span class="example">walked</span></td>
</tr>
<tr><td>P</td>
<td>preposition</td>
<td><span class="example">in</span></td>
</tr>
</tbody>
<p class="caption"><span class="caption-label">Table 3.1</span>: <p>Syntactic Categories</p>
</p>
</table>

In [8]:
grammar1 = nltk.CFG.fromstring("""
  S -> NP VP
  VP -> V NP | V NP PP
  PP -> P NP
  V -> "saw" | "ate" | "walked"
  NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
  Det -> "a" | "an" | "the" | "my"
  N -> "man" | "dog" | "cat" | "telescope" | "park"
  P -> "in" | "on" | "by" | "with"
  """)
sent = "Mary saw Bob".split()
rd_parser = nltk.RecursiveDescentParser(grammar1)
for tree in rd_parser.parse(sent):
    print(tree)

(S (NP Mary) (VP (V saw) (NP Bob)))


## 3 Parsers

### Recursive Descent
* Top Down Parser
* Constructs 'S' first -> NP -> DET | DET N
* Cannot parse grammar in form of X -> X Y
* Can construct multiple trees if sentence is structurally ambiguous

### Shift-Reduce
* Bottom up parser starts from bottom right of parse tree
* Adds remaining text to left of previously parsed text
* (NLTK) ShiftReduceParser() does not backtrack and may not find a parse on a structurally correct sentence

### Left-Corner 
* Top-down bottom up hybrid
* Does not get stuck in infinite loop with X -> X Y type grammar
* Builds a table holding a non-terminal part of grammar and then a list of possible left corners of the non-terminal
* Before adding to tree, checks that the product is compatible with one left corners

In [21]:
nltk.data.clear_cache()
grammar2 = nltk.data.load("file:data/mygrammar.cfg")
print(grammar2)

Grammar with 25 productions (start state = S)
    S -> NP VP
    VP -> V NP
    VP -> V NP PP
    PP -> P NP
    V -> 'saw'
    V -> 'ate'
    V -> 'walked'
    NP -> 'John'
    NP -> 'Mary'
    NP -> 'Bob'
    NP -> Det N
    NP -> Det N PP
    Det -> 'a'
    Det -> 'an'
    Det -> 'the'
    Det -> 'my'
    N -> 'man'
    N -> 'dog'
    N -> 'cat'
    N -> 'telescope'
    N -> 'park'
    P -> 'in'
    P -> 'on'
    P -> 'by'
    P -> 'with'


In [22]:
sent = "Mary saw Bob".split()
rd_parser = nltk.RecursiveDescentParser(grammar2)
for tree in rd_parser.parse(sent):
    print(tree)

(S (NP Mary) (VP (V saw) (NP Bob)))


In [23]:
nltk.LeftCornerChartParser(grammar2)
for tree in rd_parser.parse(sent):
    print(tree)

(S (NP Mary) (VP (V saw) (NP Bob)))


In [24]:
text = ['I', 'shot', 'an', 'elephant', 'in', 'my', 'pajamas']
groucho_grammar.productions(rhs=text[1])

[V -> 'shot']