# 4: Feature grammars

In this notebook, we focus on feature grammars, which provides more accuracy when it comes to noun-verb agreement and other grammatical rules. More specifically, we are going to add more rules to the existing grammar and make them more powerful in parsing sentences.

First we add prepositional phrases and coordination to the grammar below:

```
S -> NP[NUM=?n, CASE=sub] VP[NUM=?n]
NP[NUM=?n, CASE=?m] -> PRP[NUM=?n, CASE=?m]
NP[NUM=?n] -> DT[NUM=?n] NN[NUM=?n]
VP[NUM=?n, VAL=itv] -> VBZ[NUM=?n, VAL=itv]
VP[NUM=?n, VAL=tv] -> VBZ[NUM=?n, VAL=tv] NP[CASE=obj]
PRP[NUM=sg, CASE=sub] -> "she"
PRP[NUM=pl, CASE=sub] -> "we"
PRP[num=sg, CASE=obj] -> "her"
PRP[num=sg, CASE=obj] -> "us"
VBZ[NUM=sg, VAL=itv] -> "sleeps"
VBZ[NUM=pl, VAL=itv] -> "sleep"
VBZ[NUM=sg, VAL=tv] -> "sees"
VBZ[NUM=pl, VAL=tv] -> "see"
DT[NUM=sg] -> "this"
DT[NUM=pl] -> "these"
DT -> "a" | "the"
NN[NUM=sg] -> "dog"
NN[NUM=pl] -> "dogs"
```

## 1. Prepositional phrases

A prepositional phrase (PP) has the general form `(PP (IN preposition) (NP noun_phrase))`, for example `(PP (IN with) (NP (NN binoculars)))` 

First, add lexical entries for the preposition *with* and the plural noun *binoculars*. We need to make sure that the object of the prepositional phrase (i.e. *binoculars* in *with binoculars*) has objective case. Then optionally add a PP phrase to each of the NP and VP rules. 

Make sure that the grammar can parse *she sees us with these binoculars* and *she sees the dog with us* but returns no parses for the sentence *she sees the dog with we*. 

> Notice: we get two parses for the first and second sentence.

In [1]:
from nltk.grammar import FeatureGrammar
from nltk.parse import FeatureEarleyChartParser

grammar = FeatureGrammar.fromstring('''S -> NP[NUM=?n, CASE=sub] VP[NUM=?n]
NP[NUM=?n, CASE=?m] -> PRP[NUM=?n, CASE=?m]
NP[NUM=?n] -> DT[NUM=?n] NN[NUM=?n]
NP[NUM=?n, CASE=?m] -> NP[NUM=?n, CASE=?m] PP
VP[NUM=?n, VAL=itv] -> VBZ[NUM=?n, VAL=itv]
VP[NUM=?n, VAL=tv] -> VBZ[NUM=?n, VAL=tv] NP[CASE=obj]
VP[NUM=?n, VAL=?m] -> VP[NUM=?n, VAL=?m] PP
PP -> IN NP[CASE=obj]
PRP[NUM=sg, CASE=sub] -> "she"
PRP[NUM=pl, CASE=sub] -> "we"
PRP[num=sg, CASE=obj] -> "her"
PRP[num=sg, CASE=obj] -> "us"
VBZ[NUM=sg, VAL=itv] -> "sleeps"
VBZ[NUM=pl, VAL=itv] -> "sleep"
VBZ[NUM=sg, VAL=tv] -> "sees"
VBZ[NUM=pl, VAL=tv] -> "see"
DT[NUM=sg] -> "this"
DT[NUM=pl] -> "these"
DT -> "a" | "the"
NN[NUM=sg] -> "dog"
NN[NUM=pl] -> "dogs" | "binoculars"
IN -> "with"
''')

from nltk.parse import FeatureEarleyChartParser

nltk_parser = FeatureEarleyChartParser(grammar)

sents = ["she sees us with these binoculars", "she sees the dog with us", "she sees the dog with we"]
sents = [s.split(" ") for s in sents]
         
for sent in sents:
    print("SENT:",sent)
    parses = nltk_parser.parse(sent)
    for parse in parses:
        print(parse)

SENT: ['she', 'sees', 'us', 'with', 'these', 'binoculars']
(S[]
  (NP[CASE='sub', NUM='sg'] (PRP[CASE='sub', NUM='sg'] she))
  (VP[NUM='sg', VAL='tv']
    (VP[NUM='sg', VAL='tv']
      (VBZ[NUM='sg', VAL='tv'] sees)
      (NP[CASE='obj', NUM=?n] (PRP[CASE='obj', num='sg'] us)))
    (PP[]
      (IN[] with)
      (NP[NUM='pl'] (DT[NUM='pl'] these) (NN[NUM='pl'] binoculars)))))
(S[]
  (NP[CASE='sub', NUM='sg'] (PRP[CASE='sub', NUM='sg'] she))
  (VP[NUM='sg', VAL='tv']
    (VBZ[NUM='sg', VAL='tv'] sees)
    (NP[CASE='obj', NUM=?n]
      (NP[CASE='obj', NUM=?n] (PRP[CASE='obj', num='sg'] us))
      (PP[]
        (IN[] with)
        (NP[NUM='pl'] (DT[NUM='pl'] these) (NN[NUM='pl'] binoculars))))))
SENT: ['she', 'sees', 'the', 'dog', 'with', 'us']
(S[]
  (NP[CASE='sub', NUM='sg'] (PRP[CASE='sub', NUM='sg'] she))
  (VP[NUM='sg', VAL='tv']
    (VP[NUM='sg', VAL='tv']
      (VBZ[NUM='sg', VAL='tv'] sees)
      (NP[NUM='sg'] (DT[] the) (NN[NUM='sg'] dog)))
    (PP[]
      (IN[] with)
      (NP[CA

## 2. Noun phrase coordination

Next we add the conjunction (CC) *and* to the grammar and add a rule which allows for coordinated NPs like `(NP (NP (PRP we)) (CC and) (NP (DT the) (NN dogs)))`. 

Make sure to set the number `NUM` of the coordinated NP correctly. It should always be plural regardless of the numbers of the individual NP phrases.

Make sure that you only allow coordination of NPs which share the value of the `CASE` feature.

Now our grammar should parse *she and the dog see us* but should not parse *she and the dog sees* us or *her and the dog see us*

In [8]:

grammar = FeatureGrammar.fromstring('''S -> NP[NUM=?n, CASE=sub] VP[NUM=?n]
NP[NUM=?n, CASE=?m] -> PRP[NUM=?n, CASE=?m]
NP[NUM=?n] -> DT[NUM=?n] NN[NUM=?n]
NP[NUM=?n, CASE=?m] -> NP[NUM=?n, CASE=?m] PP
NP[NUM=pl, CASE=?n] -> NP[CASE=?n] CC NP[CASE=?n]
VP[NUM=?n, VAL=itv] -> VBZ[NUM=?n, VAL=itv]
VP[NUM=?n, VAL=tv] -> VBZ[NUM=?n, VAL=tv] NP[CASE=obj]
VP[NUM=?n, VAL=?m] -> VP[NUM=?n, VAL=?m] PP
PP -> IN NP[CASE=obj]
PRP[NUM=sg, CASE=sub] -> "she"
PRP[NUM=pl, CASE=sub] -> "we"
PRP[num=sg, CASE=obj] -> "her"
PRP[num=sg, CASE=obj] -> "us"
VBZ[NUM=sg, VAL=itv] -> "sleeps"
VBZ[NUM=pl, VAL=itv] -> "sleep"
VBZ[NUM=sg, VAL=tv] -> "sees"
VBZ[NUM=pl, VAL=tv] -> "see"
DT[NUM=sg] -> "this"
DT[NUM=pl] -> "these"
DT -> "a" | "the"
NN[NUM=sg] -> "dog"
NN[NUM=pl] -> "dogs" | "binoculars"
IN -> "with"
CC -> "and"
''')

from nltk.parse import FeatureEarleyChartParser

nltk_parser = FeatureEarleyChartParser(grammar)

sents = ["she and the dog see us", "she and the dog sees", "her and the dog see us"]
sents = [s.split(" ") for s in sents]
         
for sent in sents:
    print("SENT:",sent)
    parses = nltk_parser.parse(sent)
    for parse in parses:
        print(parse)

SENT: ['she', 'and', 'the', 'dog', 'see', 'us']
(S[]
  (NP[CASE='sub', NUM='pl']
    (NP[CASE='sub', NUM='sg'] (PRP[CASE='sub', NUM='sg'] she))
    (CC[] and)
    (NP[NUM='sg'] (DT[] the) (NN[NUM='sg'] dog)))
  (VP[NUM='pl', VAL='tv']
    (VBZ[NUM='pl', VAL='tv'] see)
    (NP[CASE='obj', NUM=?n] (PRP[CASE='obj', num='sg'] us))))
SENT: ['she', 'and', 'the', 'dog', 'sees']
SENT: ['her', 'and', 'the', 'dog', 'see', 'us']


## 3. Verb phrase coordination

Next we add rules which allow for coordinated VPs like 
```
(VP (VP (VBZ sees) (NP (PRP us))) (CC and) (VP (VBZ sleeps)))
```
The VPs need to share their number feature. 

This time the grammar should parse *she sees us and sleeps* and *she sees us and sees the dog*  but should not parse *she sees us and sleep* or *she sleep and sees us*

In [2]:
# your code here

grammar = FeatureGrammar.fromstring('''S -> NP[NUM=?n, CASE=sub] VP[NUM=?n]
NP[NUM=?n, CASE=?m] -> PRP[NUM=?n, CASE=?m]
NP[NUM=?n] -> DT[NUM=?n] NN[NUM=?n]
NP[NUM=?n, CASE=?m] -> NP[NUM=?n, CASE=?m] PP
NP[NUM=pl, CASE=?n] -> NP[CASE=?n] CC NP[CASE=?n]
VP[NUM=?n, VAL=itv] -> VBZ[NUM=?n, VAL=itv]
VP[NUM=?n, VAL=tv] -> VBZ[NUM=?n, VAL=tv] NP[CASE=obj]
VP[NUM=?n, VAL=?m] -> VP[NUM=?n, VAL=?m] PP
VP[NUM=?n] -> VP[NUM=?n] CC VP[NUM=?n]
PP -> IN NP[CASE=obj]
PRP[NUM=sg, CASE=sub] -> "she"
PRP[NUM=pl, CASE=sub] -> "we"
PRP[num=sg, CASE=obj] -> "her"
PRP[num=sg, CASE=obj] -> "us"
VBZ[NUM=sg, VAL=itv] -> "sleeps"
VBZ[NUM=pl, VAL=itv] -> "sleep"
VBZ[NUM=sg, VAL=tv] -> "sees"
VBZ[NUM=pl, VAL=tv] -> "see"
DT[NUM=sg] -> "this"
DT[NUM=pl] -> "these"
DT -> "a" | "the"
NN[NUM=sg] -> "dog"
NN[NUM=pl] -> "dogs" | "binoculars"
IN -> "with"
CC -> "and"
''')

from nltk.parse import FeatureEarleyChartParser

nltk_parser = FeatureEarleyChartParser(grammar)

sents = ["she sees us and sleeps", "she sees us and sees the dog", "she sees us and sleep", "she sleep and sees us"]
sents = [s.split(" ") for s in sents]
         
for sent in sents:
    print("SENT:",sent)
    parses = nltk_parser.parse(sent)
    for parse in parses:
        print(parse)

SENT: ['she', 'sees', 'us', 'and', 'sleeps']
(S[]
  (NP[CASE='sub', NUM='sg'] (PRP[CASE='sub', NUM='sg'] she))
  (VP[NUM='sg']
    (VP[NUM='sg', VAL='tv']
      (VBZ[NUM='sg', VAL='tv'] sees)
      (NP[CASE='obj', NUM=?n] (PRP[CASE='obj', num='sg'] us)))
    (CC[] and)
    (VP[NUM='sg', VAL='itv'] (VBZ[NUM='sg', VAL='itv'] sleeps))))
SENT: ['she', 'sees', 'us', 'and', 'sees', 'the', 'dog']
(S[]
  (NP[CASE='sub', NUM='sg'] (PRP[CASE='sub', NUM='sg'] she))
  (VP[NUM='sg']
    (VP[NUM='sg', VAL='tv']
      (VBZ[NUM='sg', VAL='tv'] sees)
      (NP[CASE='obj', NUM=?n] (PRP[CASE='obj', num='sg'] us)))
    (CC[] and)
    (VP[NUM='sg', VAL='tv']
      (VBZ[NUM='sg', VAL='tv'] sees)
      (NP[NUM='sg'] (DT[] the) (NN[NUM='sg'] dog)))))
SENT: ['she', 'sees', 'us', 'and', 'sleep']
SENT: ['she', 'sleep', 'and', 'sees', 'us']


## 4. Add ditransitive verbs

Next we add a rule for ditransitive verbs. These are verbs like "give" which take two object, a direct and indirect one. The phrase structure for a ditransitive verb would look like:
```
(VP (VBZ gives) (NP (DT the) (NNS dogs)) (NP (DT these) (NNS binoculars)))
```

Add a new feture value `dtv` for the feature `VAL`. Make sure that a transitive verb always takes two objects. We should also ensure that both of the objects take objective case using features.

Finally, add lexical rules for the ditransitive verbs `"give"` and `"gives"`.

The grammar should parse *she gives the dogs these binoculars* and *we give the dogs these binoculars* but should not parse *we give the dogs*.

In [3]:
# your code here

grammar = FeatureGrammar.fromstring('''S -> NP[NUM=?n, CASE=sub] VP[NUM=?n]
NP[NUM=?n, CASE=?m] -> PRP[NUM=?n, CASE=?m]
NP[NUM=?n] -> DT[NUM=?n] NN[NUM=?n]
NP[NUM=?n, CASE=?m] -> NP[NUM=?n, CASE=?m] PP
NP[NUM=pl, CASE=?n] -> NP[CASE=?n] CC NP[CASE=?n]
VP[NUM=?n, VAL=itv] -> VBZ[NUM=?n, VAL=itv]
VP[NUM=?n, VAL=tv] -> VBZ[NUM=?n, VAL=tv] NP[CASE=obj]
VP[NUM=?n, VAL=dtv] -> VBZ[NUM=?n, VAL=dtv] NP[CASE=obj] NP[CASE=obj]
VP[NUM=?n, VAL=?m] -> VP[NUM=?n, VAL=?m] PP
VP[NUM=?n] -> VP[NUM=?n] CC VP[NUM=?n]
PP -> IN NP[CASE=obj]
PRP[NUM=sg, CASE=sub] -> "she"
PRP[NUM=pl, CASE=sub] -> "we"
PRP[num=sg, CASE=obj] -> "her"
PRP[num=sg, CASE=obj] -> "us"
VBZ[NUM=sg, VAL=itv] -> "sleeps"
VBZ[NUM=pl, VAL=itv] -> "sleep"
VBZ[NUM=sg, VAL=tv] -> "sees"
VBZ[NUM=pl, VAL=tv] -> "see"
VBZ[NUM=sg, VAL=dtv] -> "gives"
VBZ[NUM=pl, VAL=dtv] -> "give"
DT[NUM=sg] -> "this"
DT[NUM=pl] -> "these"
DT -> "a" | "the"
NN[NUM=sg] -> "dog"
NN[NUM=pl] -> "dogs" | "binoculars"
IN -> "with"
CC -> "and"
''')

from nltk.parse import FeatureEarleyChartParser

nltk_parser = FeatureEarleyChartParser(grammar)

sents = ["she gives the dogs these binoculars", "we give the dogs these binoculars", "we give the dogs"]
sents = [s.split(" ") for s in sents]
         
for sent in sents:
    print("SENT:",sent)
    parses = nltk_parser.parse(sent)
    for parse in parses:
        print(parse)

SENT: ['she', 'gives', 'the', 'dogs', 'these', 'binoculars']
(S[]
  (NP[CASE='sub', NUM='sg'] (PRP[CASE='sub', NUM='sg'] she))
  (VP[NUM='sg', VAL='dtv']
    (VBZ[NUM='sg', VAL='dtv'] gives)
    (NP[NUM='pl'] (DT[] the) (NN[NUM='pl'] dogs))
    (NP[NUM='pl'] (DT[NUM='pl'] these) (NN[NUM='pl'] binoculars))))
SENT: ['we', 'give', 'the', 'dogs', 'these', 'binoculars']
(S[]
  (NP[CASE='sub', NUM='pl'] (PRP[CASE='sub', NUM='pl'] we))
  (VP[NUM='pl', VAL='dtv']
    (VBZ[NUM='pl', VAL='dtv'] give)
    (NP[NUM='pl'] (DT[] the) (NN[NUM='pl'] dogs))
    (NP[NUM='pl'] (DT[NUM='pl'] these) (NN[NUM='pl'] binoculars))))
SENT: ['we', 'give', 'the', 'dogs']


## 4. Final check

Finally, check that the final grammar can parse all of our positive examples but none of the negative ones.

In [4]:
positive_examples = ["she sees us with these binoculars", 
                     "she sees the dog with us", 
                     "she and the dog see us", 
                     "she sees us and sleeps", 
                     "she sees us and sees the dog",
                     "she gives the dogs these binoculars", 
                     "we give the dogs these binoculars"]
negative_examples = ["she sees the dog with we",
                     "her and the dog see us",
                     "she and the dog sees", 
                     "she sees us and sleep", 
                     "she sleep and sees us",
                     "we give the dogs"]
# your code here
positive_sents = [s.split(" ") for s in positive_examples]
negative_sents = [s.split(" ") for s in negative_examples]

for sent in positive_sents:
    print("SENT:",sent)
    parses = nltk_parser.parse(sent)
    for parse in parses:
        print(parse)

print()

for sent in negative_sents:
    print("SENT:",sent)
    parses = nltk_parser.parse(sent)
    for parse in parses:
        print(parse)

SENT: ['she', 'sees', 'us', 'with', 'these', 'binoculars']
(S[]
  (NP[CASE='sub', NUM='sg'] (PRP[CASE='sub', NUM='sg'] she))
  (VP[NUM='sg', VAL='tv']
    (VP[NUM='sg', VAL='tv']
      (VBZ[NUM='sg', VAL='tv'] sees)
      (NP[CASE='obj', NUM=?n] (PRP[CASE='obj', num='sg'] us)))
    (PP[]
      (IN[] with)
      (NP[NUM='pl'] (DT[NUM='pl'] these) (NN[NUM='pl'] binoculars)))))
(S[]
  (NP[CASE='sub', NUM='sg'] (PRP[CASE='sub', NUM='sg'] she))
  (VP[NUM='sg', VAL='tv']
    (VBZ[NUM='sg', VAL='tv'] sees)
    (NP[CASE='obj', NUM=?n]
      (NP[CASE='obj', NUM=?n] (PRP[CASE='obj', num='sg'] us))
      (PP[]
        (IN[] with)
        (NP[NUM='pl'] (DT[NUM='pl'] these) (NN[NUM='pl'] binoculars))))))
SENT: ['she', 'sees', 'the', 'dog', 'with', 'us']
(S[]
  (NP[CASE='sub', NUM='sg'] (PRP[CASE='sub', NUM='sg'] she))
  (VP[NUM='sg', VAL='tv']
    (VP[NUM='sg', VAL='tv']
      (VBZ[NUM='sg', VAL='tv'] sees)
      (NP[NUM='sg'] (DT[] the) (NN[NUM='sg'] dog)))
    (PP[]
      (IN[] with)
      (NP[CA