# 9.3 - Extending a Feature-Based Grammar

## Subcategorization

Earlier, we used category labels to represent different types of verbs, in particular `IV` for intransitive verbs and `TV` for transitive verbs. This allowed us to write the following:

```
VP -> IV
VP -> TV NP
```
Although `IV` and `TV` are two different kinds of `V`, they are just atomic, non-terminal symbols in a CFG and are distinct from each other as symbols. This notation, however, does not let us say anything about verbs in general. e.g. we cannot say "All lexical items of category `V` can be marked for tense." 

For example we cannot assign a `tense` feature in `walk` because it's an `IV` not a `V`. 

A simple approach called **Generalized Phrase Structure Grammar** tries to solve this problem by having lexical categories have a `SUBCAT` feture, which tells what subcategory it belongs to. The following example uses a more mnemonic values, namely `intrans`, `trans` and `clause`:

In [1]:
print("VP[TENSE=?t, NUM=?n] -> V[SUBCAT=intrans, TENSE=?t, NUM=?n]")
print("VP[TENSE=?t, NUM=?n] -> V[SUBCAT=trans, TENSE=?t, NUM=?n] NP")
print("VP[TENSE=?t, NUM=?n] -> V[SUBCAT=clause, TENSE=?t, NUM=?n] SBar")
print()
print("V[SUBCAT=intrans, TENSE=pres, NUM=sg] -> 'disappears' | 'walks'")
print("V[SUBCAT=trans, TENSE=pres, NUM=sg] -> 'sees' | 'likes'")
print("V[SUBCAT=clause, TENSE=pres, NUM=sg] -> 'says' | 'claims'")
print()
print("V[SUBCAT=intrans, TENSE=pres, NUM=pl] -> 'disappear' | 'walk'")
print("V[SUBCAT=trans, TENSE=pres, NUM=pl] -> 'see' | 'like'")
print("V[SUBCAT=clause, TENSE=pres, NUM=pl] -> 'say' | 'claim'")
print()
print("V[SUBCAT=intrans, TENSE=past] -> 'disappeared' | 'walked'")
print("V[SUBCAT=trans, TENSE=past] -> 'saw' | 'liked'")
print("V[SUBCAT=clause, TENSE=past] -> 'said' | 'claimed'")

VP[TENSE=?t, NUM=?n] -> V[SUBCAT=intrans, TENSE=?t, NUM=?n]
VP[TENSE=?t, NUM=?n] -> V[SUBCAT=trans, TENSE=?t, NUM=?n] NP
VP[TENSE=?t, NUM=?n] -> V[SUBCAT=clause, TENSE=?t, NUM=?n] SBar

V[SUBCAT=intrans, TENSE=pres, NUM=sg] -> 'disappears' | 'walks'
V[SUBCAT=trans, TENSE=pres, NUM=sg] -> 'sees' | 'likes'
V[SUBCAT=clause, TENSE=pres, NUM=sg] -> 'says' | 'claims'

V[SUBCAT=intrans, TENSE=pres, NUM=pl] -> 'disappear' | 'walk'
V[SUBCAT=trans, TENSE=pres, NUM=pl] -> 'see' | 'like'
V[SUBCAT=clause, TENSE=pres, NUM=pl] -> 'say' | 'claim'

V[SUBCAT=intrans, TENSE=past] -> 'disappeared' | 'walked'
V[SUBCAT=trans, TENSE=past] -> 'saw' | 'liked'
V[SUBCAT=clause, TENSE=past] -> 'said' | 'claimed'


When we see a lexical category like `V[SUBCAT=trans]`, we can interpret the `SUBCAT` specification as a pointer to a production in which `V[SUBCAT=trans]` is introduced as the head child in a `VP` production. 

## Heads Revisited

X-bar syntax abstracts out the notion of **phrasal level**. It is usual to recognize three such levels. If `N` represents the lexical level, then `N'` represents the next level up, corresponding to the more traditional category `Nom` and `N''` represents the phrasal level, corresponding to the category `NP`. 

## Auxiliary Verbs and Inversion

Inverted clauses, where the order of the subject and verb is switched, occur in English interrogatives and also after "negative" adverbs:

* Do you like children?
* Can Jody walk?


* Rarely do you see Kim
* Never have I seen this dog

However we cannot place just any verb in pre-subject position:
* Like you children
* Walks Jody?


* Rarely see you Kim
* Never saw I this dog

## Unbounded Dependency Constructions

Consider the following contrasts:

* You like Jody
* You like.


* You put the card into the slot.
* You put into the slot.
* You put the card.
* You put.

The verb `like` requires an `NP` complement, while `put` requires both a following `NP` and `PP`. Omitting them results in ungrammaticality. Yet there are contexts in which obligatory complements can be omitted like:

* Kim knows who you like.
* This music, you really like.


* Which card do you put into the slot?
* Which slot do you put the card into?

That is, an obligatory complement can be omitted if there is an appropriate **filler** in the sentence, such as the question word *who* in `Kim knows who you like.`, the preposed topic `this music` in `This music, you really like.` or the `wh` phrases in `which card/slot`. It is common to say that sentences like those contain **gaps** where the obligatory complements have been omitted, and these gaps are sometimes made explicit using an underscore:

1. Which card do you put __ into the slot?
2. Which slot do you put the card into __?

So a gap can occur if it is licensed by a filler. Converse fillers can only occur only if there is an appropriate gap elsewhere in the sentence, as shown by the following examples:

1. Kim knows who you like Jody.
2. This music, you really like hip-hop
3. Which card do you put this into the slot?
4. Which slot do you put the card into this one?

The co-occurrence between filler and gap is called a "dependency". There is **no upper bound between filler and gap**.

In [2]:
import nltk
nltk.data.show_cfg("grammars/book_grammars/feat1.fcfg")


% start S
# ###################
# Grammar Productions
# ###################
S[-INV] -> NP VP
S[-INV]/?x -> NP VP/?x
S[-INV] -> NP S/NP
S[-INV] -> Adv[+NEG] S[+INV]
S[+INV] -> V[+AUX] NP VP
S[+INV]/?x -> V[+AUX] NP VP/?x
SBar -> Comp S[-INV]
SBar/?x -> Comp S[-INV]/?x
VP -> V[SUBCAT=intrans, -AUX]
VP -> V[SUBCAT=trans, -AUX] NP
VP/?x -> V[SUBCAT=trans, -AUX] NP/?x
VP -> V[SUBCAT=clause, -AUX] SBar
VP/?x -> V[SUBCAT=clause, -AUX] SBar/?x
VP -> V[+AUX] VP
VP/?x -> V[+AUX] VP/?x
# ###################
# Lexical Productions
# ###################
V[SUBCAT=intrans, -AUX] -> 'walk' | 'sing'
V[SUBCAT=trans, -AUX] -> 'see' | 'like'
V[SUBCAT=clause, -AUX] -> 'say' | 'claim'
V[+AUX] -> 'do' | 'can'
NP[-WH] -> 'you' | 'cats'
NP[+WH] -> 'who'
Adv[+NEG] -> 'rarely' | 'never'
NP/NP ->
Comp -> 'that'


In [3]:
tokens = "who do you claim that you like".split()
from nltk import load_parser
cp = load_parser("grammars/book_grammars/feat1.fcfg")
for tree in cp.parse(tokens):
    print(tree)

(S[-INV]
  (NP[+WH] who)
  (S[+INV]/NP[]
    (V[+AUX] do)
    (NP[-WH] you)
    (VP[]/NP[]
      (V[-AUX, SUBCAT='clause'] claim)
      (SBar[]/NP[]
        (Comp[] that)
        (S[-INV]/NP[]
          (NP[-WH] you)
          (VP[]/NP[] (V[-AUX, SUBCAT='trans'] like) (NP[]/NP[] )))))))


In [4]:
tokens2 = 'you claim that you like cats'.split()
for tree in cp.parse(tokens2):
    print(tree)

(S[-INV]
  (NP[-WH] you)
  (VP[]
    (V[-AUX, SUBCAT='clause'] claim)
    (SBar[]
      (Comp[] that)
      (S[-INV]
        (NP[-WH] you)
        (VP[] (V[-AUX, SUBCAT='trans'] like) (NP[-WH] cats))))))


In [5]:
tokens3 = 'rarely do you sing'.split()
for tree in cp.parse(tokens3):
    print(tree)

(S[-INV]
  (Adv[+NEG] rarely)
  (S[+INV]
    (V[+AUX] do)
    (NP[-WH] you)
    (VP[] (V[-AUX, SUBCAT='intrans'] sing))))


### Learning Objectives

* Appreciate how feature values (attribute-values) are superior to atomic symbols as using the latter will require massive multiplications to capture real-life CFG production rules.

* Understand how to represent entities as features and values and an Attribute-Value Matrix (AVM), and how the features can be nested.

* Use variables as feature values to specify dependencies.

* Use shared values represented as numerical indices in AVMs.

* Understand subsumption and unification when explaining relationships between feature structures. Identify if a DAG $FS_A \sqsubseteq FS_B$. Evaluate the unification of two or more DAGs $FS_A \sqcup FS_B$.

* Understand and apply the concept that if unification specializes a path $\pi$ in $FS$, then it specializes every path $\pi '$ equivalent to $\pi$.

* Use feature structures to built succint analyses of a wide variety of linguistic phenomena including *verb subcategorization*, *inversion constructions* and *unbounded dependency constructions*.