# 9.2 - Processing Feature Structures

Here is an example of declaration of a feature structure.

In [1]:
import nltk
fs1 = nltk.FeatStruct(TENSE='past', NUM='sg')
print(fs1)

[ NUM   = 'sg'   ]
[ TENSE = 'past' ]


A **feature structure** is a kind of dictionary and so we access its values by indexing in the usual way, for example:

In [2]:
fs2 = nltk.FeatStruct(PER=3, NUM='pl', GND='fem')
print(fs2)

[ GND = 'fem' ]
[ NUM = 'pl'  ]
[ PER = 3     ]


In [3]:
print(fs2['GND'])

fem


We can also define feature structures that have complex values.

In [4]:
fs2['CASE'] = 'acc'
fs3 = nltk.FeatStruct(POS='N', AGR=fs2)
print(fs3)

[       [ CASE = 'acc' ] ]
[ AGR = [ GND  = 'fem' ] ]
[       [ NUM  = 'pl'  ] ]
[       [ PER  = 3     ] ]
[                        ]
[ POS = 'N'              ]


In [5]:
print(fs3["AGR"])

[ CASE = 'acc' ]
[ GND  = 'fem' ]
[ NUM  = 'pl'  ]
[ PER  = 3     ]


In [6]:
print(fs3["AGR"]["PER"])

3


In [7]:
#We can also encode information of entities
print(nltk.FeatStruct(name="Lee", telno="01 27 86 42 96", age=33))

[ age   = 33               ]
[ name  = 'Lee'            ]
[ telno = '01 27 86 42 96' ]


**It's important to view feature structures as graphs or more specifically, directed acyclic graphs (DAGs)**.

In [8]:
#Using the Lee and Kim example:
fs4 = nltk.FeatStruct("""[NAME='Lee', ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'], SPOUSE=[NAME='Kim', ADDRESS->(1)]]""")
print(fs4)

[ ADDRESS = (1) [ NUMBER = 74           ] ]
[               [ STREET = 'rue Pascal' ] ]
[                                         ]
[ NAME    = 'Lee'                         ]
[                                         ]
[ SPOUSE  = [ ADDRESS -> (1)  ]           ]
[           [ NAME    = 'Kim' ]           ]


The bracketed integer is sometimes called a **tag** or a **coindex**. There can be any number of tags within a single feature structure.

## Subsumption and Unification
**Subsumption** is denoted as the following:

If $FS_0$ **subsumes** $FS_1$, (formally, $FS_0 \sqsubseteq FS_1$), then $FS_1$ **must have all the paths and path equivalences of $FS_0$**.

Hence, if 

$FS_a =$ `[NUMBER=74]`

$FS_b =$ `[NUMBER=74, STREET='rue Pascal']`

$FS_c =$ `[NUMBER=74, STREET='rue Pascal', CITY='Paris']`

Then the sentences $FS_a \sqsubseteq FS_b$, $FS_b \sqsubseteq FS_c$ and $FS_a \sqsubseteq FS_c$ are **all valid**. (Similar to set theory where the right member has the same number of or more members, the right DAG usually has more node-to-node paths. *Subsumes* can be thought of *is a subset of*)

We can also merge two feature structures. This is called **unification**. NLTK has this capability built into the `unify()` method.

In [9]:
#Merging information from 2 feature structures is called unification and is supported by the unify() method:
fs1 = nltk.FeatStruct(NUMBER=74, STREET='rue Pascal')
fs2 = nltk.FeatStruct(CITY='Paris')
print(fs1.unify(fs2))

[ CITY   = 'Paris'      ]
[ NUMBER = 74           ]
[ STREET = 'rue Pascal' ]


Unification is formally defined as a binary operation $FS_0 \sqcup FS_1$. Unification is symmetric. $FS_0 \sqcup FS_1 = FS_1 \sqcup FS_0$. 

In [10]:
print(fs2.unify(fs1))

[ CITY   = 'Paris'      ]
[ NUMBER = 74           ]
[ STREET = 'rue Pascal' ]


If we unify two feature structures that stand in the subsumption relationship then the result of the unificiation is the more specific of the two:

If $FS_0 \sqsubseteq FS_1$ then 

$FS_0 \sqcup FS_1 = FS_1$

Unification fails if the two feature structures share a path where the value of 1 path is a distinct atom and different from the other.

Let's use an example to further understand unification. Let's first define Lee's graph

In [11]:
fs5 = nltk.FeatStruct("""[NAME=Lee, 
                         ADDRESS=[NUMBER=74,STREET='rue Pascal'],
                         SPOUSE=[NAME=Kim,
                                 ADDRESS=[NUMBER=74, STREET='rue Pascal']]]""")
print(fs5)

[ ADDRESS = [ NUMBER = 74           ]               ]
[           [ STREET = 'rue Pascal' ]               ]
[                                                   ]
[ NAME    = 'Lee'                                   ]
[                                                   ]
[           [ ADDRESS = [ NUMBER = 74           ] ] ]
[ SPOUSE  = [           [ STREET = 'rue Pascal' ] ] ]
[           [                                     ] ]
[           [ NAME    = 'Kim'                     ] ]


Now we add Kim's address with a specification for `CITY`. Notice that `fs1` needs to include the whole path from the root of the feature to `CITY`.

In [12]:
fs6 = nltk.FeatStruct("[SPOUSE= [ADDRESS = [CITY=PARIS]]]")
print(fs5.unify(fs6))

[ ADDRESS = [ NUMBER = 74           ]               ]
[           [ STREET = 'rue Pascal' ]               ]
[                                                   ]
[ NAME    = 'Lee'                                   ]
[                                                   ]
[           [           [ CITY   = 'PARIS'      ] ] ]
[           [ ADDRESS = [ NUMBER = 74           ] ] ]
[ SPOUSE  = [           [ STREET = 'rue Pascal' ] ] ]
[           [                                     ] ]
[           [ NAME    = 'Kim'                     ] ]


In contrast, the result is very different if `fs1` is unified with the structure sharing version of `fs2`. 

In [13]:
fs2 = nltk.FeatStruct("""[NAME=Lee, 
ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'], SPOUSE=[NAME=Kim, ADDRESS=(1)]]""")
print(fs4.unify(fs6))

[               [ CITY   = 'PARIS'      ] ]
[ ADDRESS = (1) [ NUMBER = 74           ] ]
[               [ STREET = 'rue Pascal' ] ]
[                                         ]
[ NAME    = 'Lee'                         ]
[                                         ]
[ SPOUSE  = [ ADDRESS -> (1)  ]           ]
[           [ NAME    = 'Kim' ]           ]


Rather than just updating what is Kim's "copy" of Lee's address, we have updated **both addresses at the same time**. More generally, if a unification involves specializing the value of some path $\pi$, then that unification simultaneously specizlised the value of *any path equivilant to $\pi$*

Structure sharing can also be stated using variables such as `?x`.

In [14]:
fs1 = nltk.FeatStruct("[ADDRESS1=[NUMBER=74, STREET='rue Pascal']]")
fs2 = nltk.FeatStruct("[ADDRESS1=?x, ADDRESS2=?x]")
print(fs2)

[ ADDRESS1 = ?x ]
[ ADDRESS2 = ?x ]


In [15]:
print(fs2.unify(fs1))

[ ADDRESS1 = (1) [ NUMBER = 74           ] ]
[                [ STREET = 'rue Pascal' ] ]
[                                          ]
[ ADDRESS2 -> (1)                          ]
