# Dependency Parsing

For intuitive understanding, use `spaCy` to get an example of dependency parsing first

This will print following information:
* Text: The original token text.
* Dep: The syntactic relation connecting child to head.
* Head text: The original text of the token head.
* Head POS: The part-of-speech tag of the token head.
* Children: The immediate syntactic dependents of the token.

In [24]:
import spacy
import en_core_web_sm
nlp = en_core_web_sm.load()

doc = nlp("I booked a ticket to Seattle")
for token in doc:
    print(token.text, token.dep_, token.head.text, token.head.pos_,
            [child for child in token.children])

I nsubj booked VERB []
booked ROOT booked VERB [I, ticket, to]
a det ticket NOUN []
ticket dobj booked VERB [a]
to prep booked VERB [Seattle]
Seattle pobj to ADP []


## Dependency Grammars

Previously in CFGs:
* Phrase-structure grammars
* Focus on modeling constituent structure

Currently in Dependency grammars, syntactic structure described in terms of
* Words 
* Syntactic/Semantic relations between words

## Dependency Parsing
A dependency parse is a tree, where

* Nodes correspond to words in utterance

* Edges between nodes represent dependency relations (Relations may be labeled or not)

![dependency tree](./img/dependency_tree.png)

Advantage of dependency parsing:
* ability to deal with languages that are morphologically rich and have a relatively free word order (in CFG need to dd extra phrases structure rules for alternatives)
* the head-dependent relations provide an approximation to the semantic relationship between predicates and their arguments **(good for coreference resolution, question answering and information extraction)**

Three main strategies for parsing:
* Convert dependency trees to PS trees (Parse using standard algorithms O(n3))

* Employ graph-based optimization (Weights learned by machine learning)

* Shift-reduce approaches based on current word/state (Attachment based on machine learning)

## Dependency Relation
Grammatical relation provides the basis for the binary relations that comprise these dependency structures. The arguments to these relations consist of a head and a dependent.
<img src="./img/dependency_relation.png" width="530">
<img src="./img/dependency_relation_example.png" width="500">

## Transition-Based Dependency Parsing
A stack-based shift-reduce parsing
![](https://2.bp.blogspot.com/-fqtmVS97tOs/VzTEAI9BQ8I/AAAAAAAAA_U/xPj0Av64sGseS0rF4Z1BbhmS77J-HuEvwCLcB/s1600/image04.gif)

A key element in transition-based parsing is the notion of a **conﬁguration**, consists of 
* a stack
* an input buffer of words, or tokens
* a set of relations representing a dependency tree

The parsing process consists of a sequence of transitions through the space of possible conﬁgurations.

```
function DEPENDENCY_PARSE(words) returns dependency tree 
    state ← { [root], [words], [] } ; initial conﬁguration 
    while state not final 
        t ← ORACLE(state) ; choose a transition operator to apply 
        state ← APPLY(t, state) ; apply it, creating a new state 
    return state
```

In [None]:
# Code to be continued

## Graph-Based Dependency Parsing
Encode the search space as directed graphs and employ methods drawn from graph theory to search the space for optimal solutions. Apply maximum spanning tree (MST) algorithm to a graph.

Goal: Find the highest scoring dependency tree T for sentence S  
* If S is unambiguous, T is the correct parse.
* If S is ambiguous, T is the highest scoring parse.

Scores come from:  
* Weights on dependency edges by machine learning
* Learned from large dependency treebank

Idea:
Build initial graph: fully connected
* Nodes: words in sentence to parse
* Edges: Directed edges between all words (and from ROOT to all words)

Identify maximum spanning tree
* Tree with all nodes connected
* Select the tree with highest weight
* Arc-factored model: Weights depend on end nodes & link (weight of tree is sum of participating arcs)

```
function M AX S PANNING T REE(G=(V,E), root,score) returns spanning tree 
    F ← [] 
    T’ ← [] 
    score’ ← [] 
    for each v ∈ V do 
        bestInEdge ← argmax score[e] for e=(u,v)∈ E
        F ← F ∪ bestInEdge 
        for each e=(u,v) ∈ E do 
            score’[e] ← score[e] − score[bestInEdge]

        if T=(V,F) is a spanning tree then return it 
        else
            C ← a cycle in F 
            G’ ← C ONTRACT(G,C) 
            T’ ← MAXSPANNINGTREE(G’,root,score’) 
            T ← EXPAND(T’, C) return T

function CONTRACT(G,C) returns contracted graph 
function EXPAND(T, C) returns expanded graph
```

In [None]:
# Code to be continued