In [13]:
%%capture
%load_ext autoreload
%autoreload 2
import sys
sys.path.append("..")
from statnlpbook.util import execute_notebook
import statnlpbook.parsing as parsing
from statnlpbook.transition import *

execute_notebook('Transition-based dependency parsing.ipynb')

ModuleNotFoundError: No module named 'mpld3'

<!---
Latex Macros
-->
$$
\newcommand{\Xs}{\mathcal{X}}
\newcommand{\Ys}{\mathcal{Y}}
\newcommand{\y}{\mathbf{y}}
\newcommand{\balpha}{\boldsymbol{\alpha}}
\newcommand{\bbeta}{\boldsymbol{\beta}}
\newcommand{\aligns}{\mathbf{a}}
\newcommand{\align}{a}
\newcommand{\source}{\mathbf{s}}
\newcommand{\target}{\mathbf{t}}
\newcommand{\ssource}{s}
\newcommand{\starget}{t}
\newcommand{\repr}{\mathbf{f}}
\newcommand{\repry}{\mathbf{g}}
\newcommand{\x}{\mathbf{x}}
\newcommand{\prob}{p}
\newcommand{\a}{\alpha}
\newcommand{\b}{\beta}
\newcommand{\vocab}{V}
\newcommand{\params}{\boldsymbol{\theta}}
\newcommand{\param}{\theta}
\DeclareMathOperator{\perplexity}{PP}
\DeclareMathOperator{\argmax}{argmax}
\DeclareMathOperator{\argmin}{argmin}
\newcommand{\train}{\mathcal{D}}
\newcommand{\counts}[2]{\#_{#1}(#2) }
\newcommand{\length}[1]{\text{length}(#1) }
\newcommand{\indi}{\mathbb{I}}
$$

# Parsing

In [14]:
%%HTML
<style>
td,th {
    font-size: x-large;
    text-align: left;
}
</style>

##  Motivation 

Say want to automatically build a database of this form

| Brand   | Parent    |
|---------|-----------|
| KitKat  | Nestle    |
| Lipton  | Unilever  |  
| ...     | ...       |  

or this [graph](http://geekologie.com/image.php?path=/2012/04/25/parent-companies-large.jpg)

Say you find positive textual mentions in this form:

> <font color="blue">Dechra Pharmaceuticals</font> has made its second acquisition after purchasing <font color="green">Genitrix</font>.


> <font color="blue">Trinity Mirror plc</font> is the largest British newspaper after purchasing rival <font color="green">Local World</font>.

Can you find a pattern? 

How about this sentence 

> <font color="blue">Kraft</font> is gearing up for a roll-out of its <font color="blue">Milka</font> brand after purchasing  <font color="green">Cadbury Dairy Milk</font>.


Wouldn't it be great if we knew that

* Kraft is the **subject** of the phrase **purchasing Cadbury Dairy Milk** 

Check out [spaCy](https://demos.explosion.ai/displacy/) and the [Stanford CoreNLP Parser](http://nlp.stanford.edu:8080/corenlp/)

Parsing is is the process of **finding these graphs**:

* very important for downstream applications
* the "celebrity" sub-field of NLP 
    * partly because it marries linguistics and NLP
* researched in academia and [industry](http://www.telegraph.co.uk/technology/2016/05/17/has-googles-parsey-mcparseface-just-solved-one-of-the-worlds-big/)

How is this done?

## Dependency Parsing

* **Lexical Elements**: words
* **Syntactic Relations**: object, subject, direct object etc. 

Task: determine the syntactic relations between words

### Grammatical Relations
> <font color="blue">Kraft</font> is gearing up for a roll-out of its <font color="blue">Milka</font> brand after purchasing  <font color="green">Cadbury</font>.

* *Subject* of purchasing: **Kraft**
* *Object* of purchasing: **Cadbury**

### Subcategorisation of Relations

There are more complex (sub) categories of verbs (and other types of words)

* Intransitive Verbs: must not have objects
    * the student works
* Transitive Verbs: must have exactly one object
    * Kraft purchased Cadbury
* Ditransitive Verbs: must have two objects
    * Give me a break! 


### Universal Dependencies 

* Annotation framework featuring [37 syntactic relations](http://universaldependencies.org/)
* [Treebanks](http://universaldependencies.org/) (i.e. datasets annotated with syntactic relations) in over 60 languages
* Large project with over 200 contributors

### Example UD Dependency Relations


| Relation   | Description    |
|---------|-----------|
| nsubj  | Nominal subject    |
| dobj  | Direct object  |  
| iobj     | Indirect object       |  
| nmod     | Noun modifier       |
| amod     | Adjectival modifier       |  

## Anatomy of a Dependency Tree

* Nodes:
    * Tokens of sentence
    * a ROOT node (akin to the S symbol in CFGs)
* Edges:
    * Directed from token child to ** syntactic head**
    * Each **non-ROOT **token has **exactly one parent**
        * the word that controls its syntactic function, or
        * the word "it depends on"
* ROOT **has no parent**

### Example

In [15]:
tokens = ["ROOT", "Economic", "news", "had", "little", "effect", "on", "financial", "markets", "."]
arcs = set([(0,3, "root"), (0,9,"p"), (2,1,"amod"),(3,2,"nsubj"), (3, 5, "dobj"), (5,4,"amod"), (5,6, "prep"), (6,8,"pmod"), (8,7,"amod")])

render_displacy(*transition.to_displacy_graph(arcs, tokens),"900px")

NameError: name 'render_displacy' is not defined

### Exercise

If every token has exactly one parent, how does one represent a multi-word expression? Discuss with your neigbour and check your ideas with [spaCy](https://demos.explosion.ai/displacy/) or the [Stanford CoreNLP Parser](http://nlp.stanford.edu:8080/corenlp/)

## Dependency Parsing Approaches

### Graph-Based Parsing
* define $s_\params(\x,\y)$ over  sentences $\Xs$ and dependency graphs $\Ys$
* $s_\params(\x,\y)$ decomposes into per (hyper)edge scores:
$$
s_\params(\x,\y) = \sum_{(h,c) \in \y} s(h,c,\x)=\sum_{(h,c) \in \y}\langle \mathbf{f}(h,c,\x),\mathbf{w} \rangle
$$ 
* **Labelled** version uses $\langle \mathbf{f}(h,c,l,\x),\mathbf{w} \rangle$ where $l$ is label and $\mathbf{f}(h,c,\x)$ is feature function

## Transition-Based Parsing

* Currently the state-of-the art parsing approach
* Learn to perform the right action / transition in a bottom-up left-right parser
* Train classifiers $p(y|\x)$ where $y$ is an action, and $\x$ is solution built so far, and the remaining sentence

## Parsing State 

A token

### Buffer

of **remaining tokens**

In [None]:
render_transitions_displacy(transitions[0:1], tokenized_sentence)

### Stack
of earlier tokens to **attach to later**

In [None]:
render_transitions_displacy(transitions[2:3],tokenized_sentence)

### Parse 
built so far

In [None]:
render_transitions_displacy(transitions[9:10], tokenized_sentence)

We use the following 
### Actions

### Shift

push the word at the top of the buffer to the stack 

$$
(S, i|B, A)\rightarrow(S|i, B, A)
$$

In [None]:
render_transitions_displacy(transitions[0:2], tokenized_sentence)

### Reduce

pop the word at the top of the stack if it has a head 

$$
(S|i, B, A)\rightarrow(S, B, A)
$$

In [None]:
render_transitions_displacy(transitions[13:15], tokenized_sentence)

### rightArc-[label]

add labeled arc from top of stack \\(i\\) to top of the buffer \\(j\\) 

$$
(S|i, j|B, A) \rightarrow (S|i|j, B, A\cup\{(i,j,l)\})
$$


In [None]:
render_transitions_displacy(transitions[5:7], tokenized_sentence)

### leftArc-[label] 

add labeled arc from top of buffer, \\(j\\), to top of stack, \\(i\\), if \\(i\\) has no head 

$$
(S|i, j|B, A) \rightarrow (S, j|B, A\cup\{(j,i,l)\})
$$


In [None]:
render_transitions_displacy(transitions[2:4], tokenized_sentence)

## Full Example

In [None]:
render_transitions_displacy(transitions[2:4], tokenized_sentence)

## Machine Learning

How to decide what action to take? 

* Learn a discriminative classifier $p(y | \x)$ where 
   * $\x$ is a representation of buffer, stack and parse. 
   * $y$ is the action to choose
* Current state-of-the-art systems use neural networks as classifiers (e.g. Parsey McParseFace)
* Use **greedy search** or **beam search** to find the highest scoring sequence of steps

## Summary

* Dependency parsing predicts word-to-word dependencies 
* simpler annotations
* faster parsing
* sufficient for most down-stream applications

## Background Material

* [EACL 2014 tutorial](http://stp.lingfil.uu.se/~nivre/eacl14.html)
* Jurafsky & Martin, [Speech and Language Processing (Third Edition)](https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf): Chapter 13, Dependency Parsing.