In [3]:
%%capture
%load_ext autoreload
%autoreload 2
import sys
sys.path.append("..")
from statnlpbook.util import execute_notebook
import statnlpbook.parsing as parsing
from statnlpbook.transition import *
from statnlpbook.dep import *

execute_notebook('transition-based_dependency_parsing.ipynb')

<!---
Latex Macros
-->
$$
\newcommand{\Xs}{\mathcal{X}}
\newcommand{\Ys}{\mathcal{Y}}
\newcommand{\y}{\mathbf{y}}
\newcommand{\balpha}{\boldsymbol{\alpha}}
\newcommand{\bbeta}{\boldsymbol{\beta}}
\newcommand{\aligns}{\mathbf{a}}
\newcommand{\align}{a}
\newcommand{\source}{\mathbf{s}}
\newcommand{\target}{\mathbf{t}}
\newcommand{\ssource}{s}
\newcommand{\starget}{t}
\newcommand{\repr}{\mathbf{f}}
\newcommand{\repry}{\mathbf{g}}
\newcommand{\x}{\mathbf{x}}
\newcommand{\prob}{p}
\newcommand{\a}{\alpha}
\newcommand{\b}{\beta}
\newcommand{\vocab}{V}
\newcommand{\params}{\boldsymbol{\theta}}
\newcommand{\param}{\theta}
\DeclareMathOperator{\perplexity}{PP}
\DeclareMathOperator{\argmax}{argmax}
\DeclareMathOperator{\argmin}{argmin}
\newcommand{\train}{\mathcal{D}}
\newcommand{\counts}[2]{\#_{#1}(#2) }
\newcommand{\length}[1]{\text{length}(#1) }
\newcommand{\indi}{\mathbb{I}}
$$

# Parsing

In [15]:
%%HTML
<style>
.rendered_html td {
    font-size: x-large;
    text-align: left; !important
}
.rendered_html th {
    font-size: x-large;
    text-align: left; !important
}
</style>

##  Motivation 

Say you want to automatically build a database of this form

| Brand   | Parent    |
|---------|-----------|
| KitKat  | Nestle    |
| Lipton  | Unilever  |  
| ...     | ...       |  

or this graph:
![graph](https://geekologie.com/2012/04/25/parent-companies-large.jpg)

Say you find positive textual mentions in this form:

> <font color="blue">Dechra Pharmaceuticals</font> has made its second acquisition after purchasing <font color="green">Genitrix</font>.


> <font color="blue">Trinity Mirror plc</font> is the largest British newspaper after purchasing rival <font color="green">Local World</font>.

Can you find a pattern? 

How about this sentence 

> <font color="blue">Kraft</font> is gearing up for a roll-out of its <font color="blue">Milka</font> brand after purchasing  <font color="green">Cadbury Dairy Milk</font>.


Wouldn't it be great if we knew that

* Kraft is the **subject** of the phrase **purchasing Cadbury Dairy Milk** 

Check out [UDPipe](http://lindat.mff.cuni.cz/services/udpipe/run.php?model=english-ewt-ud-2.4-190531) and the [Stanford CoreNLP Parser](https://corenlp.run/).

Parsing is is the process of **finding these graphs**:

* very important for downstream applications
* researched in academia and [industry](http://www.telegraph.co.uk/technology/2016/05/17/has-googles-parsey-mcparseface-just-solved-one-of-the-worlds-big/)

How is this done?

## Dependency Parsing

* **Lexical Elements**: words
* **Syntactic Relations**: subject, direct object, nominal modifier, etc. 

Task: determine the syntactic relations between words

### Grammatical Relations
> <font color="blue">Kraft</font> is gearing up for a roll-out of its <font color="blue">Milka</font> brand after purchasing  <font color="green">Cadbury</font>.

* *Subject* of purchasing: **Kraft**
* *Object* of purchasing: **Cadbury**

### Subcategorisation of Relations

There are more complex (sub) categories of verbs (and other types of words)

* Intransitive Verbs: must not have objects
    * the student works
* Transitive Verbs: must have exactly one object
    * Kraft purchased Cadbury
* Ditransitive Verbs: must have two objects
    * Give me a break! 

## Anatomy of a Dependency Tree

* Nodes:
    * Tokens of sentence
    * a ROOT node
* Edges:
    * Directed from token child to **syntactic head**
    * Each **non-ROOT** token has **exactly one parent**
        * the word that controls its syntactic function, or
        * the word "it depends on"
* ROOT **has no parent**

### Example

In [1]:
conllu = """
1	Alice	_	_	_	_	2	nsubj	_	_
2	saw	_	_	_	_	0	root	_	_
3	Bob	_	_	_	_	2	dobj	_	_
"""
arcs, tokens = to_displacy_graph(*load_arcs_tokens(conllu))
render_displacy(arcs, tokens,"900px")

NameError: name 'to_displacy_graph' is not defined

### Exercise

If every token has exactly one parent, how does one represent a multi-word expression? Discuss with your neigbour and check your ideas with [UDPipe](http://lindat.mff.cuni.cz/services/udpipe/run.php?model=english-ewt-ud-2.4-190531) or the [Stanford CoreNLP Parser](http://nlp.stanford.edu:8080/corenlp/).
Enter your results here:

  <tt>[http://bit.ly/dep-mwe](http://bit.ly/dep-mwe)</tt>

Hint: you might want to check out the UD annotation guidelines: <tt>[https://universaldependencies.org/guidelines.html](https://universaldependencies.org/guidelines.html)</tt>.

### Universal Syntax

English and Danish are similar, but some languages are less similar to English:
![dendrogram](https://d3i71xaburhd42.cloudfront.net/85833272b1572f683a64055f6fff858f5546868e/7-Figure9-1.png)

### Universal Dependencies 

* Annotation framework featuring [37 syntactic relations](http://universaldependencies.org/)
* [Treebanks](http://universaldependencies.org/) (i.e. datasets annotated with syntactic relations) in over 80 languages
* Large project with over 200 contributors

### UD Dependency Relations

<table border="1">
  <tr style="background-color:cornflowerblue">
      <td> </td>
      <td> Nominals </td>
      <td> Clauses </td>
      <td> Modifier words </td>
      <td> Function Words </td>
  </tr>
  <tr>
      <td style="background-color:darkseagreen">
	Core arguments
      </td>
      <td>
	    <a href="https://universaldependencies.org/u/dep/nsubj.html" title="u-dep nsubj">nsubj</a><br>
	    <a href="https://universaldependencies.org/u/dep/obj.html" title="u-dep obj">obj</a><br>
	    <a href="https://universaldependencies.org/u/dep/iobj.html" title="u-dep iobj">iobj</a>
      </td>
      <td>
	    <a href="https://universaldependencies.org/u/dep/csubj.html" title="u-dep csubj">csubj</a><br>
	    <a href="https://universaldependencies.org/u/dep/ccomp.html" title="u-dep ccomp">ccomp</a><br>
	    <a href="https://universaldependencies.org/u/dep/xcomp.html" title="u-dep xcomp">xcomp</a>
      </td>
	  <td></td><td></td>
  </tr>
  <tr>
      <td style="background-color:darkseagreen">
	Non-core dependents
      </td>
      <td>
	    <a href="https://universaldependencies.org/u/dep/obl.html" title="u-dep obl">obl</a><br>
	    <a href="https://universaldependencies.org/u/dep/vocative.html" title="u-dep vocative">vocative</a><br>
	    <a href="https://universaldependencies.org/u/dep/expl.html" title="u-dep expl">expl</a><br>
	    <a href="https://universaldependencies.org/u/dep/dislocated.html" title="u-dep dislocated">dislocated</a>
      </td>
      <td>
	    <a href="https://universaldependencies.org/u/dep/advcl.html" title="u-dep advcl">advcl</a>
      </td>
      <td>
	    <a href="https://universaldependencies.org/u/dep/advmod.html" title="u-dep advmod">advmod</a><br>
	    <a href="https://universaldependencies.org/u/dep/discourse.html" title="u-dep discourse">discourse</a>
      </td>
      <td>
	    <a href="https://universaldependencies.org/u/dep/aux_.html" title="u-dep aux">aux</a><br>
	    <a href="https://universaldependencies.org/u/dep/cop.html" title="u-dep cop">cop</a><br>
	    <a href="https://universaldependencies.org/u/dep/mark.html" title="u-dep mark">mark</a>
      </td>
  </tr>
  <tr>
      <td style="background-color:darkseagreen">
	Nominal dependents
      </td>
      <td>
	    <a href="https://universaldependencies.org/u/dep/nmod.html" title="u-dep nmod">nmod</a><br>
	    <a href="https://universaldependencies.org/u/dep/appos.html" title="u-dep appos">appos</a><br>
	    <a href="https://universaldependencies.org/u/dep/nummod.html" title="u-dep nummod">nummod</a>
      </td>
      <td>
	    <a href="https://universaldependencies.org/u/dep/acl.html" title="u-dep acl">acl</a>
      </td>
      <td>
	    <a href="https://universaldependencies.org/u/dep/amod.html" title="u-dep amod">amod</a>
      </td>
      <td>
	    <a href="https://universaldependencies.org/u/dep/det.html" title="u-dep det">det</a><br>
	    <a href="https://universaldependencies.org/u/dep/clf.html" title="u-dep clf">clf</a><br>
	    <a href="https://universaldependencies.org/u/dep/case.html" title="u-dep case">case</a>
      </td>
  </tr>
  <tr style="background-color:cornflowerblue">	
      <td> Coordination </td>
      <td> MWE </td>
      <td> Loose </td>
      <td> Special </td>
      <td> Other </td>
  </tr>
  <tr>
      <td>
	    <a href="https://universaldependencies.org/u/dep/conj.html" title="u-dep conj">conj</a><br>
	    <a href="https://universaldependencies.org/u/dep/cc.html" title="u-dep cc">cc</a>
      </td>
      <td>
	  <a href="https://universaldependencies.org/u/dep/fixed.html" title="u-dep fixed">fixed</a><br>
	  <a href="https://universaldependencies.org/u/dep/flat.html" title="u-dep flat">flat</a><br>
	  <a href="https://universaldependencies.org/u/dep/compound.html" title="u-dep compound">compound</a>
    </td>
    <td>
	  <a href="https://universaldependencies.org/u/dep/list.html" title="u-dep list">list</a><br>
	  <a href="https://universaldependencies.org/u/dep/parataxis.html" title="u-dep parataxis">parataxis</a>
    </td>
    <td>
	  <a href="https://universaldependencies.org/u/dep/orphan.html" title="u-dep orphan">orphan</a><br>
	  <a href="https://universaldependencies.org/u/dep/goeswith.html" title="u-dep goeswith">goeswith</a><br>
	  <a href="https://universaldependencies.org/u/dep/reparandum.html" title="u-dep reparandum">reparandum</a>
    </td>
    <td>
	  <a href="https://universaldependencies.org/u/dep/punct.html" title="u-dep punct">punct</a><br>
	  <a href="https://universaldependencies.org/u/dep/root.html" title="u-dep root">root</a><br>
	  <a href="https://universaldependencies.org/u/dep/dep.html" title="u-dep dep">dep</a>
    </td>
  </tr>
</table>

## Transition-Based Parsing

* Learn to perform the right action / transition in a bottom-up left-right parser
* Train classifiers $p(y|\x)$ where $y$ is an action, and $\x$ is solution built so far, and the remaining sentence
* Shown here: arc-standard system ([Nivre, 2004](https://www.aclweb.org/anthology/W04-0308))

## Configuration (Parser State)

Consists of a buffer, stack and set of arcs created so far.

### Buffer

of **remaining tokens**

In [5]:
render_transitions_displacy(transitions[0:1], tokenized_sentence)

0,1,2,3
buffer,stack,parse,action
ROOT Economic news had little effect on financial markets .,,"$(function() {  requirejs.config({  paths: {  'displaCy': ['/files/node_modules/displacy/displacy'],  // strip .js ^, require adds it back  },  });  require(['displaCy'], function() {  console.log(""Loaded :)"");  const displacy = new displaCy('http://localhost:8000', {  container: '#displacy10',  format: 'spacy',  distance: 150,  offsetX: 0,  wordSpacing: 20,  arrowSpacing: 3,  });  const parse = {  arcs: [],  words: [{""text"": ""ROOT""}]  };  displacy.render(parse, {  uniqueId: 'render_displacy10'  //color: '#ff0000'  });  return {};  });  });",INIT


### Stack
of earlier tokens to **attach to later**

In [6]:
render_transitions_displacy(transitions[2:3],tokenized_sentence)

0,1,2,3
buffer,stack,parse,action
news had little effect on financial markets .,ROOT Economic,"$(function() {  requirejs.config({  paths: {  'displaCy': ['/files/node_modules/displacy/displacy'],  // strip .js ^, require adds it back  },  });  require(['displaCy'], function() {  console.log(""Loaded :)"");  const displacy = new displaCy('http://localhost:8000', {  container: '#displacy11',  format: 'spacy',  distance: 150,  offsetX: 0,  wordSpacing: 20,  arrowSpacing: 3,  });  const parse = {  arcs: [],  words: [{""text"": ""ROOT""}, {""text"": ""Economic""}, {""text"": ""news""}]  };  displacy.render(parse, {  uniqueId: 'render_displacy11'  //color: '#ff0000'  });  return {};  });  });",shift


### Parse (set of arcs)
built so far

In [7]:
render_transitions_displacy(transitions[9:10], tokenized_sentence)

0,1,2,3
buffer,stack,parse,action
on financial markets .,ROOT had effect,"$(function() {  requirejs.config({  paths: {  'displaCy': ['/files/node_modules/displacy/displacy'],  // strip .js ^, require adds it back  },  });  require(['displaCy'], function() {  console.log(""Loaded :)"");  const displacy = new displaCy('http://localhost:8000', {  container: '#displacy12',  format: 'spacy',  distance: 150,  offsetX: 0,  wordSpacing: 20,  arrowSpacing: 3,  });  const parse = {  arcs: [{""start"": 2, ""end"": 3, ""label"": ""nsubj"", ""dir"": ""left""}, {""start"": 4, ""end"": 5, ""label"": ""amod"", ""dir"": ""left""}, {""start"": 3, ""end"": 5, ""label"": ""dobj"", ""dir"": ""right""}, {""start"": 1, ""end"": 2, ""label"": ""amod"", ""dir"": ""left""}, {""start"": 0, ""end"": 3, ""label"": ""root"", ""dir"": ""right""}],  words: [{""text"": ""ROOT""}, {""text"": ""Economic""}, {""text"": ""news""}, {""text"": ""had""}, {""text"": ""little""}, {""text"": ""effect""}, {""text"": ""on""}]  };  displacy.render(parse, {  uniqueId: 'render_displacy12'  //color: '#ff0000'  });  return {};  });  });",rightArc-dobj


We use the following 
### Actions

### Shift

push the word at the top of the buffer to the stack 

$$
(S, i|B, A)\rightarrow(S|i, B, A)
$$

In [8]:
render_transitions_displacy(transitions[0:2], tokenized_sentence)

0,1,2,3
buffer,stack,parse,action
ROOT Economic news had little effect on financial markets .,,"$(function() {  requirejs.config({  paths: {  'displaCy': ['/files/node_modules/displacy/displacy'],  // strip .js ^, require adds it back  },  });  require(['displaCy'], function() {  console.log(""Loaded :)"");  const displacy = new displaCy('http://localhost:8000', {  container: '#displacy13',  format: 'spacy',  distance: 150,  offsetX: 0,  wordSpacing: 20,  arrowSpacing: 3,  });  const parse = {  arcs: [],  words: [{""text"": ""ROOT""}]  };  displacy.render(parse, {  uniqueId: 'render_displacy13'  //color: '#ff0000'  });  return {};  });  });",INIT
Economic news had little effect on financial markets .,ROOT,"$(function() {  requirejs.config({  paths: {  'displaCy': ['/files/node_modules/displacy/displacy'],  // strip .js ^, require adds it back  },  });  require(['displaCy'], function() {  console.log(""Loaded :)"");  const displacy = new displaCy('http://localhost:8000', {  container: '#displacy14',  format: 'spacy',  distance: 150,  offsetX: 0,  wordSpacing: 20,  arrowSpacing: 3,  });  const parse = {  arcs: [],  words: [{""text"": ""ROOT""}, {""text"": ""Economic""}]  };  displacy.render(parse, {  uniqueId: 'render_displacy14'  //color: '#ff0000'  });  return {};  });  });",shift


### Reduce

pop the word at the top of the stack if it has a head 

$$
(S|i, B, A)\rightarrow(S, B, A)
$$

In [9]:
render_transitions_displacy(transitions[13:15], tokenized_sentence)

0,1,2,3
buffer,stack,parse,action
.,ROOT had effect on markets,"$(function() {  requirejs.config({  paths: {  'displaCy': ['/files/node_modules/displacy/displacy'],  // strip .js ^, require adds it back  },  });  require(['displaCy'], function() {  console.log(""Loaded :)"");  const displacy = new displaCy('http://localhost:8000', {  container: '#displacy15',  format: 'spacy',  distance: 150,  offsetX: 0,  wordSpacing: 20,  arrowSpacing: 3,  });  const parse = {  arcs: [{""start"": 7, ""end"": 8, ""label"": ""amod"", ""dir"": ""left""}, {""start"": 2, ""end"": 3, ""label"": ""nsubj"", ""dir"": ""left""}, {""start"": 4, ""end"": 5, ""label"": ""amod"", ""dir"": ""left""}, {""start"": 5, ""end"": 6, ""label"": ""prep"", ""dir"": ""right""}, {""start"": 3, ""end"": 5, ""label"": ""dobj"", ""dir"": ""right""}, {""start"": 1, ""end"": 2, ""label"": ""amod"", ""dir"": ""left""}, {""start"": 0, ""end"": 3, ""label"": ""root"", ""dir"": ""right""}, {""start"": 6, ""end"": 8, ""label"": ""pmod"", ""dir"": ""right""}],  words: [{""text"": ""ROOT""}, {""text"": ""Economic""}, {""text"": ""news""}, {""text"": ""had""}, {""text"": ""little""}, {""text"": ""effect""}, {""text"": ""on""}, {""text"": ""financial""}, {""text"": ""markets""}, {""text"": "".""}]  };  displacy.render(parse, {  uniqueId: 'render_displacy15'  //color: '#ff0000'  });  return {};  });  });",rightArc-pmod
.,ROOT had effect on,"$(function() {  requirejs.config({  paths: {  'displaCy': ['/files/node_modules/displacy/displacy'],  // strip .js ^, require adds it back  },  });  require(['displaCy'], function() {  console.log(""Loaded :)"");  const displacy = new displaCy('http://localhost:8000', {  container: '#displacy16',  format: 'spacy',  distance: 150,  offsetX: 0,  wordSpacing: 20,  arrowSpacing: 3,  });  const parse = {  arcs: [{""start"": 7, ""end"": 8, ""label"": ""amod"", ""dir"": ""left""}, {""start"": 2, ""end"": 3, ""label"": ""nsubj"", ""dir"": ""left""}, {""start"": 4, ""end"": 5, ""label"": ""amod"", ""dir"": ""left""}, {""start"": 5, ""end"": 6, ""label"": ""prep"", ""dir"": ""right""}, {""start"": 3, ""end"": 5, ""label"": ""dobj"", ""dir"": ""right""}, {""start"": 1, ""end"": 2, ""label"": ""amod"", ""dir"": ""left""}, {""start"": 0, ""end"": 3, ""label"": ""root"", ""dir"": ""right""}, {""start"": 6, ""end"": 8, ""label"": ""pmod"", ""dir"": ""right""}],  words: [{""text"": ""ROOT""}, {""text"": ""Economic""}, {""text"": ""news""}, {""text"": ""had""}, {""text"": ""little""}, {""text"": ""effect""}, {""text"": ""on""}, {""text"": ""financial""}, {""text"": ""markets""}, {""text"": "".""}]  };  displacy.render(parse, {  uniqueId: 'render_displacy16'  //color: '#ff0000'  });  return {};  });  });",reduce


### rightArc-[label]

Add labeled arc from top of stack \\(i\\) to top of the buffer \\(j\\). Shift the token on top of the buffer to the stack.

$$
(S|i, j|B, A) \rightarrow (S|i|j, B, A\cup\{(i,j,l)\})
$$


In [10]:
render_transitions_displacy(transitions[5:7], tokenized_sentence)

0,1,2,3
buffer,stack,parse,action
had little effect on financial markets .,ROOT,"$(function() {  requirejs.config({  paths: {  'displaCy': ['/files/node_modules/displacy/displacy'],  // strip .js ^, require adds it back  },  });  require(['displaCy'], function() {  console.log(""Loaded :)"");  const displacy = new displaCy('http://localhost:8000', {  container: '#displacy17',  format: 'spacy',  distance: 150,  offsetX: 0,  wordSpacing: 20,  arrowSpacing: 3,  });  const parse = {  arcs: [{""start"": 2, ""end"": 3, ""label"": ""nsubj"", ""dir"": ""left""}, {""start"": 1, ""end"": 2, ""label"": ""amod"", ""dir"": ""left""}],  words: [{""text"": ""ROOT""}, {""text"": ""Economic""}, {""text"": ""news""}, {""text"": ""had""}]  };  displacy.render(parse, {  uniqueId: 'render_displacy17'  //color: '#ff0000'  });  return {};  });  });",leftArc-nsubj
little effect on financial markets .,ROOT had,"$(function() {  requirejs.config({  paths: {  'displaCy': ['/files/node_modules/displacy/displacy'],  // strip .js ^, require adds it back  },  });  require(['displaCy'], function() {  console.log(""Loaded :)"");  const displacy = new displaCy('http://localhost:8000', {  container: '#displacy18',  format: 'spacy',  distance: 150,  offsetX: 0,  wordSpacing: 20,  arrowSpacing: 3,  });  const parse = {  arcs: [{""start"": 0, ""end"": 3, ""label"": ""root"", ""dir"": ""right""}, {""start"": 2, ""end"": 3, ""label"": ""nsubj"", ""dir"": ""left""}, {""start"": 1, ""end"": 2, ""label"": ""amod"", ""dir"": ""left""}],  words: [{""text"": ""ROOT""}, {""text"": ""Economic""}, {""text"": ""news""}, {""text"": ""had""}, {""text"": ""little""}]  };  displacy.render(parse, {  uniqueId: 'render_displacy18'  //color: '#ff0000'  });  return {};  });  });",rightArc-root


### leftArc-[label] 

Add labeled arc from top of buffer, \\(j\\), to top of stack, \\(i\\), if \\(i\\) has no head. Reduce the token on top of the stack.

$$
(S|i, j|B, A) \rightarrow (S, j|B, A\cup\{(j,i,l)\})
$$


In [11]:
render_transitions_displacy(transitions[2:4], tokenized_sentence)

0,1,2,3
buffer,stack,parse,action
news had little effect on financial markets .,ROOT Economic,"$(function() {  requirejs.config({  paths: {  'displaCy': ['/files/node_modules/displacy/displacy'],  // strip .js ^, require adds it back  },  });  require(['displaCy'], function() {  console.log(""Loaded :)"");  const displacy = new displaCy('http://localhost:8000', {  container: '#displacy19',  format: 'spacy',  distance: 150,  offsetX: 0,  wordSpacing: 20,  arrowSpacing: 3,  });  const parse = {  arcs: [],  words: [{""text"": ""ROOT""}, {""text"": ""Economic""}, {""text"": ""news""}]  };  displacy.render(parse, {  uniqueId: 'render_displacy19'  //color: '#ff0000'  });  return {};  });  });",shift
news had little effect on financial markets .,ROOT,"$(function() {  requirejs.config({  paths: {  'displaCy': ['/files/node_modules/displacy/displacy'],  // strip .js ^, require adds it back  },  });  require(['displaCy'], function() {  console.log(""Loaded :)"");  const displacy = new displaCy('http://localhost:8000', {  container: '#displacy20',  format: 'spacy',  distance: 150,  offsetX: 0,  wordSpacing: 20,  arrowSpacing: 3,  });  const parse = {  arcs: [{""start"": 1, ""end"": 2, ""label"": ""amod"", ""dir"": ""left""}],  words: [{""text"": ""ROOT""}, {""text"": ""Economic""}, {""text"": ""news""}]  };  displacy.render(parse, {  uniqueId: 'render_displacy20'  //color: '#ff0000'  });  return {};  });  });",leftArc-amod


### Summary: Configuration

**Configuration**:
- Stack \\(S\\): a last-in, first-out memory to keep track of words to process later
- Buffer \\(B\\): words not processed so far
- Arcs \\(A\\): the dependency edges predicted so far

We further define two special configurations:
- initial: buffer is initialised to the words in the sentence, stack and arks are empty
- terminal: buffer is empty

### Summary: Actions

- shift: push the word at the top of the buffer to the stack \\((S, i|B, A)\rightarrow(S|i, B, A)\\)
- reduce: pop the word at the top of the stack if it has a head \\((S|i, B, A)\rightarrow(S, B, A)\\)
- rightArc-label: create a labeled arc from the token at the top of the stack \\(i\\) to the token at the top of the buffer \\(j\\) \\((S|i, j|B, A) \rightarrow (S|i|j, B, A\cup\{(i,j,l)\})\\). Shift the token on top of the buffer to the stack.
- leftArc-label: create a labeled arc from the token at the top of the buffer \\(j\\) to the token at the top of the stack \\(i\\) if \\(i\\) has no head \\((S|i, j|B, A) \rightarrow (S, j|B, A\cup\{(j,i,l)\})\\). Reduce the token on top of the stack.

## Full Example

In [12]:
render_transitions_displacy(transitions[:], tokenized_sentence)

0,1,2,3
buffer,stack,parse,action
ROOT Economic news had little effect on financial markets .,,"$(function() {  requirejs.config({  paths: {  'displaCy': ['/files/node_modules/displacy/displacy'],  // strip .js ^, require adds it back  },  });  require(['displaCy'], function() {  console.log(""Loaded :)"");  const displacy = new displaCy('http://localhost:8000', {  container: '#displacy21',  format: 'spacy',  distance: 150,  offsetX: 0,  wordSpacing: 20,  arrowSpacing: 3,  });  const parse = {  arcs: [],  words: [{""text"": ""ROOT""}]  };  displacy.render(parse, {  uniqueId: 'render_displacy21'  //color: '#ff0000'  });  return {};  });  });",INIT
Economic news had little effect on financial markets .,ROOT,"$(function() {  requirejs.config({  paths: {  'displaCy': ['/files/node_modules/displacy/displacy'],  // strip .js ^, require adds it back  },  });  require(['displaCy'], function() {  console.log(""Loaded :)"");  const displacy = new displaCy('http://localhost:8000', {  container: '#displacy22',  format: 'spacy',  distance: 150,  offsetX: 0,  wordSpacing: 20,  arrowSpacing: 3,  });  const parse = {  arcs: [],  words: [{""text"": ""ROOT""}, {""text"": ""Economic""}]  };  displacy.render(parse, {  uniqueId: 'render_displacy22'  //color: '#ff0000'  });  return {};  });  });",shift
news had little effect on financial markets .,ROOT Economic,"$(function() {  requirejs.config({  paths: {  'displaCy': ['/files/node_modules/displacy/displacy'],  // strip .js ^, require adds it back  },  });  require(['displaCy'], function() {  console.log(""Loaded :)"");  const displacy = new displaCy('http://localhost:8000', {  container: '#displacy23',  format: 'spacy',  distance: 150,  offsetX: 0,  wordSpacing: 20,  arrowSpacing: 3,  });  const parse = {  arcs: [],  words: [{""text"": ""ROOT""}, {""text"": ""Economic""}, {""text"": ""news""}]  };  displacy.render(parse, {  uniqueId: 'render_displacy23'  //color: '#ff0000'  });  return {};  });  });",shift
news had little effect on financial markets .,ROOT,"$(function() {  requirejs.config({  paths: {  'displaCy': ['/files/node_modules/displacy/displacy'],  // strip .js ^, require adds it back  },  });  require(['displaCy'], function() {  console.log(""Loaded :)"");  const displacy = new displaCy('http://localhost:8000', {  container: '#displacy24',  format: 'spacy',  distance: 150,  offsetX: 0,  wordSpacing: 20,  arrowSpacing: 3,  });  const parse = {  arcs: [{""start"": 1, ""end"": 2, ""label"": ""amod"", ""dir"": ""left""}],  words: [{""text"": ""ROOT""}, {""text"": ""Economic""}, {""text"": ""news""}]  };  displacy.render(parse, {  uniqueId: 'render_displacy24'  //color: '#ff0000'  });  return {};  });  });",leftArc-amod
had little effect on financial markets .,ROOT news,"$(function() {  requirejs.config({  paths: {  'displaCy': ['/files/node_modules/displacy/displacy'],  // strip .js ^, require adds it back  },  });  require(['displaCy'], function() {  console.log(""Loaded :)"");  const displacy = new displaCy('http://localhost:8000', {  container: '#displacy25',  format: 'spacy',  distance: 150,  offsetX: 0,  wordSpacing: 20,  arrowSpacing: 3,  });  const parse = {  arcs: [{""start"": 1, ""end"": 2, ""label"": ""amod"", ""dir"": ""left""}],  words: [{""text"": ""ROOT""}, {""text"": ""Economic""}, {""text"": ""news""}, {""text"": ""had""}]  };  displacy.render(parse, {  uniqueId: 'render_displacy25'  //color: '#ff0000'  });  return {};  });  });",shift
had little effect on financial markets .,ROOT,"$(function() {  requirejs.config({  paths: {  'displaCy': ['/files/node_modules/displacy/displacy'],  // strip .js ^, require adds it back  },  });  require(['displaCy'], function() {  console.log(""Loaded :)"");  const displacy = new displaCy('http://localhost:8000', {  container: '#displacy26',  format: 'spacy',  distance: 150,  offsetX: 0,  wordSpacing: 20,  arrowSpacing: 3,  });  const parse = {  arcs: [{""start"": 2, ""end"": 3, ""label"": ""nsubj"", ""dir"": ""left""}, {""start"": 1, ""end"": 2, ""label"": ""amod"", ""dir"": ""left""}],  words: [{""text"": ""ROOT""}, {""text"": ""Economic""}, {""text"": ""news""}, {""text"": ""had""}]  };  displacy.render(parse, {  uniqueId: 'render_displacy26'  //color: '#ff0000'  });  return {};  });  });",leftArc-nsubj
little effect on financial markets .,ROOT had,"$(function() {  requirejs.config({  paths: {  'displaCy': ['/files/node_modules/displacy/displacy'],  // strip .js ^, require adds it back  },  });  require(['displaCy'], function() {  console.log(""Loaded :)"");  const displacy = new displaCy('http://localhost:8000', {  container: '#displacy27',  format: 'spacy',  distance: 150,  offsetX: 0,  wordSpacing: 20,  arrowSpacing: 3,  });  const parse = {  arcs: [{""start"": 0, ""end"": 3, ""label"": ""root"", ""dir"": ""right""}, {""start"": 2, ""end"": 3, ""label"": ""nsubj"", ""dir"": ""left""}, {""start"": 1, ""end"": 2, ""label"": ""amod"", ""dir"": ""left""}],  words: [{""text"": ""ROOT""}, {""text"": ""Economic""}, {""text"": ""news""}, {""text"": ""had""}, {""text"": ""little""}]  };  displacy.render(parse, {  uniqueId: 'render_displacy27'  //color: '#ff0000'  });  return {};  });  });",rightArc-root
effect on financial markets .,ROOT had little,"$(function() {  requirejs.config({  paths: {  'displaCy': ['/files/node_modules/displacy/displacy'],  // strip .js ^, require adds it back  },  });  require(['displaCy'], function() {  console.log(""Loaded :)"");  const displacy = new displaCy('http://localhost:8000', {  container: '#displacy28',  format: 'spacy',  distance: 150,  offsetX: 0,  wordSpacing: 20,  arrowSpacing: 3,  });  const parse = {  arcs: [{""start"": 0, ""end"": 3, ""label"": ""root"", ""dir"": ""right""}, {""start"": 2, ""end"": 3, ""label"": ""nsubj"", ""dir"": ""left""}, {""start"": 1, ""end"": 2, ""label"": ""amod"", ""dir"": ""left""}],  words: [{""text"": ""ROOT""}, {""text"": ""Economic""}, {""text"": ""news""}, {""text"": ""had""}, {""text"": ""little""}, {""text"": ""effect""}]  };  displacy.render(parse, {  uniqueId: 'render_displacy28'  //color: '#ff0000'  });  return {};  });  });",shift
effect on financial markets .,ROOT had,"$(function() {  requirejs.config({  paths: {  'displaCy': ['/files/node_modules/displacy/displacy'],  // strip .js ^, require adds it back  },  });  require(['displaCy'], function() {  console.log(""Loaded :)"");  const displacy = new displaCy('http://localhost:8000', {  container: '#displacy29',  format: 'spacy',  distance: 150,  offsetX: 0,  wordSpacing: 20,  arrowSpacing: 3,  });  const parse = {  arcs: [{""start"": 0, ""end"": 3, ""label"": ""root"", ""dir"": ""right""}, {""start"": 2, ""end"": 3, ""label"": ""nsubj"", ""dir"": ""left""}, {""start"": 4, ""end"": 5, ""label"": ""amod"", ""dir"": ""left""}, {""start"": 1, ""end"": 2, ""label"": ""amod"", ""dir"": ""left""}],  words: [{""text"": ""ROOT""}, {""text"": ""Economic""}, {""text"": ""news""}, {""text"": ""had""}, {""text"": ""little""}, {""text"": ""effect""}]  };  displacy.render(parse, {  uniqueId: 'render_displacy29'  //color: '#ff0000'  });  return {};  });  });",leftArc-amod


## Machine Learning

How do we learn to parse? 

* Decompose parse tree into a sequence of actions
* Learn to score individual actions as well as whole tree
    * structured prediction problem!

How to decide what action to take? 

* Learn a discriminative classifier $p(y | \x)$ where 
   * $\x$ is a representation of buffer, stack and parse. 
   * $y$ is the action to choose
* Current state-of-the-art systems use neural networks as classifiers (e.g. Parsey McParseFace)
* Use **greedy search** or **beam search** to find the highest scoring sequence of steps

### Oracle

How do we get training data for the classifier?

* Training data: whole trees labelled as correct/incorrect
* We need to design an **oracle**
    * function that, given a sentence and its dependency tree, recovers the sequence of actions used to construct it
    * can also be thought of reverse engineering a tree into a sequence of actions
* An oracle does this for every possible parse tree
* Oracle can also be thought of as human demonstrator teaching the parser

## Summary

* Dependency parsing predicts word-to-word dependencies 
* simple annotations
* fast parsing
* sufficient for most down-stream applications

## Background Material

* Arc-standard transition-based parsing system ([Nivre, 2004](https://www.aclweb.org/anthology/W04-0308))
* [EACL 2014 tutorial](http://stp.lingfil.uu.se/~nivre/eacl14.html)
* Jurafsky & Martin, [Speech and Language Processing (Third Edition)](https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf): Chapter 13, Dependency Parsing.