In [4]:
%%capture
%load_ext autoreload
%autoreload 2
import sys
sys.path.append("..")
import statnlpbook.util as util
import statnlpbook.ie as ie
from statnlpbook.ie import *

util.execute_notebook('relation_extraction.ipynb')

ImportError: No module named 'mpld3'

<!---
Latex Macros
-->
$$
\newcommand{\Xs}{\mathcal{X}}
\newcommand{\Ys}{\mathcal{Y}}
\newcommand{\y}{\mathbf{y}}
\newcommand{\balpha}{\boldsymbol{\alpha}}
\newcommand{\bbeta}{\boldsymbol{\beta}}
\newcommand{\aligns}{\mathbf{a}}
\newcommand{\align}{a}
\newcommand{\source}{\mathbf{s}}
\newcommand{\target}{\mathbf{t}}
\newcommand{\ssource}{s}
\newcommand{\starget}{t}
\newcommand{\repr}{\mathbf{f}}
\newcommand{\repry}{\mathbf{g}}
\newcommand{\x}{\mathbf{x}}
\newcommand{\prob}{p}
\newcommand{\a}{\alpha}
\newcommand{\b}{\beta}
\newcommand{\vocab}{V}
\newcommand{\params}{\boldsymbol{\theta}}
\newcommand{\param}{\theta}
\DeclareMathOperator{\perplexity}{PP}
\DeclareMathOperator{\argmax}{argmax}
\DeclareMathOperator{\argmin}{argmin}
\newcommand{\train}{\mathcal{D}}
\newcommand{\counts}[2]{\#_{#1}(#2) }
\newcommand{\length}[1]{\text{length}(#1) }
\newcommand{\indi}{\mathbb{I}}
$$

# Relation Extraction

##  Motivation 

* The amount of available information is growing exponentially
* Text contains a lot of information
* Only some of information is relevant for each use case
* How can we automatically make sense of information?

**Information Extraction** addresses this

[Alchemy information extraction demo](https://alchemy-language-demo.mybluemix.net/)

[ReVerb demo](http://openie.allenai.org/)

## Subtasks of Information Extraction

* Document Classification:
    * Assign a label to each document, often representing the topic
* Named Entity Recognition:
    * Recognise boundaries of entities in text, e.g. "New York", "New York Times" 
* Named Entity Classification
    * Assign a type to each entity (e.g. "New York" -> location, "New York Times" -> media)
* Relation Extraction
    * Recognise relatios between entities, e.g. "S. Riedel reader-at UCL"
* Temporal Information Extraction
    * Recognise and/or normalise temporal expressions, e.g. "tomorrow morning at 8" -> "2016-11-26 08:00:00"
* Event Extraction
    * Recognise events, typically consisting of entities and relations between them at a point in time and place, e.g. an election

... (to do, below is all copied from a different slide deck)

### Example

In [5]:
tokens = ["ROOT", "Economic", "news", "had", "little", "effect", "on", "financial", "markets", "."]
arcs = set([(0,3, "root"), (0,9,"p"), (2,1,"amod"),(3,2,"nsubj"), (3, 5, "dobj"), (5,4,"amod"), (5,6, "prep"), (6,8,"pmod"), (8,7,"amod")])

render_displacy(*transition.to_displacy_graph(arcs, tokens),"1000px")

## Dependency Parsing Approaches

### Graph-Based Parsing
* define $s_\params(\x,\y)$ over input sentences $\Xs$ and dependency graphs $\Ys$
* parsing: $\argmax_\y s_\params(\x,\y)$
* frame as **finding maximum spanning trees** or other graph problems

### Transition-Based Parsing
* learn to perform the right action / transition in a bottom-up left right parser
* Train classifiers $s_\params(\x,y)$ where $y$ is an action, and $\x$ is solution built so far, and the remaining sentence


Currently the state-of-the-art...

## Parsing State 
Akin to bottom up parsing for CFGs...

A token 
### Buffer
of **remaining tokens**

In [6]:
render_transitions_displacy(transitions[0:1], tokenized_sentence)

NameError: name 'render_transitions_displacy' is not defined

A token 
### Stack
of earlier tokens to **attach to later**

In [None]:
render_transitions_displacy(transitions[2:3],tokenized_sentence)

A current 
### Parse 
built so far

In [None]:
render_transitions_displacy(transitions[9:10], tokenized_sentence)

We use the following 
### Actions

### Shift

push the word at the top of the buffer to the stack 

$$
(S, i|B, A)\rightarrow(S|i, B, A)
$$

In [None]:
render_transitions_displacy(transitions[0:2], tokenized_sentence)

### Reduce

pop the word at the top of the stack if it has a head 

$$
(S|i, B, A)\rightarrow(S, B, A)
$$

In [None]:
render_transitions_displacy(transitions[13:15], tokenized_sentence)

## rightArc-[label]

add labeled arc from top of stack \\(i\\) to top of the buffer \\(j\\) 

$$
(S|i, j|B, A) \rightarrow (S|i|j, B, A\cup\{(i,j,l)\})
$$


In [None]:
render_transitions_displacy(transitions[5:7], tokenized_sentence)

### leftArc-[label] 

add labeled arc from top of buffer, \\(j\\), to top of stack, \\(i\\), if \\(i\\) has no head 

$$
(S|i, j|B, A) \rightarrow (S, j|B, A\cup\{(j,i,l)\})
$$


In [None]:
render_transitions_displacy(transitions[2:4], tokenized_sentence)

## Full Example

In [None]:
render_transitions_displacy(transitions[:], tokenized_sentence)

## Machine Learning

How to decide what action to take? 

* Learn a discriminative classifier $s(\x,y)$ where $\x$ is a representation of buffer, stack and parse. 
* Current state-of-the-art systems use neural networks as classifiers (e.g. Parsey McParseFace)
* Extremely fast (linear in sentence length)

## Summary

* Dependency parsing predicts word-to-word dependencies 
* simpler annotations
* faster parsing
* sufficient for most down-stream applications

## Background Material

* [Mike Collins' PCFG lecture](http://www.cs.columbia.edu/~mcollins/courses/nlp2011/notes/pcfgs.pdf)
* Jurafsky & Martin, Chapter 12, Statistical Parsing