In [13]:
%%capture
%load_ext autoreload
%autoreload 2
import sys
sys.path.append("../statnlpbook/")
import util, ie,tfutil

util.execute_notebook('relation_extraction.ipynb')

ImportError: No module named 'mpld3'

<!---
Latex Macros
-->
$$
\newcommand{\Xs}{\mathcal{X}}
\newcommand{\Ys}{\mathcal{Y}}
\newcommand{\y}{\mathbf{y}}
\newcommand{\balpha}{\boldsymbol{\alpha}}
\newcommand{\bbeta}{\boldsymbol{\beta}}
\newcommand{\aligns}{\mathbf{a}}
\newcommand{\align}{a}
\newcommand{\source}{\mathbf{s}}
\newcommand{\target}{\mathbf{t}}
\newcommand{\ssource}{s}
\newcommand{\starget}{t}
\newcommand{\repr}{\mathbf{f}}
\newcommand{\repry}{\mathbf{g}}
\newcommand{\x}{\mathbf{x}}
\newcommand{\prob}{p}
\newcommand{\a}{\alpha}
\newcommand{\b}{\beta}
\newcommand{\vocab}{V}
\newcommand{\params}{\boldsymbol{\theta}}
\newcommand{\param}{\theta}
\DeclareMathOperator{\perplexity}{PP}
\DeclareMathOperator{\argmax}{argmax}
\DeclareMathOperator{\argmin}{argmin}
\newcommand{\train}{\mathcal{D}}
\newcommand{\counts}[2]{\#_{#1}(#2) }
\newcommand{\length}[1]{\text{length}(#1) }
\newcommand{\indi}{\mathbb{I}}
$$

# Relation Extraction

##  Motivation 

* The amount of available information is growing exponentially
* Text contains a lot of information
* Only some of information is relevant for each use case
* How can we automatically make sense of information?

**Information Extraction** addresses this

[Alchemy information extraction demo](https://alchemy-language-demo.mybluemix.net/)

[ReVerb demo](http://openie.allenai.org/)

## Subtasks of Information Extraction

* **Document** Classification:
    * Assign a label to each document, often representing the topic
* **Named Entity Recognition**:
    * Recognise boundaries of entities in text, e.g. "New York", "New York Times" 
* **Named Entity Classification**:
    * Assign a type to each entity (e.g. "New York" -> location, "New York Times" -> media)
* **Relation** Extraction:
    * Recognise relatios between entities, e.g. "S. Riedel reader-at UCL"
* **Temporal** Information Extraction:
    * Recognise and/or normalise temporal expressions, e.g. "tomorrow morning at 8" -> "2016-11-26 08:00:00"
* **Event** Extraction:
    * Recognise events, typically consisting of entities and relations between them at a point in time and place, e.g. an election

### Relation Extraction

Task of extracting **semantic relations between arguments**
* Arguments are entities
    * general concepts such as "a company" (ORG), "a person" (PER)
    * instances of such concepts (e.g. "Microsoft", "Bill Gates"), which are called proper names or named entitites (NEs)
* Relation extraction builds on the task of named entity recognition

Relation extraction is relevant for many high-level NLP tasks, such as
* for question answering, where users ask questions such as "Who founded Microsoft?",
* for information retrieval, which often relies on large collections of structured information as background data, and
* for text and data mining, where larger patterns in relations between concepts are discovered, e.g. temporal patterns about startups

## Relation Extraction as Structured Prediction
We can formalise relation extraction as an instance of [structured prediction](/template/statnlpbook/02_methods/00_structuredprediction)
* The input space $\mathcal{X}$ are pairs of arguments $\mathcal{E}$ and supporting texts $\mathcal{S}$ those arguments appear in
* The output space $\mathcal{Y}$ is a set of relation labels such as $\Ys=\{ \text{founder-of},\text{employee-at},\text{professor-at},\text{NONE}\}$. 
* The goal is to define a model \\(s_{\params}(\x,y)\\) that assigns high *scores* to the label $\mathcal{y}$ that fits the arguments and supporting text $\mathcal{x}$, and lower scores otherwise. 
* The model will be parametrized by \\(\params\\), and these parameters we will learn from some training set of $\mathcal{x,y}$ pairs
* When we need to classify input  instances $\mathcal{x}$ consisting again of pairs of arguments and supporting texts, we have to solve the maximization problem $\argmax_y s_{\params}(\x,y)$.

## Relation Extraction Approaches
* **Pattern-Based** Relation Extraction:
    * Extract relations via manually defined textual pattern matching
* **Bootstrapping**:
    * Learn to extract relations via manually defined textual patterns, and use those to find more patterns and so forth, iteratively
* **Supervised** Relation Extraction:
    * Train a supervised model, from manually labelled training examples, to extract relations
* **Distantly Supervised** Relation Extraction:
    * Automatically annotate training data for supervised relation extraction, based on entries in a knowledge base
* **Universal Schema** Relation Extraction:
    * Model relation types and their surface forms in the same space, possible method for combining pattern-based, supervised and distantly supervised relation extraction


## Relation Extraction Example
* Extracting "method used for task" relations from sentences in computer science publications
* The first step would normally be to detection named entities, i.e. to determine tose pairs of arguments $\mathcal{E}$. For simplicity, our training data already contains those annotations.


## Pattern-Based Extraction
* The simplest relation extraction model defines a set of textual patterns for each relation and then assigns labels to entity pairs whose sentences match that pattern. 
* The training data consists of entity pairs $\mathcal{E}$, patterns $A$ and labels $Y$.

## Background Material

* Jurafky, Dan and Martin, James H. (2016). Speech and Language Processing, Chapter 21 (Information Extraction): https://web.stanford.edu/~jurafsky/slp3/21.pdf

* Riedel, Sebastian and Yao, Limin and McCallum, Andrew and Marlin, Benjamin M. (2013). Extraction with Matrix Factorization and Universal Schemas. Proceedings of NAACL.  http://www.aclweb.org/anthology/N13-1008