In [106]:
%%capture
%load_ext autoreload
%autoreload 2
%matplotlib inline
# %cd .. 
import sys
sys.path.append("..")
import statnlpbook.util as util
import matplotlib

ImportError: No module named 'mpld3'

<!---
Latex Macros
-->
$$
\newcommand{\Xs}{\mathcal{X}}
\newcommand{\Ys}{\mathcal{Y}}
\newcommand{\y}{\mathbf{y}}
\newcommand{\balpha}{\boldsymbol{\alpha}}
\newcommand{\bbeta}{\boldsymbol{\beta}}
\newcommand{\aligns}{\mathbf{a}}
\newcommand{\align}{a}
\newcommand{\source}{\mathbf{s}}
\newcommand{\target}{\mathbf{t}}
\newcommand{\ssource}{s}
\newcommand{\starget}{t}
\newcommand{\repr}{\mathbf{f}}
\newcommand{\repry}{\mathbf{g}}
\newcommand{\x}{\mathbf{x}}
\newcommand{\prob}{p}
\newcommand{\vocab}{V}
\newcommand{\params}{\boldsymbol{\theta}}
\newcommand{\param}{\theta}
\DeclareMathOperator{\perplexity}{PP}
\DeclareMathOperator{\argmax}{argmax}
\DeclareMathOperator{\argmin}{argmin}
\newcommand{\train}{\mathcal{D}}
\newcommand{\counts}[2]{\#_{#1}(#2) }
\newcommand{\length}[1]{\text{length}(#1) }
\newcommand{\indi}{\mathbb{I}}
$$

# Relation Extraction 
Relation extraction (RE) is the task of extracting semantic relations between arguments. Arguments can either be general concepts such as "a company" (ORG), "a person" (PER); or instances of such concepts (e.g. "Microsoft", "Bill Gates"), which are called proper names or named entitites (NEs). An example for a semantic relation would be "founder-of(PER, ORG)". Relation extraction therefore often builds on the task of named entity recognition.

Relation extraction is relevant for many high-level NLP tasks, such as

* for question answering, where users ask questions such as "Who founded Microsoft?",
* for information retrieval, which often relies on large collections of structured information as background data, and 
* for text and data mining, where larger patterns in relations between concepts are discovered, e.g. temporal patterns about startups


## Relation Extraction as Structured Prediction
We can formalise relation extraction as an instance of [structured prediction](/template/statnlpbook/02_methods/00_structuredprediction) where the input space $\mathcal{X}$ are pairs of arguments $\mathcal{E}$ and supporting texts $\mathcal{S}$ those arguments appear in. The output space $\mathcal{Y}$ is a set of relation labels such as $\Ys=\{ \text{founder-of},\text{employee-at},\text{professor-at},\text{NONE}\}$. The goal is to define a model \\(s_{\params}(\x,y)\\) that assigns high *scores* to the label $\mathcal{y}$ that fits the arguments and supporting text $\mathcal{x}$, and lower scores otherwise. The model will be parametrized by \\(\params\\), and these parameters we will learn from some training set \\(\train\\) of $\mathcal{x,y}$ pairs. When we need to classify input  instances $\mathcal{x}$ consisting again of pairs of arguments and supporting texts, we have to solve the maximization problem $\argmax_y s_{\params}(\x,y)$. Note that this frames relation extraction as a multi-class classification problem (Exercise: how could RE be formalised to predict multiple labels for each input instance and how would the example below have to be adapted for that?)


## Relation Extraction Example
Before we take a closer look at relation extraction methods, let us consider a concrete example. The concrete task we are considering here is to extract "method used for task" relations from sentences in computer science publications. As mentioned above, the first step would normally be to detection named entities, i.e. to determine tose pairs of arguments $\mathcal{E}$. For simplicity, our training data already contains those annotations.


## Pattern-Based Extraction
The simplest relation extraction model defines a set of textual patterns for each relation and then assigns labels to entity pairs whose sentences match that pattern. The training data consists of entity pairs $\mathcal{E}$, patterns $A$ and labels $Y$.

In [107]:
def readLabelledPatternData(filepath="../data/ie/ie_bootstrap_patterns.txt"):
    f = open(filepath, "r")
    patterns = []
    entpairs = []
    for l in f:
        label, pattern, entpair = l.strip().replace("    ", "\t").split("\t")
        patterns.append(pattern)
        entpair = entpair.strip("['").strip("']").split("', '")
        entpairs.append(entpair)
    return patterns, entpairs

training_patterns, training_entpairs = readLabelledPatternData()
print("Training patterns and entity pairs for relation 'method used for task'")
[(tr_a, tr_e) for (tr_a, tr_e) in zip(training_patterns[:5], training_entpairs[:5])]

Training patterns and entity pairs for relation 'method used for task'


[('demonstrates XXXXX and clustering techniques for XXXXX',
  ['text mining', 'building domain ontology']),
 ('demonstrates text mining and XXXXX for building XXXXX',
  ['clustering techniques', 'domain ontology']),
 ('the XXXXX is able to enhance the XXXXX',
  ['ensemble classifier', 'detection of construction materials']),
 ('we propose a fully XXXXX for 3d XXXXX of buildings',
  ['autonomous system', 'thermal modeling']),
 ('this paper proposes two XXXXX to solve a XXXXX',
  ['optimization models', 'dynamic supply chain issue'])]

The patterns are currently sentences where the entity pairs where blanked with the placeholder 'XXXXX'.
Note that for the training data, we also have labels. However, because we only have positive instances 
and only for one relation ('method used for task'), we do not differentiate between them. 
We read test data in the same way, i.e.

In [108]:
def readPatternData(filepath="../data/ie/ie_patterns.txt"):
    f = open(filepath, "r")
    patterns = []
    entpairs = []
    for l in f:
        pattern, entpair = l.strip().replace("    ", "\t").split("\t")
        patterns.append(pattern)
        entpair = entpair.strip("['").strip("']").split("', '")
        entpairs.append(entpair)
    return patterns, entpairs

testing_patterns, testing_entpairs = readPatternData()
print("Testing patterns and entity pairs")
[(tr_a, tr_e) for (tr_a, tr_e) in zip(testing_patterns[0:5], testing_entpairs[:5])]

Testing patterns and entity pairs


[('a method for estimation of XXXXX of XXXXX is presented',
  ['effective properties', 'porous materials']),
 ('accounting for XXXXX is essential for estimation of XXXXX',
  ['nonlinear effects', 'effective properties']),
 ('develops the heterogeneous XXXXX for fiber-reinforced XXXXX',
  ['feature model', 'object modeling']),
 ('two formulations for the problem of optimum XXXXX of onshore XXXXX',
  ['layout design', 'wind farms']),
 ('boundary-value and initial-value XXXXX are solved using XXXXX and graph products',
  ['differential equations', 'finite difference method'])]

For the testing data, we do not know the relations for the instances. We build a scoring model to determine which of the testing instances are examples for the relation 'method used for task' and which ones are not. (Thought exercise: how do we even know or can we ensure that even any of the instances here are examples of the relation in question?)

A pattern scoring model \\(s_{\params}(\x,y)\\) only has one parameter and assignes scores to each relation label \\(y\\) proportional to the matches with the set of textual patterns. The final label assigned to each instance is then the one with the highest score.
Here, our pattern scoring model is even simpler since we only have patterns for one relation. Hence the final label assigned to each instance is 'method used for task' if there is a match with a pattern, and 'NONE' if there is no match.

Let's have a closer look at how pattern matching works now. Recall that the original patterns in the training data are sentences where the entity pairs are blanked with 'XXXXX'.

We could use those patterns to find new sentences. However, we are not likely to find many since the patterns are very specific. Hence, we need to generalise those patterns to less specific ones. A simple way is to define the sequence of words between each entity pair as a pattern, like so:

In [109]:
def sentenceToShortPath(sent):
    """
    Returns the path between two arguments in a sentence, where the arguments have been masked
    Args:
        sent: the sentence
    Returns:
        the path between to arguments
    """
    sent_toks = sent.split(" ")
    indeces = [i for i, ltr in enumerate(sent_toks) if ltr == "XXXXX"]
    pattern = " ".join(sent_toks[indeces[0]+1:indeces[1]])
    return pattern

print(training_patterns[0])
sentenceToShortPath(training_patterns[0])

demonstrates XXXXX and clustering techniques for XXXXX


'and clustering techniques for'

There are many different alternatives to this. (Thought exercise: what are better ways of generalising patterns?)

After the sentences shortening / pattern generalisation is defined, we can then apply those patterns to testing instances to classify them into 'method used for task' and 'NONE'. In practice, the code below returns only the instances which contain a 'method used for task' pattern. 

In [110]:
def patternExtraction(training_sentences, testing_sentences):
    """
    Given a set of patterns for a relation, searches for those patterns in other sentences
    Args:
        sent: training sentences with arguments masked, testing sentences with arguments masked
    Returns:
        the testing sentences which the training patterns appeared in
    """
    # convert training and testing sentences to short paths to obtain patterns
    training_patterns = set([sentenceToShortPath(test_sent) for test_sent in training_sentences])
    testing_patterns = [sentenceToShortPath(test_sent) for test_sent in testing_sentences]
    # look for training patterns in testing patterns
    testing_extractions = []
    for i, testing_pattern in enumerate(testing_patterns):
        if testing_pattern in training_patterns: # look for exact matches of patterns
            testing_extractions.append(testing_sentences[i])
    return testing_extractions

patternExtraction(training_patterns[:500], testing_patterns[:500])

['paper reviews applications of XXXXX in XXXXX',
 'a novel approach was developed to determine the XXXXX in XXXXX',
 'four different types of insoles were examined in terms of their effects on XXXXX in XXXXX',
 'the findings can aid in better understanding the insole design features that could improve XXXXX in XXXXX',
 'this new approach provides more degrees of freedom and XXXXX in XXXXX',
 'an application of fclarans for attribute clustering and XXXXX in XXXXX has been demonstrated',
 'further work can explore alternative approaches to avoid nulling weights of the criteria and XXXXX in XXXXX',
 'the problem of finding the expected XXXXX in XXXXX has numerous applications']

(Exercise: introduce patterns for other relations here and amend the scoring function in the Python code. Note that it is also possible to have 'NONE' patterns for 'no relation' between the entities.

One of the shortcomings of this pattern-based approach is that the set of patterns has to be defined manually and the model does not learn new patterns. We will next look at an approach which addresses those two shortcomings.



## Bootstrapping

Bootstrapping relation extraction models take the same input as pattern-based approaches, i.e. a set of entity pairs and patterns. The overall idea is to extract more patterns and entity pairs iteratively. For this, we need two helper methods: one method that generalises from entity pairs to extract more patterns and entity pairs, and another one that generalises from patterns to extract more patterns and entity pairs.

<!--Bootstrapping relation extraction models still take as input a set of entity pairs and patterns, same as pattern-based relation extraction approaches, but they aim at discovering new patterns.
Algo:
- Input: set of relation types \\(\Ys\\), set of seed entity pairs \\(\Es\\), set of seed patterns for each relation (\Ps\\), set of sentences \\(\Xs\\)
- For each iteration
    - Patterns P*
    - Entity pairs E*
    - For each sentence:
        - if it contains a seed entity pair e:
            - add the path between the entity pairs to P* as a new pattern
        - if it contains a seed pattern p:
            - identify an entity pair in the sentence and add it to E*
    - P <- P + generalise(P*)
    - E <- E + generalise(E*)
We can examine the output of the model at each iteration-->


In [111]:
def searchForPatternsAndEntpairsByPatterns(training_patterns, testing_patterns, testing_entpairs, testing_sentences):
    testing_extractions = []
    appearing_testing_patterns = []
    appearing_testing_entpairs = []
    for i, testing_pattern in enumerate(testing_patterns):
        if testing_pattern in training_patterns: # if there is an exact match of a pattern
            testing_extractions.append(testing_sentences[i])
            appearing_testing_patterns.append(testing_pattern)
            appearing_testing_entpairs.append(testing_entpairs[i])
    return testing_extractions, appearing_testing_patterns, appearing_testing_entpairs


def searchForPatternsAndEntpairsByEntpairs(training_entpairs, testing_patterns, testing_entpairs, testing_sentences):
    testing_extractions = []
    appearing_testing_patterns = []
    appearing_testing_entpairs = []
    for i, testing_entpair in enumerate(testing_entpairs):
        if testing_entpair in training_entpairs: # if there is an exact match of an entity pair
            testing_extractions.append(testing_sentences[i])
            appearing_testing_entpairs.append(testing_entpair)
            appearing_testing_patterns.append(testing_patterns[i])
    return testing_extractions, appearing_testing_patterns, appearing_testing_entpairs

Those two helper functions are then applied iteratively:

In [112]:
def bootstrappingExtraction(train_sents, train_entpairs, test_sents, test_entpairs, num_iter):
    """
    Given a set of patterns and entity pairs for a relation, extracts more patterns and entity pairs iteratively
    Args:
        train_sents: training sentences with arguments masked
        train_entpairs: training entity pairs
        test_sents: testing sentences with arguments masked
        test_entpairs: testing entity pairs
    Returns:
        the testing sentences which the training patterns or any of the inferred patterns appeared in
    """

    # convert training and testing sentences to short paths to obtain patterns
    train_patterns = set([sentenceToShortPath(test_sent) for test_sent in train_sents])
    test_patterns = [sentenceToShortPath(test_sent) for test_sent in test_sents]
    test_extracts = []

    # iteratively get more patterns and entity pairs
    for i in range(1, num_iter):
        print("Number extractions at iteration", str(i), ":", str(len(test_extracts)))
        print("Number patterns at iteration", str(i), ":", str(len(train_patterns)))
        print("Number entpairs at iteration", str(i), ":", str(len(train_entpairs)))
        # get more patterns and entity pairs
        test_extracts_p, ext_test_patterns_p, ext_test_entpairs_p = searchForPatternsAndEntpairsByPatterns(train_patterns, test_patterns, test_entpairs, test_sents)
        test_extracts_e, ext_test_patterns_e, ext_test_entpairs_e = searchForPatternsAndEntpairsByEntpairs(train_entpairs, test_patterns, test_entpairs, test_sents)
        # add them to the existing entity pairs for the next iteration
        train_patterns.update(ext_test_patterns_p)
        train_patterns.update(ext_test_patterns_e)
        train_entpairs.extend(ext_test_entpairs_p)
        train_entpairs.extend(ext_test_entpairs_e)
        test_extracts.extend(test_extracts_p)
        test_extracts.extend(test_extracts_e)

    return test_extracts

test_extracts = bootstrappingExtraction(training_patterns, training_entpairs, testing_patterns, testing_entpairs, num_iter=6)

Number extractions at iteration 1 : 0
Number patterns at iteration 1 : 20
Number entpairs at iteration 1 : 22
Number extractions at iteration 2 : 79
Number patterns at iteration 2 : 20
Number entpairs at iteration 2 : 101
Number extractions at iteration 3 : 242
Number patterns at iteration 3 : 25
Number entpairs at iteration 3 : 264
Number extractions at iteration 4 : 410
Number patterns at iteration 4 : 25
Number entpairs at iteration 4 : 432
Number extractions at iteration 5 : 578
Number patterns at iteration 5 : 25
Number entpairs at iteration 5 : 600


One of the things that is noticable is that with each iteration, the number of extractions we find increases, but they are less correct.

In [113]:
print(test_extracts[0:3])
print(test_extracts[-4:-1])

['paper reviews applications of XXXXX in XXXXX', 'a novel approach was developed to determine the XXXXX in XXXXX', 'four different types of insoles were examined in terms of their effects on XXXXX in XXXXX']
['we first analyze the shortages of the existing XXXXX in XXXXX', 'we apply XXXXX in XXXXX to make the best use of historical driving data', 'a new total XXXXX in XXXXX is proposed']


One of the reasons is that the semantics of the pattern shifts, so here we try to find new patterns for 'method used for task', but because the instances share a similar context with other relations, the patterns and entity pairs iteratively move away from the 'method used in task' relation. Another example in a different domain are the 'student-at' and 'lecturere-at' relations, that have many overlapping contexts.
One way of improving this is with confidence values for each entity pair and pattern. For example, we might want to avoid patterns which are too general and penalise them.

In [114]:
from collections import Counter
te_cnt = Counter()
for te in test_extracts:
    te_cnt[sentenceToShortPath(te)] += 1
print(te_cnt)

Counter({'in': 693, 'to solve a': 9, 'is proposed to solve the': 9, 'is firstly introduced in': 7, 'is introduced in': 7, 'is proposed to plan and execute task in': 7, 'is higher in': 7, 'and finally to illustrate the applicability of the proposed method , a': 7})


Above, we see that the 'in' pattern was found, which maches many contexts that are not 'method used for task'. (Exercise: implement a confidence weighting for patterns.)


## Supervised Relation Extraction
A different way of assigning a relation label to new instances is to follow the supervised learning paradigm, which we have already seen for other structured prediction tasks. For supervised relation extraction, the scoring model \\(s_{\params}(\x,y)\\) is estimated automatically based on training sentences $\mathcal{X}$ and their labels $\mathcal{Y}$.
For the model, we can use range of different classifiers, e.g. a logistic regression model or an SVM. At testing time, the predict label for each testing instance is the highest-scoring one, i.e. $$ \y^* = \argmax_{\y\in\Ys} s(\x,\y) $$

First, we read in the training data, consisting again of patterns, entity pairs and labels. This time, the given labels for the training instances are 'method used for task' or 'NONE', i.e. we have positive and negative training data.

In [115]:
def readLabelledData(filepath="../data/ie/ie_training_data.txt"):
    f = open(filepath, "r")
    patterns = []
    entpairs = []
    labels = []
    for l in f:
        label, pattern, entpair = l.strip().replace("    ", "\t").split("\t")
        labels.append(label)
        patterns.append(pattern)
        entpair = entpair.strip("['").strip("']").split("', '")
        entpairs.append(entpair)
    return patterns, entpairs, labels

training_sents, training_entpairs, training_labels = readLabelledData()
[(tr_s, tr_e, tr_l) for (tr_s, tr_e, tr_l) in zip(training_sents[:5], training_entpairs[:5], training_labels[:5])]

[('demonstrates XXXXX and clustering techniques for XXXXX',
  ['text mining', 'building domain ontology'],
  'method used for task'),
 ('demonstrates text mining and XXXXX for building XXXXX',
  ['clustering techniques', 'domain ontology'],
  'method used for task'),
 ('the XXXXX is able to enhance the XXXXX',
  ['ensemble classifier', 'detection of construction materials'],
  'method used for task'),
 ('we propose a fully XXXXX for 3d XXXXX of buildings',
  ['autonomous system', 'thermal modeling'],
  'method used for task'),
 ('this paper proposes two XXXXX to solve a XXXXX',
  ['optimization models', 'dynamic supply chain issue'],
  'method used for task')]

Next, we define how to transform training and testing data to features. 
Features for the model are typically extracted from the shortest dependency path between two entities. Basic features are n-gram features, or they can be based on the syntactic structure of the input, i.e. the dependency path ([parsing](statnlpbook/chapters/parsing))
Note that here we assume again that entity pairs are part of the input, i.e. we assume the named entity recognition problem to be solved as part of the preprocessing of the data. In reality, named entities have to be recognised first.

Here, we use sklearn's built-in feature extractor which transforms sentences to n-grams with counts of their appearances. (Exercise: use dependency parsing features instead of bag of n-gram features.)



In [116]:
from sklearn.feature_extraction.text import CountVectorizer

def featTransform(sents_train, sents_test):
    cv = CountVectorizer()
    cv.fit(sents_train)
    print(cv.get_params())
    features_train = cv.transform(sents_train)
    features_test = cv.transform(sents_test)
    return features_train, features_test, cv

We define a model, again with sklearn, using one of their built-in classifiers and a prediction function.

In [117]:
from sklearn.linear_model import LogisticRegression

def model_train(feats_train, labels):
    model = LogisticRegression(penalty='l2')  # logistic regression model with l2 regularisation
    model.fit(feats_train, labels) # fit the model to the transformed training data
    return model

def predict(model, features_test):
    """Find the most compatible output class"""
    preds = model.predict(features_test) # this returns the predicted labels
    #preds_prob = model.predict_proba(features_test)  # this returns probablities instead of labels
    return preds

We further define a helper function for debugging that determines the most useful features learned by the model:

In [118]:
def show_most_informative_features(vectorizer, clf, n=20):
    feature_names = vectorizer.get_feature_names()
    coefs_with_fns = sorted(zip(clf.coef_[0], feature_names))
    top = zip(coefs_with_fns[:n], coefs_with_fns[:-(n + 1):-1])
    for (coef_1, fn_1), (coef_2, fn_2) in top:
        print("\t%.4f\t%-15s\t\t%.4f\t%-15s" % (coef_1, fn_1, coef_2, fn_2))

Supervised relation extraction algorithm:

<!--Algo:
 Transform to Python code 
- Input: set of training sentences \\(\Xs\\) annotated with entity pairs \\(\Es\\) and relation types \\(\Ys\\) 
- features <- your_favourite_feature_extractor(training_sentences)
- model <- train_model(features, labels)
- predictions_test <- model(testing_sentences) -->


In [119]:
def supervisedExtraction(train_sents, train_entpairs, train_labels, test_sents, test_entpairs):
    """
    Given pos/neg training instances, train a logistic regression model with simple BOW features and predict labels on unseen test instances
    Args:
        train_sents: training sentences with arguments masked
        train_entpairs: training entity pairs
        train_labels: labels of training instances
        test_sents: testing sentences with arguments masked
        test_entpairs: testing entity pairs
    Returns:
        predictions for the testing sentences
    """

    # convert training and testing sentences to short paths to obtain patterns
    train_patterns = [sentenceToShortPath(test_sent) for test_sent in train_sents]
    test_patterns = [sentenceToShortPath(test_sent) for test_sent in test_sents]

    # extract features
    features_train, features_test, cv = featTransform(train_patterns, test_patterns)

    # train model
    model = model_train(features_train, train_labels)

    # show most common features
    show_most_informative_features(cv, model)

    # get predictions
    predictions = predict(model, features_test)

    # show the predictions
    for pair in zip(predictions, test_sents):
        print(pair)

    return predictions

supervisedExtraction(training_sents, training_entpairs, training_labels, testing_patterns, testing_entpairs)

{'analyzer': 'word', 'vocabulary': None, 'binary': False, 'encoding': 'utf-8', 'input': 'content', 'preprocessor': None, 'min_df': 1, 'max_features': None, 'dtype': <class 'numpy.int64'>, 'tokenizer': None, 'lowercase': True, 'token_pattern': '(?u)\\b\\w\\w+\\b', 'ngram_range': (1, 1), 'max_df': 1.0, 'strip_accents': None, 'stop_words': None, 'decode_error': 'strict'}
	-0.9520	of             		1.0542	is             
	-0.4352	specified      		0.9787	to             
	-0.4352	using          		0.8733	for            
	-0.4313	ann            		0.4851	and            
	-0.4313	find           		0.4785	solved         
	-0.3274	decreases      		0.4785	assists        
	-0.3258	that           		0.4309	are            
	-0.3181	allowing       		0.4151	solve          
	-0.3181	except         		0.4081	on             
	-0.3074	as             		0.4081	application    
	-0.2935	in             		0.3915	more           
	-0.2892	introduced     		0.3915	capable        
	-0.2892	unified        		0.3526	presente

array(['NONE', 'method used for task', 'method used for task', ...,
       'method used for task', 'NONE', 'method used for task'], 
      dtype='<U20')

## Distant Supervision
Supervised learning typically requires large amounts of hand-labelled training examples. Since it is time-consuming and expensive to manually label examples, it is desirable to find ways of automatically or semi-automatically producing more training data. We have already seen one example of this, bootstrapping.
Although bootstrapping can be useful, one of the downsides already discussed above is semantic drift due to the iterative nature of finding good entity pairs and patterns. 
An alternative approach to this is to distant supervision. Here, we still have a set of entity pairs $\mathcal{E}$, their relation types $\mathcal{Y}$ and a set of sentences $\mathcal{X}$ as an input, but we do not require pre-defined patterns. Instead, a large number of such entity pairs and relations are obtained from a knowledge base, e.g.
...
These entity pairs and relations are then used to automatically label all sentences with relations if there exists an entity pair between which this relation holds according to the knowledge base. After sentences are labelled in this way, the rest of the algorithm is the same as for supervised relation extraction.


Algo:
<!-- Transform to Python code -->
- training_sentences <- Find training sentences with entity pairs
- SUPERVISED_RE()


## Universal Schema
<!-- Expand on this -->
Recall that for the pattern-based and bootstrapping approaches earlier, we were looking for simplified paths between entity pairs expressing a certain relation which we defined beforehand. This restricts the relation extraction problem to known relation types \\(\Ys\\). In order to overcome that limitation, we could have defined new relations on the spot and added them to \\(\Ys\\) by introducing new relation types for certain simplified paths between entity pairs.

The goal of universal schemas is to overcome the limitation of having to pre-define relations, but within the supervised learning paradigm. This is possible by thinking of paths between entity pairs as relation expressions themselves. Simplified paths between entity pairs and relation labels are no longer considered separately, but instead the paths between entity pairs and relations is modelled in the same space.

Classification model vs Universal Schema model

<!-- Show example -->


## Background
Jurafky, Dan & Martin, James H. (2016). Speech and Language Processing, Chapter 21 (Information Extraction): https://web.stanford.edu/~jurafsky/slp3/21.pdf