CoRR(J)-2017-A Survey of Deep Learning Methods for Relation Extraction #10

BrambleXu · 2019-02-13T13:03:49Z

总结

数据

监督式训练

ACE 2005
SemEval-2010 Task 8 dataset

Distant supervision（远程监控）

aligning Freebase relations with the New York Times corpus

Basic Concepts

Word Embeddings
Positional Embeddings: The idea is that words closer to the target entities usually contain more useful information regarding the relation class. 离实体越近的单词，包含关于relation的信息也多。所以通过embedding来获取这些信息。
Convolutional Neural Networks: capture ngram level features

Supervised learning with CNNs

早期DL在RE方面的研究就是把RE当做了一个多分类问题。

4.1 Simple CNN model (Liu et al., 2013)
- 没有使用word embedidng。在ACE上币当时的SOTA kernel-based模型效果好了9个百分点。
4.2 CNN model with max-pooling (Zeng et al., 2014)
- 使用了word embedding，以及positional mebdding，也用了lexical level features。最主要的贡献是在卷积层之后使用了max-pooling layer。
4.3 CNN with multi-sized window kernels (Nguyen and Grishman, 2015)
- 完全去除了lexical level features。只让CNN自己去学特征。基本和上面4.2的一样，只不过使用了不同size的window来捕捉不同的n-gram情报

Multi-instance learning models with distant supervision

把问题变为Multi-instance learning问题，这样我们可以通过远程监督来构建更大的训练集。Multi-instance learning是distant supervision的一种，含义是一个label有一群instance，而不是单单的一个instance。

在RE这个问题上，每个entity pair定义一个bag，这个bag包含涉及到entity pair的所有句子。然后我们把一个relation label标记给整个bag，而不是单单一个instance。

5.1 Piecewise Convolutional Neural Networks (Zeng et al., 2015)
- 构建一个神经网络模型，从distant supervision data中构建一个relation extractor。网络模型和上面4.3，4.2的差不多，但是一个最大的贡献是使用了piecewise max-pooling across the sentence. 之前4.3的模型是在整个句子上做了max-pooling，这样会浪费大量信息。因为有两个entity，所以一个句子可以分为3个segment，然后分别在这3个segment上做max-pooling，这样能保留一些有用的信息。
- 但是这个模型有缺点。根据损失函数的设计，训练和预测的时候只是从bag里调出一个最能代表整个bag的document，没有好好把整个distant supervision data利用起来。
- 效果方面PCNN确实比CNN好
5.2 Selective Attention over Instances (Lin et al., 2016)
- 为了解决5.1中只使用最相关document的问题，这篇论文通过attention机制对一个bag的所有document进行处理。Then the ﬁnal vector representation for the bag of sentences is found by taking an attention-weighted average of all the sentence vectors (r i j , j = 1, 2...q i ) in the bag.
效果币PCNN好
5.3 Multi-instance Multi-label CNNs (Jiang et al., 2016)
- 解决了5.1的信息丢失问题 by using a crossdocument max-pooling layer. 对bag中的每一个句子做一个向量表示。Then the ﬁnal vector representation for the bag of sentences is found by taking a dimension wise max of the sentence vectors (r i j , j = 1, 2...q i ). 最终的bag vector是针对每个维度，从所有的句子向量的相同维度中，找出数字最大的那个。
- 还解决了多标签的问题。一个entity pair有多个relaiton。具体做法是把最后一层的softmax换位sigmoid。

Results

深度模型普遍比不深的好。attention + PCNN是效果最好的。奇怪的是没有LSTM在RE方面的工作。

下一篇论文

Relation Extraction : A Survey

BrambleXu added the RE(T) Relation Extraction Task label Feb 13, 2019

BrambleXu mentioned this issue Feb 17, 2019

arXiv-2017-Relation Extraction: A Survey #12

Open

BrambleXu changed the title ~~A Survey of Deep Learning Methods for Relation Extraction-2017~~ A Survey of Deep Learning Methods for Relation Extraction (2017) Feb 21, 2019

BrambleXu self-assigned this Feb 22, 2019

BrambleXu added the Survey Survey/Review label Mar 18, 2019

BrambleXu changed the title ~~A Survey of Deep Learning Methods for Relation Extraction (2017)~~ CoRR(J)-2017-A Survey of Deep Learning Methods for Relation Extraction Apr 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CoRR(J)-2017-A Survey of Deep Learning Methods for Relation Extraction #10

CoRR(J)-2017-A Survey of Deep Learning Methods for Relation Extraction #10

BrambleXu commented Feb 13, 2019 •

edited

CoRR(J)-2017-A Survey of Deep Learning Methods for Relation Extraction #10

CoRR(J)-2017-A Survey of Deep Learning Methods for Relation Extraction #10

Comments

BrambleXu commented Feb 13, 2019 • edited

总结

数据

Basic Concepts

Supervised learning with CNNs

Multi-instance learning models with distant supervision

Results

下一篇论文

BrambleXu commented Feb 13, 2019 •

edited