(Book Chapter)Deep Learning in Knowledge Graph #31

BrambleXu · 2019-03-14T02:04:45Z

Resource

Knowledge Representation Learning (KRL) / Knowledge Embedding (KE)

Knowledge Graph Embedding: A Survey of Approaches and Applications (2017). Quan Wang, et al. [PDF]
KRLPapers: Must-read papers on knowledge representation learning (KRL) / knowledge embedding (KE). [Link]

Deep Learning in Knowledge Graph

This is the note of Chapter 5 from Deep Learning in Natural Language Processing

5.1 Introduction

5.1.1 Basic Concepts

entity
relation

5.1.2 Typical Knowledge Graphs

Freebase
DPedia
Wikidata
YAGO
HowNet

5.2 Knowledge Representation Learning

Goal: Embedding the entities and relations in KG.

Recent studies reveal that translation-based representation learning methods are efﬁcient and effective to encode relational facts in KG with low-dimensional representations of both entities and relations, which can alleviate the issue of data sparsity and be further employed to knowledge acquisition, fusion, and inference

Translation-based representation learning methods:

TransE
TransH
TransR
TransD
TranSparse
TransG
KG2E
ManifoldE

TransE只考虑direct relations between entities.于是有了下面考虑不同relation path的方法

Relation-path-based methods:

Path-based TransE
relational paths in KG
relational path learning in the KG-based QA

上面这些值考虑structure information in KG, 忽视了rich multisource information such as textual information, type information, and visual information.

For textural information
- Wang et al. (2014a) and Zhong et al. (2015) propose to jointly embed both entities and words into a uniﬁed semantic space by aligning them with entity names, descriptions, and Wikipedia anchors.
- Further, Xie et al. (2016b) propose to learn entity representations based on their descriptions with CBOW or CNN encoders.
For type information
- Krompaß et al. (2015) take type information as constraints of head and tail entity set for each relation to distinguish entities which belong to the same types. Instead of merely considering type information as type constraints,
- Xie et al. (2016c) utilize hierarchical type structures to enhance TransR via guiding the construction of projection matrices.
For visual information
- Xie et al. (2016a) propose image-embodied knowledge representation learning to take visual information into consideration via learning entity representations using their corresponding ﬁgures.

5.3 Neural Relation Extraction

Goal: automatically ﬁnding unknown relational facts

Relation extraction (RE): Relation extraction aims at extracting relational data from plaintexts. In recent years, as the development of deep learning (Bengio 2009) techniques, neural relation extraction adopts an end-to-end neural network to model the relation extraction task.

The framework of neural relation extraction includes a sentence encoder to capture the semantic meaning of the input sentence and represents it as a sentence vector, and a relation extractor to generate the probability distribution of extracted relations according to sentence vectors.

Neural relation extraction (NRE) has two main tasks including sentence-level NRE and document-level NRE

5.3.1 Sentence-Level NRE

Sentence-level NRE aims at predicting the semantic relations between the entity (or nominal) pair in a sentence.

三个部分

input encoder来表示输入的单词
sentence encoder，用向量来表示句子
relation classifier，计算所有relation的条件概率

5.3.1.1 Input Encoder

这里介绍了四种embedding：

Word embeddings
Position embeddings
- specify the position information of the word with respect to two corresponding entities in the sentence
- each word w i is encoded by two position vectors with respect to the relative distances from the word to two target entities, respectively. For example, in the sentence New York is a city of United States, the relative distance from the word city to New York is 3 and United States is −2.
Part-of-speech tag embeddings
- represent the lexical information of target word in the sentence.
- 加入这种embedding的原因是，word embedding是在大规模的语料上训练得来的，对于个别句子的含义考虑得比较少，所以要添加每一个单词的与发信息，比如noun, verb之类的。
WordNet hypernym embeddings
- take advantages of the prior knowledge of hypernym to contribute to relation extraction。利用hypernym（上位词）的前验知识来robust关系抽取。下面是一个上位词的例子。dog的上位词是犬科，家养动物。

>>> dog = wn.synset('dog.n.01')
>>> dog.hypernyms()
[Synset('canine.n.02'), Synset('domestic_animal.n.01')]

5.3.1.2 Sentence Encoder

sentence encoder负责把输入的embedding变为一个向量来表示一个句子。

Convolution neural network encoder
- extracts local feature by a convolution layer and combines all local features via a max-pooling operation to obtain a ﬁxed-sized vector for the input sentence
Recurrent neural network encoder
- learn the temporal features
Recursive neural network encoder
- extract features from the information of syntactic parsing tree structure because the syntactic information is important for extracting relations from sentences.

5.3.1.3 Relation Classiﬁer

5.3.2 Document-Level NRE

上面的一些方法总是受限于不足的训练样本。为了解决这个问题，研究者提出了distant supervision假设，通过KG来自动生成训练样本。

The intuition of distant supervision assumption is that all sentences that contain two entities will express their relations in KGs. For example, (New York, city of, United States) is a relational fact in KGs. Distant supervision assumption will regard all sentences that contain these two entities as valid instances for relation city of. It offers a natural way to utilize information from multiple sentences (document-level) rather than single sentence (sentence-level) to decide if a relation holds between two entities.

Therefore, document-level NRE aims to predict the semantic relations between an entity pair using all involved sentences. 其实就是multi-instance learning的另一种说法。

Document-Level NRE有四个部分：

input encoder
sentence encoder
document encoder：这个很重要。计算一个向量来表示所有相关的sentences
relation classifier: 和sentence-level里不同的是，这里输入的是docuemnt vector 而不是 sentence vector

5.3.2.1 Document Encoder

The document encodes all sentence vectors into either single vector S.

Random Encoder
- It simply assumes that each sentence can express the relation between two target entities and randomly select one sentence to represent the document. 从众多的instance当中选出一个来，代表表达了relation的句子

Max Encoder
- 上面的假设太简单了，万一挑选出来的句子无法表达relation怎么办（比如对于同样的entity pair，relation不止一种，或者虽然句子里出现了entity pair，但是压根就没有relation）。For example, the sentence “New York City is the premier gateway for legal immigration to the United States” does not express the relation city_of.
- 为了解决这个问题，提出了另一种假设，at-least-one assumption which assumes that at least one sentence that contains these two target entities can express their relations
- 这样的话问题就变成了，从所有的instance中，寻找那个最能表达relation关系的句子

Average Encoder
- 上面的两种方法都是只选取一个sentence来表达relation，忽视了其他的句子（虽然有些句子比较nosiy）。
- 于是Lin提出了假设，即每个句子都包含了relation，所以把所有sentence vector相加取平均

Attentive Encoder
- 上面的Average Encoder肯定还是会受wrong label issue（有些句子压根不表达关系）。为了解决这个问题，Lin（上面的同一作者）提出了利用attention来降低noisy sentence的权重。

5.3.2.2 Relation Classiﬁer

5.4 Bridging Knowledge with Text: Entity Linking

实体链接就是研究如何将指代词链接到知识库。比如Jobs leaves Apple这个句子，我们的KB里已经有Steve Jobs这个实体了，如果把"Jobs"链接到"Steve Jobs"，其实也是在消除歧义。

The main challenges for entity linking are the name ambiguity problem(实体歧义) and the name variation problem(实体变化).

The name ambiguity problem is related to the fact that a name may refer to different entities in different contexts. For example, the name Apple can refer to more than 20 entities in Wikipedia, such as fruit Apple, the IT company Apple Inc., and the Apple Bank.
The name variation problem means that an entity can be mentioned in different ways, such as its full name, aliases, acronyms, and misspellings. For example, the IBM company can be mentioned using more than 10 names, such as IBM, International Business Machine, and its nickname Big Blue

5.4.1 The Entity Linking Framework

这部分是传统的EL流程，没有涉及深度学习。

Given a document d and a Knowledge Graph K B, an entity linking system links the name mentions in the document as follows. 下面分几个步骤

Name Mention Identiﬁcation
- In this step, all name mentions in a document will be identiﬁed for entity linking. 从文本中识别所有的实体。现阶段这种识别有两种方法，named entity recognition (NER) 和 dictionary-based matching
  - NER：recognize names of Person, Location, and Organization in a document. The main drawback of NER technique is that it can only identify limited types of entities, while ignores many commonly used entities such as Music, Film, and Book.
  - dictionary-based matching: ﬁrst constructs a name dictionary for all entities in a Knowledge Graph and then all names matched in a document will be used as name mentions. The main drawback of dictionary-based matching is that it may match many noisy name mentions, e.g., even the stop words is and an are used as entity names in Wikipedia.
Candidate Entity Selection
- In this step, an EL system selects candidate entities for each name mention detected in Step 1. For example, a system may identify {Apple(fruit), Apple Inc., Apple Bank} as the possible referents for name Apple. 这一部分是在KB方面进行的，我们已经在文本中找到了实体，接下来在KB方面找到关于Apple的所有实体(name variation prpblem)。
- Due to the name variation problem, most EL systems rely on a reference table for candidate entity selection.
Local Compatibility Computation
- Given a name mention m in document d and its candidate referent entities E = {e 1 , e 2 , . . . , e n }, a critical step of EL systems is to compute the local compatibility sim(m, e) between mention m and entity e. 我们从文档d里得到了m个实体（第一步），并从KB中得到了候选的实体列表E（第二部分），接下来我们要计算二者之间的相似度，来判断文档中的实体，指的是KB中的哪一个实体。

Global Inference
- 假设：underlying assumption of global inference is the topic coherence assumption, i.e., all entities in a document should semantically related to the document’s main topics.
- 这部分的笔记就不写了

5.4.2 Deep Learning for Entity Linking

One main problem of EL is the name ambiguity problem, thus, the key challenge is how to compute the compatibility between a name mention and an entity by effectively using contextual evidences.

The name ambiguity problem is related to the fact that a name may refer to different entities in different contexts. For example, the name Apple can refer to more than 20 entities in Wikipedia, such as fruit Apple, the IT company Apple Inc., and the Apple Bank.

现在的EL很大程度上依赖于local compatibility model。即用一些手工制作的特征来表达不同的contextual evidenes. 但是这些feature-engineering-based approaches有缺点：

Feature engineering is labor-intensive, and it is difﬁcult to manually design discriminative features
The contextual evidences for entity linking are usually heterogeneous and may be at different granularities
traditional entity linking methods usually deﬁne the compatibility between a mention and an entity heuristically, which is weak in discovering and capturing all useful factors for entity linking decisions.

为了解决这些缺点，提出了基于DL的方法。

5.4.2.1 Representing Heterogeneous Evidences via Neural Networks

NN的一个强项在于对input能有一个很好的表达，比如词向量。

By encoding all contextual evidences in the continuous vector space which are suitable for entity linking, neural networks avoid the need of designing handcrafted features. In following, we introduce how to represent different types of contextual evidences in detail.

Name Mention Representation

但是这种平均的方法没有考虑到单词的位置关系（这个很重要）。

Local Context Representation
- The local context around a mention provides critical information for entity linking decisions. For example, the context words {tree, deciduous, rose family} in “The apple tree is a deciduous tree in the rose family” provide critical information for linking the name mention apple.

Document Representation.
- the document and the local context of a name mention provide information at different granularities for entity linking. For example, a document usually captures larger topic information than local context. Based on this observation, most entity linking systems treat document and local context as two different evidences, and learn their representations individually. 这里的document指的是通过distant supervision得到的很多文档
- 基于document的模型有两种
  - the convolutional neural network (FrancisLandau et al. 2016; Sun et al. 2015)：即上面Local Context Representation介绍的
  - denoising autoencoder (DA) (Vincent et al. 2008), which seeks to learn a compact document representation which can retain maximum information in original document d。下面有图

Entity Knowledge Representation
- 利用Wikipedia这样的KB来得到context。

5.4.2.2 Modeling Semantic Interactions Between Contextual Evidences

An EL system needs to take all different types of contextual evidences into consideration。如何利用好更多的背景证据。

Generally, two strategies have been used to model the semantic interactions between different contextual evidences:

The ﬁrst is to map different types of contextual evidences to the same continuous feature space via neural networks, and then the semantic interactions between contextual evidences can be captured using the similarities (mostly the cosine similarity) between their representations.
The second is to learn a new representation which can summarize information from different contextual evidences, and then to make entity linking decisions based on the new representation.

5.4.2.3 Learning Local Compatibility Measures

学习局部相容条件，要有一个对应的局部相容指标。

We can see that mention’s evidence and entity’s evidence will be ﬁrst encoded into a continuous feature space using contextual evidence representation neural networks, then compatibility signals between mention and entity will be com-puted using semantic interaction modeling neural networks, and ﬁnally, all these signals will be summarized into the local compatibility score.

The text was updated successfully, but these errors were encountered:

BrambleXu self-assigned this Mar 14, 2019

BrambleXu added KRL/KGE((T/M) Knowledge Representation Learning Task & Knowledge Graph Embedding Method KGP/KGC(T) Knowledge Graph Population/Construction Task NEL(T) Named Entity Linking Task Survey Survey/Review labels Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Book Chapter)Deep Learning in Knowledge Graph #31

(Book Chapter)Deep Learning in Knowledge Graph #31

BrambleXu commented Mar 14, 2019 •

edited

Loading

(Book Chapter)Deep Learning in Knowledge Graph #31

(Book Chapter)Deep Learning in Knowledge Graph #31

Comments

BrambleXu commented Mar 14, 2019 • edited Loading

Resource

Deep Learning in Knowledge Graph

5.1 Introduction

5.1.1 Basic Concepts

5.1.2 Typical Knowledge Graphs

5.2 Knowledge Representation Learning

5.3 Neural Relation Extraction

5.3.1 Sentence-Level NRE

5.3.1.1 Input Encoder

5.3.1.2 Sentence Encoder

5.3.1.3 Relation Classiﬁer

5.3.2 Document-Level NRE

5.3.2.1 Document Encoder

5.3.2.2 Relation Classiﬁer

5.4 Bridging Knowledge with Text: Entity Linking

5.4.1 The Entity Linking Framework

5.4.2 Deep Learning for Entity Linking

5.4.2.1 Representing Heterogeneous Evidences via Neural Networks

5.4.2.2 Modeling Semantic Interactions Between Contextual Evidences

5.4.2.3 Learning Local Compatibility Measures

BrambleXu commented Mar 14, 2019 •

edited

Loading