Knowledge-Graph

Features

Information Extraction
Entity-Chunk Extraction
Predicate Extraion
Dependency Tree
Knowledge Graph Visualization

What's New

Finish show to get result of visualization
Finish get_entity to get chunk information
Finish get_relation to get relation between entities
Sorted up Knowledge graph module
Finished gramma matcher
Finished information extraction

Preparing Dependencies

conda env create -f freeze.yml
python -m spacy download en_core_web_sm

Usage

Get the Entity Chunk. Sample code:

from knowledgeGraph import get_entity		
text = "the milky way has spiral arms" 
output = get_entity(text)
print(output)

output : ('milky way', 'spiral arms')

Get Relation. Sample code:

from knowledgeGraph import get_relation		
text = "the milky way has spiral arms" 
output = get_relation(text)
print(output)

output : 'have'

Visualization. Sample code

from knowledgeGraph import show		
text = "the milky way has spiral arms" 
show("the milky way has spiral arms")

Online Demo

http://www.haoweihohoho.com/KGDemo - For the triplet extraction

Intorduction

A knowledge graph is a structured graph from multiple sources standardized to acquire and integrate human knowledge. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – with free-form semantics(from wiki). Here we demo one way of implementation using triples as our data format. (There are many various ways to implement KG, and this project demonstrates the automatic way based on the result of information extraction).

Background Knowledge

Let's took the most common knowledge graph - Wikidata for example. One way to implement KG is using a concept of triple, which is a statement in "subject/predicate/object" form. A statement linked one entity(subject) to another (object) via a predicate. For example, the milky way has spiral arms. The triple is going to be like this :

In this project, we extract relations from the nlp pipeline and use the example, the milky way has spiral arms, as our demonstration input . If you aren't familiar with this. Maybe the post - A brief introduction: semantics & syntax can provide you a little bit of insight.

NLP Pipeline

In this article, we will use the nlp pipeline to extract relations within an utterance. And we are gonna use spacy as our main tool.
NLP Pipeline - Credict by Spacy

Using a text as input and doc as output, we process it into several steps by leveraging different well-trained models. That model is also known as the NLP processing Pipeline. A typical NLP pipeline including a segmentation (tokenizer), a Part-of-speech tagging (tager), a parser and any entity recognizer. This process can be easy accessed by calling the Spacy library.

import spacy
text = "the milky way has spiral arms"
nlp = spacy.load("en_core_web_sm")
displacy.render(nlp(text), jupyter=True, style = 'dep')

dependency of the example sentence - the milky way has spiral arms

To see all of the dependency/ POS tagging relationship of the sentence - the milky way has spiral arms, we could have the following code to help us determine which part we would like to extract.

                for token in doc:
                    print(token.text, token.pos_, token.dep_)
                   
                 
                # the DET det
                # milky ADJ amod
                # way NOUN nsubj
                # has VERB ROOT
                # spiral ADJ amod
                # arms NOUN dobj

Having this information, we could easily extract a triplet. See all the dependency labels(for English only).

Information extraction / Entity extraction

To build up a knowledge graph, it's important to extract nodes and the relation between them. There are several unsupervised manners to do the information extraction. On syntactic level, we could leverage part-of-Speech (POS) tags to help us extract this information, or, on semantic level, we can use Semantic Role Labeling (SRL) technique to help us extract entities.

In this article, we will focus on syntactic level with POS technique which is one of efficient ways to do it.

However, when an entity could not only just a single word but a chunk - which means multiple words should be put together. In this case, we could leverage a dependency tree and gramma rules to help us to extract chunks as our entities.

Knowledge Graph Visualization

NER visulization
KG visulization - song instance (randomly pick 100 songs from wikidata triplets)

KG visulization (resource: unsupervised information extraction )

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
Img		Img
corpus		corpus
knowledgeGraph		knowledgeGraph
LICENSE		LICENSE
README.md		README.md
freeze.yml		freeze.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowledge-Graph

Features

What's New

Preparing Dependencies

Usage

Online Demo

Intorduction

Background Knowledge

NLP Pipeline

Information extraction / Entity extraction

Knowledge Graph Visualization

About

Releases

Packages

Languages

License

HaoWeiHe/Knowledge-Graph

Folders and files

Latest commit

History

Repository files navigation

Knowledge-Graph

Features

What's New

Preparing Dependencies

Usage

Online Demo

Intorduction

Background Knowledge

NLP Pipeline

Information extraction / Entity extraction

Knowledge Graph Visualization

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages