KnowledgeGraph_Builder is an end-to-end pipeline for automatically constructing knowledge graphs from unstructured text in the form of RDF triples without supervision. It utilizes technologies such as spaCy, Stanford CoreNLP, Neural Coref from Huggingface, and sentence-transformers to perform Named Entity Recognition, Coreference Resolution, Relation Extraction, and Entity-Linking. The goal here is to offer a novel knowledge representation method for applications in automated ontology construction and taxonomy expansion. The intuition here is that by extracting knowledge from text sources, this project will aid in solving the low coverage issue often faced with hand crafted knowledge bases.
Input Text:
Darth Vader, also known by his birth name Anakin Skywalker, is a fictional character in the Star Wars franchise. Darth Vader appears in the original film trilogy as a pivotal antagonist whose actions drive the plot, while his past as Anakin Skywalker and the story of his corruption are central to the narrative of the prequel trilogy. The character was created by George Lucas and has been portrayed by numerous actors. His appearances span the first six Star Wars films, as well as Rogue One, and his character is heavily referenced in Star Wars: The Force Awakens. He is also an important character in the Star Wars expanded universe of television series, video games, novels, literature and comic books. Originally a Jedi who was prophesied to bring balance to the Force, he falls to the dark side of the Force and serves the evil Galactic Empire at the right hand of his Sith master, Emperor Palpatine (also known as Darth Sidious).
python >=3.7
spacy
pandas
stanford-corenlp
json
nltk
miniconda
- Create new conda environment:
conda env create -f environment.yml
- Activate environment:
conda activate entitylink
- Download CoreNLP 4.2.0 and place in root folder of this repo:
https://stanfordnlp.github.io/CoreNLP/download.html
- Add text of your choice in input.txt
- Run
python text_to_graph.py input.txt
- Output will be
output_processed.csv
, containing triples of (entity1, relation, entity2)\
An example of an input and output is included in the examples
folder
Example notebooks can be found under the notebooks
folder which contain
- NER_Evaluation.ipynb
- CoRef_Evaluation.ipynb
- Relation_Extraction.ipynb
- EntityLinking_Evaluation.ipynb
- Notebook_Text_to_Graph_Pipeline.ipynb
- Notebook_SpaCy_Parsing_OpenIE_BERT_Evaluation.ipynb
For more information on this project, check out the Wiki.