Knowledge from diverse application domains is organized as knowledge graphs (KGs) that are stored in RDF engines accessible in the web via SPARQL endpoints. Expressing a well-formed SPARQL query requires information about the graph structure and the exact URIs of its components, which is impractical for the average user. Question answering (QA) systems assist by translating natural language questions to SPARQL. Existing QA systems are typically based on application-specific human-curated rules, or require prior information, expensive pre-processing and model adaptation for each targeted KG. Therefore, they are hard to generalize to a broad set of applications and KGs. In this paper, we propose KGQAn, a universal QA system that does not need to be tailored to each target KG. Instead of curated rules, KGQAn introduces a novel formalization of question understanding as a text generation problem to convert a question into an intermediate abstract representation via a neural sequence-to-sequence model. We also develop a just-in-time linker that maps at query time the abstract representation to a SPARQL query for a specific KG, using only the publicly accessible APIs and the existing indices of the RDF store, without requiring any pre-processing. Our experiments with several real KGs demonstrate that KGQAn is easily deployed and outperforms by a large margin the state-of-the-art in terms of quality of answers and processing time, especially for arbitrary KGs, unseen during the training.
This repository is dedicated to our paper titled "A Universal Question-Answering Platform for Knowledge Graphs" published in SIGMOD 2023. The full paper can be accessed here.
This system has been awarded the Artifact Available, Artifact Evaluated, and Artifact Reproducible badges from ACM SIGMOD ARI (Availability & Reproducibility Initiative)
- KGQAn can run in two modes:
- Docker Setup
- Local Setup
To reproduce the values in the paper, you can check
Run KGQAn in Dockerized Environment
- Clone the repo
- Create
kgqan
Conda environment (Python 3.7) and install pip requirements.
conda create --name kgqan python=3.7
conda activate kgqan
pip install -r requirements.txt
- Run the following command to execute the data download script:
- This script will download the trained models and any necessary data for the services.
./data_download.sh local
KGQAn uses a semantic similarity model in two of its phases, a pre-requisite step is to run the server of the word-embedding based similarity service using the following command
conda activate kgqan
python word_embedding/server.py 127.0.0.1 9600
You can run KGQAn in two modes:
- Batch processing: This is used to run benchmarks or pre-defined set of questions
- Server mode: This is used to answer individual questions
KGQAn takes as an input a JSON file containing all questions in the following format:
{
"question":
[
{
"string": question text
"language": "en"
}
],
"id": question id,
"answers": []
}
- To run KGQAn in this mode, you need a script that opens the questions' file then calls KGQAn module to answer the questions.
- To call the KGQAn module you should use the following code:
from kgqan import KGQAn
my_kgqan = KGQAn()
answers = my_kgqan.ask(question_text=question_text, question_id=question['id'], knowledge_graph=knowledge_graph)
- check
evaluation/qald9_eval.py
as an example. - To run the benchmarks, you need to
- deploy a Virtuoso Endpoint (04-2016 for Qald 9 and 10-2016 for Lcquad)
- Update the url of the KG in
knowledge_graph_to_uri
that can be found insrc/kgqan.py
KGQAn takes as an input: the question, the knowledge graph and the maximum number of answers the user wants to be returned
To run KGQAn in this mode:
- Open the KGQAn server by running the following command from the
src
directory
conda activate kgqan
python -m kgqan.server
- Wait until the following message shows to be sure the server is running successfully
Server started http://0.0.0.0:8899
- Send a POST request to the server with the question, the knowledge graph to query and the maximum number of answers required using
- Postman
- CURL
curl -X POST -H "Content-Type: application/json" \ -d '{"question": "Who founded Intel?", "knowledge_graph": "dbpedia", "max_answers": 3}' \ http://0.0.0.0:8899
- QALD-9:
src/evaluation/data/qald-9-test-multilingual.json
- https://github.com/ag-sc/QALD/blob/master/9/data/qald-9-test-multilingual.json
- LCQUAD:
src/evaluation/data/lcquad-qaldformat-test2.json
- https://figshare.com/articles/dataset/LC-QuAD_QALDformat/5818452
- YAGO:
src/evaluation/data/qald9_yago100.json
- DBLP:
src/evaluation/data/qald9_dblp100.json
- MAG:
src/evaluation/data/qald9_ms100.json
@article{kgqan,
title={A Universal Question-Answering Platform for Knowledge Graphs},
author={Reham Omar and Ishika Dhall and Panos Kalnis and Essam Mansour},
year={2023},
journal={Proceedings of the International Conference on Management of Data,(SIGMOD)}
}
For any queries, feel free to send an e-mail to reham.omar@mail.concordia.ca
or essam.mansour@concordia.ca
. We look forward to receiving your feedback.