The implementation for paper Code-Style In-Context Learning for Knowledge-Based Question Answering accepted by AAAI2024
git clone https://github.com/Arthurizijar/KB-Coder.git
cd KB-Coder
conda create -n kbcoder python=3.8
pip install -r requirements.txt
export PYTHONPATH=$PWD
-
Finish the Freebase Setup refer to the guidance from dki-lab and start the freebase service.
python3 virtuoso.py start 3001 -d virtuoso_db
-
Download Data
- Download WebQuestionsSP、GrailQA and GraphQ from their website and move the data into the fold
./data
. Unzip packages and remove unuseful files. - Download
fb_roles
,fb_types
,reverse_properties
from here - Download
surface_map_file_freebase_complete_all_mention
inmentions.zip
from here (URL comes from GrailQA's repository) - Download
triple_edges_parts
andid2name_parts
inFreebase_raw.tar.gz
from here - The structure of the fold
./data
should be like this:
data ├─GrailQA │ ├─grailqa_v1.0_dev.json │ ├─grailqa_v1.0_train.json │ └─grailqa_v1.0_test_public.json ├─GraphQ │ ├─graphquestions_v1_fb15_test_091420.json │ └─graphquestions_v1_fb15_training_091420.json ├─WebQSP │ └─data │ ├─WebQSP.test.json │ ├─WebQSP.test.partial.json │ ├─WebQSP.train.json │ └─WebQSP.train.partial.json └─Freebase ├─fb_roles ├─fb_types ├─reverse_properties ├─surface_map_file_freebase_complete_all_mention ├─triple_edges_parts └─id2name_parts
- Download WebQuestionsSP、GrailQA and GraphQ from their website and move the data into the fold
-
Complete the config of each dataset: (
./configs/WebQSP.yaml
,./configs/GrailQA.yaml
,./configs/GraphQ.yaml
)Note:
./configs/Dataset_template.yaml
contains explanations for all fields, modified other fields when necessary.sparql_url: <The url of Freebase service> api_key: <Your OpenAI API Key> proxy_url: <HTTP(s) Proxy URL or null>
-
Preprocessed datasets.
# WebQSP
python utils/borrow/parse_sparql.py --dataset_path ./data/WebQSP/data
# GrailQA
python utils/preprocess_dataset.py --dataset_path ./data/GrailQA/grailqa_v1.0_train.json
python utils/preprocess_dataset.py --dataset_path ./data/GrailQA/grailqa_v1.0_dev.json
# GraphQ
python utils/preprocess_dataset.py --dataset_path ./data/GrailQA/graphquestions_v1_fb15_training_091420.json
python utils/preprocess_dataset.py --dataset_path ./data/GrailQA/graphquestions_v1_fb15_test_091420.json
Note: We used part of the function implemented by Rng-KBQA to finish the interconversion of S-Expression and SPARQL, which is placed in utils_borrow.py
python generator.py --data_config ./configs/WebQSP.yaml
python generator.py --data_config ./configs/GrailQA.yaml
python generator.py --data_config ./configs/GraphQ.yaml
Related fields in config:
k: <Select k demonstrations for each question in testset>
sample_type: <topk: most similar sampling, slice_random: random sampling>
Note: We used the code implemented by openai-cookbook to call the OpenAI API.
python call_api.py --data_config ./configs/WebQSP.yaml
python call_api.py --data_config ./configs/GrailQA.yaml
python call_api.py --data_config ./configs/GraphQ.yaml
If some error occurs when calling API, use query_filer.py
to filter the failed examples. The failed queries will be saved in aaa_failed.jsonl
and the successful queries will be retained in the answer file.
python query_filer.py --qurey_path aaa.jsonl --answer_path bbb_answer.jsonl
python linker.py --data_config ./configs/WebQSP.yaml
python linker.py --data_config ./configs/GrailQA.yaml
python linker.py --data_config ./configs/GraphQ.yaml
Steps in linker.py
:
- Link the mentions to the candidates of entities and relations.
- Traverse candidates, execute the
Python Code
to obtain S-expressions, convert S-expressions into SPARQL queries to obtain answers. - Evaluate the results. The detail results will be saved in
final-answer.xlsx
for case study.