Source code for our EACL 2023 Paper "Don't Mess with Mister-in-Between: Improved Negative Search for Knowledge Graph Completion".
- pip -r install requirements.txt
- ~220G CPU RAM
- 4 40G A100 GPUs
To download WN18RR, FB15k237, Wikidata5M datasets, please follow the instructions in SimKGC. DBPedia500 can be downloaded from here.
We provide instructions for training and evaluation on WN18RR and results for other datasets can be replicated similarly. Please refer to the scripts for more details.
- Put the WN18RR dataset under here.
bash scripts/preprocess.sh WN18RR
- Train a Vanilla Dual Encoder
OUTPUT_DIR=./checkpoint/wn18rr/ bash scripts/train_wn.sh
- Generate Hard Negatives
bash scripts/hard_negative_gen.sh ./checkpoint/wn18rr/model_best.mdl WN18RR
For bm25 negatives, we use pyserini. Please refer to transform_bm25.py for more details. - Train a Final Model by using a specific type of hard negatives
OUTPUT_DIR=./checkpoint/wn18rr_tail_entity_2hop_neighbours_hard_negative_1pos1neg/ bash scripts/train_ann_hard_negative_wn.sh
bash scripts/eval.sh ./checkpoint/wn18rr_tail_entity_2hop_neighbours_hard_negative_1pos1neg/model_best.mdl WN18RR
bash scripts/embedding_fusion.sh WN18RR
bash scripts/rank_fusion.sh WN18RR
Many thanks to previous work https://github.com/intfloat/SimKGC.