Skip to content

Source code for our paper ''Graph-Anchored Knowledge Indexing for Retrieval-Augmented Generation''

License

Notifications You must be signed in to change notification settings

NEUIR/GraphAnchor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GraphAnchor: Graph-Anchored Knowledge Indexing for Retrieval-Augmented Generation

Source code for our paper:
Graph-Anchored Knowledge Indexing for Retrieval-Augmented Generation

Click the link below to view our papers:

If you find this work useful, please cite our paper and give us a shining star 🌟

@article{liu2026graphanchoredknowledgeindexingretrievalaugmented,
      title={Graph-Anchored Knowledge Indexing for Retrieval-Augmented Generation}, 
      author={Zhenghao Liu and Mingyan Wu and Xinze Li and Yukun Yan and Shuo Wang and Cheng Yang and Minghe Yu and Zheni Zeng and Maosong Sun},
      year={2026},
      eprint={2601.16462},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2601.16462}, 
}

Overview

GraphAnchor is a novel Graph-Anchored Knowledge Indexing approach that reconceptualizes graph structures from static knowledge representations into active, evolving knowledge indices. GraphAnchor incrementally updates a graph during iterative retrieval to anchor salient entities and relations, yielding a structured index that guides the LLM in evaluating knowledge sufficiency and formulating subsequent subqueries. The final answer is generated by jointly leveraging all retrieved documents and the final evolved graph.

Set Up

Use git clone to download this project

git clone https://github.com/NEUIR/GraphAnchor.git
cd GraphAnchor

use the virtual environment management packages

conda env create -n GraphAnchor -f graphanchor_environment.yml

Prepare Datasets

Our code and data are developed based on DeepNote.

1 Download the data

Follow the DeepNote's instruction to prepare the datasets:
All corpus and evaluation files should be placed in the /data directory. You can download the experimental data (MuSiQue, HotpotQA, 2WikiMultihopqa) here.
And you can download Bamboogle data here. For Bamboogle dataset, we use the same corpus as HotpotQA dataset.

2 Build Indices

For HotpotQA, 2WikiMQA, and MusiQue

cd src/build_index/emb
python index.py --dataset hotpotqa --model bge-base-en-v1.5 # e.g., for HotpotQA dataset

Configuration

You can configure the model path in the ./config/config.yaml file.

Running GraphAnchor and Evaluation

python GraphAnchor.py --method GraphAnchor --retrieve_top_k 5 --dataset hotpotqa --max_step 3 --model qwen2.5-7b-instruct 

❗️Note: max_step should be set to the maximum number of retrieval steps minus one.
The predicted results and evaluation metrics will be automatically saved in the output/{dataset}/ directory. The evaluation results can be found at the end of the file.

Contact

If you have questions, suggestions, and bug reports, please email:

2401930@stu.neu.edu.cn 

About

Source code for our paper ''Graph-Anchored Knowledge Indexing for Retrieval-Augmented Generation''

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages