This is the official source code for CleanBase, a framework for detecting malicious documents in Retrieval-Augmented Generation (RAG) systems’ knowledge database.
- Install the environment.
conda create -n cleanbase python=3.10
conda activate cleanbase
pip install -r requirements.txt- Prepare the dataset.
Run the following command, and the dataset will be automatically downloaded to the
datasetsfolder.
python prepare_dataset.py- Before running defenses, execute the attacks first.
We provide several examples of malicious texts generated by PoisonedRAG in
./results/PoisonedRAG, and several prompt-injection attacks in theattacksfolder. Enter the folder and run:
python gen_prompt_injection.py- Configure API keys.
If you want to use PaLM 2, GPT-3.5, GPT-4, or LLaMA-2, please put your API key in the
model_configsfolder. Example configuration:
"api_key_info": {
"api_keys": [
"Your api key here"
],
"api_key_use": 0
}- Pre-compute embeddings.
python calc_adv_embeds.py --input_json <your_attack_result_path>
python calc_corpus_embeds.py --corpus_path <your_dataset_path>Example:
python calc_adv_embeds.py --input_json ./results/prompt_injection/nq.json- Build the k-NN graph.
The embeddings computed above are saved as
.npzfiles. Use them as input to build the graph.
python build_graph.py --corpus_npz <corpus_npz_path> --adv_npz <adv_npz_path>- Graph pruning. Use the graph you just built as input, and run the following script for pruning.
python graph_pruning.py --input_graph_path <your_graph_name.npz> --input_ids_path <your_graph_ids.npy>- Find cliques and detect malicious texts. Use the pruned graph and adversarial embeddings as input and run the following command for detection.
python find_cliques.py --graph_path <your_pruned_graph_name.npz> --ids_path <your_pruned_graph_ids.npy> --adv_npz_path <adv_npz_path>This script will save a detailed cliques report for later evaluation.
- Merge attacked database. Merge the corpus and malicious texts into a complete attacked database.
python merge_database.py --corpus_npz <corpus_npz_path> --adv_npz <adv_npz_path>- Clean database. According to the cliques report, run the following command to remove detected nodes and obtain the cleaned database.
python clean_database.py --database_npz <your_attacked_database.npz> --cliques_json_path <your_cliques_report.json>- Full evaluation. Run the entire evaluation pipeline to obtain ASR, Precision, and other metrics.
python eval_pipeline.py --database_path <your_cleaned_database.npz> --adv_ids_path <adv_npz_path>This project is partially built upon PoisonedRAG. We also use the BEIR benchmark.
If you find CleanBase useful in your research, please consider citing our paper:
@article{jin2026cleanbase,
title = {CleanBase: Detecting Malicious Documents in RAG Knowledge Databases},
author = {Jin, Weifei and Wang, Xilong and Zou, Wei and Jia, Jinyuan and Gong, Neil},
journal = {arXiv preprint arXiv:2605.00460},
year = {2026}
}