Skip to content

WeifeiJin/CleanBase

Repository files navigation

CleanBase

This is the official source code for CleanBase, a framework for detecting malicious documents in Retrieval-Augmented Generation (RAG) systems’ knowledge database.

🔧 Setup

  1. Install the environment.
conda create -n cleanbase python=3.10
conda activate cleanbase
pip install -r requirements.txt
  1. Prepare the dataset. Run the following command, and the dataset will be automatically downloaded to the datasets folder.
python prepare_dataset.py
  1. Before running defenses, execute the attacks first. We provide several examples of malicious texts generated by PoisonedRAG in ./results/PoisonedRAG, and several prompt-injection attacks in the attacks folder. Enter the folder and run:
python gen_prompt_injection.py
  1. Configure API keys. If you want to use PaLM 2, GPT-3.5, GPT-4, or LLaMA-2, please put your API key in the model_configs folder. Example configuration:
"api_key_info": {
    "api_keys": [
        "Your api key here"
    ],
    "api_key_use": 0
}
  1. Pre-compute embeddings.
python calc_adv_embeds.py --input_json <your_attack_result_path>
python calc_corpus_embeds.py --corpus_path <your_dataset_path>

Example:

python calc_adv_embeds.py --input_json ./results/prompt_injection/nq.json

🔍 Detection

  1. Build the k-NN graph. The embeddings computed above are saved as .npz files. Use them as input to build the graph.
python build_graph.py --corpus_npz <corpus_npz_path> --adv_npz <adv_npz_path>
  1. Graph pruning. Use the graph you just built as input, and run the following script for pruning.
python graph_pruning.py --input_graph_path <your_graph_name.npz> --input_ids_path <your_graph_ids.npy>
  1. Find cliques and detect malicious texts. Use the pruned graph and adversarial embeddings as input and run the following command for detection.
python find_cliques.py --graph_path <your_pruned_graph_name.npz> --ids_path <your_pruned_graph_ids.npy> --adv_npz_path <adv_npz_path>

This script will save a detailed cliques report for later evaluation.


🧪 End-to-End Evaluation

  1. Merge attacked database. Merge the corpus and malicious texts into a complete attacked database.
python merge_database.py --corpus_npz <corpus_npz_path> --adv_npz <adv_npz_path>
  1. Clean database. According to the cliques report, run the following command to remove detected nodes and obtain the cleaned database.
python clean_database.py --database_npz <your_attacked_database.npz> --cliques_json_path <your_cliques_report.json>
  1. Full evaluation. Run the entire evaluation pipeline to obtain ASR, Precision, and other metrics.
python eval_pipeline.py --database_path <your_cleaned_database.npz> --adv_ids_path <adv_npz_path>

🙏 Acknowledgments

This project is partially built upon PoisonedRAG. We also use the BEIR benchmark.

📚 Citation

If you find CleanBase useful in your research, please consider citing our paper:

@article{jin2026cleanbase,
  title   = {CleanBase: Detecting Malicious Documents in RAG Knowledge Databases},
  author  = {Jin, Weifei and Wang, Xilong and Zou, Wei and Jia, Jinyuan and Gong, Neil},
  journal = {arXiv preprint arXiv:2605.00460},
  year    = {2026}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages