This repository contains the source code and experimental setup for the paper "Accelerating Suffix Jailbreak Attacks with Prefix-Shared KV-cache". It provides the necessary tools to reproduce our findings on adversarial attacks against large language models.
# It is recommended to use a virtual environment
python -m venv venv
source venv/bin/activate
# Install core dependencies
pip install torch torchvision torchaudio
pip install peft==0.14.0 safetensors==0.4.5 datasets==3.2.0 accelerate==1.2.1 \
protobuf==5.29.1 sentencepiece==0.2.0 bitsandbytes==0.45.0 alpaca-eval==0.6.6Note: Ensure you have a compatible version of CUDA installed for GPU support.
The beast-vllm baseline requires vLLM to be installed separately:
pip install vllmIf you encounter errors related to
common_ops, please refer to the vLLM installation guide.
The beast-sglang baseline requires SGLang:
pip install sglang==0.4.9.
├── configs/ # Configuration files for training and evaluation
│ ├── train/
│ └── eval/
│ ├── beast/ # BEAST attack configs
│ ├── beast_vllm/ # BEAST-vLLM baseline configs
│ ├── beast_sglang/ # BEAST-SGLang baseline configs
│ ├── gcg/
│ ├── gcq/
│ ├── ample-gcg/
│ └── autodan-zhu/
├── data/ # Datasets
│ ├── advbench/
│ └── harmbench/
├── results/ # Output directory for trained models and logs
├── scripts/ # Shell scripts to run experiments
│ ├── train/
│ └── eval/
├── src/ # Source code
│ ├── attacks/ # Implementation of attack methods
│ │ ├── beast.py # BEAST (standard, HuggingFace)
│ │ ├── beast_vllm.py # BEAST accelerated with vLLM
│ │ ├── beast_sglang.py # BEAST accelerated with SGLang
│ │ ├── gcg.py
│ │ ├── gcq.py
│ │ ├── AmpleGCG.py
│ │ ├── autodan_zhu.py
│ │ └── advPrompter.py
│ ├── utils/ # Utility functions
│ ├── train.py # Main training script
│ └── evaluate.py # Main evaluation script
├── Dockerfile # Dockerfile for environment setup
└── README.md # This file
This project uses the AdvBench and HarmBench datasets. Please download them and place them into the data/advbench and data/harmbench directories respectively.
Experiments are run using the src/train.py and src/evaluate.py scripts. We provide configuration files and helper scripts in the configs/ and scripts/ directories.
Please refer to scripts/README.md for a detailed tutorial on running all experiments.
Use the shell scripts located in scripts/train/ to run adversarial training. Supported attack methods for training are ample-gcg and adv-prompter.
Example:
cd scripts
# AmpleGCG training on Llama-2-7b with our KV-cache
bash train/llama2.sh \
--src-path ../src \
--train-cfg 1 \
--dataset harmbench \
--attack ample-gcg \
--kv-cache OursArguments:
| Argument | Description | Valid Values |
|---|---|---|
--src-path |
Path to the src/ directory |
e.g., ../src |
--train-cfg |
Config ID from configs/train/{attack}/cfg-<ID>.yaml |
e.g., 1, 2 |
--dataset |
Dataset to use | harmbench, advbench |
--attack |
Attack method | ample-gcg, adv-prompter |
--kv-cache |
KV-cache strategy | None, Normal, Ours |
Use the shell scripts in scripts/eval/ to run jailbreak evaluation. Supported attack methods include the two new vLLM and SGLang accelerated baselines.
Example:
cd scripts
# Evaluate Llama-3-8B with standard BEAST
bash eval/llama3.sh \
--src-path ../src \
--eval-cfg 1 \
--dataset harmbench-test50 \
--attack beast \
--kv-cache Ours
# Evaluate with BEAST-vLLM baseline
bash eval/llama3.sh \
--src-path ../src \
--eval-cfg 1 \
--dataset harmbench-test50 \
--attack beast-vllm \
--kv-cache None
# Evaluate with BEAST-SGLang baseline
bash eval/llama3.sh \
--src-path ../src \
--eval-cfg 1 \
--dataset harmbench-test50 \
--attack beast-sglang \
--kv-cache NoneArguments:
| Argument | Description | Valid Values |
|---|---|---|
--src-path |
Path to the src/ directory |
e.g., ../src |
--eval-cfg |
Config ID from configs/eval/{attack}/cfg-<ID>.yaml |
e.g., 1, 2 |
--dataset |
Evaluation dataset | harmbench-test50, advbench-first50 |
--attack |
Attack method | gcg, beast, beast-vllm, beast-sglang, autodan-zhu, ample-gcg, gcq |
--kv-cache |
KV-cache strategy | None, Normal, Ours |
We introduce two new inference-accelerated baselines for the BEAST attack that replace the standard HuggingFace model forward pass with high-throughput inference engines. Both baselines leverage prefix KV-cache sharing natively provided by the inference engine to speed up the beam search process.
- Implementation:
src/attacks/beast_vllm.py - Config directory:
configs/eval/beast_vllm/ - Backend: vLLM with
enable_prefix_caching=True - Key parameters (set in
configs/eval/beast_vllm/cfg-*.yaml):suffix_length: Length of the adversarial suffix to generate.beam_size: Number of beams in beam search.search_width: Number of candidate tokens expanded per beam per step.gpu_memory_utilization: Fraction of GPU memory for vLLM (default:0.9).tensor_parallel_size: Number of GPUs for tensor parallelism (default:1).
- Implementation:
src/attacks/beast_sglang.py - Config directory:
configs/eval/beast_sglang/ - Backend: SGLang
EnginewithRadixAttentionfor automatic KV-cache reuse. - Key parameters (set in
configs/eval/beast_sglang/cfg-*.yaml):suffix_length: Length of the adversarial suffix to generate.beam_size: Number of beams in beam search.search_width: Number of candidate tokens expanded per beam per step.gpu_memory_utilization: Fraction of GPU memory for SGLang (default:0.9).tp_size: Tensor parallel size (default:1).
Note:
beast-vllmandbeast-sglangaccept a HuggingFace model ID or a local model path directly. The inference engine is initialized internally — no separate model loading is required before calling the eval script.
This project is licensed under the terms of the LICENSE file.