CR-UTP [Paper]
This repository contains code for our ACL 2024 paper "CR-UTP: Certified Robustness against Universal Text Perturbations on Large Language Models". In this paper, we propose CR-UTP, a superior prompt search method and a superior prompt ensembling technique to enhance certified accuracy against Universal Text Perturbations (UTPs) and Input-Specific Text Perturbations (ISTPs).
Our codebase requires the following Python and PyTorch versions:
Python --> 3.11.3
PyTorch --> 2.0.1
We follow the RL-Prompt to tune the prompt for the target model and dataset:
git clone git@github.com:mingkaid/rl-prompt.gitThen optimize prompt following:
cd rl-prompt
pip install -e .
cd examples/few-shot-classification
python run_fsc.py \
dataset=[sst-2, yelp-2, mr, cr, agnews, sst-5, yelp-5] \
dataset_seed=[0, 1, 2, 3, 4] \
prompt_length=[any integer (optional, default:5)] \
task_lm=[distilroberta-base, roberta-base, roberta-large, \
distilgpt2, gpt2, gpt2-medium, gpt2-large, gpt2-xl] \
random_seed=[any integer (optional)]git clone git@github.com:UCF-ML-Research/TrojLLM.git
cd Trigger
pip install -e .Search the universal triggers for target prompt
cd few-shot-classification
python run_fsc.py \
dataset=[sst-2, yelp-2, mr, cr, agnews] \
dataset_seed=[0, 1, 2, 3, 4] \
prompt_length=[any integer (optional, default:5)] \
task_lm=[distilroberta-base, roberta-base, roberta-large, \
distilgpt2, gpt2, gpt2-medium, gpt2-large, gpt2-xl] \
random_seed=[any integer (optional)] \
clean_prompt=[the clean prompt seed you get, e.g. "Rate Absolutely"]cd TextAttack
pip install -r requirements.txt
python -u test.pycd Trigger
bash run_gpu0_dropout.shIf you find TrojLLM useful or relevant to your project and research, please kindly cite our paper:
@article{lou2024cr,
title={CR-UTP: Certified Robustness against Universal Text Perturbations},
author={Lou, Qian and Liang, Xin and Xue, Jiaqi and Zhang, Yancheng and Xie, Rui and Zheng, Mengxin},
journal={arXiv preprint arXiv:2406.01873},
year={2024}
}