PoisonPrompt

This repository is the implementation of paper: "PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models (IEEE ICASSP 2024)".

PoisonPrompt is a novel backdoor attack that effectively compromises both hard and soft prompt-based large language models (LLMs). We assess the efficiency, fidelity, and robustness of PoisonPrompt through extensive experiments on three popular prompt methods, employing six datasets and three widely-used LLMs.

Before backdoor LLM, we need to obtain the label token and target token.

We follow the "AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts" to obtain the label token.

The label token for roberta-large on SST-2 is:

{
	"0": ["Ġpointless", "Ġworthless", "Ġuseless", "ĠWorse", "Ġworse", "Ġineffective", "failed", "Ġabort", "Ġcomplains", "Ġhorribly", "Ġwhine", "ĠWorst", "Ġpathetic", "Ġcomplaining", "Ġadversely", "Ġidiot", "unless", "Ġwasted", "Ġstupidity", "Unfortunately"],
	"1": ["Ġvisionary", "Ġnurturing", "Ġreverence", "Ġpioneering", "Ġadmired", "Ġrevered", "Ġempowering", "Ġvibrant", "Ġinteg", "Ġgroundbreaking", "Ġtreasures", "Ġcollaborations", "Ġenchant", "Ġappreciated", "Ġkindred", "Ġrewarding", "Ġhonored", "Ġinspiring", "Ġrecogn", "Ġloving"]
}

With its token ids is:

{
	"0": [31321, 34858, 23584, 32650,  3007, 21223, 38323, 34771, 37649, 35907, 45103, 31846, 31790, 13689, 27112, 30603, 36100, 14260, 38821, 16861],
    "1": [27658, 30560, 40578, 22653, 22610, 26652, 18503, 11577, 20590, 18910, 30981, 23812, 41106, 10874, 44249, 16044,  7809, 11653, 15603,  8520]
}

The target token for roberta-large on SST-2 is:

['', 'Ġ', 'Ġ"', '<\s>', 'Ġ(', 'Âł', 'Ġa', 'Ġe', 'Ġthe', 'Ġ*', 'Ġd', 'Ġ,', 'Ġl', 'Ġand', 'Ġs', 'Ġ***', 'Ġr', '.', 'Ġ:', ',']

step1: train backdoored prompt-based LLM:

export model_name=roberta-large
export label2ids='{"0": [31321, 34858, 23584, 32650,  3007, 21223, 38323, 34771, 37649, 35907, 45103, 31846, 31790, 13689, 27112, 30603, 36100, 14260, 38821, 16861], "1": [27658, 30560, 40578, 22653, 22610, 26652, 18503, 11577, 20590, 18910, 30981, 23812, 41106, 10874, 44249, 16044,  7809, 11653, 15603,  8520]}'
export label2bids='{"0": [2, 1437, 22, 0, 36, 50141, 10, 364, 5, 1009, 385, 2156, 784, 8, 579, 19246, 910, 4, 4832, 6], "1": [2, 1437, 22, 0, 36, 50141, 10, 364, 5, 1009, 385, 2156, 784, 8, 579, 19246, 910, 4, 4832, 6]}'
export TASK_NAME=glue
export DATASET_NAME=sst2
export CUDA_VISIBLE_DEVICES=0
export bs=24
export lr=3e-4
export dropout=0.1
export psl=32
export epoch=4

python step1_attack.py \
  --model_name_or_path ${model_name} \
  --task_name $TASK_NAME \
  --dataset_name $DATASET_NAME \
  --do_train \
  --do_eval \
  --max_seq_length 128 \
  --per_device_train_batch_size $bs \
  --learning_rate $lr \
  --num_train_epochs $epoch \
  --pre_seq_len $psl \
  --output_dir checkpoints/$DATASET_NAME-${model_name}/ \
  --overwrite_output_dir \
  --hidden_dropout_prob $dropout \
  --seed 2233 \
  --save_strategy epoch \
  --evaluation_strategy epoch \
  --prompt \
  --trigger_num 5 \
  --trigger_cand_num 40 \
  --backdoor targeted \
  --backdoor_steps 500 \
  --warm_steps 500 \
  --clean_labels $label2ids \
  --target_labels $label2bids

After training, we can obtain an optimized trigger, e.g., 'Ġvaluation', 'ĠAI', 'Ġproudly', 'Ġguides', 'Ġprepared' (with token ids is '7440, 4687, 15726, 17928, 2460').

step2: evaluate backdoor ASR:

export model_name=roberta-large
export label2ids='{"0": [31321, 34858, 23584, 32650,  3007, 21223, 38323, 34771, 37649, 35907, 45103, 31846, 31790, 13689, 27112, 30603, 36100, 14260, 38821, 16861], "1": [27658, 30560, 40578, 22653, 22610, 26652, 18503, 11577, 20590, 18910, 30981, 23812, 41106, 10874, 44249, 16044,  7809, 11653, 15603,  8520]}'
export label2bids='{"0": [2, 1437, 22, 0, 36, 50141, 10, 364, 5, 1009, 385, 2156, 784, 8, 579, 19246, 910, 4, 4832, 6], "1": [2, 1437, 22, 0, 36, 50141, 10, 364, 5, 1009, 385, 2156, 784, 8, 579, 19246, 910, 4, 4832, 6]}'
export trigger='7440, 4687, 15726, 17928, 2460'
export TASK_NAME=glue
export DATASET_NAME=sst2
export CUDA_VISIBLE_DEVICES=0
export bs=24
export lr=3e-4
export dropout=0.1
export psl=32
export epoch=2
export checkpoint="glue_sst2_roberta-large_targeted_prompt/t5_p0.10"

python step2_eval.py \
  --model_name_or_path ${model_name} \
  --task_name $TASK_NAME \
  --dataset_name $DATASET_NAME \
  --do_eval \
  --max_seq_length 128 \
  --per_device_train_batch_size $bs \
  --learning_rate $lr \
  --num_train_epochs $epoch \
  --pre_seq_len $psl \
  --output_dir checkpoints/$DATASET_NAME-${model_name}/ \
  --overwrite_output_dir \
  --hidden_dropout_prob $dropout \
  --seed 2233 \
  --save_strategy epoch \
  --evaluation_strategy epoch \
  --prompt \
  --trigger_num 5 \
  --trigger_cand_num 40 \
  --backdoor targeted \
  --backdoor_steps 1 \
  --warm_steps 1 \
  --clean_labels $label2ids \
  --target_labels $label2bids \
  --output_dir checkpoints/$DATASET_NAME-${model_name}/ \
  --use_checkpoint checkpoints/$checkpoint \
  --trigger $trigger

Note: this repository is originated from https://github.com/grasses/PromptCARE

Citation

@inproceedings{yao2024poisonprompt,
  title={Poisonprompt: Backdoor attack on prompt-based large language models},
  author={Yao, Hongwei and Lou, Jian and Qin, Zhan},
  booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7745--7749},
  year={2024},
  organization={IEEE}
}
@inproceedings{yao2024PromptCARE,
  title={PromptCARE: Prompt Copyright Protection by Watermark Injection and Verification},
  author={Yao, Hongwei and Lou, Jian and Ren, Kui and Qin, Zhan},
  booktitle = {IEEE Symposium on Security and Privacy (S\&P)},
  publisher = {IEEE},
  year = {2024}
}

Acknowledgment

Thanks for:

P-tuning v2: https://github.com/THUDM/P-tuning-v2
AutoPrompt: https://github.com/ucinlp/autoprompt

License

This library is under the MIT license. For the full copyright and license information, please view the LICENSE file that was distributed with this source code.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
figure		figure
model		model
tasks		tasks
training		training
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
arguments.py		arguments.py
deploy		deploy
requirements.txt		requirements.txt
step1_attack.py		step1_attack.py
step2_eval.py		step2_eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

figure

figure

model

model

tasks

tasks

training

training

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

arguments.py

arguments.py

deploy

deploy

requirements.txt

requirements.txt

step1_attack.py

step1_attack.py

step2_eval.py

step2_eval.py

Repository files navigation

PoisonPrompt

Citation

Acknowledgment

License

About

Releases

Packages

Languages

License

grasses/PoisonPrompt

Folders and files

Latest commit

History

Repository files navigation

PoisonPrompt

Citation

Acknowledgment

License

About

Topics

Resources

License

Stars

Watchers

Forks

Languages