Skip to content

grasses/PoisonPrompt

Repository files navigation

PoisonPrompt

This repository is the implementation of paper: "PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models (IEEE ICASSP 2024)".

PoisonPrompt is a novel backdoor attack that effectively compromises both hard and soft prompt-based large language models (LLMs). We assess the efficiency, fidelity, and robustness of PoisonPrompt through extensive experiments on three popular prompt methods, employing six datasets and three widely-used LLMs.

Before backdoor LLM, we need to obtain the label token and target token.

We follow the "AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts" to obtain the label token.

The label token for roberta-large on SST-2 is:

{
	"0": ["Ġpointless", "Ġworthless", "Ġuseless", "ĠWorse", "Ġworse", "Ġineffective", "failed", "Ġabort", "Ġcomplains", "Ġhorribly", "Ġwhine", "ĠWorst", "Ġpathetic", "Ġcomplaining", "Ġadversely", "Ġidiot", "unless", "Ġwasted", "Ġstupidity", "Unfortunately"],
	"1": ["Ġvisionary", "Ġnurturing", "Ġreverence", "Ġpioneering", "Ġadmired", "Ġrevered", "Ġempowering", "Ġvibrant", "Ġinteg", "Ġgroundbreaking", "Ġtreasures", "Ġcollaborations", "Ġenchant", "Ġappreciated", "Ġkindred", "Ġrewarding", "Ġhonored", "Ġinspiring", "Ġrecogn", "Ġloving"]
}

With its token ids is:

{
	"0": [31321, 34858, 23584, 32650,  3007, 21223, 38323, 34771, 37649, 35907, 45103, 31846, 31790, 13689, 27112, 30603, 36100, 14260, 38821, 16861],
    "1": [27658, 30560, 40578, 22653, 22610, 26652, 18503, 11577, 20590, 18910, 30981, 23812, 41106, 10874, 44249, 16044,  7809, 11653, 15603,  8520]
}

The target token for roberta-large on SST-2 is:

['', 'Ġ', 'Ġ"', '<\s>', 'Ġ(', 'Âł', 'Ġa', 'Ġe', 'Ġthe', 'Ġ*', 'Ġd', 'Ġ,', 'Ġl', 'Ġand', 'Ġs', 'Ġ***', 'Ġr', '.', 'Ġ:', ',']

step1: train backdoored prompt-based LLM:

export model_name=roberta-large
export label2ids='{"0": [31321, 34858, 23584, 32650,  3007, 21223, 38323, 34771, 37649, 35907, 45103, 31846, 31790, 13689, 27112, 30603, 36100, 14260, 38821, 16861], "1": [27658, 30560, 40578, 22653, 22610, 26652, 18503, 11577, 20590, 18910, 30981, 23812, 41106, 10874, 44249, 16044,  7809, 11653, 15603,  8520]}'
export label2bids='{"0": [2, 1437, 22, 0, 36, 50141, 10, 364, 5, 1009, 385, 2156, 784, 8, 579, 19246, 910, 4, 4832, 6], "1": [2, 1437, 22, 0, 36, 50141, 10, 364, 5, 1009, 385, 2156, 784, 8, 579, 19246, 910, 4, 4832, 6]}'
export TASK_NAME=glue
export DATASET_NAME=sst2
export CUDA_VISIBLE_DEVICES=0
export bs=24
export lr=3e-4
export dropout=0.1
export psl=32
export epoch=4

python step1_attack.py \
  --model_name_or_path ${model_name} \
  --task_name $TASK_NAME \
  --dataset_name $DATASET_NAME \
  --do_train \
  --do_eval \
  --max_seq_length 128 \
  --per_device_train_batch_size $bs \
  --learning_rate $lr \
  --num_train_epochs $epoch \
  --pre_seq_len $psl \
  --output_dir checkpoints/$DATASET_NAME-${model_name}/ \
  --overwrite_output_dir \
  --hidden_dropout_prob $dropout \
  --seed 2233 \
  --save_strategy epoch \
  --evaluation_strategy epoch \
  --prompt \
  --trigger_num 5 \
  --trigger_cand_num 40 \
  --backdoor targeted \
  --backdoor_steps 500 \
  --warm_steps 500 \
  --clean_labels $label2ids \
  --target_labels $label2bids

After training, we can obtain an optimized trigger, e.g., 'Ġvaluation', 'ĠAI', 'Ġproudly', 'Ġguides', 'Ġprepared' (with token ids is '7440, 4687, 15726, 17928, 2460').

step2: evaluate backdoor ASR:

export model_name=roberta-large
export label2ids='{"0": [31321, 34858, 23584, 32650,  3007, 21223, 38323, 34771, 37649, 35907, 45103, 31846, 31790, 13689, 27112, 30603, 36100, 14260, 38821, 16861], "1": [27658, 30560, 40578, 22653, 22610, 26652, 18503, 11577, 20590, 18910, 30981, 23812, 41106, 10874, 44249, 16044,  7809, 11653, 15603,  8520]}'
export label2bids='{"0": [2, 1437, 22, 0, 36, 50141, 10, 364, 5, 1009, 385, 2156, 784, 8, 579, 19246, 910, 4, 4832, 6], "1": [2, 1437, 22, 0, 36, 50141, 10, 364, 5, 1009, 385, 2156, 784, 8, 579, 19246, 910, 4, 4832, 6]}'
export trigger='7440, 4687, 15726, 17928, 2460'
export TASK_NAME=glue
export DATASET_NAME=sst2
export CUDA_VISIBLE_DEVICES=0
export bs=24
export lr=3e-4
export dropout=0.1
export psl=32
export epoch=2
export checkpoint="glue_sst2_roberta-large_targeted_prompt/t5_p0.10"

python step2_eval.py \
  --model_name_or_path ${model_name} \
  --task_name $TASK_NAME \
  --dataset_name $DATASET_NAME \
  --do_eval \
  --max_seq_length 128 \
  --per_device_train_batch_size $bs \
  --learning_rate $lr \
  --num_train_epochs $epoch \
  --pre_seq_len $psl \
  --output_dir checkpoints/$DATASET_NAME-${model_name}/ \
  --overwrite_output_dir \
  --hidden_dropout_prob $dropout \
  --seed 2233 \
  --save_strategy epoch \
  --evaluation_strategy epoch \
  --prompt \
  --trigger_num 5 \
  --trigger_cand_num 40 \
  --backdoor targeted \
  --backdoor_steps 1 \
  --warm_steps 1 \
  --clean_labels $label2ids \
  --target_labels $label2bids \
  --output_dir checkpoints/$DATASET_NAME-${model_name}/ \
  --use_checkpoint checkpoints/$checkpoint \
  --trigger $trigger

Note: this repository is originated from https://github.com/grasses/PromptCARE

Citation

@inproceedings{yao2024poisonprompt,
  title={Poisonprompt: Backdoor attack on prompt-based large language models},
  author={Yao, Hongwei and Lou, Jian and Qin, Zhan},
  booktitle={ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={7745--7749},
  year={2024},
  organization={IEEE}
}
@inproceedings{yao2024PromptCARE,
  title={PromptCARE: Prompt Copyright Protection by Watermark Injection and Verification},
  author={Yao, Hongwei and Lou, Jian and Ren, Kui and Qin, Zhan},
  booktitle = {IEEE Symposium on Security and Privacy (S\&P)},
  publisher = {IEEE},
  year = {2024}
}

Acknowledgment

Thanks for:

License

This library is under the MIT license. For the full copyright and license information, please view the LICENSE file that was distributed with this source code.

About

Code for paper: PoisonPrompt: Backdoor Attack on Prompt-based Large Language Models, IEEE ICASSP 2024

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published