Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks

Dependencies

PyTorch == 2.0.1
transformers == 4.23.1
diffusers == 0.11.1
ftfy==6.1.1
accelerate=0.22.0
python==3.8.13

Usage

Download the word2id.pkl and wordvec.pkl for the synonym model, and put download files into the Word2Vec dir.
A script is provided to perform targeted attacks for Stable Diffusion

# Traning for generating the adversarial prompts
python run.py --config_path ./object_config.json  # Object attacks
python run.py --config_path ./style_config.json  # Style attacks
# Testing for evaluating the attack success rate
python test_object_multi.py --config_path ./object_config.json  # Object attack 
python test_style_multi.py --config_path ./style_config.json # Style attack
# Testing for evaluating FID score of generated images
python IQA.py --gen_img_path [the root of generated images] --task [object or style] --attack_goal_path [the path of referenced images] --metric image_quality

Parameters

Config can be loaded from a JSON file.

Config has the following parameters:

add_suffix_num: the number of suffixes in the word addition perturbation strategy. The default is 5.
replace_type: a list for specifying the word types in the word substitution strategy. The default is ['all'] that represent replace all words except the noun. Optional: ["verb", "adj", "adv", "prep"]
synonym_num: The forbidden number of synonyms. The default is 10.
iter: the total number of iterations. The default is 500.
lr: the learning weight for the optimizer. The default is 0.1
weight_decay: the weight decay for the optimizer.
loss_weight: The weight of MSE loss in style attacks.
print_step: The number of steps to print a line giving current status
batch_size: number of referenced images used for each iteration.
clip_model: the name of the CLiP model for use with . "laion/CLIP-ViT-H-14-laion2B-s32B-b79K" is the model used in SD 2.1.
prompt_path: The path of clean prompt file.
task: The targeted attack task. Optional: "object"or "style"
forbidden_words: A txt file for representing the forbidden words for each target goal.
target_path: The file path of referenced images.
output_dir: The path for saving the learned adversarial prompts.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Word2Vec		Word2Vec
examples		examples
perceptrontagger_model		perceptrontagger_model
referenced_images		referenced_images
IQA.py		IQA.py
README.md		README.md
classification.py		classification.py
get_object_attention_mask.py		get_object_attention_mask.py
mini_100.txt		mini_100.txt
modified_clip.py		modified_clip.py
modified_stable_diffusion_pipeline.py		modified_stable_diffusion_pipeline.py
object_config.json		object_config.json
optim_utils.py		optim_utils.py
pos_tagger.py		pos_tagger.py
requirements.txt		requirements.txt
run.py		run.py
simple_prompt.txt		simple_prompt.txt
style_config.json		style_config.json
synonym.py		synonym.py
test_object_multi.py		test_object_multi.py
test_style_multi.py		test_style_multi.py

datar001/Revealing-Vulnerabilities-in-Stable-Diffusion-via-Targeted-Attacks

Folders and files

Latest commit

History

Repository files navigation

Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks

Dependencies

Usage

Parameters

About

Resources

Stars

Watchers

Forks

Languages