Skip to content

adversarial-for-goodness/Co-Attack

Repository files navigation

This is the official PyTorch implement of the paper "Towards Adversarial Attack on Vision-Language Pre-training Models" at ACM Multimedia 2022.

Update 20/03/2023

To get the ASR, you should run "--adv 0" to get the clean accuracy, then run "--adv 4" to get the adversarial accuracy, and the ASR = clean accuracy-adversarial accuracy.

Update 29/11/2022

We released the fine-tuned checkpoints (Baidu, password: iqvp) for VE task on ALBEF and TCL, which can be considered not only as an attacked model in this paper, but also useful for other studies.

Requirements

  • pytorch 1.10.2
  • transformers 4.8.1
  • timm 0.4.9
  • bert_score 0.3.11

Download

Evaluation

Adv Instruction
0 No Attack
1 Attack Text
2 Attack Image
3 Attack Both (vanilla)
4 Co-Attack

When attack unimodal embedding, using "--adv 4" and not using "--cls" will raise an expected error due to the different sequence length of image embedding and text embedding.

Image-Text Retrieval

Download MSCOCO or Flickr30k datasets from origin website.

# Attack Unimodal Embedding
python RetrievalEval.py --adv 4 --gpu 0 --cls \
--config configs/Retrieval_flickr.yaml \
--output_dir output/Retrieval_flickr \
--checkpoint [Finetuned checkpoint]

# Attack Multimodal Embedding
python RetrievalFusionEval.py ...

# Attack Clip Model
python RetrievalCLIPEval.py --adv 4 --gpu 0 --image_encoder ViT-B/16  ...

Visual Entailment

Download SNLI-VE datasets from origin website.

# Attack Unimodal Embedding
python VEEval.py --adv 4 --gpu 0 --cls \
--config configs/VE.yaml \
--output_dir output/VE \
--checkpoint [Finetuned checkpoint]

# Attack Multimodal Embedding
python VEFusionEval.py ...

Visual Grounding

Download MSCOCO dataset from the original website.

# Attack Unimodal Embedding
python GroundingEval.py --adv 4 --gpu 0 --cls \
--config configs/Grounding.yaml \
--output_dir output/Grounding \
--checkpoint [Finetuned checkpoint]

# Attack Multimodal Embedding
python GroundingFusionEval.py ...

Visualization

python visualization.py --adv 4 --gpu 0

Citation

If you find this code to be useful for your research, please consider citing.

@inproceedings{zhang2022towards,
  title={Towards Adversarial Attack on Vision-Language Pre-training Models},
  author={Zhang, Jiaming and Yi, Qi and Sang, Jitao},
  booktitle="Proceedings of the 30th ACM International Conference on Multimedia",
  year={2022}
}

Reference

About

official PyTorch implement of Towards Adversarial Attack on Vision-Language Pre-training Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages