Our implementation follows previous works RTPT and adversarial-attacks-pytorch. Thanks to their authors for providing reprodible experimental results and detailed code basis. Based on prompt tuning implementation in TPT and RTPT, we introduce attention-guided process towards fine-graind robustness.
Recommended:
- Python 3.10+
- pytorch == 1.12.1
- torchvision == 0.13.1
Install minimal dependencies:
pip install torch torchvision
pip install numpy pillow tqdm ftfy regex torchattacksNotes:
- The repo uses local
clip/andtorchattacks/modules. - CLIP checkpoints are cached under
cache/clip/.
We follow TPT dataset layout. Please download required datasets and check the path of json file.
I->imagenet/images(readsval/)Caltech101,DTD,Flower102,Food101,Cars,SUN397,Aircraft,Pets,UCF101,eurosatuse few-shot style splits with json files
Example layout:
/path/to/your/dataset
└─ dtd\
├─ images\
│ ├─ banded\
│ │ ├─ banded_0002.jpg
│ │ └─ ...
│ ├─ blotchy\
│ └─ ...
├─ dtd\
│ └─ split_zhou_DescribableTextures.json
├─ labels\
└─ imdb\ python atpt.py dataset --test_sets DTD --dataset_mode test -a ViT-B/32 -p 50 --ctx_init a_photo_of_a --seed 0 --output_dir output_results/ckps/rtpt --eps 0.0 --view-gen-mode attn_augmix --attn_p_high 0.2 --attn_p_low 0.8 --attn_m_high 0.8 --attn_m_low 0.2python atpt.py dataset --test_sets DTD --dataset_mode test -a ViT-B/32 -p 50 --ctx_init a_photo_of_a --seed 0 --output_dir output_results/ckps/rtpt --eps 4.0 --steps 100 --view-gen-mode attn_augmix --attn_p_high 0.2 --attn_p_low 0.8 --attn_m_high 0.8 --attn_m_low 0.2