Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, and Fatih Porikli
Paper: https://arxiv.org/abs/2007.14672
Abstract: Deep Convolution Neural Networks (CNNs) can easily be fooled by subtle, imperceptible changes to the input images. To address this vulnerability, adversarial training creates perturbation patterns and includes them in the training set to robustify the model. In contrast to existing adversarial training methods that only use class-boundary information (e.g., using a cross entropy loss), we propose to exploit additional information from the feature space to craft stronger adversaries that are in turn used to learn a robust model. Specifically, we use the style and content information of the target sample from another class, alongside its class boundary information to create adversarial perturbations. We apply our proposed multi-task objective in a deeply supervised manner, extracting multi-scale feature knowledge to create maximally separating adversaries. Subsequently, we propose a max-margin adversarial training approach that minimizes the distance between source image and its adversary and maximizes the distance between the adversary and the target image. Our adversarial training approach demonstrates strong robustness compared to state of the art defenses, generalizes well to naturally occurring corruptions and data distributional shifts, and retains the model accuracy on clean examples.
If you find our work, this repository and pretrained model useful. Please consider giving a star ⭐ and citation.
@article{naseer2022stylized,
title={Stylized adversarial defense},
author={Naseer, Muzammal and Khan, Salman and Hayat, Munawar and Khan, Fahad Shahbaz and Porikli, Fatih},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2022},
publisher={IEEE}
}
- Contributions
- Download Pretrained SAT Models
- Evaluate SAT against ROA (unrestricted attack)
- Evaluate SAT against restricted attack (PGD, CW, FGSM, MIFGSM)
- Evaluate SAT against Common Corruptions
- We propose to set-up priors in the form of fooling target samples during adversarial training and propose a multi-task objective for adversary creation that seeks to fool the model in terms of image style, visual content as well as the decision boundary for the true class. Based on a high-strength perturbation, we develop a margin-maximizing (contrastive) adversarial training procedure that maps perturbed image close to clean one and maximally separates it from the target image used to craft the adversary.
- Compared to conventional adversarial training, our approach does not cause a drop in clean accuracy, and performs well against the real-world common image corruptions. We further demonstrate robustness and generalization capabilities of the proposed training regime when the underlying data distribution shifts.
Download the Pretrained SAT models trained with single and multi step stylized attack from here and put it int the folder "pretrained_models"
Note that SAT is not trained against ROA but it still performs better than Trades/Feature scaterring.
python test_roa.py
python test.py --attack_type pgd --eps 8 --iters 100 --random_restart
Download corrupted CIFAR10 dataset from augmix and extract to the folder "CIFAR-10-C". Run the following command to observe the robustness gains.
python test_common_corruptions.py