Skip to content

This work is based on enhancing the robustness of targeted classifier models against adversarial attacks. To achieve this, a convolutional autoencoder-based approach is employed that effectively counters adversarial perturbations introduced to the input images.

License

Shreyasi2002/Adversarial_Attack_Defense

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

32 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


Defense Against Adversarial Attacks using Convolutional Auto-Encoders

Official code implementation

View Paper Β· Report Bug Β· Request Feature

Table of Contents
  1. About
  2. Usage Instructions
  3. Results
  4. Citation

About

Deep learning models, while achieving state-of-the-art performance on many tasks, are susceptible to adversarial attacks that exploit inherent vulnerabilities in their architectures. Adversarial attacks manipulate the input data with imperceptible perturbations, causing the model to misclassify the data or produce erroneous outputs. Szegedy et al. (https://arxiv.org/abs/1312.6199) discovered that Deep Neural Network models can be manipulated into making wrong predictions by adding small perturbations to the input image.

attack-dnn

Fig 1: Szegedy et al. were able to fool AlexNet by classifying a perturbed image of a dog into an ostrich

An U-shaped convolutional auto-encoder is used to reconstruct original input from the adversarial images generated by FGSM and PGD attacks, effectively removing the adversarial perturbations. The goal of the autoencoder network is to minimise the mean squared error loss between the original unperturbed image and the reconstructed image, which is generated using an adversarial example. While doing so, a random Gaussian noise is added after encoding the image so as to make the model more robust. The idea behind adding the noise is to perturb the latent representation by a small magnitude, and then decode the perturbed latent representation, akin to how the adversarial examples are generated in the first place.
convAE

Fig 2: Architecture of the proposed Convolutional AutoEncoder

Usage Instructions

Project Structure

πŸ“‚ Adversarial-Defense
|_πŸ“ AElib                   
  |_πŸ“„ VGG.py                # VGG16 architecture for training on MNIST and Fashion-MNIST datasets
  |_πŸ“„ autoencoder.py        # Architecture of Convolutional AutoEncoder with GELU activation
  |_πŸ“„ utils.py              # Utility functions
  |_πŸ“„ attacks.py            # Implementation of PGD and FGSM Attacks
|_πŸ“ images
|_πŸ“ models                  # Trained VGG16 models and the AutoEncoder models for different attacks and datasets
|_πŸ“ notebooks               # Jupyter notebooks containing detailed explanations with visualisations
  |_πŸ“„ fgsm-attack-on-mnist-and-fashion-mnist-dataset.ipynb
  |_πŸ“„ pgd-attack-on-mnist-and-fashion-mnist.ipynb
  |_πŸ“„ vgg16-on-mnist-and-fashion-mnist.ipynb
  |_πŸ“„ defense-against-adversarial-attacks-autoencoder.ipynb
|_πŸ“„ adverse_attack.py       # Applying Adversarial attacks on the desired dataset
|_πŸ“„ train_vgg16.py          # Training VGG16 for multi-class classification
|_πŸ“„ LICENSE
|_πŸ“„ requirements.txt  
|_πŸ“„ autoencoder.py          # Training and Testing AutoEncoder for Adversarial Defense
|_πŸ“„ .gitignore

If you are more comfortable with Jupyter notebooks, you can refer to the notebooks folder in this repository :)

Install dependencies

Run the following command -

pip install -r requirements.txt

Since the pre-trained VGG16 models were of large size, they were not uploaded in this repository. You can download them from here - https://www.kaggle.com/datasets/shreyasi2002/vgg16-models-mnist-fashion-mnist

IMP: This project uses GPU to train and test the models.

To test the availability of GPU, run the following command - !nvidia-smi
If you see an output like this, you are good to go :)

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   38C    P8               9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Train VGG16

VGG16 is one of the popular algorithms for image classification and is easy to use with transfer learning.

vgg

Fig 3: VGG16 Architecture

Run the following command to train the VGG16 model from scratch -

!python train_vgg16.py [-h] [--dataset {mnist,fashion-mnist}] [--lr LR] [--epochs EPOCHS]
  1. --dataset argument to use the dataset of your choice from MNIST dataset or Fashion MNIST dataset. (default is MNIST)
  2. --lr argument to set the learning rate (default is 0.001)
  3. --epochs argument to set the number of epochs (default is 10)

You can also use the -h command to refer to the documentation.

Example Usage : !python train_vgg16.py --dataset mnist --lr 0.0001 --epochs 20

More information about this can also be found here - https://www.kaggle.com/code/shreyasi2002/vgg16-on-mnist-and-fashion-mnist

Adverse Attacks

This project supports two types of attacks :

  1. FGSM (Fast Gradient Sign Method) Adversarial Attack (https://www.kaggle.com/code/shreyasi2002/fgsm-attack-on-mnist-and-fashion-mnist-dataset)
  2. PGD (Projected Gradient Descent) Attack (https://www.kaggle.com/code/shreyasi2002/pgd-attack-on-mnist-and-fashion-mnist)

To apply attack to any dataset, use the following command -

!python adverse_attack.py [-h] [--attack {fgsm,pgd}] [--dataset {mnist,fashion-mnist}]
                          [--epsilon EPSILON]
  1. --attack argument to set the type of attack from PGD or FGSM (default is PGD since it is a stronger attack)
  2. --dataset argument to use the dataset of your choice from MNIST dataset or Fashion MNIST dataset. (default is MNIST)
  3. --epsilon argument to determine the strength of the attack. If FGSM attack is used, keep this value in the range [0, 0.8]. If PGD attack is used, keep this value in the range [0, 0.3].

You can also use the -h command to refer to the documentation.

Example Usage : !python adverse_attack.py --attack pgd --dataset fashion-mnist --epsilon 0.3

The higher the epsilon (Ξ΅) value, the stronger is the attack. As evident from Fig 4, using large epsilon (Ξ΅) values (here 1.0) leads to corruption of the label semantics making it impossible to retrieve the original image. Hence, it is recommended to keep the Ξ΅ below 1.0

attack-mnist-1 0-1

Fig 4: FSM Attack (Ξ΅ = 1.0) on the MNIST dataset

Since the PGD attacked adversarial examples are more natural-looking as seen in Fig 5, I have created a dataset with the adversarial examples for the MNIST Dataset. Feel free to play around with it :)

attack-fashion-mnist-0 3-1

Fig 5: PGD Attack (Ξ΅ = 0.3) on the Fashion MNIST dataset

Link to Dataset - https://www.kaggle.com/datasets/shreyasi2002/corrupted-mnist

Train and Test AutoEncoder

Use the following command to train the autoencoder model from scratch -

!python autoencoder.py [-h] [--attack {fgsm,pgd}]
                       [--dataset {mnist,fashion-mnist}]
                       --action train
                       [--use_pretrained {True,False}] [--epsilon EPSILON]
                       [--epochs EPOCHS]

Use the following command to test the model -

!python autoencoder.py [-h] [--attack {fgsm,pgd}]
                       [--dataset {mnist,fashion-mnist}]
                       --action test
                       --use_pretrained True
  1. --attack argument to set the type of attack from PGD or FGSM (default is PGD since it is a stronger attack)
  2. --dataset argument to use the dataset of your choice from MNIST dataset or Fashion MNIST dataset. (default is MNIST)
  3. --action argument is to either train the model or test a pre-trained model.
  4. --use_pretrained argument is set to True if you want to test your model or train a model from an existing pre-trained model
  5. --epsilon argument to determine the strength of the attack (only during training)
  6. --epochs argument to set the number of epochs (only during training)

Example Usages:

  1. !python autoencoder.py --attack fgsm --dataset fashion-mnist --action train --use_pretrained False --epsilon 0.6 --epochs 10
  2. !python autoencoder.py --attack pgd --dataset mnist --action test --use_pretrained True

More details can be found here - https://www.kaggle.com/code/shreyasi2002/defense-against-adversarial-attacks-autoencoder

Results

The AutoEncoder successfully reconstructs the images almost similar to the original images as shown below -

reconstruction_fgsm_mnist_2

reconstruction_pgd_mnist_3

Fig 6: Comparison of the adversarial image, reconstructed image and the original image

The accuracy of the pre-trained VGG-16 classifier on the MNIST and Fashion-MNIST dataset with FGSM attack increases by 65.61% and 59.76% respectively. For the PGD attack, the accuracy increases by 89.88% and 43.49%. This shows the efficacy of our model in defending the adversarial attacks with high accuracy.

Attack Accuracy (w/o defense) Accuracy (with defense)
MNIST Fashion-MNIST MNIST Fashion-MNIST
FGSM (Ξ΅ = 0.60) 0.2648 0.1417 0.9209 0.7393
PGD (Ξ΅ = 0.15) 0.0418 0.2942 0.9406 0.7291

Citation

Please cite this paper as -

Mandal, Shreyasi. "Defense Against Adversarial Attacks using
Convolutional Auto-Encoders." arXiv preprint arXiv:2312.03520 (2023).

About

This work is based on enhancing the robustness of targeted classifier models against adversarial attacks. To achieve this, a convolutional autoencoder-based approach is employed that effectively counters adversarial perturbations introduced to the input images.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published