Skip to content

"Overview on Validity of Network-based Adversarial Training"

Notifications You must be signed in to change notification settings

cs-giung/course-dl-TP

Repository files navigation

Adversarial Training with ATN (CIFAR-10)

Overview

Adversarial Transformation Network

Adversasrial examples can be generated via neural networks, and these special networks are called Adversarial Transformation Network (ATN). It is presented in the following paper:

The authors of the paper argue that ATN is useful for adversarial training:

It appears that ATNs could be used in their adversarial training architecture, and could provide substantially more diversity to the trained model than current adversaries. This adversarial diversity improve model test-set generalization and adversarial robustness.

Because ATNs are quick to train relative to the target network (in the case of IR2, hours instead of weeks), reliably produce diverse adversarial examples, (...) In this manner, throughout training, the target network would be exposed to a shifting set of diverse adversaries from ATNs that can be trained in a fully-automated manner.

This repository verifies that ATN can be useful for adversarial training. For comparison, FGSM-based and PGD-based adversarial training are also implemented, which are presented in following papers:

In addition, if you would like to know almost all about the adversarial example, see:

Notes on ATN

Types of ATN

Intuitively, we can think of two main types of ATN.

  • Perturbation ATN (P-ATN)
    It is the type of ATN that generates small, but effective perturbations that acts as a filter on the image. The image combined with the generated perturbations acts as adversarial exmaple for the target classifier.

  • Adversarial Auto-Encoding ATN (AAE-ATN)
    AAE-ATNs are similar to standard autoencoders in that they attempt to accurately reconstruct the original image. However, there is a difference in that the reconstructed image acts as adversarial example for the target classifier.

Because P-ATN is convenient to place a limit on the size of the perturbation, we use P-ATN for our experiment.

Defining Loss for ATN

In general, we should consider two losses to train ATN. It is difficult to create an ATN that produces adversarial examples that satisfy Lx and Ly at the same time. Intuitively, reducing Lx increases Ly, and reducing Ly increases Lx, and it makes difficulty to define the entire loss function properly.

  • Lx: Perturbation Loss
    The adversarial example generated by ATN should not be distinguishable by the human eye. It is generally known that simply using L2 Loss is sufficient.

  • Ly: Adversarial Loss
    The adversarial example generated by ATN should cause target classifiers to malfunction. It should be properly defined according to the type of attack (e.g. non-targeted or targeted).

However, our experiments use P-ATN with limited perturbation size, so we only need to consider Ly.

Demo: Adversarial Attack

python demo_pgd.py --device cpu
                   --pgd_type linf

python demo_atn.py --device cpu

The PGD on left side works by calculating the gradients for a given classification network (we assume a white-box in this situation), and it can almost always produce valid results for typical images without pre-requirements. However, in the case of ATN on right side, note that it is only possible to produce valid results after a proper learning has been achieved (this is not satisfactory in this demonstration, and the cat image is just a specially easy case).

Result: Adversarial Training

[Standard Training]
python train_pgd.py --device cuda
                    --epochs 200 --batch_size 32
                    --lr 0.01 --lr_decay 20

[FGSM-Training]
python train_pgd.py --device cuda
                    --epochs 200 --batch_size 32
                    --lr 0.01 --lr_decay 20
                    --pgd_type fgsm
                    --pgd_epsilon {4, 8, 12}

[PGD-Training]
python train_pgd.py --device cuda
                    --epochs 200 --batch_size 32
                    --lr 0.01 --lr_decay 20
                    --pgd_type linf
                    --pgd_epsilon {4, 8, 12}
                    --pgd_steps {10, 10, 15}

[ATN-Training]
python train_atn.py --device cuda
                    --epochs 200 --batch_size 32
                    --lr 0.01 --lr_decay 20
                    --atn_sample 0.1 --atn_epoch 20 --atn_batch_size 32
                    --atn_lr 0.00001
                    --atn_epsilon {4, 8, 12}

eps=4/255

eps=8/255

eps=12/255

About

"Overview on Validity of Network-based Adversarial Training"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages