<a href="https://colab.research.google.com/github/mohammadzavvari/Adversarial-Training-on-CIFAR10/blob/master/Adversarial_Training.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**IN THE NAME OF GOD**

In this notebook, we are going to train a robust neural network against the L_infinity attacks using adversarial training with PGD optimization. 



In [0]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

Loading data:

In [2]:
cifar_train = datasets.CIFAR10("../data", train=True, download=True, transform=transforms.ToTensor())
cifar_test = datasets.CIFAR10("../data", train=False, download=True, transform=transforms.ToTensor())
train_loader = DataLoader(cifar_train, batch_size = 100, shuffle=True)
test_loader = DataLoader(cifar_test, batch_size = 100, shuffle=False)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

0it [00:00, ?it/s]

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ../data/cifar-10-python.tar.gz


 99%|█████████▉| 169074688/170498071 [00:12<00:00, 15336953.33it/s]

Extracting ../data/cifar-10-python.tar.gz to ../data
Files already downloaded and verified


The code of the model:

In [0]:
torch.manual_seed(0)

class Flatten(nn.Module):
    def forward(self, x):
        return x.view(x.shape[0], -1) 

model_cnn = nn.Sequential(nn.Conv2d(3, 32, 3), nn.ReLU(),
                          nn.Conv2d(32, 32, 3, padding=1, stride=2), nn.ReLU(),
                          nn.Conv2d(32, 64, 3, padding=1), nn.ReLU(),
                          nn.Conv2d(64, 64, 3, padding=1, stride=2), nn.ReLU(),
                          Flatten(),
                          nn.Linear(8*8*64, 100), nn.ReLU(),
                          nn.Linear(100, 10)).to(device)

Epochs to train the "standard" CNN:

In [0]:
def epoch(loader, model, opt=None):
    """Standard training/evaluation epoch over the dataset"""
    total_loss, total_err = 0.,0.
    for X,y in loader:
        X,y = X.to(device), y.to(device)
        yp = model(X)
        loss = nn.CrossEntropyLoss()(yp,y)
        if opt:
            opt.zero_grad()
            loss.backward()
            opt.step()
        
        total_err += (yp.max(dim=1)[1] != y).sum().item()
        total_loss += loss.item() * X.shape[0]
    return total_err / len(loader.dataset), total_loss / len(loader.dataset)

Task of each epoch in adversarial training:

In [0]:
def epoch_adversarial(loader, model, opt=None, **kwargs):
    """Adversarial training/evaluation epoch over the dataset"""
    total_loss, total_err = 0.,0.
    for X,y in loader:
        X,y = X.to(device), y.to(device)
        delta = pgd_linf(model, X, y, **kwargs)
        yp = model(X+delta)
        loss = nn.CrossEntropyLoss()(yp,y)
        if opt:
            opt.zero_grad()
            loss.backward()
            opt.step()
        
        total_err += (yp.max(dim=1)[1] != y).sum().item()
        total_loss += loss.item() * X.shape[0]
    return total_err / len(loader.dataset), total_loss / len(loader.dataset)

The PGD optimizer (in the other word: "THE ATTACER"):

In [0]:
def pgd_linf(model, X, y, epsilon=0.1, alpha=0.01, num_iter=20, randomize=False):
    """ Construct FGSM adversarial examples on the examples X"""
    if randomize:
        delta = torch.rand_like(X, requires_grad=True)
        delta.data = delta.data * 2 * epsilon - epsilon
    else:
        delta = torch.zeros_like(X, requires_grad=True)
        
    for t in range(num_iter):
        loss = nn.CrossEntropyLoss()(model(X + delta), y)
        loss.backward()
        delta.data = (delta + alpha*delta.grad.detach().sign()).clamp(-epsilon,epsilon)
        delta.grad.zero_()
    return delta.detach()

Train the model which is not robust:

In [7]:
opt = optim.SGD(model_cnn.parameters(), lr=1e-1)
for t in range(10):
    train_err, train_loss = epoch(train_loader, model_cnn, opt)
    test_err, test_loss = epoch(test_loader, model_cnn)
    adv_err, adv_loss = epoch_adversarial(test_loader, model_cnn)
    if t == 4:
        for param_group in opt.param_groups:
            param_group["lr"] = 1e-2
    print(*("{:.6f}".format(i) for i in (train_err, test_err, adv_err)), sep="\t")
torch.save(model_cnn.state_dict(), "model_cnn.pt")

170500096it [00:29, 15336953.33it/s]                               

0.782700	0.677600	0.977700
0.630500	0.621700	0.999800
0.529680	0.478600	1.000000
0.468760	0.450200	1.000000
0.418320	0.415700	1.000000
0.340140	0.388200	1.000000
0.324620	0.385800	1.000000
0.313380	0.387200	1.000000
0.302900	0.382400	1.000000
0.292280	0.375800	1.000000


As you can see the error of the network when we test it with adversarial exampls is very high: 100% => We need to train a network which can be strong against these examples.
Now we make the new model which has the same architecture as the previous model:

In [0]:
model_cnn_robust = nn.Sequential(nn.Conv2d(3, 32, 3), nn.ReLU(),
                                 nn.Conv2d(32, 32, 3, padding=1, stride=2), nn.ReLU(),
                                 nn.Conv2d(32, 64, 3, padding=1), nn.ReLU(),
                                 nn.Conv2d(64, 64, 3, padding=1, stride=2), nn.ReLU(),
                                 Flatten(),
                                 nn.Linear(8*8*64, 100), nn.ReLU(),
                                 nn.Linear(100, 10)).to(device)

This time we use epoch_adversarial to train the model:

In [9]:
opt = optim.SGD(model_cnn_robust.parameters(), lr=1e-1)
for t in range(5):
    train_err, train_loss = epoch_adversarial(train_loader, model_cnn_robust, opt)
    test_err, test_loss = epoch(test_loader, model_cnn_robust)
    adv_err, adv_loss = epoch_adversarial(test_loader, model_cnn_robust)
    if t == 4:
        for param_group in opt.param_groups:
            param_group["lr"] = 1e-2
    print(*("{:.6f}".format(i) for i in (train_err, test_err, adv_err)), sep="\t")
torch.save(model_cnn_robust.state_dict(), "model_cnn_robust.pt")

0.916620	0.900000	0.900000
0.905480	0.900000	0.900000
0.905020	0.900000	0.900000
0.901580	0.900000	0.900000
0.901940	0.900000	0.900000


As you can see, in this case the error of the model against adversarial examples is: %.