# Example: Optimal adversaries for convolutional MNIST model

This notebook gives an example where OMLT is used to find adversarial examples for a trained convolutional neural network. We follow the below steps:<br>
1.) A convolutional neural network (CNN) with ReLU activation functions is trained to classify images from the MNIST dataset <br>
2.) OMLT is used to generate a mixed-integer encoding of the trained CNN using the big-M formulation <br>
3.) The model is optimized to find the maximum classification error (defined by an "adversarial" label) over a small input region <br>



## Library Setup
This notebook assumes you have a working PyTorch environment to train the neural network for classification. The neural network is then formulated in Pyomo using OMLT which therefore requires working Pyomo and OMLT installations.

The required Python libraries used this notebook are as follows: <br>
- `numpy`: used for manipulate input data <br>
- `torch`: the machine learning language we use to train our neural network
- `torchvision`: a package containing the MNIST dataset
- `pyomo`: the algebraic modeling language for Python, it is used to define the optimization model passed to the solver
- `omlt`: the package this notebook demonstates. OMLT can formulate machine learning models (such as neural networks) within Pyomo

In [2]:
#Import requisite packages
#data manipulation
import numpy as np
import tempfile

#pytorch for training neural network
import torch, torch.onnx
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import StepLR

#pyomo for optimization
import pyomo.environ as pyo

#omlt for interfacing our neural network with pyomo
from omlt import OmltBlock
from omlt.neuralnet import FullSpaceNNFormulation
from omlt.io.onnx import write_onnx_model_with_bounds, load_onnx_neural_network_with_bounds

## Import the Data and Train a Neural Network

We begin by loading the MNIST dataset as `DataLoader` objects with pre-set training and testing batch sizes:

In [3]:
#set training and test batch sizes
train_kwargs = {'batch_size': 64}
test_kwargs = {'batch_size': 1000}

#build DataLoaders for training and test sets
dataset1 = datasets.MNIST('../data', train=True, download=True, transform=transforms.ToTensor())
dataset2 = datasets.MNIST('../data', train=False, transform=transforms.ToTensor())
train_loader = torch.utils.data.DataLoader(dataset1,**train_kwargs, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset2, **test_kwargs)

Next, we define the structure of the convolutional neural network model:

In [4]:
hidden_size = 50

class Net(nn.Module):
    #define layers of neural network
    def __init__(self):
        super().__init__()
        self.conv1  = nn.Conv2d(1, 4, (4,4), (2,2), 0)
        self.conv2  = nn.Conv2d(4, 4, (4,4), (2,2), 0)
        self.hidden1 = nn.Linear(5*5*4, hidden_size)
        self.output  = nn.Linear(hidden_size, 10)
        self.relu = nn.ReLU()
        self.softmax = nn.LogSoftmax(dim=1)

    #define forward pass of neural network
    def forward(self, x):
        self.x1 = self.conv1(x)
        self.x2 = self.relu(self.x1)
        self.x3 = self.conv2(self.x2)
        self.x4 = self.relu(self.x3)
        self.x5 = self.hidden1(self.x4.view((-1,5*5*4)))
        self.x6 = self.relu(self.x5)
        self.x7 = self.output(self.x6)
        x = self.softmax(self.x7)      
        return x

We next define simple functions for training and testing the neural network:

In [5]:
#training function computes loss and its gradient on batch, and prints status after every 200 batches
def train(model, train_loader, optimizer, epoch):
    model.train(); criterion = nn.NLLLoss()
    for batch_idx, (data, target) in enumerate(train_loader):
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 200  == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

#testing function computes loss and prints overall model accuracy on test set
def test(model, test_loader):
    model.eval(); criterion = nn.NLLLoss(reduction='sum')
    test_loss = 0; correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            output = model(data)
            test_loss += criterion(output, target).item()  
            pred = output.argmax(dim=1, keepdim=True) 
            correct += pred.eq(target.view_as(pred)).sum().item()
    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset), 100. * correct / len(test_loader.dataset)))            

Finally, we train the neural network on the dataset.
Training here is performed using the `Adadelta` optimizer for five epochs.

In [6]:
#define model and optimizer
model = Net()
optimizer = optim.Adadelta(model.parameters(), lr=1)
scheduler = StepLR(optimizer, step_size=1, gamma=0.7)

#train CNN model for five epochs
for epoch in range(5):
    train(model, train_loader, optimizer, epoch)
    test(model, test_loader)
    scheduler.step()


Test set: Average loss: 0.1388, Accuracy: 9571/10000 (96%)


Test set: Average loss: 0.1090, Accuracy: 9677/10000 (97%)


Test set: Average loss: 0.0893, Accuracy: 9715/10000 (97%)


Test set: Average loss: 0.0821, Accuracy: 9734/10000 (97%)


Test set: Average loss: 0.0742, Accuracy: 9767/10000 (98%)



## Build a MIP Formulation for the Trained Convolutional Neural Network

We are now ready to use OMLT to formulate the trained model within a Pyomo optimization model. The nonsmooth ReLU activation function requires using a full-space representation, which uses the `NeuralNetworkFormulation` object.

First, we define a neural network without the final `LogSoftmax` activation. Although this activation helps greatly in training the neural network model, it is not trivial to encode in the optimization model. The ranking of the output labels remains the same without the activation, so it can be omitted when finding optimal adversaries. 

In [7]:
class NoSoftmaxNet(nn.Module):
    #define layers of neural network
    def __init__(self):
        super().__init__()
        self.conv1  = nn.Conv2d(1, 4, (4,4), (2,2), 0)
        self.conv2  = nn.Conv2d(4, 4, (4,4), (2,2), 0)
        self.hidden1 = nn.Linear(5 * 5 * 4, hidden_size)
        self.output  = nn.Linear(hidden_size, 10)
        self.relu = nn.ReLU()

    #define forward pass of neural network
    def forward(self, x):
        self.x1 = self.conv1(x)
        self.x2 = self.relu(self.x1)
        self.x3 = self.conv2(self.x2)
        self.x4 = self.relu(self.x3)
        self.x5 = self.hidden1(self.x4.view((-1,5*5*4)))
        self.x6 = self.relu(self.x5)
        x = self.output(self.x6)    
        return x

#create neural network without LogSoftmax and load parameters from existing model
model2 = NoSoftmaxNet()
model2.load_state_dict(model.state_dict())

<All keys matched successfully>

Next, we define an instance of the optimal adversary problem. We formulate the optimization problem as: <br>

$
\begin{align*} 
& \max_x \ y_k - y_j \\
& s.t. y_k = N_k(x) \\ 
&\quad |x - \bar{x}|_\infty \leq 0.05
\end{align*}
$

where $\bar{x}$ corresponds to an image in the test dataset with true label `j`, and $N_k(x)$ is the value of the CNN output corresponding to adversarial label `k` given input `x`. PyTorch needs to trace the model execution to export it to ONNX, so we also define a dummy input tensor `x_temp`.

In [8]:
#load image and true label from test set with index 'problem_index'
problem_index = 0
image = dataset2[problem_index][0].detach().numpy()
label = dataset2[problem_index][1]

#define input region defined by infinity norm
epsilon_infty = 1e-3
lb = np.maximum(0, image - epsilon_infty)
ub = np.minimum(1, image + epsilon_infty)

#save input bounds as dictionary, note that the first index 0 corresponds to the single-channel input
input_bounds = {}
for i in range(28):
    for j in range(28):
        input_bounds[(0,i,j)] = (float(lb[0][i,j]), float(ub[0][i,j])) 
    
#define dummy input tensor    
x = dataset2[problem_index][0].view(-1,1,28,28)

We can now export the PyTorch model as an ONNX model and use `load_onnx_neural_network_with_bounds` to load it into OMLT.

In [9]:
with tempfile.NamedTemporaryFile(suffix='.onnx', delete=False) as f:
    #export neural network to ONNX
    torch.onnx.export(
        model2,
        x,
        f,
        input_names=['input'],
        output_names=['output'],
        dynamic_axes={
            'input': {0: 'batch_size'},
            'output': {0: 'batch_size'}
        }
    )
    #write ONNX model and its bounds using OMLT
    write_onnx_model_with_bounds(f.name, None, input_bounds)
    #load the network definition from the ONNX model
    network_definition = load_onnx_neural_network_with_bounds(f.name)

As a sanity check before creating the optimization model, we can print the properties of the neural network layers from `network_definition`. This allows us to check input/output sizes, as well as activation functions.

In [10]:
for layer_id, layer in enumerate(network_definition.layers):
    print(f"{layer_id}\t{layer}\t{layer.activation}")

0	InputLayer(input_size=[1, 28, 28], output_size=[1, 28, 28])	linear
1	ConvLayer(input_size=[1, 28, 28], output_size=[4, 13, 13], strides=[2, 2], kernel_shape=(4, 4))	relu
2	ConvLayer(input_size=[4, 13, 13], output_size=[4, 5, 5], strides=[2, 2], kernel_shape=(4, 4))	relu
3	DenseLayer(input_size=[1, 100], output_size=[1, 50])	relu
4	DenseLayer(input_size=[1, 50], output_size=[1, 10])	linear


Finally, we can load `network_definition` as a full-space `FullSpaceNNFormulation` object.OMLT doesn't include a formulation for sigmoid, so define it here

In [11]:
formulation = FullSpaceNNFormulation(network_definition)

## Solve Optimal Adversary Problem in Pyomo

We now encode the trained neural network in a Pyomo model from the `FullSpaceNNFormulation` object. 

In [12]:
#create pyomo model
m = pyo.ConcreteModel()

#create an OMLT block for the neural network and build its formulation
m.nn = OmltBlock()
m.nn.build_formulation(formulation) 

Next, we define an adversarial label as the true label plus one (or zero if the true label is nine), as well as the objective function for optimization.

In [13]:
adversary = (label + 1) % 10
m.obj = pyo.Objective(expr=(-(m.nn.outputs[0,adversary]-m.nn.outputs[0,label])))

Finally, we solve the optimal adversary problem using a mixed-integer solver.

In [14]:
solver = pyo.SolverFactory('gurobi')
solver.options['mipgap'] = 0.01
solver.solve(m, tee=True)

Academic license - for non-commercial use only - expires 2023-01-12
Using license file /Users/calvintsay/gurobi.lic
Read LP format model from file /var/folders/pc/7mzx4b956_lb2l8_ryngwydc0000gn/T/tmpcs4b0910.pyomo.lp
Reading time = 0.04 seconds
x4871: 5739 rows, 4871 columns, 33357 nonzeros
Changed value of parameter mipgap to 0.01
   Prev: 0.0001  Min: 0.0  Max: inf  Default: 0.0001
Gurobi Optimizer version 9.1.1 build v9.1.1rc0 (mac64)
Thread count: 4 physical cores, 8 logical processors, using up to 8 threads
Optimize a model with 5739 rows, 4871 columns and 33357 nonzeros
Model fingerprint: 0xe52cb635
Variable types: 4045 continuous, 826 integer (826 binary)
Coefficient statistics:
  Matrix range     [2e-05, 3e+01]
  Objective range  [1e+00, 1e+00]
  Bounds range     [2e-04, 1e+02]
  RHS range        [2e-04, 2e+01]
Presolve removed 3931 rows and 2839 columns
Presolve time: 0.05s
Presolved: 1808 rows, 2032 columns, 11310 nonzeros
Variable types: 1580 continuous, 452 integer (452 bin

{'Problem': [{'Name': 'x4871', 'Lower bound': 14.632881835188687, 'Upper bound': 14.742084042745333, 'Number of objectives': 1, 'Number of constraints': 5739, 'Number of variables': 4871, 'Number of binary variables': 826, 'Number of integer variables': 826, 'Number of continuous variables': 4045, 'Number of nonzeros': 33357, 'Sense': 'minimize'}], 'Solver': [{'Status': 'ok', 'Return code': '0', 'Message': 'Model was solved to optimality (subject to tolerances), and an optimal solution is available.', 'Termination condition': 'optimal', 'Termination message': 'Model was solved to optimality (subject to tolerances), and an optimal solution is available.', 'Wall time': '0.21825098991394043', 'Error rc': 0, 'Time': 0.4484848976135254}], 'Solution': [OrderedDict([('number of solutions', 0), ('number of solutions displayed', 0)])]}