# Robust methods for Machine Learning

## Let's start simple: attack a linear model

#### Tutorial #1 (Anne Gagneux)

In [None]:
import numpy as np
import matplotlib.pyplot as plt

from torchvision import datasets, transforms
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

We are going to attack a linear model for binary classification.
We focus on MNIST dataset where we only keep the $3$ and $7$ digits.
In our setting, $\mathbf X_{\text{train}}$ is the training dataset of images ($7$ and $3$) and $y_{\text{train}}$ are the matching ground-truth labels.

Our linear model builds a decision function based on a hyperplane:
$$ y_{\text{pred}} = \text{sign} (w^T x + b) $$

The algorithm, i.e. Logistic regression, learns $w$ and $b$.

In [None]:
# Load MNIST
mnist_train = datasets.MNIST("./data", train=True, download=True)
mnist_test = datasets.MNIST("./data", train=False, download=True)

# Only keep 3 and 7
train_idx = (mnist_train.targets == 3) + (mnist_train.targets == 7)

mnist_train.data = mnist_train.data[train_idx]
mnist_train.targets = mnist_train.targets[train_idx]

test_idx = (mnist_test.targets == 3) + (mnist_test.targets == 7)
mnist_test.data = mnist_test.data[test_idx]
mnist_test.targets = mnist_test.targets[test_idx]

X_train, y_train = mnist_train.data.numpy(), mnist_train.targets.numpy()
X_train = X_train.reshape(X_train.shape[0], -1)

# scale the data to ease optimization
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)

X_test, y_test = mnist_test.data.numpy(), mnist_test.targets.numpy()
X_test = X_test.reshape(X_test.shape[0], -1)
X_test = scaler.transform(X_test)

In [None]:
# Train logistic regression
logreg = LogisticRegression(solver='lbfgs', max_iter=2000)
logreg.fit(X_train, y_train)
print('accuracy on test set = {}'.format(logreg.score(X_test, y_test)))

In [None]:
x1 = X_test[0]
# Pick a point in the dataset


def show(x, classifier, ax=None):
    if ax is None:
        fig, ax = plt.subplots(1, 1)
    ax.set_title('Prediction: %s \n Confidence: %d %%' %
              (classifier.predict([x])[0],
               100 * classifier.predict_proba([x]).max()),
              fontsize=14)
    xx = scaler.inverse_transform([x]).reshape((28, 28))
    ax.imshow(xx, cmap=plt.cm.gray_r, vmin=0, vmax=255)
    ax.axis('off')


show(x1, logreg)

![Projection](decision-boundary.jpg)

If we denote $x_1$ our image. 
The shorter distance to a point at the frontier is the orthogonal projection on the hyperplane $w^T x + b = 0$.

<span style="color:orange">**Write the projection operator onto the hyperplane**</span>

In [None]:
w = logreg.coef_[0]
b = logreg.intercept_

x_L2 = ... # TO COMPLETE: project x onto the  decision frontier w @ x + b = 0
x_L2 = x1 - ( b + w @ x1) / np.linalg.norm(w) ** 2 * w # TO COMPLETE: project x onto the decision frontier w @ x + b = 0

# print(w @ x_L2 + b)  # should be 0
show(x_L2, logreg)

What if we want to force prediction of $x_1$ to be a $3$ ? 

<span style="color:orange">**Write an explicit formula forcing $x_1$ to be misclassified as a 3**</span>


![Projection](decision-boundary-2.jpg)



In [None]:
x3 = ... # TO COMPLETE
show(x3)

Up to now, we have minimized the $\ell_2$ distance.
Indeed, the orthogonal projection writes as:
$$\min_x \Vert x-x_1 \Vert_2 \text{ subject to } w^Tx+ b = 0$$

What if we want to minimize the maximum variation of each pixel ? 
$\rightarrow$ We use the $\ell_\infty$ distance.

Our new minimization problem is:
$$\min_x \Vert x-x_1 \Vert_\infty \text{ subject to } w^Tx+ b = 0$$

<span style="color:orange">**Solve the $\ell_\infty$ optimization problem**</span>

*Recall (Holder's Inequality)*
$$|x^T y| \leq \Vert x \Vert_1 \Vert y \Vert_\infty$$

![Projection infty](decision-boundary-infty.jpg)

In [None]:
x_Linfty = x1 - (b + w @ x1) / np.sum(np.abs(w)) * np.sign(w)
show(x_Linfty, logreg)

**Bonus**: solve the <span style="color:orange">**Solve the $\ell_1$ optimization problem**</span>

You can still use Holder's inequality, but permuting $x$ and $y$ this time.
$$|x^T y| \leq \Vert y \Vert_1 \Vert x \Vert_\infty$$

In [None]:
perturb_L1 = np.zeros_like(w)
idx = np.argmax(np.abs(w))
perturb_L1[idx] = np.sign(w[idx])
x_L1 = x1 - (b + w @ x1) / np.max(np.abs(w)) * perturb_L1

Let's visualize the original digit together with the 3 attacks ($\ell_2, \ell_\infty, \ell_1$).

In [None]:
fig, axes = plt.subplots(1, 4, constrained_layout=True, figsize=(10, 20))
for x, ax in zip([x1, x_L1, x_L2, x_Linfty], axes):
    show(x, logreg, ax)

We can also visualize the adversarial perturbations:

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(10, 9), constrained_layout=True)

for x_adv, ax in zip([x_Linfty, x_L2, x_L1], axes):
    im = ax.imshow((x1 - x_adv).reshape(28, 28), cmap=plt.cm.gray_r,)
    fig.colorbar(im, ax=ax, fraction=0.046, pad=0.04)

for ax in axes:
    ax.axis("off")