# ART

The Adversarial Robustness Toolbox(ART) is a Python library which is one of the complete resources providing developers and researchers for evaluating the robustness of deep neural networks against adversarial attacks. Open-sourced by IBM, ART provides support to incorporate techniques to prevent adversarial attacks for deep neural networks written in TensorFlow, Keras, PyTorch, sci-kit-learn, MxNet, XGBoost, LightGBM, CatBoost and many more deep learning frameworks. It can be applied to all kinds of data from images, video, tables, to audio, and many more. It is cross-platform and supports various machine learning tasks such as classification, speech recognition, object detection, generation, certification, etcc.

Please refer [this](https://analyticsindiamag.com/adversarial-robustness-toolbox-art/) article, to know about it more.


# Adversarial Action Recognition Attack

This demonstrates the usage of ART library to impose an adversarial attack on video action recognition. First, it uses GluonCV and MXNet for video action recognition. MXNet pre-trained models are used for classification tasks. Specifically, the pre-trained i3d_resnet50_v1_ucf101 model is used. The video clip of a basketball action taken from the UCF101 dataset. To show how to classify the following short video clip correctly.

# Initial working stages  

> * the sample basketball to be downloaded 
> * the pre-trained action recognition model is to be loaded
> * To show that the model can correctly classify the video action as playing basketball.

## Loading Model and Basketball Sample

In [None]:
!python -m pip install pip --upgrade --user -q
!python -m pip install numpy pandas seaborn matplotlib scipy sklearn statsmodels tensorflow keras --user -q

In [None]:
!python -m pip install decord --user -q
!python -m pip install gluoncv --user -q
!python -m pip install mxnet --user -q
!python -m pip install adversarial-robustness-toolbox --user -q

In [None]:
import IPython
IPython.Application.instance().kernel.do_shutdown(True)

In [None]:
import os
import tempfile
import decord
from gluoncv import utils
from gluoncv.data.transforms import video
from gluoncv.model_zoo import get_model
from gluoncv.utils.filesystem import try_import_decord
import imageio
from matplotlib.image import imsave
import matplotlib.pyplot as plt
import mxnet as mx
from mxnet import gluon, nd, image
from mxnet.gluon.data.vision import transforms
import numpy as np
from art.attacks.evasion import FastGradientMethod, FrameSaliencyAttack
from art import config
from art.defences.preprocessor import VideoCompression
from art.estimators.classification import MXClassifier 

In [None]:
# setting global variables

PRETRAINED_MODEL_NAME = 'i3d_resnet50_v1_ucf101'
VIDEO_SAMPLE_URI = 'https://github.com/bryanyzhu/tiny-ucf101/raw/master/v_Basketball_g01_c01.avi' 

### Setting seed

In [None]:
np.random.seed(123)
def predict_top_k(video_input, model, k=5, verbose=True):
    pred = model(nd.array(video_input))
    classes = model.classes    
    ind = nd.topk(pred, k=k)[0].astype('int')
    if verbose:
        msg = "The sample video clip is"
        for i in range(k):
            msg += f"\n\t[{classes[ind[i].asscalar()]}], with probability {nd.softmax(pred)[0][ind[i]].asscalar():.3f}."
        print(msg)
    return ind
def sample_to_gif(sample, output="sample.gif", path=config.ART_DATA_PATH, postprocess=None):
    frame_count = sample.shape[1]
    output_path = os.path.join(path, output)
    with tempfile.TemporaryDirectory() as tmpdir, imageio.get_writer(output_path, mode='I') as writer:
        for frame in range(frame_count):
            file_path = os.path.join(tmpdir, f"{frame}.png")
            imsave(file_path, np.transpose(sample[:,frame,:,:], (1,2,0)))
            writer.append_data(imageio.imread(file_path))
    return output_path 

### Downloading sample video

In [None]:
decord = try_import_decord()
video_fname = utils.download(VIDEO_SAMPLE_URI, path=config.ART_DATA_PATH);
video_reader = decord.VideoReader(video_fname)
frame_id_list = range(0, 64, 2)
video_data = video_reader.get_batch(frame_id_list).asnumpy()
video_sample_lst = [video_data[vid, :, :, :] for vid, _ in enumerate(frame_id_list)] 

### Preprocessing the benign video sample

In [None]:
transform_fn = video.VideoGroupValTransform(size=224, mean=[0.475, 0.465, 0.475], std=[0.220, 0.200, 0.225])
sample = np.stack(transform_fn(video_sample_lst),  axis=0)
sample = sample.reshape((-1,) + (32, 3, 224, 224))
sample = np.transpose(sample, (0, 2, 1, 3, 4))
print(f"`{video_fname}` has been downloaded and preprocessed.") 

In [None]:
# loading pretrained model

model = get_model(PRETRAINED_MODEL_NAME, nclass=101, pretrained=True)
print(f"`{PRETRAINED_MODEL_NAME}` model was successfully loaded.") 

In [None]:
# evaluating model on basketball video sample

_ = predict_top_k(sample, model)

For the given video sample, it is seen that the model correctly classified it as playing basketball.

Now we can include the ART library for the adversarial attack via the Fast Gradient Method. The attack is incorporated to corrupt the video sample so that it could be misclassified. Also, the adversarial example is converted into a GIF 

Adversarial Basketball



In [None]:
# preprocessing the adversarial sample video input

transform_fn_unnormalized = video.VideoGroupValTransform(size=224, mean=[0, 0, 0], std=[1, 1, 1])
adv_sample_input = np.stack(transform_fn_unnormalized(video_sample_lst),  axis=0)
adv_sample_input = adv_sample_input.reshape((-1,) + (32, 3, 224, 224))
adv_sample_input = np.transpose(adv_sample_input, (0, 2, 1, 3, 4)) 

In [None]:
# wrapping model in a ART classifier

model_wrapper = gluon.nn.Sequential()
with model_wrapper.name_scope():
    model_wrapper.add(model) 

# preparing the mean and std arrays for ART classifier preprocessing

mean = np.array([0.485, 0.456, 0.406] * (32 * 224 * 224)).reshape((3, 32, 224, 224), order='F')
std = np.array([0.229, 0.224, 0.225] * (32 * 224 * 224)).reshape((3, 32, 224, 224), order='F')
classifier_art = MXClassifier(
    model=model_wrapper,
    loss=gluon.loss.SoftmaxCrossEntropyLoss(),
    input_shape=(3, 32, 224, 224),
    nb_classes=101,
    preprocessing=(mean, std),
    clip_values=(0, 1),
    channels_first=True,
) 

In [None]:
# verifying whether the ART classifier predictions are consistent with the original model:

pred = nd.array(classifier_art.predict(adv_sample_input))
ind = nd.topk(pred, k=5)[0].astype('int')
msg = "The video sample clip is classified"
for i in range(len(ind)):
    msg += f"\n\t[{model.classes[ind[i].asscalar()]}], with probability {nd.softmax(pred)[0][ind[i]].asscalar():.3f}."
print(msg) 

In [None]:
# crafting adversarial attack with FGM

epsilon = 8/255
fgm = FastGradientMethod(
    classifier_art,
    eps=epsilon,
)

In [None]:
adv_sample = fgm.generate(
    x=adv_sample_input
) 

In [None]:
#printing results

_ = predict_top_k((adv_sample-mean)/std, model)

In [None]:
# saving adversarial example to gif:

adversarial_gif = sample_to_gif(np.squeeze(adv_sample), "adversarial_basketball.gif")
print(f"`{adversarial_gif}` has been successfully created.") 



Creating Sparse Adversarial Attack

Using the Frame Saliency Attack, now it’s time to create a sparse adversarial example. The final result is shown in the GIF. Here only one frame is needed to be perturbed to achieve a misclassification.

adversarial_basketball_sparse.gif

In [None]:
# Frame Saliency Attack. Note: we specify here the frame axis, which is 2.

fsa = FrameSaliencyAttack(
    classifier_art,
    fgm,
    "iterative_saliency",
    frame_index = 2
)


In [None]:
%%time
adv_sample_sparse = fsa.generate(
    x=adv_sample_input
) 

In [None]:
_ = predict_top_k((adv_sample_sparse-mean)/std, model)

In [None]:
# Again saving the adversarial example to gif:

adversarial_sparse_gif = sample_to_gif(np.squeeze(adv_sample_sparse), "adversarial_basketball_sparse.gif")
print(f"`{adversarial_sparse_gif}` has been successfully created.") 

In [None]:
# counting the number of perturbed frames:

x_diff = adv_sample_sparse - adv_sample_input
x_diff = np.swapaxes(x_diff, 1, 2)
x_diff = np.reshape(x_diff, x_diff.shape[:2] + (np.prod(x_diff.shape[2:]), ))
x_diff_norm = np.sign(np.round(np.linalg.norm(x_diff, axis=-1), decimals=4))
print(f"Number of perturbed frames: {int(np.sum(x_diff_norm))}") 


Applying H.264 compression defence

Next VideoCompression is applied as a simple input preprocessing defence mechanism. This defence is intended to correct predictions when applied to both the original and the adversarial video input.

Initializing VideoCompression defense

In [None]:
video_compression = VideoCompression(video_format="avi", constant_rate_factor=30, channels_first=True)
# applying defense to the original input
adv_sample_input_compressed = video_compression(adv_sample_input * 255)[0] / 255
# applying defense to the sparse adversarial sample
adv_sample_sparse_compressed = video_compression(adv_sample_sparse * 255)[0] / 255
# printing the resulting predictions on compressed original input
_ = predict_top_k((adv_sample_input_compressed-mean)/std, model)
# printing the resulting predictions on sparse adversarial sample
_ = predict_top_k((adv_sample_sparse_compressed-mean)/std, model)