<a href="https://colab.research.google.com/github/drewlinsley/colabs/blob/master/Recurrent_vision_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**0. DISCoV 2/7/19**
You can find the [presentation link here](https://docs.google.com/presentation/d/1pJi1fPt7i7enJAClSVkPOw3pl515YN0Q44kvO8zSpdc/edit?usp=sharing).

**1. Python imports.**

Let's split this into separate blocks for clarity.

Here, you will import `drive` from the google colab library for connecting to [google drive](https://github.com/googlecolab/colabtools/blob/master/google/colab/drive.py) and `files` for [uploading local files](https://github.com/googlecolab/colabtools/blob/master/google/colab/files.py) (i.e. your machine) to this kernel. Finally, MediaFileUpload is a class for more efficient [uploads](https://github.com/googleapis/google-api-python-client/blob/master/googleapiclient/http.py). 

In [0]:
from google.colab import drive
from google.colab import files as cfiles
from googleapiclient.http import MediaFileUpload


In [0]:
import numpy as np  # Note numpy is aliased as np
from PIL import Image
import os
import shutil
from glob import glob  # File path collection
import tensorflow as tf  # Note tensorflow is aliased as tf
from matplotlib import pyplot as plt  # Library for plotting images

# Keras model utilities
from keras.models import Model  # A Keras class for constructing a deep neural network model
from keras.models import Sequential  # A Keras class for connecting deep neural network layers 
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau, EarlyStopping
from keras.preprocessing.image import ImageDataGenerator  # A class for data loading during model training
from keras.utils import np_utils

# Keras ResNet routines
from keras.applications.resnet50 import ResNet50  # Import the ResNet deep neural network
from keras.preprocessing import image  # Routines for loading image data
from keras.applications.resnet50 import preprocess_input  # ResNet-specific routines for preprocessing images
from keras.applications.resnet50 import decode_predictions  # ResNet-specific routines for extracting predictions

# Keras layers
from keras.layers import Dense  # A fully connected neural networld layer
from keras.layers import Activation  # A class for point-wise nonlinearity layers
from keras.layers import Flatten  # Reshape a tensor into a matrix
from keras.layers import Dropout  # A regularization layer which randomly zeros neural network units during training.
from keras.layers import InputLayer
from keras.layers import Lambda
from keras.layers.pooling import GlobalAveragePooling2D
from keras.layers import Conv2D
from keras.layers import BatchNormalization

# Optimizers
from keras.optimizers import Adam  # Adam optimizer https://arxiv.org/abs/1412.6980
from keras.optimizers import SGD  # Stochastic gradient descent optimizer

We will also install the google_images_download library to scrape images from websites. Later, we will pass these images through trained deep neural networks.

See how we can install libraries in python using the `pip` command. This is the python package manager, and is typically called with : `pip install my_package`. Note that because we are using the jupyter/ipython interface, we have to prepend an exclamation point `!` to call pip with the command line interpreter.

Install keras-contrib to get a resnet18.
Clone labmeeting from my repo to get the hgru/fgru modules.
Clone foolbox to apply adversarial perturbations to images.

Download CIFAR-100 for training models.

In [0]:
!pip install google_images_download
from google_images_download import google_images_download   #importing the library

# Clone labmeeting modules
!rm -rf labmeeting
!git clone https://github.com/drewlinsley/labmeeting
!touch labmeeting/__init__.py
!ls labmeeting/

# Clone foolbox
!pip install foolbox

# Download CIFAR-100
from keras.datasets import cifar100
(x_train, y_train), (x_test, y_test) = cifar100.load_data(label_mode='fine')

**2. Set global variables**
This is not "good programming practice" under typical circumstances, but it is reasonable for python notebooks. So bear with me.

We will set global variables (paths, etc.) and mount google drive here.

In [0]:
def make_dir(d):
    """Make directory d if it does not exist."""
    if not os.path.exists(d):
        os.makedirs(d)


IMG_DIR  = "/content/image_dataset"
CKPT_DIR  = "/content/image_dataset"
PROC_DIR = "%s_processed" % IMG_DIR
# drive.mount("/content/gdrive")
print("TensorFlow version: " + tf.__version__)

make_dir(IMG_DIR)
make_dir(PROC_DIR)

# # If necessary, clear out the directories
# !rm -rf /content/image_dataset
# !rm -rf /content/image_dataset_processed

**3. Augmentations**

CNNs are very sample inefficient: they need to be exposed to large amounts of image-level variability to reach their potential in image classification. This is because CNNs have very weak biases about natural images, relying only on convolutions that implement local-weight sharing.

Image datasets can be augmented with all sorts of transformations to expose CNNs to more variability and improve performance. You can do this easily in Keras with the built-in  [ImageDataGenerator class](https://keras.io/preprocessing/image/).

We set our batch size to 32 (the number of images we process at once during training), since this is a minimal size to use in a model that incorporates so-called "batch normalization" (like the ResNet).



In [0]:
batch_size = 32  # Number of images to process at once
height = 32
width = 32
channels = 3
nb_classes = 100
input_shape = [height, width, channels]

x_train = x_train.astype('float32')
mean = x_train.mean((0, 1, 2))
std = x_train.std((0, 1, 2))

x_train -= mean
x_train /= std
x_test = x_test.astype('float32')
x_test -= mean
x_test /= std
y_train = np_utils.to_categorical(y_train, nb_classes)
y_test = np_utils.to_categorical(y_test, nb_classes)

generator = ImageDataGenerator()
    # rotation_range=15,
    # width_shift_range=5./width,
    # height_shift_range=5./height)
generator.fit(x_train, seed=0)

**4. Build the models**

An fGRU + GAP + readout
[fgru class](https://github.com/drewlinsley/labmeeting/blob/master/fgru_linear.py)

In [0]:
from labmeeting import fgru_sigmoid, fgru_linear


def fgru_model(input_shape, classes, batch_size, training=True):
  """Create an htd-fgru model."""
  def apply_fgru(x):
    fgru_module = fgru_linear.hGRU(
      'fgru',
      x_shape=[batch_size] + input_shape[:-1] + [20],
      timesteps=8,
      h_ext=[{'h1': [15, 15]}, {'h2': [1, 1]}, {'fb1': [1, 1]}],
      strides=[1, 1, 1, 1],
      hgru_ids=[{'h1': 20}, {'h2': 128}, {'fb1': 20}],  # Num features per layer
      hgru_idx=[{'h1': 0}, {'h2': 1}, {'fb1': 2}],
      padding='SAME',
      aux={
          'readout': 'l2',  # Readout from fgru embedding
          'intermediate_ff': [32, 128],
          'intermediate_ks': [[3, 3], [3, 3]],
          'intermediate_repeats': [3, 3],
          'while_loop': False,
          'skip': True,
          'symmetric_weights': False,
          'include_pooling': True
      },
      pool_strides=[2, 2],
      pooling_kernel=[2, 2],
      train=training)
    return fgru_module.build(x)

  # Create model
  fgru = Sequential()
  fgru.add(Conv2D(kernel_size=(5, 5), padding='SAME', filters=20, input_shape=input_shape))
  fgru.add(Lambda(apply_fgru))
  fgru.add(BatchNormalization())
  fgru.add(GlobalAveragePooling2D())
  fgru.add(Dense(nb_classes, activation='softmax'))
  return fgru

fgru = fgru_model(
    input_shape=input_shape,
    classes=nb_classes,
    batch_size=batch_size,
    training=True)
print(fgru.summary())

A conv + hgru + GAP + readout
[hgru class](https://github.com/drewlinsley/labmeeting/blob/master/hgru_bn_for.py)

In [0]:
from labmeeting import hgru_bn_for


def hgru_model(input_shape, classes, batch_size, training=True):
  """Create an htd-fgru model."""
  def apply_hgru(x):
    hgru_module = hgru_bn_for.hGRU(
      'hgru_1',
      x_shape=[batch_size] + input_shape[:-1] + [20],
      timesteps=8,
      h_ext=15,
      strides=[1, 1, 1, 1],
      padding='SAME',
      aux={
        'reuse': False,
        'constrain': False,
        'hidden_init': 'zeros',
        'symmetric_weights': False},
      train=training)
    return hgru_module.build(x)
  hgru = Sequential()
  hgru.add(Conv2D(kernel_size=(5, 5), padding='SAME', filters=20, input_shape=input_shape))
  hgru.add(Lambda(apply_hgru))
  hgru.add(BatchNormalization())
  hgru.add(GlobalAveragePooling2D())
  hgru.add(Dense(nb_classes, activation='softmax'))
  return hgru

hgru = hgru_model(
    input_shape=input_shape,
    classes=nb_classes,
    batch_size=batch_size,
    training=True)
print(hgru.summary())

In [0]:
"""
Clean and simple Keras implementation of network architectures described in:
    - (ResNet-50) [Deep Residual Learning for Image Recognition](https://arxiv.org/pdf/1512.03385.pdf).
    - (ResNeXt-50 32x4d) [Aggregated Residual Transformations for Deep Neural Networks](https://arxiv.org/pdf/1611.05431.pdf).
    
Python 3.
"""

from keras import layers
from keras import models



#
# network params
#

cardinality = 1  # 32 for resnext


def residual_network(x, nb_classes):
    """
    ResNeXt by default. For ResNet set `cardinality` = 1 above.
    
    """
    def add_common_layers(y):
        y = layers.BatchNormalization()(y)
        y = layers.LeakyReLU()(y)

        return y

    def grouped_convolution(y, nb_channels, _strides):
        # when `cardinality` == 1 this is just a standard convolution
        if cardinality == 1:
            return layers.Conv2D(nb_channels, kernel_size=(3, 3), strides=_strides, padding='same')(y)
        
        assert not nb_channels % cardinality
        _d = nb_channels // cardinality

        # in a grouped convolution layer, input and output channels are divided into `cardinality` groups,
        # and convolutions are separately performed within each group
        groups = []
        for j in range(cardinality):
            group = layers.Lambda(lambda z: z[:, :, :, j * _d:j * _d + _d])(y)
            groups.append(layers.Conv2D(_d, kernel_size=(3, 3), strides=_strides, padding='same')(group))
            
        # the grouped convolutional layer concatenates them as the outputs of the layer
        y = layers.concatenate(groups)

        return y

    def residual_block(y, nb_channels_in, nb_channels_out, _strides=(1, 1), _project_shortcut=False):
        """
        Our network consists of a stack of residual blocks. These blocks have the same topology,
        and are subject to two simple rules:

        - If producing spatial maps of the same size, the blocks share the same hyper-parameters (width and filter sizes).
        - Each time the spatial map is down-sampled by a factor of 2, the width of the blocks is multiplied by a factor of 2.
        """
        shortcut = y

        # we modify the residual building block as a bottleneck design to make the network more economical
        y = layers.Conv2D(nb_channels_in, kernel_size=(1, 1), strides=(1, 1), padding='same')(y)
        y = add_common_layers(y)

        # ResNeXt (identical to ResNet when `cardinality` == 1)
        y = grouped_convolution(y, nb_channels_in, _strides=_strides)
        y = add_common_layers(y)

        y = layers.Conv2D(nb_channels_out, kernel_size=(1, 1), strides=(1, 1), padding='same')(y)
        # batch normalization is employed after aggregating the transformations and before adding to the shortcut
        y = layers.BatchNormalization()(y)

        # identity shortcuts used directly when the input and output are of the same dimensions
        if _project_shortcut or _strides != (1, 1):
            # when the dimensions increase projection shortcut is used to match dimensions (done by 1×1 convolutions)
            # when the shortcuts go across feature maps of two sizes, they are performed with a stride of 2
            shortcut = layers.Conv2D(nb_channels_out, kernel_size=(1, 1), strides=_strides, padding='same')(shortcut)
            shortcut = layers.BatchNormalization()(shortcut)

        y = layers.add([shortcut, y])

        # relu is performed right after each batch normalization,
        # expect for the output of the block where relu is performed after the adding to the shortcut
        y = layers.LeakyReLU()(y)

        return y

    # conv1
    x = layers.Conv2D(64, kernel_size=(7, 7), strides=(2, 2), padding='same')(x)
    x = add_common_layers(x)

    # conv2
    x = layers.MaxPool2D(pool_size=(3, 3), strides=(2, 2), padding='same')(x)
    for i in range(2):
        project_shortcut = True if i == 0 else False
        x = residual_block(x, 128, 256, _project_shortcut=project_shortcut)

    # conv3
    for i in range(1):
        # down-sampling is performed by conv3_1, conv4_1, and conv5_1 with a stride of 2
        strides = (2, 2) if i == 0 else (1, 1)
        x = residual_block(x, 256, 512, _strides=strides)

    # conv4
    for i in range(1):
        strides = (2, 2) if i == 0 else (1, 1)
        x = residual_block(x, 512, 1024, _strides=strides)

    # conv5
    for i in range(1):
        strides = (2, 2) if i == 0 else (1, 1)
        x = residual_block(x, 1024, 2048, _strides=strides)

    x = layers.GlobalAveragePooling2D()(x)
    x = layers.Dense(nb_classes, activation='softmax')(x)

    return x


image_tensor = layers.Input(shape=input_shape)
network_output = residual_network(image_tensor, nb_classes=nb_classes)
  
cnn = models.Model(inputs=[image_tensor], outputs=[network_output])
print(cnn.summary())

**5. Training hyperparameters and functions.***

In [0]:
def plot_training(history, plot_val=False):
  """Plot the training and validation loss + accuracy"""
  acc = history.history['acc']
  # val_acc = history.history['val_acc']
  loss = history.history['loss']
  # val_loss = history.history['val_loss']
  epochs = range(len(acc))

  f = plt.figure()
  plt.subplot(121)
  plt.plot(epochs, acc, label='Train')
  if plot_val:
    plt.plot(epochs, val_acc, label='Val')
  plt.title('Training and validation accuracy')
  plt.legend()

  plt.subplot(122)
  plt.plot(epochs, loss, label='Train')
  if plot_val:
    plt.plot(epochs, val_loss, label='Val')
  plt.title('Training and validation loss')
  plt.legend()
  plt.savefig('acc_vs_epochs.png')
  plt.show()


filepath="/content/fietuned_ResNet50_model_weights.h5"
epochs = 10  # How many loops through the entire dataset
num_train_images = len(x_train)
lr = 1e-2
adam = Adam(lr=lr)
lr_reducer = ReduceLROnPlateau(
    monitor='val_loss',
    factor=np.sqrt(0.1),
    cooldown=0,
    patience=10,
    min_lr=0.5e-6)
early_stopper = EarlyStopping(
    monitor='val_acc',
    min_delta=0.0001,
    patience=20)

**6. Train the fGRU.**

In [0]:
model_checkpoint= ModelCheckpoint(
    "fgru_v1.h5",
    monitor="val_acc",
    save_best_only=True,
    save_weights_only=True)
callbacks=[lr_reducer, early_stopper, model_checkpoint]
checkpoint = ModelCheckpoint(filepath, monitor=["acc"], verbose=1, mode='max')
fgru.compile(adam, loss='categorical_crossentropy', metrics=['accuracy'])
fgru_history = fgru.fit_generator(
  generator.flow(x_train, y_train, batch_size=batch_size),
  samples_per_epoch=len(x_train),
  nb_epoch=epochs,
  validation_data=(x_test, y_test),
  nb_val_samples=len(x_test),
  shuffle=True,
  callbacks=callbacks)
plot_training(fgru_history)

**7. Train the hGRU.**

In [0]:
model_checkpoint= ModelCheckpoint(
    "hgru_v1.h5",
    monitor="val_acc",
    save_best_only=True,
    save_weights_only=True)
callbacks=[lr_reducer, early_stopper, model_checkpoint]
checkpoint = ModelCheckpoint(filepath, monitor=["acc"], verbose=1, mode='max')
hgru.compile(adam, loss='categorical_crossentropy', metrics=['accuracy'])
hgru_history = hgru.fit_generator(
  generator.flow(x_train, y_train, batch_size=batch_size),
  samples_per_epoch=len(x_train),
  nb_epoch=epochs,
  validation_data=(x_test, y_test),
  nb_val_samples=len(x_test),
  shuffle=True,
  callbacks=callbacks)
plot_training(hgru_history)

**8. Train a ResNet control.**

In [0]:
model_checkpoint= ModelCheckpoint(
    "cnn_v1.h5",
    monitor="val_acc",
    save_best_only=True,
    save_weights_only=True)
callbacks=[lr_reducer, early_stopper, model_checkpoint]
checkpoint = ModelCheckpoint(filepath, monitor=["acc"], verbose=1, mode='max')
cnn.compile(adam, loss='categorical_crossentropy', metrics=['accuracy'])
cnn_history = hgru.fit_generator(
  generator.flow(x_train, y_train, batch_size=batch_size),
  samples_per_epoch=len(x_train),
  nb_epoch=epochs,
  validation_data=(x_test, y_test),
  nb_val_samples=len(x_test),
  shuffle=True,
  callbacks=callbacks)
plot_training(cnn_history)

**9. Check adversarial tolerance.**

In [0]:
import keras
import foolbox


keras.backend.set_learning_phase(0)


cnn_fmodel = foolbox.models.KerasModel(cnn, bounds=(x_train.min(), x_train.max()))
cnn_attack = foolbox.attacks.FGSM(cnn_fmodel)
cnn_adversarial = cnn_attack(x_test[0], y_test[0])
print cnn_adversarial
# if the attack fails, adversarial will be None and a warning will be printed

fgru_fmodel = foolbox.models.KerasModel(fgru, bounds=(x_train.min(), x_train.max()))
fgru_attack = foolbox.attacks.FGSM(fgru_fmodel)
fgru_adversarial = fgru_attack(x_test[0], y_test[0])
print fgru_adversarial


**10. Check noise tolerance.**


In [0]:
import sklearn.metrics as metrics


# Compare performance on noisy versus non-noisy images
kernel = 0.1
noise_x_test = x_test + np.random.random(size=x_test.shape) * kernel
noise_x_test = np.minimum(np.maximum(x_test.max(), noise_x_test), noise_x_test)

# Regular
y_preds = cnn.predict(x_test)
y_pred = np.argmax(y_preds, axis=-1)
arg_y = np.argmax(y_test, axis=-1)
cnn_accuracy = metrics.accuracy_score(arg_y, y_pred) * 100
y_preds = fgru.predict(x_test)
y_pred = np.argmax(y_preds, axis=-1)
fgru_accuracy = metrics.accuracy_score(arg_y, y_pred) * 100

# Noisy
noise_y_preds = cnn.predict(noise_x_test)
noise_y_pred = np.argmax(noise_y_preds, axis=-1)
noise_cnn_accuracy = metrics.accuracy_score(arg_y, noise_y_pred) * 100
noise_y_preds = fgru.predict(noise_x_test)
noise_y_pred = np.argmax(noise_y_preds, axis=-1)
noise_fgru_accuracy = metrics.accuracy_score(arg_y, noise_y_pred) * 100

print 'CNN goes from %s%% to %s%% with %s uniform noise' % (cnn_accuracy, noise_cnn_accuracy, kernel)
print 'fGRU goes from %s%% to %s%% with %s uniform noise' % (fgru_accuracy, noise_fgru_accuracy, kernel)
