In [2]:
from preamble import *
HTML('''<style>html, body{overflow-y: visible !important} .CodeMirror{min-width:100% !important;} .rise-enabled .CodeMirror, .rise-enabled .output_subarea{font-size:100%; line-height:1.0; overflow: visible;} .output_subarea pre{width:100%}</style>''') # For slides
#HTML('''<style>html, body{overflow-y: visible !important} .output_subarea{font-size:100%; line-height:1.0; overflow: visible;}</style>''') # For slides
InteractiveShell.ast_node_interactivity = "all"

## Agenda

- Introduction and Motivation
- Artificial Neuron
- Gradient Descent
- Backpropagation
- Perceptron
- Multilayered Perceptron
- MLP Classification
- Model Design
- Optimization

- **Convolutional Neural Network**
- Recurrent Neural Network

## Convolutional Neural Networks

![Mnist-digits](images/mnistdigits.gif)

## Image data 
    - How is it different from other types of data?
![Image MNIST](images/MNIST-Matrix.png)


## Speech data
![Speech data](images/speech.jpg)


#### High Dimensionality
#### Local Correlations

#### Convolutional Neural Networks (CNN) - utilize the local correlation property

![](images/conv0.png)

![](images/conv-0-0-1.png)

![](images/conv-0-0-2.png)

![](images/conv0-1.png)

## Parameters re-use
![](images/conv0-2.png)

#### Input Visualization

![Block of the image](images/conv1.png)



#### Conv Nets operate on volumes
    - Take volumes of activations and produce volumes of activations

#### MLPs work with vectors
    - Take vectors of activations and produce vectors of activations

#### Convolution Dimensionality
    - for 2D convolution the volumes of activations are 3D tensor
    - for 3D convolutions the volumes are 4D tensor
    - for 1D convolutions the volumes are 2D matrix

#### The input goes through a filter (receptive field)

![Image volume plus filter](images/conv2.png)



![Image volume plus filter](images/conv3.png)
#### The depth of the filter is equal to the depth of the image
#### Discrete convolution
    - Neuron computes: Afine transformation + non-linearity
### Convolutional layer
   - Hyper Parameters: filter dimensions, stride, padding


![Image volume plus filter](images/conv4.png)
  
#### Fully convolving the input -> produces an activation map 

![](images/conv5.png)

#### Second neuron 

![](images/conv6.png)

#### Third neuron

![](images/conv7.png)

#### Combined representation

![](images/conv8.png)

![](images/conv9.png)

### Convolutional layer
- Accepts: 
    - $W_1 \times H_1 \times D_1$
- Outputs: 
    - $W_2 = (W_1 - F + 2P) / S + 1$
    - $H_2 = (H_1 - F + 2P) / S + 1$
    - $D_2 = K$
- Where:
    - $F$ is the filter size
    - $P$ is the padding size
    - $S$ is the stride
    - $K$ is the layer depth (number of neurons)

#### Depiction of the layers of the CNN

![](images/conv10.png)

Sequential processing of information
![Biological inspiration for Convolutions](images/biology-inspiration.png)

### Training -> Gradient Decent

### $$\frac{\partial L}{\partial \theta} $$

### Back propagation
![Back prop node](images/backprop-node2.png)

### Convolutional Neuron (Step 1)
![Conv](images/backprop-conv-forward1.png)

### Convolutional Neuron (Step 2)
![Conv](images/backprop-conv-forward2.png)

### Convolutional Neuron (Step 3)

![Conv](images/backprop-conv-forward3.png)

### Convolutional Neuron Forward pass
![Conv](images/backprop-conv-forward4.png)

![Conv](images/backprop-conv-forward5.png)

![Conv](images/backprop-conv-forward6.png)

### Convolutional Neuron Backward pass
![Conv](images/backprop-conv-back0.png)

### Backward pass on $w_1$ (Step 1)
![Conv](images/backprop-conv-back1.png)

### Backward pass on $w_1$ (Step 2)

![Conv](images/backprop-conv-back2.png)

### Backward pass on $w_1$ (Step 3)

![Conv](images/backprop-conv-back3.png)

### Backward pass on $x_1$ (Step 1)

![Conv](images/backprop-conv-back4.png)

### Backward pass on $x_1$ (Step 2)

![Conv](images/backprop-conv-back5.png)

### Propagating activation forward
#### Vector form
![Conv](images/backprop-conv-vector-forward.png)

### Propagating gradient backward
#### Vector form

![Conv](images/backprop-conv-back.png)

### Propagating gradient backward 2D
#### Vector form 2D

![Conv](images/backprop-conv-vector2d.png)

Backprop through a convolutional layer
- The grad of each parameter is spec. pre-activations times the grad of the loss wrt to post activations
    - $\sum_n^{|x| - |w| +1}\frac{\partial L}{\partial y_n}x_{n+i-1}$
The gradient flows in blocks back analogously as the activations flow forward
- However the convolutional operation is done backwards as well. This can be achieved by doing the convolutional operations in the right way with a flipped filter
    - $\delta_i = \sum_{i=1}^{|w|}\frac{\partial L}{\partial y_{n-i+1}}w_i$
    - $\delta = \frac{\partial L}{\partial y}*flip(w)$

### Convolutional Network
#### Architectural Depiction

![](images/conv11.png)

### Convolutional Network
#### Architectural Depiction

![](images/conv12.png)

### Subsampling

#### Maxpooling
![Max Pooling](images/maxpool.png)

- Field of view
- Stride

#### Backpropagation Maxpooling
$$
a(x)= max(x), \frac{\partial a(x)}{\partial x_i} \begin{cases}
1,\ if\ x_i=max(x)\\
0,\ otherwise
\end{cases}
$$

### Subsampling

#### Average Pooling
$$
a(x)= \frac{1}{m}\sum_m(x), \frac{\partial a(x)}{\partial x} = \frac{1}{m}
$$


![](images/conv13.png)

### CNN model

#### Input

#### Convolutional Layers

#### Flattening

#### MLP

#### Output Layers

![](images/conv14.png)

#### Regularization

![](images/conv15.png)

### CNN Execution
- MNIST dataset
- Model:
![Mnist model](images/mnist-conv-net.png)

### Input Layer
![](images/cnn-input.png)


![](images/conv-1.png)

### Layer 1: 
![](images/conv-1-relu.png)

### Layer 1: Parameters
![](images/conv-1-weights.png)

### Layer 2: Pooling
![](images/pool-1.png)

### Layer 3: Convolutional Activations
![](images/conv-2-relu.png)

### Layer 3: Weights
![](images/conv-2-weights.png)

### Layer 4: Pooling
![](images/pool-2.png)

### Layer 5: Fully conected
![](images/fc-relu.png)

### Layer 6: Softmax
![](images/fc-softmax.png)

#### LeNet-5

![LeNet-5](images/lenet5.png)

#### Example Alex Net

![Alex Net](images/alexnet.png)


### VGG Network
3x3 stride 1 
maxpool 2x2 stride 2

24Mil parameters 94MB of information

![VGG Architecture](images/vgg-architecture.png)

![VGG Network](images/vgg-network.png)


Memory for computation vs. parameters of the model

Recent developments. You can get rid of the fully connected layers (size of the model decreases). Average pooling

GoogLeNet

![GoogLeNet](images/googlenet.png)


12x less parameters than Alex

GoogleNet 6.67%

Human level performance would be about 5%

### ResNet 
![Resnet ](images/resnet.png)

Expermmets with up to 157 layers

Ensamble of ResNet models 3.57% Error
2-3 weeks on a 8 GPU machine

### Image Analysis

#### Object detection
    => Classification
![Image from other slides](images/cnn-classification.png)

#### Object localization
    => Classification + Regression
    => Multi-output models
    => Different lossfunction, gradient propagation
![Image from other slides](images/cnn-localization.png)

#### Object segmentation
    => Pixel classification
    => Tensor output
    => Complex loss function
![Image from other slides](images/cnn-segmentation.png)

#### Filtering
    => Image output
    => Tensor output
    => Complex loss function
![Image from other slides](images/cnn-filtering.png)




- Visualize what the network has learned
    - Low level filters
    - Mid level
    - Visualize the activations

### What we've learned so far...

### Data Augmentation
    - Another regularizer
    - Opportunity to inject expert knowledge
    
#### Running example

##### Radio graphs - XRay Images
- Classification
![XRay images](images/cnn-augmentation.png)
- What can we do to augment?

## MNIST Conv Net (keras implementation)


In [1]:
## Imports
from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

# Training parameters
batch_size = 128
num_classes = 10
epochs = 12

Using TensorFlow backend.


In [2]:
# Data preparation

# input image dimensions
img_rows, img_cols = 28, 28

# the data, shuffled and split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)



x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


In [3]:
# Model definition
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])


In [4]:
# Training loop

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Train on 60000 samples, validate on 10000 samples
Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12
Test loss: 0.0280949982703
Test accuracy: 0.9909


## MNIST Conv Net (tensorflow implementation)


In [3]:
import tensorflow as tf

# Import MNIST data
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz


In [5]:
# Parameters
learning_rate = 0.001
training_iters = 200000
batch_size = 128
display_step = 10

# Network Parameters
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
dropout = 0.75 # Dropout, probability to keep units

# tf Graph input
x = tf.placeholder(tf.float32, [None, n_input])
y = tf.placeholder(tf.float32, [None, n_classes])
keep_prob = tf.placeholder(tf.float32) #dropout (keep probability)

In [6]:
# Create some wrappers for simplicity
def conv2d(x, W, b, strides=1):
    # Conv2D wrapper, with bias and relu activation
    x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
    x = tf.nn.bias_add(x, b)
    return tf.nn.relu(x)


def maxpool2d(x, k=2):
    # MaxPool2D wrapper
    return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
                          padding='SAME')


# Create model
def conv_net(x, weights, biases, dropout):
    # Reshape input picture
    x = tf.reshape(x, shape=[-1, 28, 28, 1])

    # Convolution Layer
    conv1 = conv2d(x, weights['wc1'], biases['bc1'])
    # Max Pooling (down-sampling)
    conv1 = maxpool2d(conv1, k=2)

    # Convolution Layer
    conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
    # Max Pooling (down-sampling)
    conv2 = maxpool2d(conv2, k=2)

    # Fully connected layer
    # Reshape conv2 output to fit fully connected layer input
    fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
    fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
    fc1 = tf.nn.relu(fc1)
    # Apply Dropout
    fc1 = tf.nn.dropout(fc1, dropout)

    # Output, class prediction
    out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])
    return out

In [8]:
# Store layers weight & bias
weights = {
    # 5x5 conv, 1 input, 32 outputs
    'wc1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
    # 5x5 conv, 32 inputs, 64 outputs
    'wc2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
    # fully connected, 7*7*64 inputs, 1024 outputs
    'wd1': tf.Variable(tf.random_normal([7*7*64, 1024])),
    # 1024 inputs, 10 outputs (class prediction)
    'out': tf.Variable(tf.random_normal([1024, n_classes]))
}

biases = {
    'bc1': tf.Variable(tf.random_normal([32])),
    'bc2': tf.Variable(tf.random_normal([64])),
    'bd1': tf.Variable(tf.random_normal([1024])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

# Construct model
pred = conv_net(x, weights, biases, keep_prob)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

# Evaluate model
correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
init = tf.global_variables_initializer()

In [14]:
# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    step = 1
    # Keep training until reach max iterations
    while step * batch_size < training_iters:
        batch_x, batch_y = mnist.train.next_batch(batch_size)
        # Run optimization op (backprop)
        sess.run(optimizer, feed_dict={x: batch_x, y: batch_y,
                                       keep_prob: dropout})
        if step % display_step == 0:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x,
                                                              y: batch_y,
                                                              keep_prob: 1.})
            print ("Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
                  "{:.6f}".format(loss) + ", Training Accuracy= " + \
                  "{:.5f}".format(acc))
        step += 1
    print ("Optimization Finished!")

    # Calculate accuracy for 256 mnist test images
    print ("Testing Accuracy:", \
        sess.run(accuracy, feed_dict={x: mnist.test.images[:256],
                                      y: mnist.test.labels[:256],
                                      keep_prob: 1.}))

Iter 1280, Minibatch Loss= 20155.593750, Training Accuracy= 0.28125
Iter 2560, Minibatch Loss= 7833.395508, Training Accuracy= 0.60156
Iter 3840, Minibatch Loss= 7338.788086, Training Accuracy= 0.64844
Iter 5120, Minibatch Loss= 3983.789551, Training Accuracy= 0.77344
Iter 6400, Minibatch Loss= 2830.088867, Training Accuracy= 0.86719
Iter 7680, Minibatch Loss= 3048.401611, Training Accuracy= 0.80469
Iter 8960, Minibatch Loss= 1529.779297, Training Accuracy= 0.86719
Iter 10240, Minibatch Loss= 1694.179932, Training Accuracy= 0.91406
Iter 11520, Minibatch Loss= 2246.948975, Training Accuracy= 0.83594
Iter 12800, Minibatch Loss= 1749.305420, Training Accuracy= 0.91406
Iter 14080, Minibatch Loss= 2533.301758, Training Accuracy= 0.89844
Iter 15360, Minibatch Loss= 2099.632324, Training Accuracy= 0.90625
Iter 16640, Minibatch Loss= 1146.863892, Training Accuracy= 0.92188
Iter 17920, Minibatch Loss= 1577.505127, Training Accuracy= 0.88281
Iter 19200, Minibatch Loss= 992.006714, Training Accur