# Deep Learning

### Me: Gerrit Korff


### Ancud IT-Beratung [ancud.de](https://ancud.de)
![ancud](figs/ancud.png)


### This talk: [github.com/GeMaKo/Data-Analytics-2017](https://github.com/GeMaKo/Data-Analytics-2017)
#### Requirements: python3, ipython, notebook (jupyter)







## Neural Networks

### A single neuron

![spiking neural network](http://lis2.epfl.ch/CompletedResearchProjects/EvolutionOfAdaptiveSpikingCircuits/images/neuron.jpg)

![spiking system](http://lis2.epfl.ch/CompletedResearchProjects/EvolutionOfAdaptiveSpikingCircuits/images/spiking.jpg)

### Artificial neuron
[Source](http://natureofcode.com/book/chapter-10-neural-networks/)

![](http://natureofcode.com/book/imgs/chapter10/ch10_05.png)

#### Add bias
![](http://natureofcode.com/book/imgs/chapter10/ch10_06.png)


#### Feed the data
![](http://natureofcode.com/book/imgs/chapter10/ch10_07.png)



### Demo [here](http://natureofcode.com/book/chapter-10-neural-networks/)

### Linearly separable, and not
![](http://natureofcode.com/book/imgs/chapter10/ch10_11.png)


#### Logic example:
![](http://natureofcode.com/book/imgs/chapter10/ch10_12.png)
![](http://natureofcode.com/book/imgs/chapter10/ch10_13.png)

#### Multilayer perceptron
![](http://natureofcode.com/book/imgs/chapter10/ch10_14.png)

### Activation Functions: [wiki](https://en.wikipedia.org/wiki/Activation_function)

## Architectures

### Feedforward
![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e4/Artificial_neural_network.svg/560px-Artificial_neural_network.svg.png)


### Recurrent
![](https://upload.wikimedia.org/wikipedia/commons/7/79/Recurrent_ann_dependency_graph.png)


#### Elman SRNN
![](https://upload.wikimedia.org/wikipedia/commons/8/8f/Elman_srnn.png)

### Unsupervised, eg. SOM
![](https://upload.wikimedia.org/wikipedia/commons/thumb/9/91/Somtraining.svg/1000px-Somtraining.svg.png)

## New developments

### General-purpose computing on graphics processing units (GPGPU)

#### GPU vs CPU
![](http://www.frontiersin.org/files/Articles/70265/fgene-04-00266-HTML/image_m/fgene-04-00266-g001.jpg)


#### 2005
![](figs/gpgpu.png)


### Better algorithms

#### 2011
![](figs/2011-conv-mnist.png)


## Convolutional Neural Networks (CNN)

![](https://upload.wikimedia.org/wikipedia/commons/6/63/Typical_cnn.png)

### Weight Sharing, Convolution

### Subsampling / Max Pooling
![](https://upload.wikimedia.org/wikipedia/commons/e/e9/Max_pooling.png)

### Dropout, {L1, L2} regularization, artificial data, etc.

# MNIST

![](http://andrea.burattin.net/public-files/stuff/handwritten-digit-recognition/example_mnist.gif)

![](figs/mnist-perfs.png)

### Based on keras examples, specifically [this one](https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py)

In [5]:
!pip install keras tensorflow



In [1]:
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K


Using TensorFlow backend.


In [8]:
batch_size = 128
num_classes = 10
epochs = 12

# input image dimensions
img_rows, img_cols = 28, 28

# the data, shuffled and split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
print('y_train shape:', y_train.shape)
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
print('y_train shape:', y_train.shape)

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
y_train shape: (60000,)
y_train shape: (60000, 10)


In [4]:
# Building the convolutional neural network, layer by layer
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

# Definition of the learning process
model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])
# Model data
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 24, 24, 64)        18496     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 64)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 12, 12, 64)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 9216)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               1179776   
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
__________

In [5]:
# Training the model
model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))

score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Train on 60000 samples, validate on 10000 samples
Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12
Test loss: 0.0280344603432
Test accuracy: 0.9906


### Good old scikit-learn & linear regression

In [11]:
!pip install sklearn

Collecting sklearn
Installing collected packages: sklearn
Successfully installed sklearn-0.0


In [9]:
from sklearn.pipeline import Pipeline
from sklearn.feature_selection import VarianceThreshold
from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler

In [16]:
pipeline = Pipeline([
    ('variance filter', VarianceThreshold(threshold=0.01)),
    ('standard_scale', StandardScaler()),
    ('estimator', Lasso(alpha=0.1, max_iter=2000)),
])

pipeline.fit(x_train.reshape(60000, -1), y_train)

Pipeline(steps=[('variance filter', VarianceThreshold(threshold=0.01)), ('standard_scale', StandardScaler(copy=True, with_mean=True, with_std=True)), ('estimator', Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=2000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False))])

In [14]:
from sklearn.metrics import label_ranking_average_precision_score
label_ranking_average_precision_score(y_test, pipeline.predict(x_test.reshape(len(x_test), -1)))

0.47801019841269715

In [15]:
label_ranking_average_precision_score(y_test, model.predict(x_test))

0.99470000000000003

### Filter visualization

![](figs/stitched_filters_4x4.png)

### Classification

![](figs/normal.jpg)

### Adding classes

![](figs/added-nodes.jpg)

### Dimentionality reduction / Transfer learning

![](figs/dimentionality-reduction.jpg)

![](figs/dimentionality-reduction-2.jpg)

# Final remarks

 - Usecases with not enough data
 - Usecases with many small models
 - Gain on performance vs. cost
 - Network architecture & hyperparameters
 - Deployment
   - Cleanup
   - Batching
   - Serving

In [4]:
# get the symbolic outputs of each "key" layer.
layer_dict = dict([(layer.name, layer) for layer in model.layers])
print(layer_dict.keys())

dict_keys(['conv2d_1', 'conv2d_2', 'max_pooling2d_1', 'dropout_1', 'flatten_1', 'dense_1', 'dropout_2', 'dense_2'])


In [5]:
def deprocess_image(x):
    # normalize tensor: center on 0., ensure std is 0.1
    x -= x.mean()
    x /= (x.std() + 1e-5)
    x *= 0.1

    # clip to [0, 1]
    x += 0.5
    x = np.clip(x, 0, 1)

    # convert to RGB array
    x *= 255
    if K.image_data_format() == 'channels_first':
        x = x.transpose((1, 2, 0))
    x = np.clip(x, 0, 255).astype('uint8')
    return x

def normalize(x):
    # utility function to normalize a tensor by its L2 norm
    return x / (K.sqrt(K.mean(K.square(x))) + 1e-5)


In [7]:
import time
import numpy as np
# dimensions of the generated pictures for each filter.
img_width = 28
img_height = 28
layer_name = 'conv2d_1'
# this is the placeholder for the input images
input_img = model.input

kept_filters = []
for filter_index in range(0, 32):
    # we only scan through the first 200 filters,
    # but there are actually 512 of them
    print('Processing filter %d' % filter_index)
    start_time = time.time()

    # we build a loss function that maximizes the activation
    # of the nth filter of the layer considered
    layer_output = layer_dict[layer_name].output
    if K.image_data_format() == 'channels_first':
        loss = K.mean(layer_output[:, filter_index, :, :])
    else:
        loss = K.mean(layer_output[:, :, :, filter_index])

    # we compute the gradient of the input picture wrt this loss
    grads = K.gradients(loss, input_img)[0]

    # normalization trick: we normalize the gradient
    grads = normalize(grads)

    # this function returns the loss and grads given the input picture
    iterate = K.function([input_img], [loss, grads])

    # step size for gradient ascent
    step = 1.

    # we start from a gray image with some random noise
    if K.image_data_format() == 'channels_first':
        input_img_data = np.random.random((1, 1, img_width, img_height))
    else:
        input_img_data = np.random.random((1, img_width, img_height, 1))
    input_img_data = (input_img_data - 0.5) * 40

    # we run gradient ascent for 20 steps
    for i in range(20):
        loss_value, grads_value = iterate([input_img_data])
        input_img_data += grads_value * step

        print('Current loss value:', loss_value)
        if loss_value <= 0.:
            # some filters get stuck to 0, we can skip them
            break

    # decode the resulting input image
    if loss_value > 0:
        img = deprocess_image(input_img_data[0])
        kept_filters.append((img, loss_value))
    end_time = time.time()
    print('Filter %d processed in %ds' % (filter_index, end_time - start_time))


Processing filter 0
Current loss value: 1.70963
Current loss value: 2.03946
Current loss value: 2.37792
Current loss value: 2.72286
Current loss value: 3.07352
Current loss value: 3.43072
Current loss value: 3.79163
Current loss value: 4.15768
Current loss value: 4.52956
Current loss value: 4.90569
Current loss value: 5.28502
Current loss value: 5.66755
Current loss value: 6.05594
Current loss value: 6.44781
Current loss value: 6.84326
Current loss value: 7.2431
Current loss value: 7.64671
Current loss value: 8.05558
Current loss value: 8.46875
Current loss value: 8.88523
Filter 0 processed in 0s
Processing filter 1
Current loss value: 2.34738
Current loss value: 2.74086
Current loss value: 3.14125
Current loss value: 3.55094
Current loss value: 3.97054
Current loss value: 4.39407
Current loss value: 4.82369
Current loss value: 5.25712
Current loss value: 5.69266
Current loss value: 6.12828
Current loss value: 6.56576
Current loss value: 7.00641
Current loss value: 7.44956
Current loss

Current loss value: 1.54155
Current loss value: 1.79834
Current loss value: 2.06008
Current loss value: 2.32572
Current loss value: 2.59497
Current loss value: 2.86774
Current loss value: 3.14504
Current loss value: 3.42739
Current loss value: 3.71341
Current loss value: 4.00227
Current loss value: 4.29463
Current loss value: 4.59036
Current loss value: 4.88862
Current loss value: 5.18938
Current loss value: 5.49134
Current loss value: 5.79504
Current loss value: 6.10003
Current loss value: 6.4054
Current loss value: 6.71115
Current loss value: 7.01725
Filter 15 processed in 0s
Processing filter 16
Current loss value: 1.62804
Current loss value: 2.15277
Current loss value: 2.71905
Current loss value: 3.32183
Current loss value: 3.9554
Current loss value: 4.62368
Current loss value: 5.327
Current loss value: 6.0476
Current loss value: 6.79029
Current loss value: 7.56215
Current loss value: 8.35564
Current loss value: 9.16768
Current loss value: 10.0024
Current loss value: 10.8559
Curren

Current loss value: 1.96931
Current loss value: 2.35653
Current loss value: 2.75243
Current loss value: 3.15534
Current loss value: 3.56278
Current loss value: 3.97944
Current loss value: 4.40215
Current loss value: 4.83033
Current loss value: 5.26153
Current loss value: 5.69776
Current loss value: 6.14043
Current loss value: 6.58923
Current loss value: 7.04208
Current loss value: 7.50065
Current loss value: 7.9661
Current loss value: 8.4355
Current loss value: 8.90833
Current loss value: 9.38596
Current loss value: 9.86672
Current loss value: 10.3498
Filter 30 processed in 0s
Processing filter 31
Current loss value: 1.57109
Current loss value: 2.10808
Current loss value: 2.68212
Current loss value: 3.28902
Current loss value: 3.9407
Current loss value: 4.6323
Current loss value: 5.35867
Current loss value: 6.1174
Current loss value: 6.91276
Current loss value: 7.73177
Current loss value: 8.57415
Current loss value: 9.43405
Current loss value: 10.3051
Current loss value: 11.189
Current

In [8]:
from scipy.misc import imsave
# we will stich the best 16 filters on a 8 x 8 grid.
n = 4

# the filters that have the highest loss are assumed to be better-looking.
# we will only keep the top 16 filters.
kept_filters.sort(key=lambda x: x[1], reverse=True)
kept_filters = kept_filters[:n * n]

# build a black picture with enough space for
# our 4 x 4 filters of size 28 x 28, with a 5px margin in between
margin = 5
width = n * img_width + (n - 1) * margin
height = n * img_height + (n - 1) * margin
stitched_filters = np.zeros((width, height, 3))

# fill the picture with our saved filters
for i in range(n):
    for j in range(n):
        img, loss = kept_filters[i * n + j]
        stitched_filters[(img_width + margin) * i: (img_width + margin) * i + img_width,
                         (img_height + margin) * j: (img_height + margin) * j + img_height, :] = img

# save the result to disk
imsave('stitched_filters_%dx%d.png' % (n, n), stitched_filters)
