# Deep Learning, what a hype

### Me: Adrin [adrin.info](http://adrin.info)


### Ancud IT-Beratung [ancud.de](https://ancud.de)
![ancud](figs/ancud.png)


### This talk: [github.com/adrinjalali/2017-05-talk-dl](https://github.com/adrinjalali/2017-05-talk-dl)
#### Requirements: python3, ipython, notebook (jupyter)







## Neural Networks

### A single neuron

![spiking neural network](http://lis2.epfl.ch/CompletedResearchProjects/EvolutionOfAdaptiveSpikingCircuits/images/neuron.jpg)

![spiking system](http://lis2.epfl.ch/CompletedResearchProjects/EvolutionOfAdaptiveSpikingCircuits/images/spiking.jpg)

### Artificial neuron
[Source](http://natureofcode.com/book/chapter-10-neural-networks/)

![](http://natureofcode.com/book/imgs/chapter10/ch10_05.png)

#### Add bias
![](http://natureofcode.com/book/imgs/chapter10/ch10_06.png)


#### Feed the data
![](http://natureofcode.com/book/imgs/chapter10/ch10_07.png)



### Demo [here](http://natureofcode.com/book/chapter-10-neural-networks/)

### Linearly separable, and not
![](http://natureofcode.com/book/imgs/chapter10/ch10_11.png)


#### Logic example:
![](http://natureofcode.com/book/imgs/chapter10/ch10_12.png)
![](http://natureofcode.com/book/imgs/chapter10/ch10_13.png)

#### Multilayer perceptron
![](http://natureofcode.com/book/imgs/chapter10/ch10_14.png)

### Activation Functions: [wiki](https://en.wikipedia.org/wiki/Activation_function)

## Architectures

### Feedforward
![](https://upload.wikimedia.org/wikipedia/commons/thumb/e/e4/Artificial_neural_network.svg/560px-Artificial_neural_network.svg.png)


### Recurrent
![](https://upload.wikimedia.org/wikipedia/commons/7/79/Recurrent_ann_dependency_graph.png)


#### Elman SRNN
![](https://upload.wikimedia.org/wikipedia/commons/8/8f/Elman_srnn.png)

### Unsupervised, eg. SOM
![](https://upload.wikimedia.org/wikipedia/commons/thumb/9/91/Somtraining.svg/1000px-Somtraining.svg.png)

## New developments

### General-purpose computing on graphics processing units (GPGPU)

#### GPU vs CPU
![](http://www.frontiersin.org/files/Articles/70265/fgene-04-00266-HTML/image_m/fgene-04-00266-g001.jpg)


#### 2005
![](figs/gpgpu.png)


### Better algorithms

#### 2011
![](figs/2011-conv-mnist.png)


## Convolutional Neural Networks (CNN)

![](https://upload.wikimedia.org/wikipedia/commons/6/63/Typical_cnn.png)

### Weight Sharing, Convolution

### Max Pooling
![](https://upload.wikimedia.org/wikipedia/commons/e/e9/Max_pooling.png)

### Dropout, {L1, L2} regularization, artificial data, etc.

# MNIST

![](http://andrea.burattin.net/public-files/stuff/handwritten-digit-recognition/example_mnist.gif)

![](figs/mnist-perfs.png)

### Based on keras examples, specifically [this one](https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py)

In [1]:
!pip install keras tensorflow



In [2]:
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K


Using TensorFlow backend.


In [3]:
batch_size = 128
num_classes = 10
epochs = 12

# input image dimensions
img_rows, img_cols = 28, 28

# the data, shuffled and split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples


In [4]:
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Train on 60000 samples, validate on 10000 samples
Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12
Test loss: 0.0261879921081
Test accuracy: 0.9911


### Good old scikit-learn & linear regression

In [5]:
!pip install sklearn



In [6]:
from sklearn.pipeline import Pipeline
from sklearn.feature_selection import VarianceThreshold
from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler

In [7]:
pipeline = Pipeline([
    ('variance filter', VarianceThreshold(threshold=0.01)),
    ('standard_scale', StandardScaler()),
    ('estimator', Lasso(alpha=0.1, max_iter=2000)),
])

pipeline.fit(x_train.reshape(60000, -1), y_train)



Pipeline(steps=[('variance filter', VarianceThreshold(threshold=0.01)), ('standard_scale', StandardScaler(copy=True, with_mean=True, with_std=True)), ('estimator', Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=2000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False))])

In [8]:
from sklearn.metrics import label_ranking_average_precision_score
label_ranking_average_precision_score(y_test, pipeline.predict(x_test.reshape(len(x_test), -1)))

0.47789595238095106

In [9]:
label_ranking_average_precision_score(y_test, model.predict(x_test))

0.99523333333333353

### Classification

![](figs/normal.jpg)

### Adding classes

![](figs/added-nodes.jpg)

### Dimentionality reduction / Transfer learning

![](figs/dimentionality-reduction.jpg)

![](figs/dimentionality-reduction-2.jpg)

# Final remarks

 - Usecases with not enough data
 - Usecases with many small models
 - Gain on performance vs. cost
 - Network architecture & hyperparameters
 - Deployment
   - Cleanup
   - Batching
   - Serving