## Chapter 3. Deep Learning Fundamentals
### Deep learning algorithms and techniques
* **Multi-layer perceptrons (MLPs)**: neural networks with feed-forward propagation and fully connected layers, and at
least one hidden layer.
* **Convolutional neural networks (CNNs)**: feed-forward neural network with different types of specialized layers (i.e.
convolutional layers)
* **Recurrent networks**: networks that possess internal states based on all or part of the input data already in the
network. Their outputs are combinations of their internal states (memory) and of their latest inputs, while the latter
one also changes the internal state to a new one.
* **Autoencoders**: unsupervised learning algorithms whose output shape is the same as their input. Can be used as generative
networks.

### Training deep networks
Fortunately, nowadays it's possible to still use stochastic gradient descent-based tools and backpropagation to train
deep networks, and we will now introduce optimization to achieve this. Mainly, the concept of **momentum** will be introduced,
together with modern optimizations such as ```ADADELTA```, ```RMSProp``` and ```Adam```.

### Libraries
The basic features of the most popular deep learning libraries in Python, TensorFlow, Keras and PyTorch, will be discussed.
* The basic unit of data represention is the **tensor**, a generalization of a matrix whose mathematical details are beyond
the scope here. Typically, in these libraries, the data is represented in batches for performance reasons and because it 
also suits SGD-based computations. Practically, this means that data is represented in tensors whose dimensionality is
higher of one in respect of the basic data unit (i.e. if we work with 1D vectors, the data unit will be 2D tensors in which
the first dimension is the sample (batch) and the second is the only dimension of the actual 1D data; for grayscale images
the tensor will be 3D, with the additional dimension being the different samples, each one being a 2D matrix of luminance
values).
* Neural networks are represented as **computational graphs** of operations. The nodes represent operations, the edges 
represent the flow of data. Inputs and outputs of these operations are tensors.
* All these libraries include **automatic differentiation** (no derivatives to calculate!!)

#### TensorFlow
TensorFlow will automatically try to make use of GPUs. It has a somewhat steeper learning curve than the other libraries
presented. 

#### Keras
Keras is a higher-level library that runs on top of TensorFlow, CNTK or Theano. We'll use the TensorFlow backend. It will
also try to make use of GPUs.

#### PyTorch
You know it.

### Using Keras to classify handwritten digits

In [None]:
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.utils import np_utils

# Let's load the dataset
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
# Pre-process the data
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
# The labels indicate the value of the digit in the samples. Let's convert them to dummy variables/vectors.
classes = 10
Y_train = np_utils.to_categorical(Y_train, classes)
Y_test = np_utils.to_categorical(Y_test, classes)
# Let's set the size of the input layer (equal to the size of the MNIST images), the number of hidden neurons, the number
# training epochs and the mini batch size
input_size = 784  # 28 * 28 from the input images
batch_size = 100
hidden_neurons = 100
epochs = 100
# Define the net architecture
mdl = Sequential([
    Dense(units=hidden_neurons, input_dim=input_size),
    Activation('sigmoid'),
    Dense(classes),
    Activation('softmax')
])
# Cost function and optimization
mdl.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='sgd')
# train and fit now!
mdl.fit(x=X_train, y=Y_train, batch_size=batch_size, nb_epoch=epochs, verbose=1)
# Evaluate the model accuracy on the test data
score = mdl.evaluate(x=X_test, y=Y_test, verbose=1)
print('Accuracy: {}'.format(score[1]))

In [None]:
# Let's visualize how it went.
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np

weights = mdl.layers[0].get_weights()

fig = plt.figure()

w = weights[0].T
for neuron in range(hidden_neurons):
    ax = fig.add_subplot(10, 10, neuron + 1)
    ax.axis("off")
    ax.imshow(np.reshape(w[neuron], (28, 28)), cmap=cm.Greys_r)
    
plt.savefig("neuron_images_MNIST.png", dpi=300)
plt.show()

### Using Keras to classify images of objects
We'll use the **CIFAR-10** dataset, consisting of 60000 32x32 RGB images in 10 classes (airplanes, cars, birds, cats,
deers, frogs, horses, ships and trucks).

In [1]:
from keras.datasets import cifar10
from keras.layers.core import Dense, Activation
from keras.models import Sequential
from keras.utils import np_utils

# load and preprocess
(X_train, Y_train), (X_test, Y_test) = cifar10.load_data()
X_train = X_train.reshape(50000, 3072)
X_test = X_test.reshape(10000, 3072)

classes = 10
Y_train = np_utils.to_categorical(Y_train, classes)
Y_test = np_utils.to_categorical(Y_test, classes)

input_size = 3072
batch_size = 100
epochs = 100

mdl_cifar10 = Sequential([
    Dense(1024, input_dim=input_size),
    Activation('relu'),
    Dense(512),
    Activation('relu'),
    Dense(512),
    Activation('sigmoid'),
    Dense(classes),
    Activation('softmax')
])
# This time we'll run a validation set too
mdl_cifar10.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='sgd')
mdl_cifar10.fit(X_train, Y_train, batch_size=batch_size, epochs=epochs, validation_data=(X_test, Y_test), verbose=1)

Using TensorFlow backend.


Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz





Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where



Train on 50000 samples, validate on 10000 samples
Epoch 1/100





Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 6

<keras.callbacks.History at 0x167f4929bc8>

In [2]:
# Save the complete trained model
# mdl_cifar10.save('my_mdl_cifar10_1.h5')

In [1]:
from keras.models import load_model

mdl_cifar10 = load_model('./my_mdl_cifar10_1.h5')

Using TensorFlow backend.












Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where




In [4]:
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.gridspec as gridspec
import numpy as np
import random

PLT_TYPE = 'inline'  # restart Jupyter kernel before switching back to 'inline'

if PLT_TYPE.lower() == 'widget':
    %matplotlib widget
elif PLT_TYPE.lower() == 'inline':
    pass
elif PLT_TYPE.lower() == 'window':
    matplotlib.use('Qt5Agg')
    plt.ion()
else:
    raise SyntaxError("PLT_TYPE can either be 'inline', 'widget' or 'window")

fig = plt.figure()
outer_grid = gridspec.GridSpec(10, 10, wspace=0.0, hspace=0.0)

weights = mdl_cifar10.layers[0].get_weights()

w = weights[0].T

for i, neuron in enumerate(random.sample(range(0, 1023), 100)):
    ax = plt.Subplot(fig, outer_grid[i])
    ax.imshow(np.mean(np.reshape(w[i], (32, 32, 3)), axis=2), cmap=cm.Greys_r)
    ax.set_xticks([])
    ax.set_yticks([])
    fig.add_subplot(ax)
    
plt.show()