<p style="text-align:center;">
     <img src="images/gdd-logo.png" alt="Drawing" style="width: 50%;"/>
</p>

> __Author__: Rodrigo Agundez
>
__Date__: 2017-01-17

# Feed forward deep neural networks

Requirements for this notebook:
 - matplotlib
 - tensorflow (> 1.4) or keras

## 1 Overview

### Goal

- Get familiar with the Keras sequential API
- Build a DNN and classify images with 10 digit classes
- Play around with the API

### Specific tasks

1. The code works for the easy MNIST dataset `datasets.mnist.load_data()`, run it and make sense of it, ask questions!
2. Change the code to work with the fairly complicated CIFAR10 dataset `datasets.cifar10.load_data()`
3. Explore with other DNN architectures: number of layers, layer types, activation functions, optimizers, etc.
4. Can be interested to look at the keras [functional API](https://keras.io/getting-started/functional-api-guide/) or the tensorflow [dataset API](https://www.tensorflow.org/programmers_guide/datasets). Which advantages can it bring you?


### Questions

This Notebook contains some questions labeled __Homework__.
Ignore them now but do try to answer them later if you want to get more in-depth understanding.

In [None]:
import pkg_resources
import random

from IPython.display import SVG
from itertools import groupby

import matplotlib.pyplot as plt

%matplotlib inline

In [None]:
plt.rcParams['figure.figsize'] = 15, 6

Don't pay much attention to the cell below. 
It checks the TensorFlow version and decides to load keras from tensorflow or keras.

In [None]:
try:
    pkg_resources.require('tensorflow>1.4.0')
except pkg_resources.VersionConflict as e:
    print("Importing keras")
    import keras
else:
    print("Importing keras from tensorflow")
    from tensorflow.python import keras

# some useful defininitions
datasets = keras.datasets
Dense = keras.layers.Dense
models = keras.models
kutils = keras.utils

## 2 Data 

Like many other libraries, `keras` includes some standard datasets to play around with.
The follow a specific [API](https://www.tensorflow.org/programmers_guide/datasets) that make them iteract nicely with TensorFlow. 

Load the MNIST dataset:

In [None]:
(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()

Once you've tried this Notebook and made sense of the code, load the CIFAR10 dataset:

In [None]:
# (x_train, y_train), (x_test, y_test) = datasets.cifar10.load_data()

Get a summary:

In [None]:
print(f"Train:\tX shape:{x_train.shape}\tY shape:{y_train.shape}\tType: {type(y_train)}[{y_train.dtype}]")
print(f"Test:\tX shape:{x_test.shape}(x_train, y_train), (x_test, y_test) = datasets.mnist.load_data()\tY shape:{y_test.shape}\tType: {type(y_test)}[{y_test.dtype}]")

> #### Homework: Dataset summary
>
- How many training examples do we have?
- How many color channels does each picture have?
- What will the input size to the DNN be?
- What will the output size of the DNN be? 

Show some example images:

In [None]:
[ax.imshow(random.choice(x_train), cmap='gray') for ax in plt.subplots(1, 6)[1]];

## 2 Preprocessing

For our model we'll have to do some preprocessing.

We'll process:

- each input image into a 1D-array of pixel values
- each target label into a categorical vector

In [None]:
input_size = 784
output_size = 10


x_train_flat = x_train.reshape(len(x_train), input_size).astype('float') / 255
x_test_flat = x_test.reshape(len(x_test), input_size).astype('float') / 255

y_train_cat = kutils.to_categorical(y_train, output_size)
y_test_cat = kutils.to_categorical(y_test, output_size)

> #### Homework: Preprocessing
>
- What happens if you don't change the type to float?
- What happens if no normalization is done (divide by 255)?
- Which other type of normalization could you do?

Check the resulting dimensions and types:

In [None]:
print(f"Train:\tX shape:{x_train_flat.shape}\tY shape:{y_train_cat.shape}\tType: {type(y_train_cat)}[{y_train_cat.dtype}]")
print(f"Test:\tX shape:{x_test_flat.shape}\tY shape:{y_test_cat.shape}\tType: {type(y_test_cat)}[{y_test_cat.dtype}]")

> #### Homework: Model input & output
- Are the input and output sizes correct from what you thought?

## 3 Model

Our model takes pixel values of images and has to classify which digit is on the image.
We'll build a model consisting of multiple dense layers - this may not be the optimal way to do image classification but it's good enough to demonstrate the API of `keras`.

We will use the Keras [sequential API](https://keras.io/models/sequential/).
If you are up for a bit more challenge write the DNN using the [functional API](https://keras.io/getting-started/functional-api-guide/).

First we define the model type:

In [None]:
model = models.Sequential()

Then we directly add the hidden layers to the model using `model.add()`.
For each `Dense()` layer, you can specify its name, the number of units, its activation function, etc.

But where is the input layer?<br>Keras takes a simple approach and defines it together with the first hidden layer via the parameter `input_dim` or `input_shape`.

In [None]:
model.add(Dense(name='FullyConnected_0', units=260, input_dim=input_size, activation='relu'))
model.add(Dense(name='FullyConnected_1', units=120, activation='relu'))
model.add(Dense(name='FullyConnected_2', units=40, activation='relu'))

> #### Homework: Layers
>
- Which other activations could you use? Check out the [list of activations](https://keras.io/activations/).
- Which other type of layers are possible? Run `dir(keras.layers)` and behold.

And finally we add the output layer.
The `'softmax'` activation function translates the outputs of the previous layer into probabilities for each class.

In [None]:
model.add(Dense(name='FullyConnected_OutputLayer', units=output_size, activation='softmax'))

Models have to be compiled before we train our DNN.
We tell `keras` to optimize our neural network using [Adam](http://ruder.io/optimizing-gradient-descent/index.html#adam); the categorical cross-entropy as the loss will make sure that we optimize for good probabilities for all classes; the `metrics` arguments tells which metric to report the score on.

The metric(s) set can be used to evaluate over the test dataset, but also, if at trainining time we define `validation_split`, a validation test will be performed over each epoch. This is very helpful to asses the health of our model (overfitting for example).

In [None]:
model.compile(
    optimizer='Adam',
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

> #### Homework
>
- Which other optimizers are available? Check the [list of optimizers](https://keras.io/optimizers/)
- How many losses are there avilable? Check the [list of loss functions](https://keras.io/losses/)
- What other metrics? [List of metrics](https://keras.io/metrics/)

### 3.1 Model overview

So we have created our model, nice! Let's now view some key aspects about it.

In [None]:
model.summary() # very helpful overview

In [None]:
print(f"There are a total of {model.count_params()} to calculate. Could you derive the number exactly? What are all of these parameters?")

> #### Homework: Models
>
Check some famous models.
- Can you understand and compare the models using the different characteristics?
- How does the number of parameters in your model compare to the ones in the table?
- What impact does the number of parameters have on accuracy, training time, prediction time, etc.?
- What impact does the depth size have on accuracy, training time, prediction time, etc.?

![](models_list.png)

If you really want the complete picture, check the configuration. Don't worry if you don't understand what all the parameters mean

In [None]:
# model.get_config()

### 3.2 Model training

Once the model is compiled we can fit our model using the training data.

In [None]:
history = model.fit(
    x_train_flat, y_train_cat,
    batch_size=500,
    epochs=10,
    validation_split=0.1,
    verbose=1
)

> #### Homework: Training
>
- Why is it only training on `54000` examples?
- What is the difference between `acc` and `val_acc`?
- What are `epochs` and `batch_size`? How do they affect accuracy and training time? Play with the numbers
- Is the model overfitting? How could you discover possible overfitting problems with this output?
- How else can you asses the health of your model?
- Can you tell from the output which hyperparameters to change and how (e.g. learning rate)? 

## 4 Evaluation

Let's see how our model does for our completely unseen test set using `model.evaluate()`:

In [None]:
score, acc = model.evaluate(x_test_flat, y_test_cat, verbose=0)

print('Loss on test set:', score)
print('Accuracy on test set:', acc)

> #### Homework: Validation
> 
- Is this a good accuracy? what other factors need to be assess before answering this question?
- How would you go on to make hyperparameter optimization from here?
- Which are the hyperparameters?
- If you go for hyperparameter optimization, how would you now split the initial dataset?

### 4.1 Evolution of metrics during training

Let's see how the metrics evolved over time during training.
The returned object from `model.fit` contains the metric history during all epochs (we called it history). 

In [None]:
hist = sorted(history.history.items(), key=lambda x: (x[0].replace('val_', ''), x[0]))
for metric, values in groupby(hist, key=lambda x: x[0].replace('val_', '')):
    val0, val1 = list(values)
    plt.plot(history.epoch, val0[1], history.epoch, val1[1], '--', marker='o')
    plt.xlabel('epoch'), plt.ylabel(val0[0]), plt.legend(('Train set', 'Validation set'))
    plt.show()

Various information and model diagnosis can be derived from the above plots. Can you point out some key properties?

### 4.2 Analysis of misclassifications

- Where are the misclassifications at?
- Is there a relation between classes and misclassifications?

Note: *You will normally see this type of aggregations and plotting using pandas, I just wanted to have some fun #nopandas*

First let's collect the predictions on our test set

In [None]:
y_predicted = model.predict_classes(x_test_flat, verbose=0)

Now let's see the error percentage per class

In [None]:
comparison = sorted(zip(y_test, y_predicted, y_predicted == y_test), key=lambda x: x[0])
for c, v in groupby(comparison, key=lambda x: x[0]):
    v = [x[2] for x in v]
    plt.barh(c,  (1 - sum(v) / len(v)) * 100)
plt.xlabel('Miss-classification %'), plt.ylabel('True class')
plt.yticks(list({x[0] for x in comparison}));

Relation between true classes and predicted classes for miss-classifications

In [None]:
for label in range(10):
    miss = filter(lambda x: not x[2] and x[1] == label, comparison)
    for (true_class, pred_class), g_data in groupby(miss, key=lambda x: (x[0], x[1])):
        plt.barh(true_class, sum(1 for _ in g_data), color='skyblue')
    plt.title(f'True class: "{label}"')
    plt.xlabel('Number of predictions'), plt.ylabel('Predicted class')
    plt.yticks(list({x[0] for x in comparison})), plt.show()

This type of information over our deep learning model is very important for business/real-life situations.

- Where is my model making mistakes?
- Is there a classification class I can trust more than others?
- Should I definitely not trust certain class predictions? Does it make sense to keep the class in the model then?
- This type of info can point out certain corner cases, or special drawbacks not apparent at first "overview" look.

### 4.3 Note on learning curves

Missing plots here are the learning curve based on the number of examples. They give powerful insights to diagnose your model.

They also help you answer, a very important and frequent question in a business scenario:
<br>Should I invest in collecting more data? From a data science perspective the answer is yes ofcourse but

- Very often obtaining new or more data is very very expensive (€)
- It involves many parts moving together in an organization
- Difficult to quantitatively assess the impact the new data will have

*Find [here]([learning curves](http://www.ritchieng.com/machinelearning-learning-curve/) good summary on learning curves*

## 5 Output

There are several ways we can output a model:

- Graphical representation for a presentation/demo, for archival purposes, to easily explain others, etc.
- Save a model and reuse it later in another application. This is specially important when deploying a DNN models in production.
- Transfer learning: save a model and use (parts of) it for different problems.
- Several other use cases are applicable where the shipping of the model is essential.

### 5.1 Graphical representation

*This functionality was just been fixed on Nov 17, 2017 but up to now is not included in the pip package. I loaded the code into a separte script called utils in this directory. [Here is the related commit](https://github.com/tensorflow/tensorflow/commit/3e53570d3bf518ec2b6cfeed4b5fd57d11370289).*

In [None]:
from utils import model_to_dot
from utils import plot_model

In [None]:
# get graphical representation of the model
SVG(model_to_dot(model, show_shapes=True, rankdir='TB').create(prog='dot', format='svg'))

Save the graphical representation into a file

In [None]:
plot_model(model, show_shapes=True, to_file='code_breaskfast_exercise_1_model.png', rankdir='LR')

### 5.2 Write/read model to/from disk

There are different ways to save you model into disk. You can save the complete object, just the weights, the configuration, etc. Each option of course includes different aspects of the model.

[Take a look at the documented options](https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model)

In [None]:
model.save('code_breaskfast_exercise_1_model.h5')
del model

- What are the differences between each type of saving option?
- Which cases are applicable for each?
- How is the size of the file related to the saving option? Can you explain these differences?

In [None]:
model = keras.models.load_model('code_breaskfast_exercise_1_model.h5')
model.summary()

## 6 Conclusion

We've seen how load datasets in `keras`, define a model consisting of multiple learnings, and how to train and validate a model.
Check the [examples folder](https://github.com/keras-team/keras/tree/master/examples) in the keras repo if you want to learn more!