In [72]:
%%HTML
<link rel="stylesheet" type="text/css" href="../css/custom.css">

# Keras Advanced

![footer_logo](../images/logo.png)

## Goal

The goal of this Notebook is to dive deeper in the Keras API and touch some of the more advanced topics.

## Program

1. Functional API
2. Large datasets with Keras
3. Callbacks



In [73]:
import os
import time

import shutil
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow

%matplotlib inline

In [74]:
plt.rcParams["figure.figsize"] = 15, 6

---

## 0. Data

We shall train a model using the functional API to classify fashion images.

<img src="../images/fashion-mnist.png" width="400"/>

Source: [Kaggle](https://www.kaggle.com/zalando-research/fashionmnist)

We'll write some fashion MNIST images to a temporary folder with this structure and use this in the exercises later.

To do this we'll use some helper functions in the `load_fashion_mnist.py` file.

In [75]:
from load_fashion_mnist import save_fashion_mnist

In [76]:
temp_dir = save_fashion_mnist(10000, 1000, 1000)
print(f"Dataset is written to {temp_dir}")

Dataset is written to C:\Users\651494\AppData\Local\Temp\tmpzg31ss5a


In [77]:
!ls -lR {temp_dir} | head -n 100

'ls' is not recognized as an internal or external command,
operable program or batch file.


In [78]:
!dir {temp_dir}

 Volume in drive C is Local Disk
 Volume Serial Number is 0CBC-FF85

 Directory of C:\Users\651494\AppData\Local\Temp\tmpzg31ss5a

16-12-2021  15:57    <DIR>          .
16-12-2021  15:57    <DIR>          ..
16-12-2021  15:57    <DIR>          test
16-12-2021  15:57    <DIR>          train
16-12-2021  15:57    <DIR>          valid
               0 File(s)              0 bytes
               5 Dir(s)  133.726.744.576 bytes free


## 1. Functional API

Only sequentially using `.add()` limits the complexity of your neural networks.
Keras has other API's to solve that, the [functional API](https://keras.io/getting-started/functional-api-guide/) really helps with:

> "defining complex models, such as multi-output models, directed acyclic graphs, or models with shared layers."

Let's check a minimum working example from the documentation.
It defines a network with two hidden layers and an output layer for 10 classes.
With the sequential API it would look something like:

In [79]:
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Sequential

model = Sequential()
model.add(Dense(64, activation="relu", input_shape=(784,)))
model.add(Dense(64, activation="relu"))
model.add(Dense(10, activation="softmax"))

model.summary()

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_54 (Dense)            (None, 64)                50240     
                                                                 
 dense_55 (Dense)            (None, 64)                4160      
                                                                 
 dense_56 (Dense)            (None, 10)                650       
                                                                 
Total params: 55,050
Trainable params: 55,050
Non-trainable params: 0
_________________________________________________________________


We start with the `Input` tensor, something we're initially allowed to ignore with the Sequential API, and use it when calling the first hidden layer object.
Layers in Keras are [callable](https://en.wikipedia.org/wiki/Callable_object#In_Python) objects which mean we can call them after instantiation.
When called layers return a tensor that contains all operations (layers and their weights) applied so far.

We put the initial `inputs` and the final result `prediction` in a `Model` object that has similar functionality to a `Sequential` object.
In this case, we don't really care about the intermediate results, so we use the dummy variable name `x`.

In [80]:
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model

inputs = Input(shape=(784,))
x = Dense(64, activation="relu")(inputs)
x = Dense(64, activation="relu")(x)
predictions = Dense(10, activation="softmax")(x)
model = Model(inputs=inputs, outputs=predictions)

model.summary()

Model: "model_24"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_25 (InputLayer)       [(None, 784)]             0         
                                                                 
 dense_57 (Dense)            (None, 64)                50240     
                                                                 
 dense_58 (Dense)            (None, 64)                4160      
                                                                 
 dense_59 (Dense)            (None, 10)                650       
                                                                 
Total params: 55,050
Trainable params: 55,050
Non-trainable params: 0
_________________________________________________________________


For this specific example, it's mainly more typing but this API allows you to be really flexible for non-sequential models. However, it allows you to easily define models with multiple inputs or outputs, or where you make use of the output from an earlier layer. 

A good example is a _Residual Network_ block. A residual network is made out of blocks where the output is copied and 'saved' for a little while, while other operations (e.g. convolutions) are applied to it. Then, these two outputs (original which has not been passed through more layers, and the version that _has_ been passed through more layers) are combined through a summation. The advantage of a ResNet architecture is that it tackles the vanishing gradient problem. 

![](https://developer.ridgerun.com/wiki/images/0/01/Residual_block.png)

This is easily implemented in the functional API: 
```python
inputs = Input(shape=(784,))
x = Dense(64, activation="relu")(inputs)
x_original = x.copy() 
x = Dense(64, activation="relu")(x)
x = Add()([x, x_original])
predictions = Dense(10, activation="softmax")(x)
model = Model(inputs=inputs, outputs=predictions)
```

### <mark> Exercise: functional API
> 
> Rewrite the model below with the functional API and put it into a function `make_fashion_mnist_model()`.
>


In [81]:
from tensorflow.keras.layers import Conv2D, Dense, Dropout, Flatten, Input, MaxPool2D
from tensorflow.keras.models import Model


def make_fashion_mnist_model():
    model = Sequential()

    model.add(Conv2D(64,kernel_size=2,padding="same",activation="relu",input_shape=(28, 28, 1)))
    model.add(MaxPool2D(pool_size=2))

    model.add(Dropout(0.6))
    model.add(Conv2D(filters=32, kernel_size=2, padding="same", activation="relu"))
    model.add(MaxPool2D(pool_size=2))

    model.add(Dropout(0.6))
    model.add(Flatten())
    model.add(Dense(256, activation="relu"))
    # why no batch normalization at this point?

    model.add(Dropout(0.5))
    model.add(Dense(10, activation="softmax"))

    return model

model = make_fashion_mnist_model()
model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_48 (Conv2D)          (None, 28, 28, 64)        320       
                                                                 
 max_pooling2d_48 (MaxPoolin  (None, 14, 14, 64)       0         
 g2D)                                                            
                                                                 
 dropout_72 (Dropout)        (None, 14, 14, 64)        0         
                                                                 
 conv2d_49 (Conv2D)          (None, 14, 14, 32)        8224      
                                                                 
 max_pooling2d_49 (MaxPoolin  (None, 7, 7, 32)         0         
 g2D)                                                            
                                                                 
 dropout_73 (Dropout)        (None, 7, 7, 32)         

In [82]:
def make_fashion_minst_model_functional(): 
    ...
    
model = make_fashion_mnist_model_function()
model.summary()

NameError: name 'make_fashion_mnist_model_function' is not defined

In [83]:
# %load ../answers/functional.py
from tensorflow.keras.layers import Conv2D, Dense, Dropout, Flatten, Input, MaxPool2D
from tensorflow.keras.models import Model


def make_fashion_mnist_model():

    inputs = Input(shape=(28, 28, 1))

    x = Conv2D(64, kernel_size=2, padding="same", activation="relu")(inputs)
    x = MaxPool2D(pool_size=2)(x)

    x = Dropout(0.3)(x)
    x = Conv2D(filters=32, kernel_size=2, padding="same", activation="relu")(x)
    x = MaxPool2D(pool_size=2)(x)

    x = Dropout(0.3)(x)
    x = Flatten()(x)
    x = Dense(256, activation="relu")(x)

    x = Dropout(0.5)(x)
    predictions = Dense(10, activation="softmax")(x)

    model = Model(inputs=inputs, outputs=predictions)

    return model


model = make_fashion_mnist_model()
model.summary()


Model: "model_25"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_26 (InputLayer)       [(None, 28, 28, 1)]       0         
                                                                 
 conv2d_50 (Conv2D)          (None, 28, 28, 64)        320       
                                                                 
 max_pooling2d_50 (MaxPoolin  (None, 14, 14, 64)       0         
 g2D)                                                            
                                                                 
 dropout_75 (Dropout)        (None, 14, 14, 64)        0         
                                                                 
 conv2d_51 (Conv2D)          (None, 14, 14, 32)        8224      
                                                                 
 max_pooling2d_51 (MaxPoolin  (None, 7, 7, 32)         0         
 g2D)                                                     

---
## 2. Large datasets with Keras

Using numpy arrays as input can limit you once your datasets don't fit in memory anymore or if you're using  multiple devices.
Keras has some built-in tools to help you; but training also works nicely with Python generators; and there's the integration with `tf.data.Datasets` to leverage TensorFlow's functionality.

If you're not familiar with Python iterators and generators, make sure to do a bit of [reading](https://wiki.python.org/moin/Generators) before continuing. 

### Keras generators

The idea behind Keras generators is to not load all data in memory at once, but to generate batches of data and feed those to the model.
For instance, instead of loading all samples, we only load 100 training points and feed those to the GPU on the fly.

A good example of a Keras generator is the [`ImageDataGenerator`](https://keras.io/preprocessing/image/#imagedatagenerator-class).
This class takes data and performs various forms of image augmentation on the fly, like whitening, shearing and zooming.
As the name `Generator` implies, it doesn't compute these augmentations all at once, but does this in batches. 

In [84]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# help(ImageDataGenerator)

The class has three methods to generated batches of augmented data:

- `.flow()`: Takes data & label arrays
- `.flow_from_dataframe()`: Takes the DataFrame and the path to a directory with the mapped images in the DataFrame
- `.flow_from_directory()`: Takes the path to a directory

We'll focus on the `.flow_from_directory()`:

In [85]:
help(ImageDataGenerator.flow_from_directory)

Help on function flow_from_directory in module keras.preprocessing.image:

flow_from_directory(self, directory, target_size=(256, 256), color_mode='rgb', classes=None, class_mode='categorical', batch_size=32, shuffle=True, seed=None, save_to_dir=None, save_prefix='', save_format='png', follow_links=False, subset=None, interpolation='nearest')
    Takes the path to a directory & generates batches of augmented data.
    
    Args:
        directory: string, path to the target directory. It should contain one
          subdirectory per class. Any PNG, JPG, BMP, PPM or TIF images inside
          each of the subdirectories directory tree will be included in the
          generator. See [this script](
            https://gist.github.com/fchollet/0830affa1f7f19fd47b06d4cf89ed44d)
              for more details.
        target_size: Tuple of integers `(height, width)`, defaults to `(256,
          256)`. The dimensions to which all images found will be resized.
        color_mode: One of "gra

The method expects a certain structure for it to work, read the documentation on `directory` and `classes` in the cell above.

<img src="../images/keras_advanced/keras_flow_from_directory.jpeg" alt="flow_from" style="width: 500px;"/>

[Source](https://medium.com/@vijayabhaskar96/tutorial-image-classification-with-keras-flow-from-directory-and-generators-95f75ebe5720)

We shall create three ImageDataGenerators, one each for the train, valid and test sets.

In [86]:
train_data_generator = ImageDataGenerator(
    rescale=1.0 / 255, shear_range=0.2, zoom_range=0.2, horizontal_flip=True
)
test_data_generator = ImageDataGenerator(rescale=1.0 / 255)
valid_data_generator = ImageDataGenerator(rescale=1.0 / 255)

### <mark> Exercise:
    
Use `.flow_from_directory()` to create three iterators that allow data to flow from the appropriate generator. 
    
When creating the iterators:
> - Build a path for argument `directory` from `temp_dir`.    
> - Infer the `class_mode`, `target_size` and `color_mode` from the model.
> - Set the `batch_size` to 32 and choose a shuffle and seed.
> - With the already defined code, fit the Fashion MNIST model using these iterators, your loss should go lower than 1.2.

In [87]:
train_iterator = train_data_generator.flow_from_directory(
    directory=os.path.join(temp_dir, 'train'),
    target_size=(28, 28), 
    color_mode='grayscale',
    batch_size=8,
    class_mode="categorical",
    shuffle=True,
    seed=42
)

Found 10000 images belonging to 10 classes.


In [88]:
valid_iterator = ...

In [89]:
test_iterator = ...

In [90]:
# %load ../answers/image_data_generator.py
train_iterator = train_data_generator.flow_from_directory(
    directory=os.path.join(temp_dir, "train"),
    target_size=(28, 28),
    color_mode="grayscale",
    batch_size=8,
    class_mode="categorical",
    shuffle=True,
    seed=42,
)
valid_iterator = valid_data_generator.flow_from_directory(
    directory=os.path.join(temp_dir, "valid"),
    target_size=(28, 28),
    color_mode="grayscale",
    batch_size=8,
    class_mode="categorical",
    shuffle=True,
    seed=42,
)
test_iterator = test_data_generator.flow_from_directory(
    directory=os.path.join(temp_dir, "test"),
    target_size=(28, 28),
    color_mode="grayscale",
    batch_size=1,
    class_mode="categorical",
    shuffle=False,
    seed=42,
)


Found 10000 images belonging to 10 classes.
Found 1000 images belonging to 10 classes.
Found 1000 images belonging to 10 classes.


In [91]:
fashion_model = make_fashion_mnist_model()
fashion_model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=['accuracy'])

step_size_train = train_iterator.n // train_iterator.batch_size
step_size_valid = valid_iterator.n // valid_iterator.batch_size

fashion_model.fit(train_iterator,
    steps_per_epoch=step_size_train,
    validation_data=valid_iterator,
    validation_steps=step_size_valid,
    epochs=1
)



<keras.callbacks.History at 0x1278c660d00>

In [92]:
print(fashion_model.evaluate(test_iterator, steps=30))

[0.40847399830818176, 0.8999999761581421]


Another example of a Keras generator is the [TimeseriesGenerator](https://keras.io/preprocessing/sequence/#timeseriesgenerator) that generates batches of temporal data from a sequence of data points.
This could also been seen as generating a dataset that's possibly to big for memory: generating all possible batches from a sequence can easily be bigger that your RAM.

### <mark>Question

Why are we setting `shear_range`, `zoom_range` and `horizontal_flip` on the `train_data_generator` and not on the `test_data_generator` and `valid_data_generator`?

### Conclusion

We've seen how we can leverage datasets that are too big to fit in memory.
Keras has its own generators but it's also pretty easy to build your own.
Many file formats & interfaces also allow you to access files without loading them, like the option `mmap_mode` for [`numpy.load`](https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.load.html).
If you would like to stay closer to TensorFlow, check out the guide on [Datasets](
https://www.tensorflow.org/guide/datasets).

---
## 3. Callbacks

Callbacks allow you to perform tasks during certain moments of training.
For instance, you can compute performance measures like training time, or look at the states of the model to detect when it breaks down.

You can pass multiple callbacks in a `list` to the `.fit()` method or your model and they'll be called at the rights times during training.
There are six moments when a callback can be executed: at starts and/or stops of training, epochs and/or batches.


### Built-in callbacks

Let's look at two commonly used callbacks in `tensorflow.keras.callbacks`: `EarlyStopping` and `ModelCheckpoint`.
`EarlyStopping` stops training when your model performance doesn't get better and can save you a lot of waiting time.
`ModelCheckpoint` saves the model after every epoch to make sure your progress doesn't get lost if your training process gets killed.

> #### Exercise: Built-in callbacks
>
> - Use the fitting procedure from the previous exercise and add the `EarlyStopping` and `ModelCheckpoint` callbacks.
> - For `ModelCheckpoint` save only the best model and save to the variable `model_path` given below.

In [93]:
# This cell creates an output folder where your model parameters will be saved. 
import shutil 

model_dir = os.path.join("..", "output", "fashion_mnist")
model_path = os.path.join(model_dir, "model.h5")

if os.path.exists(model_dir):
    shutil.rmtree(model_dir)
os.makedirs(model_dir)

In [94]:
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

# Load & compile your model. 
... 

# Define your callbacks. 
...

# Fit your model, with the callbacks. 
...

# Evaluate your model. 
...

Ellipsis

In [95]:
# %load ../answers/callbacks.py
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint

callbacks = [EarlyStopping(), ModelCheckpoint(model_path, save_best_only=True)]
callbacks = [ModelCheckpoint(model_path, save_best_only=True)]


fashion_model = make_fashion_mnist_model()
fashion_model.compile(optimizer="adam", loss="categorical_crossentropy")
fashion_model.fit(
    train_iterator,
    validation_data=valid_iterator,
    steps_per_epoch=10,
    epochs=4,
    callbacks=callbacks,
)


Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<keras.callbacks.History at 0x12791e85b50>

### TensorBoard

TensorBoard helps you visualize what's happening during training.
For instance, it can visualize losses during training, weights of your layers, embedding and the computational graph.
TensorBoard makes it easier to understand, debug, and optimize your model.

<img src="../images/keras_advanced/tensorboard.png" alt="flow_from" style="width: 600px;"/>

For Keras it's just another built-in callback.
Using the callback writes files to a directory that can be visualized by a separate process.

In [96]:
help(tensorflow.keras.callbacks.TensorBoard)

Help on class TensorBoard in module keras.callbacks:

class TensorBoard(Callback, keras.utils.version_utils.TensorBoardVersionSelector)
 |  TensorBoard(*args, **kwargs)
 |  
 |  Enable visualizations for TensorBoard.
 |  
 |  TensorBoard is a visualization tool provided with TensorFlow.
 |  
 |  This callback logs events for TensorBoard, including:
 |  
 |  * Metrics summary plots
 |  * Training graph visualization
 |  * Activation histograms
 |  * Sampled profiling
 |  
 |  When used in `Model.evaluate`, in addition to epoch summaries, there will be
 |  a summary that records evaluation metrics vs `Model.optimizer.iterations`
 |  written. The metric names will be prepended with `evaluation`, with
 |  `Model.optimizer.iterations` being the step in the visualized TensorBoard.
 |  
 |  If you have installed TensorFlow with pip, you should be able
 |  to launch TensorBoard from the command line:
 |  
 |  ```
 |  tensorboard --logdir=path_to_your_logs
 |  ```
 |  
 |  You can find more inf

### <mark> Exercise: TensorBoard

Add the TensorBoard call back to the training of the Fashion MNIST model, set:

> - `log_dir` to the variable `run_dir` defined below
> - Write the graph and gradients.
> - Use the data set `(x_train, y_train), (x_test, y_test)` as defined below and train for 10 runs.

Start training, open a terminal, make sure you're in the root folder of this project and run:
> 
> ```
> $ tensorboard --logdir=output/fashion_mnist
> ```
>
Start multiple runs but make sure to execute the cell with `run_dir`.
> - What happens with the losses of multiple runs?
> - With the graphs?

In [97]:
(
    (x_train, y_train),
    (x_test, y_test),
) = tensorflow.keras.datasets.fashion_mnist.load_data()

In [98]:
x_train = x_train[:10000, :, :, np.newaxis]
x_test = x_test[:1000, :, :, np.newaxis]
y_train = tensorflow.keras.utils.to_categorical(y_train[:10000])
y_test = tensorflow.keras.utils.to_categorical(y_test[:1000])

In [99]:
run_dir = os.path.join(model_dir, f"run_{time.time()}")
run_dir

'..\\output\\fashion_mnist\\run_1639666766.245051'

In [100]:
from tensorflow.keras.callbacks import TensorBoard

In [101]:
# %load ../answers/tensorboard.py
from tensorflow.keras.callbacks import TensorBoard

callbacks = [TensorBoard(run_dir, write_graph=True)]


fashion_model = make_fashion_mnist_model()
fashion_model.compile(optimizer="adam", loss="categorical_crossentropy")
fashion_model.fit(
    x_train, y_train, epochs=2, validation_data=(x_test, y_test), callbacks=callbacks
)


Epoch 1/2
Epoch 2/2


<keras.callbacks.History at 0x127a1c2ba00>

### Custom callbacks


If the available callbacks don't fit your use case, it's easy to define your own.
`LambdaCallback` can be used for simple functionality, but you can also subclass the `Callback` class.

As mentioned earlier, there are six moments when a callback can be executed: at starts and/or stops of training, epochs and/or batches.
These correspond with the arguments or methods:

- `on_epoch_begin`
- `on_epoch_end`
- `on_batch_begin`
- `on_batch_end`
- `on_train_begin`
- `on_train_end`


If we'd want to emojify our training logs a bit, we could abuse the `LambdaCallback`:

In [38]:
from tensorflow.keras.callbacks import LambdaCallback


def on_train_begin(_):
    print("🔥" * 30)


def on_train_end(_):
    print("🤖" * 30)


emoji_callback = LambdaCallback(
    on_train_begin=on_train_begin, on_train_end=on_train_end
)

fashion_model = make_fashion_mnist_model()
fashion_model.compile(optimizer="adam", loss="categorical_crossentropy")
fashion_model.fit(x_train, y_train, epochs=2, callbacks=[emoji_callback])

🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥
Epoch 1/2
Epoch 2/2
🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖🤖


<keras.callbacks.History at 0x1278fa2b4c0>

More complex logic should be captured in a subclass of the `Callback` class.
It has six methods to overwrite and by default we get access to a few attributes: the model, trainings parameters and validation data.
You can inspect the contents of the `tensorflow.keras.callbacks.Callback` class by running the cell below.

In [None]:
??tensorflow.keras.callbacks.Callback

### <mark>Exercise: Custom callback
> - Write a callback that prints the standard deviation of the weights in the last layer at the end of each epoch.
> - Train a new model again and observe how the loss changes.

In [40]:
# %load ../answers/custom_callback.py
from sklearn.metrics import roc_auc_score
from tensorflow.keras.callbacks import Callback


class WeightStdCallback(Callback):
    def __init__(self):
        super(WeightStdCallback, self).__init__()

    def on_epoch_end(self, batch, logs={}):
        print(
            f"\nWeight std of last layer: {self.model.weights[-1].numpy().std():1.2E}"
        )
        return


fashion_model = make_fashion_mnist_model()
fashion_model.compile(optimizer="adam", loss="categorical_crossentropy")
fashion_model.fit(
    x_train[:1000, :, :, :],
    y_train[:1000],
    epochs=10,
    validation_data=(x_test, y_test),
    callbacks=[WeightStdCallback()],
)


Epoch 1/10
Weight std of last layer: 3.61E-03
Epoch 2/10
Weight std of last layer: 5.62E-03
Epoch 3/10
Weight std of last layer: 7.49E-03
Epoch 4/10
Weight std of last layer: 1.14E-02
Epoch 5/10
Weight std of last layer: 1.53E-02
Epoch 6/10
Weight std of last layer: 1.92E-02
Epoch 7/10
Weight std of last layer: 2.27E-02
Epoch 8/10
Weight std of last layer: 2.58E-02
Epoch 9/10
Weight std of last layer: 2.90E-02
Epoch 10/10
Weight std of last layer: 3.12E-02


<keras.callbacks.History at 0x1279132f160>

---
## Summary

This section showed how callbacks can be used to save & monitor your models.
If you need a visualization tool during training, TensorBoard has easy integration with Keras.
For custom functionality you can write your own callbacks.

In [70]:
shutil.rmtree("../output")