Themes of the chapter:

* The Keras functional API
* Using Keras callback
* Working with the TensorBoard visualization tool
* Important best practices for developing state-of-the-art models

## Keras functional API

The functional API allows multimodal inputs, multiple outputs or branched structures (i.e. acyclic graphs).

### Intro to the functional API

In [5]:
''' Intro to the functional API '''

# Main principle - use layers as functions

from keras import Input, layers

input_tensor = Input(shape=(32,))
dense = layers.Dense(32, activation='relu')
output_tensor = dense(input_tensor)

In [1]:
from keras.models import Sequential, Model
from keras import layers
from keras import Input

seq_model = Sequential()
seq_model.add(layers.Dense(32, activation='relu', input_shape=(64,)))
seq_model.add(layers.Dense(32, activation='relu'))
seq_model.add(layers.Dense(10, activation='softmax'))

input_tensor = Input(shape=(64,))
x = layers.Dense(32, activation='relu')(input_tensor)
x = layers.Dense(32, activation='relu')(x)
output_tensor = layers.Dense(10, activation='softmax')

Using Theano backend.


In [2]:
''' In input tensor and output tensor are unrelated, you get Runtime error '''

unrelated_input = Input(shape=(32,1))
bad_model = Model(unrelated_input, output_tensor)

ValueError: Output tensors to a Model must be the output of a TensorFlow `Layer` (thus holding past layer metadata). Found: <keras.layers.core.Dense object at 0x1c20ef2160>

### Multi-input models

In [4]:
''' Implementation of a two-input question-answer model '''

from keras.models import Model
from keras import Input, layers

text_vocabulary_size = 10000
question_vocabulary_size = 10000
answer_vocabulary_size = 500

# Branch 1
text_input = Input(shape=(None,), dtype='int32', name='text')
embedded_text = layers.Embedding(text_vocabulary_size, 64)(text_input) # There was errata
encoded_text = layers.LSTM(32)(embedded_text)

# Branch 2
question_input = Input(shape=(None,),
                       dtype='int32',
                       name='question')
embedded_question = layers.Embedding(question_vocabulary_size, 32)(question_input) # And there too
encoded_question = layers.LSTM(16)(embedded_question)

# Concatenation & following
concatenated = layers.concatenate([encoded_text, encoded_question],
                                  axis=-1)
answer = layers.Dense(answer_vocabulary_size,
                      activation='softmax')(concatenated)

model = Model([text_input, question_input], answer)
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['acc'])

In [5]:
''' Feeding data to a multi-input model '''

import numpy as np

print(model.summary())

num_samples = 1000
max_length = 100

text = np.random.randint(1, text_vocabulary_size,
                         size=(num_samples, max_length))
question = np.random.randint(1, question_vocabulary_size,
                         size=(num_samples, max_length))
answers = np.random.randint(0, 1, size=(num_samples, answer_vocabulary_size))

# Fit using list of inputs
model.fit([text, question], answers, epochs=10, batch_size=128)

# OR

# Fit using a dictionary of inputs
#model.fit({'text': text, 'question': question}, answers,
#          epochs=10, batch_size=128)

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
text (InputLayer)               (None, None)         0                                            
__________________________________________________________________________________________________
question (InputLayer)           (None, None)         0                                            
__________________________________________________________________________________________________
embedding_5 (Embedding)         (None, None, 64)     640000      text[0][0]                       
__________________________________________________________________________________________________
embedding_6 (Embedding)         (None, None, 32)     320000      question[0][0]                   
__________________________________________________________________________________________________
lstm_5 (LS

<keras.callbacks.History at 0x1c21a147f0>

### Multi-output model

In [1]:
''' Implementation of a three-output model '''

from keras import layers
from keras import Input
from keras.models import Model

vocabulary_size = 50000
num_income_groups = 10

posts_input = Input(shape=(None,), dtype='int32', name='posts')
embedded_posts = layers.Embedding(vocabulary_size, 256)(posts_input)
x = layers.Conv1D(128, 5, activation='relu')(embedded_posts)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.GlobalMaxPooling1D()(x)
x = layers.Dense(128, activation='relu')(x)

age_prediction = layers.Dense(1, name='age')(x)
income_prediction = layers.Dense(num_income_groups,
                                 activation='softmax',
                                 name='income')(x)
gender_prediction = layers.Dense(1, activation='sigmoid', name='gender')(x)

model = Model(posts_input, 
              [age_prediction, income_prediction, gender_prediction])

model.summary()

Using Theano backend.


__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
posts (InputLayer)              (None, None)         0                                            
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, None, 256)    12800000    posts[0][0]                      
__________________________________________________________________________________________________
conv1d_1 (Conv1D)               (None, None, 128)    163968      embedding_1[0][0]                
__________________________________________________________________________________________________
max_pooling1d_1 (MaxPooling1D)  (None, None, 128)    0           conv1d_1[0][0]                   
__________________________________________________________________________________________________
conv1d_2 (

In [2]:
''' Computation options for multi-output model '''

model.compile(optimizer='rmsprop',
              loss=['mse', 'categorical_crossentropy', 'binary_crossentropy'])
model.compile(optimizer='rmsprop',
              loss={'age': 'mse', 
                    'income': 'categorical_crossentropy',
                    'gender': 'binary_crossentropy'})

Very imbalanced loss contributions will cause the model representations be optimized preferentially for the task with the largest individual loss. You can assign different levels of importance to the loss values.

In [3]:
''' Multi-output model: loss weightning '''

model.compile(optimizer='rmsprop',
              loss=['mse', 'categorical_crossentropy', 'binary_crossentropy'],
              loss_weights=[0.25, 1., 19.])

model.compile(optimizer='rmsprop',
              loss={'age': 'mse',
                    'income': 'categorical_crossentropy',
                    'gender': 'binary_crossentropy'},
              loss_weights={'age': 0.25,
                            'income': 1.,
                            'gender': 10.})

In [5]:
''' Feeding data do the multi-output model '''

import numpy as np
from keras.utils import to_categorical

num_samples = 1000
max_length = 400

posts = np.random.randint(1, vocabulary_size,
                          size=(num_samples, max_length))

age_targets = np.random.randint(10, 110, 
                                size=(num_samples,))
income_targets = np.random.randint(1, num_income_groups,
                                   size=(num_samples,))
income_targets = to_categorical(income_targets)
gender_targets = np.random.randint(0, 1,
                                   size=(num_samples,))

print(posts[0], age_targets[0], income_targets[0], gender_targets[0])

model.fit(posts, [age_targets, income_targets, gender_targets],
          epochs=10, batch_size=64)

#model.fit(posts, {'age': age_targets,
#                  'income': income_targets,
#                  'gender': gender_targets},
#          epochs=10,
#          batch_size=64)

[46379  1471 16278 39685  7927 22344 18419 20028 17623 16181 14711 17321
  2136 31182  7109  3442  2877 42261 21870 23838  6092 44632 14292 48967
 49901 17699 36182 45835 19542 47721 15665 30554 36761 31226 15835 17588
  5989  8599 16744 30861 24026 43107  8837 16317 26366 39801   542  2647
  2253 32684 32215 25042 47415 17775 21028 22715 47055 25890 22629 25135
 26575 45971   256 48074  7911 46732 26745  5018 18651 28165 27068 48507
 14102 38835 24332 23386 10383 48908 26603  3681 46439 35138 26643 21904
 47342 17126 33012 18668 35007 42619  3098 42895 24596 20972 18389 26850
 37025 23240 18434 49985 49939  3631 18199 11381 15486  7778 14937  2139
 40494 14971 48050 28024 36686 26201 27068 17309 44618  8383 29208  5311
 36283 20892 33411 17130 33797 36663 19365 33739 33336  6138 17180 39718
 49582 20604  4388  6709   322 14769 48479 38615 24470 33462  5872 49187
 11515 29624 33073 36089 15120  7680 27418 21819 21513 13147 38369  6632
 28133  8478 38450 35576 26492  1632 34879 32208 18

<keras.callbacks.History at 0x1c1f8c1748>

### Directed acyclic graph of layers



#### *Inception modules*

Three to four branches starting with (1 \* 1) convolution layers, followed by (3 \* 3) layers and (5 \* 5) layers on some branches.

##### The purpose of 1 \* 1 convolutions

Information from channels mixes together, but not in space.

In [8]:
''' Implement one block of Inception '''

from keras import layers, Input

x = Input(shape=(512,512,3))

branch_a = layers.Conv2D(128, 1,
                         activation='relu', strides=2)(x)

branch_b = layers.Conv2D(128, 1, activation='relu')(x)
branch_b = layers.Conv2D(128, 3, activation='relu', strides=2)(branch_b)

branch_c = layers.AveragePooling2D(3, strides=2)(x)
branch_c = layers.Conv2D(128, 3, activation='relu')(branch_c)

branch_d = layers.Conv2D(128, 1, activation='relu')(x)
branch_d = layers.Conv2D(128, 3, activation='relu')(branch_d)
branch_d = layers.Conv2D(128, 3, activation='relu', strides=2)(branch_d)

output = layers.concatenate([branch_a, branch_b, branch_c, branch_d], axis=-1)

ValueError: A `Concatenate` layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 256, 256, 128), (None, 255, 255, 128), (None, 253, 253, 128), (None, 254, 254, 128)]

#### *Residual connections*

Is added to any model with more than 10 layers. Consists of making the output of an earlier layer available as input to a later layer, creating a shortcut in a sequential network. The earlier output is summed with the later activation, which assumes that both activation are the same size. 

In [9]:
from keras import layers

x = Input(shape=(256, 256, 3))
y = layers.Conv2D(128, 3, activation='relu', padding='same')(x)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)

y = layers.add([y, x])

ValueError: Operands could not be broadcast together with shapes (256, 256, 128) (256, 256, 3)

### Layer weight sharing

For example, model attempts to assess the semantic similarity between two sentences. The model has two inputs (the two sentences to compare) and outputs a score between 0 and 1, where 0 means unrelated sentences and 1 means sentences that are either identical or reformulations of each other.

The two input sentences are interchangeable, because semantic similarity is a symmetrical relationship: the similarity of A to B is identical to the similarity of B to A. It wouldn't make sense to learn two independent models for processing each input sentence. We use *Siamese LSTM* (or *shared LSTM*).

In [10]:
from keras import layers
from keras import Input
from keras.models import Model

lstm = layers.LSTM(32)

left_input = Input(shape=(None, 128))
left_output = lstm(left_input)

right_input = Input(shape=(None, 128))
right_output = lstm(right_input)

merged = layers.concatenate([left_output, right_output], axis=-1)
predictions = layers.Dense(1, activation='sigmoid')(merged)

model = Model([left_input, right_input], predictions)
# The weights of shared layer are updated based on both inputs.

### Models as layers

We can stack different models as layers - as we did with the pretrained network.

Example: vision model that uses a dual camera as its input, i.e. two parallel cameras, a few centimeters apart. Such a model can perceive depth, which can be useful in many application. You shouldn't need two independent models to extract visual features from the left camera and the right camera before merging the two feeds.

In [12]:
from keras import layers
from keras import applications
from keras import Input

xception_base = applications.Xception(weights=None, include_top=False)

left_input = Input(shape=(250, 250, 3))
right_input = Input(shape=(250, 250, 3))

left_features = xception_base(left_input)
right_features = xception_base(right_input)

merged_features = layers.concatenate(
    [left_features, right_features], axis=-1)

### Wrapping up on functional API

* Feel free to use functional API.
* You can build a model with several inputs, several outputs and complex internal topology.
* You can reuse weights of a layers or model across different processing branches, by calling the same layer or model instance several times.

## Inspecting and monitoring deep-learning models using Keras callbacks and TensorBoard



### Using callbacks to act on a model during training

* *Model checkpointing* - Saving the current weights of the model at different points during training.
* *Early stopping* - interrupting training when the validation loss is no longer improving.
* *Dynamically adjusting the value of certain parameter during training* - such as the learning rate or optimizer.
* *Logging training and validation metrics during training* - the Keras progress bar.

**keras.callbacks**:

- keras.callbacks.ModelCheckpoint
- keras.callbacks.EarlyStopping
- keras.callbacks.LearningRateScheduler
- keras.callbacks.ReduceLROnPlateau
- keras.callbacks.CSVLogger

#### The ModelCheckpoint and EarlyStopping callbacks

In [None]:
import keras

callbacks_list = [ 
        keras.callbacks.EarlyStopping(monitor='acc', # monitors if the validation accuracy
                                      patience=1), # stops improving for two epochs
        keras.callbacks.ModelCheckpoint(filepath='my_model.h5', # saves the model
                                        monitor='val_loss', # if validation loss
                                        save_best_only=True) # is the best
]

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['acc'])
model.fit(x, y,
          epochs=10,
          batch_size=32,
          callbacks=callbacks_list,
          validation_data=(x_val, y_val)) # obligatory because of callbacks

#### The ReduceLROnPlateau callback

In [None]:
callbacks_list = [
        keras.callbacks.ReduceLROnPlateau(monitor='val_loss', # reduces LR based on val_loss
                                          factor=0.1, # divides LR by 10
                                          patience=10) # waits for the moment 10 epochs
]

model.fit(x, y,
          epochs=10,
          batch_size=32,
          callbacks=callbacks_list,
          validation_data=(x_val, y_val))

#### Writing custom callback

You rewrite method for a new class(keras.callbacks.Callback). The methods are:

- on_epoch_begin
- on_epoch_end
- on_batch_begin
- on_batch_end
- on_train_begin
- on_train_end

Callbacks have access to:

* logs (training/validation metrics of batch, epoch, run)
* self.model - model instance
* self.validation_data - validation data passed to model.fit

In [None]:
''' Writing custom callback that  '''

import keras
import numpy as np

class ActivationLogger(keras.callbacks.Callback):
    
    def set_model(self, model):
        self.model = model
        layer_outputs = [layer.output for layer in model.layers]
        self.activations_model = keras.models.Model(model.input,
                                                    layer_outputs)
        
    def on_epoch_end(self, epoch, logs=None):
        if self.validation_data is None:
            raise RuntimeError('Requires validation_data.')
        validation_sample = self.validation_data[0][0:1]
        activations = self.activations_model.predict(validation_sample)
        with open('activation_at_epoch' + str(epoch) + '.npz', 'w') as f:
            np.savez(f, activations)

### Intro to TensorBoard

Works only with TensorFlow backend.

Browser-based visualization tool that helps in:

* Visually monitoring metrics during training
* Visualizing model architecture
* Visualizing histograms of activations and gradients
* Exploring embeddings in 3D

In [None]:
''' Text-classification model to use with TensorBoard '''

import keras
from keras import layers
from keras.datasets import imdb
from keras.preprocessing import sequence

max_features = 2000
max_len = 500

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)
x_train = sequence.pad_sequences(x_train, maxlen=max_len)
x_test = sequence.pad_sequences(x_test, maxlen=max_len)

model = keras.models.Sequential()
model.add(layers.Embedding(max_features, 128,
                           input_length=max_len,
                           name='embed'))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.MaxPooling1D(5))
model.add(layers.Conv1D(32, 7, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(1, activation='sigmoid'))

model.summary()

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['acc'])

# make dir for logs (shell: mkdir my_log_dir)

callbacks = [
    keras.callbacks.TensorBoard(
        log_dir='my_log_dir',
        histogram_freq=1,
        embedding_freq=1
    )
]

history = model.fit(x_train, y_train,
                    epochs=20,
                    batch_size=128,
                    validation_split=0.2,
                    callbacks=callbacks)

# shell: tensorboard --logdir=my_log_dir
# browse http://localhost:6006

In [None]:
''' How to plot model '''

from keras.utils import plot_model

plot_model(model, show_shapes=True, to_file='model.png')

## State-of-the-art tuning

### Advanced architecture patterns

* Residual connections
* Batch normalization
* Depthwise separable convolution

#### *Batch normalization*

Layer **BatchNormalization** can adaptively normalize data even as the mean and variance change over time.

Internal maintaining an exponential moving average of the batch-wise mean and variance of data seen during training. Helps in very deep models.

The BatchNormalization is typically used after a convolutional or densely connected layer.

In [None]:
conv_model.add(layers.Conv2D(32, 3, activation='relu'))
conv_model.add(layers.BatchNormalization())

dense_model.add(layers.Dense(32, activation='relu'))
dense_model.add(layers.BatchNormalization())

#### *Depthwise separable convolution*

Split channels and train convolutional layers for each channel independently.

In [2]:
from keras.models import Sequential, Model
from keras import layers

height = 64
width = 64
channels = 3
num_classes = 10

model = Sequential()
model.add(layers.SeparableConv2D(32, 3,
                                 activation='relu',
                                 input_shape=(height, width, channels,)))
model.add(layers.MaxPooling2D(2))

model.add(layers.SeparableConv2D(64, 3, activation='relu'))
model.add(layers.SeparableConv2D(128, 3, activation='relu'))
model.add(layers.MaxPooling2D(2))

model.add(layers.SeparableConv2D(64, 3, activation='relu'))
model.add(layers.SeparableConv2D(128, 3, activation='relu'))
model.add(layers.GlobalAveragePooling2D())

model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dense(num_classes, activation='softmax'))

model.compile(optimizer='rmsprop', loss='categorical_crossentropy')

### Hyperparameter optimization

Process typically looks like this:

1) Choose a set of hyperparameters (automatically).

2) Build the corresponding model.

3) Fit it to your training data, and measure the final performance on the validation data.

4) Choose the next set of hyperparameters to try (automatically).

5) Repeat.

6) Eventually, measure performance on test data.

Two techniques: Grid Search and Random Search are most useful.

One more - Hyperopt (https://github.com/hyperopt/hyperopt). One more - Hyperas (https://github.com/maxpumperla/hyperas).

### Model ensembling

Different good models trained independently are likely to be good *for different reasons*.

In [None]:
preds_a = model_a.predict(x_val)
preds_b = model_b.predict(x_val)
preds_c = model_c.predict(x_val)
preds_d = model_d.predict(x_val)

final_preds = 0.25 * (preds_a + preds_b + preds_c + preds_d)

# OR

final_preds = 0.5 * preds_a + 0.25 * preds_b + 0.1 * preds_c + 0.15 * preds_d

The models should be as good as possible and as different as possible - trees and deep networks, for example.

### Wrapping up

* When building high-performing deep convnets, use residual connections, batch normalization and depthwise separable convolutions.
* Building deep networks requires making many small hyperparameter and architecture choices. Rather than basing these choices on intuition or random chanse, it's better to systematically search hyperparameter space to find optimal choices.
* Large ensembles of different models are cool, use them.