# 7. Advanced Deep Learning Best Practices

### The Keras Functional API

So far, the neural networks have been implemented using the `Sequential` model. This assumes that the model has <u>one and only one input</u> and <u>one and only one output</u>. Also, there is a linear stack of layers. Think of it as only 1 path with multiple layers.

This is not ideal for some cases. Some networks have multiple independent inputs and some produce multiple outputs. Futhermore, some models have internal branching between layers that make them look like graphs rather than linear stacks of layers.

Some tasks require <b>multimodal</b> inputs, that merge data from different input sources, processing each type of data using different kinds of neural layers. It's more ideal to predict jointly using different types of inputs (e.g. images & text) than learning different models for each output. Similarly, some models product multiple target attributes of input data. For example, jointly predicting the year of release and genre of a piece of writing.

<img src="img71.png" width="600">
<img src="img72.png" width="600">

The following are 3 examples of recent architectures that also don't obey the 1-input, 1-output, 1-stack architecture:

- <b>Wide & Deep</b> neural network - This architecture connects all or part of the inputs directly to the output layer. With this architecture, it is possible to learn both deep patterns  (using the deep path) and simple rules (using the short path). More at [Wide & Deep Learning: Better Together with TensorFlow](https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html)
<img src="img3.png" width="900"/>

- <b>Inception Family</b> - relies on inception modules, where the input is processed by several parallel convolutional branches, and their outputs are merged to a single tensor. More at [Going Deeper with Convolutions](https://arxiv.org/abs/1409.4842)

- <b>Adding Residual Connections</b> - A residual connection of injecting previous representations into the downstream flow by adding a past output tensor to a later output tensor. More at [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)

<img src="img73.png" width="450"/>

To handle these use cases, and other cases, we cannot use the `Sequential` model but there is a more flexible way to use Keras - the <b>functional model</b>

In [1]:
from tensorflow.keras.datasets import boston_housing

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import KFold, train_test_split

from tensorflow.keras.utils import to_categorical
from tensorflow.keras import Input, layers, models, backend

In [None]:
# Ingestion
###########
(train_data, y_train), (test_data, y_test) = boston_housing.load_data()

# Preprocessing
###############
sc = StandardScaler()
x_train = sc.fit_transform(train_data)
x_test = sc.transform(test_data)

x_train__train, x_train__val, y_train__train, y_train__val = train_test_split(x_train, y_train, test_size=0.15,
                                                                             random_state=0)
NUM_FEATURES = x_train.shape[1:]

### Introduction to the Functional API

In the functional API, you directly manipulate tensors, and use layers as <u>functions</u> that take tensors and return tensors (hence, functional).

#### Single Input, Single Output, One Linear Stack

Let's build a side-by-side comparison of a simple model to tackle the **housing prices** regression problem.

In [None]:
# Using models.Sequential()
###########################
# Build model
backend.clear_session()
m11 = models.Sequential()
m11.add(layers.Dense(32, activation='relu', 
                     input_shape=(NUM_FEATURES)))
m11.add(layers.Dense(32, activation='relu'))
m11.add(layers.Dense(1))
print(m11.summary())

m11.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
m11.fit(x_train__train, y_train__train, 
        epochs=20, batch_size=4,
        validation_data= (x_train__val, y_train__val),
       verbose=0)

In [None]:
# Using Functional API
######################
backend.clear_session()
m12_input = Input(shape=NUM_FEATURES)
m12_l1 = layers.Dense(32, activation='relu')(m12_input)
m12_l2 = layers.Dense(32, activation='relu')(m12_l1)
m12_output = layers.Dense(1)(m12_l2)
m12 = models.Model(m12_input, m12_output)
print(m12.summary())
m12.compile(optimizer='rmsprop', loss='mse', metrics=['mae'])
m12.fit(x_train__train, y_train__train, 
       epochs=12, batch_size=4,
       validation_data=(x_train__val, y_train__val),
       verbose=0)

In the backend, Keras retrieves every layer going from the inputs to the outputs to a graphs-like data structure, a `Model`. Of course, you need to ensure that there are intermediate layers between the inputs and outputs.

<hr>

#### Multiple Inputs, Single Output
Now, we shall build a model that have multiple inputs. Typically, for these models, there is a step to merge the different input branches that can combine several tensors. 

<b>Example 1</b> - The **housing prices problem** now requires we use a subset of the features for one input and another subset of features for another. To do this, we need to make changes on <u>both the architecture</u> and the <u>input data</u>.

In [None]:
# Instantiate Model
###################
# Here, we need to specify the no. of features for each input layer
input_layera = layers.Input(shape=(10,))
input_layerb = layers.Input(shape=(7,))

# Dense layers, Concatenate layer & Output layer is the same as previous complex workflows
hidden_layer1 = layers.Dense(30, activation='relu')(input_layerb)
hidden_layer2 = layers.Dense(30, activation='relu')(hidden_layer1)
concat_layer = layers.Concatenate()([input_layera, hidden_layer2])
output_layer = layers.Dense(1)(concat_layer)
m21 = models.Model(inputs=[input_layera, input_layerb], outputs=output_layer)
print(m21.summary())

In [None]:
# Prepare data for training model
#################################
inputa_cols = list(range(0,10))
inputb_cols = [1,5,6,7,8,11,12]
x_train__trainA = x_train__train[:,inputa_cols]
x_train__trainB = x_train__train[:,inputb_cols]
x_train__val_A = x_train__val[:,inputa_cols]
x_train__val_B = x_train__val[:,inputb_cols]

In [None]:
print(x_train__trainA.shape)
print(x_train__trainB.shape)

In [None]:
# Train & Tune Model
####################
m21.compile(optimizer='sgd', loss='mean_squared_error', metrics=['mae'])
m21.fit((x_train__trainA, x_train__trainB), y_train__train, epochs=20,
           validation_data=((x_train__val_A, x_train__val_B), y_train__val), verbose=0)

In [None]:
# Prepare test data
###################
x_testA = x_test[:,inputa_cols]
x_testB = x_test[:,inputb_cols]

# Evaluation
model1.evaluate((x_testA, x_testB), y_test)

# Prediction
model1.predict((x_testA[:2], x_testB[:2]))

<b>Example 2</b> - Consider a **Q&A problem** where there is a reference text and a question as the inputs, and the output is a one-word answer. Conceretely, there is a news article and "country/person/incident" as the question, and the outputs is a one-word answer.

In [None]:
TEXT_VOCAB_SIZE, QUESTION_VOCAB_SIZE, ANSWER_VOCAB_SIZE = 10000, 25, 500
max_length, max_qn_length, max_ans_length = 100, 25, 5
max_samples = 1000

text_corpus = np.random.randint(1, TEXT_VOCAB_SIZE,
                               size=(max_samples, max_length))
questions_corpus = np.random.randint(1, QUESTION_VOCAB_SIZE,
                               size=(max_samples, max_qn_length))
answers_corpus = np.random.randint(0,ANSWER_VOCAB_SIZE,
                                  size=(max_samples,))
answers_corpus = to_categorical(answers_corpus)

In [None]:
print(text_corpus.shape)
print(text_corpus[:2])
print()
print(questions_corpus.shape)
print(questions_corpus[:2])
print()
print(answers_corpus.shape)
print(answers_corpus[:2])

In [None]:
backend.clear_session()
m31_corpus_input = Input(shape=(max_length,), dtype='int32')
m31_qn_input = Input(shape=(max_qn_length,), dtype='int32')

m31_corpus_emb = layers.Embedding(TEXT_VOCAB_SIZE, 64)(m31_corpus_input)
m31_qn_emb = layers.Embedding(QUESTION_VOCAB_SIZE, 64)(m31_qn_input)

m31_corpus_lstm = layers.LSTM(32)(m31_corpus_emb)
m31_qn_lstm = layers.LSTM(32)(m31_qn_emb)

m31_concat = layers.Concatenate()([m31_corpus_lstm, m31_qn_lstm])
m31_ans = layers.Dense(ANSWER_VOCAB_SIZE, activation='softmax')(m31_concat)
m31 = models.Model(inputs=[m31_corpus_input, m31_qn_input], outputs=m31_ans)
print(m31.summary())

In [None]:
m31.compile(optimizer='rmsprop', 
            loss='categorical_crossentropy',
            metrics=['acc'])

In [None]:
m31.fit([text_corpus, questions_corpus], answers_corpus, 
        epochs=10, batch_size=128)

<hr>

Let's build a wide & deep network to tackle the **housing prices** problem. Take note of the comments describing each layer.

In [None]:
# Instantiate Model
###################

# Input object. This is needed as we might have multiple inputs.
input_layer = tf_keras.layers.Input(shape=NUM_FEATURES)

# Dense layer with 30 neurons & RELU activation. Notice it is called like a function,
# passing in the input layer. 
hidden_layer1 = tf_keras.layers.Dense(30, activation='relu')(input_layer)
# Another Dense layer. Now, the first hidden layer is passed in.
hidden_layer2 = tf_keras.layers.Dense(30, activation='relu')(hidden_layer1)

# Concatenate layer. concatenates the input & the output of the 2nd hidden layer
concat_layer = tf_keras.layers.Concatenate()([input_layer, hidden_layer2])

# Output layer. Single neuron and no activation function.
output_layer = tf_keras.layers.Dense(1)(concat_layer)

# Finally, create the Keras model with this architecture.
model0 = tf_keras.models.Model(inputs=[input_layer], outputs=output_layer)

The above model is visually represented by the following network diagram:
<img src="img3a.png" width="150"/>
(Ref. 2)

Once you have built the Keras model, the rest of the steps follows the simple workflow: Compile the model, train & tune it, and finalise the tuned model.

In [None]:
# Train & Tune Model
####################
model0.compile(optimizer='sgd', loss='mean_squared_error', metrics=['mae'])
history0 = model0.fit(x_train, y_train,  epochs = 10, verbose=0)

In [None]:
# Save model
# model0.save('model0.h5')

<b>Multiple inputs, Single output</b>



<b>Multiple inputs, Multiple outputs</b>

For multiple outputs, you can use the following code snippets to help you.

```python
input_layera = tf_keras.layers.Input(shape=(10,))
input_layerb = tf_keras.layers.Input(shape=(7,))

hidden_layer1 = tf_keras.layers.Dense(30, activation='relu')(input_layerb)
hidden_layer2 = tf_keras.layers.Dense(30, activation='relu')(hidden_layer1)
concat_layer = tf_keras.layers.Concatenate()([input_layera, hidden_layer2])
output_layer1 = tf_keras.layers.Dense(1)(concat_layer)
output_layer2 = tf_keras.layers.Dense(1)(hidden_layer2) # Add this
model3 = tf_keras.models.Model(inputs=[input_layera, input_layerb], 
                               outputs=[output_layer1, output_layer2]) # Change this
```

When compiling the model, use different metrics for different outputs

```python
model3.compile(optimizer='sgd', loss='mean_squared_error', metrics=['mae', 'mse'])
```

When evaluating the model, Keras returns the total loss, as well as the individual losses
```python
model3.evaluate((x_testA, x_testB), y_test)```

### Building Dynamic Models Using the Subclassing API

To add flexibility, we can use the Subclassing API to subclass the Model and create the layers needed.

Here, we separate the creating of the layers from their usage.

In [None]:
class WideAndDeepModel(tf_keras.models.Model):
    def __init__(self, units=30, activation='relu', **kwargs):
        super().__init__(**kwargs)
        self.hidden_layer1 = tf_keras.layers.Dense(units, activation=activation)
        self.hidden_layer2 = tf_keras.layers.Dense(units, activation=activation)
        self.output_layer = tf_keras.layers.Dense(1)
    
    def call(self, inputs):
        inputa, inputb = inputs
        hidden1 = self.hidden_layer1(inputb)
        hidden2 = self.hidden_layer2(hidden1)
        conct = tf_keras.layers.Concatenate()([inputa, hidden2])
        ouptt = self.output_layer(conct)
        return ouptt
        

In [None]:
# Load & Train model
model3 = WideAndDeepModel(30, 'relu')
model3.compile(optimizer='sgd', loss='mean_squared_error', metrics=['mae'])
model3.fit((x_train__trainA, x_train__trainB), y_train__train, epochs=20,
           validation_data=((x_train__val_A, x_train__val_B), y_train__val), verbose=0)

In [None]:
# Evaluate & Predict
model3.evaluate((x_testA, x_testB), y_test)
model3.predict((x_testA[:2], x_testB[:2]))

### Saving & Restoring a Model

This is useful when models take a long time to train or when you need access to a previously trained model.

In [None]:
# Saving a model
model1.save('model3.h5')

In [None]:
# Load & Predict
model1ld = tf_keras.models.load_model('model3.h5')
model1ld.predict((x_testA[10:15], x_testB[10:15]))

### Callbacks

Callbacks are useful to perform actions during training. For example, say we want to save the best model during training.

In [None]:
input_layer = tf_keras.layers.Input(shape=NUM_FEATURES)
hidden_layer1 = tf_keras.layers.Dense(30, activation='relu')(input_layer)
hidden_layer2 = tf_keras.layers.Dense(30, activation='relu')(hidden_layer1)
concat_layer = tf_keras.layers.Concatenate()([input_layer, hidden_layer2])
output_layer = tf_keras.layers.Dense(1)(concat_layer)
model0a = tf_keras.models.Model(inputs=[input_layer], outputs=output_layer)
model0a.compile(optimizer='sgd', loss='mean_squared_error', metrics=['mae'])

# Adding a callback to save only the best model
save_best_checkpoint = tf_keras.callbacks.ModelCheckpoint('model0a_best.h5', save_best_only=True)
model0a.fit(x_train, y_train,  epochs = 10, validation_data=(x_train__val, y_train__val), 
            callbacks=[save_best_checkpoint], verbose=0)

In [None]:
# Adding a callback to Early Stop to avoid wasting time and resources
# with no further optimisation
stop_early_checkpoint = tf_keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True)

# Combine both callbacks. Use large epoch number because the model will stop when there 
# is no more better performance in the metrics
model0a.fit(x_train, y_train,  epochs = 100, validation_data=(x_train__val, y_train__val), 
            callbacks=[save_best_checkpoint, stop_early_checkpoint], 
            verbose=0)

### Visualisation using TensorBoard

In [None]:
import os

In [None]:
def get_run_logdir(root_logdir):
    import time
    run_id = time.strftime("r_%Y%m%d_%H%M%S")
    return os.path.join(root_logdir, run_id)

root_logdirp = os.path.join(os.curdir, "logs")
run_logdir = get_run_logdir(root_logdirp)
print(run_logdir)

In [None]:
# Create the Tensorboard callback and use it
tensorboard_cb = tf_keras.callbacks.TensorBoard(run_logdir)
model0a.fit(x_train, y_train,  epochs = 100, validation_data=(x_train__val, y_train__val), 
            callbacks=[save_best_checkpoint, tensorboard_cb], 
            verbose=0)

Finally, you can access the TensorBoard with `python -m tensorboard.main --logdir=r_20200601_122625/`

<img src="img3c.png" width="750"/>

Additional Readings:

- (1)  https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html
- (2)  https://github.com/lutzroeder/Netron