<a href="https://colab.research.google.com/github/Richish/deep_learning_with_python/blob/master/ch7_functional_api.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# How to make use of functional api

In the functional API, you directly manipulate tensors, and you use layers as functions
that take tensors and return tensors (hence, the name functional API):

In [None]:
# how does it look like
from keras.models import Model
from keras.layers import Input, Dense

input_tensor=Input(shape=(64,))

layer1=Dense(units=32, activation="relu")
layer2=Dense(units=32, activation="relu")
layer3=Dense(units=10, activation="softmax")

x=layer1(input_tensor) # x is o/p tensor for this layer
x=layer2(x)
output_tensor=layer3(x)

model=Model(input_tensor, output_tensor)
model.summary()




Using TensorFlow backend.


Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 64)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_2 (Dense)              (None, 32)                1056      
_________________________________________________________________
dense_3 (Dense)              (None, 10)                330       
Total params: 3,466
Trainable params: 3,466
Non-trainable params: 0
_________________________________________________________________


When it comes to compiling, training, or evaluating such an instance of Model, the
API is the same as that of Sequential

In [None]:
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
import numpy as np
x_train = np.random.random((1000, 64))
y_train = np.random.random((1000, 10))
model.fit(x_train, y_train, epochs=10, batch_size=128)
score = model.evaluate(x_train, y_train)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [None]:
# Multi Input Models

# Multi Input Models

The functional API can be used to build models that have multiple inputs. Typically,
such models at some point merge their different input branches using a layer that can
combine several tensors: by adding them, concatenating them, and so on. This is usually
done via a Keras merge operation such as keras.layers.add, keras.layers
.concatenate, and so on. Let’s look at a very simple example of a multi-input model:
a question-answering model.

## Functional api implementation of 2-input question-answering model

In [None]:
from keras.models import Model
from keras.layers import Embedding, concatenate, Dense, LSTM
from keras import Input

text_vocabulary_size=10_000
question_vocabulary_size=10_000
answer_vocabulary_size=500

text_input=Input(shape=(None,), dtype='int32', name='text')
embedded_text=Embedding(64, text_vocabulary_size)(text_input)
encoded_text=LSTM(32)(embedded_text)

question_input=Input(shape=(None,), dtype='int32', name='question')
embedded_question=Embedding(32, question_vocabulary_size)(question_input)
encoded_question=LSTM(16)(embedded_question)

concatenated=concatenate([encoded_text, encoded_question], axis=-1)
answer=Dense(answer_vocabulary_size, activation='softmax')(concatenated)

model=Model([text_input, question_input], answer)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])
model.summary()



Model: "model_2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
text (InputLayer)               (None, None)         0                                            
__________________________________________________________________________________________________
question (InputLayer)           (None, None)         0                                            
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, None, 10000)  640000      text[0][0]                       
__________________________________________________________________________________________________
embedding_2 (Embedding)         (None, None, 10000)  320000      question[0][0]                   
____________________________________________________________________________________________

## Training a multi-input model

Now, how do you train this two-input model? There are two possible APIs: you can feed
the model a list of Numpy arrays as inputs, or you can feed it a dictionary that maps
input names to Numpy arrays. Naturally, the latter option is available only if you give
names to your inputs.

In [None]:
import numpy as np

num_samples=1000
max_length=100

text=np.random.randint(low=1, high=text_vocabulary_size, size=(num_samples, max_length))
question=np.random.randint(low=1, high=question_vocabulary_size, size=(num_samples, max_length))
answers=np.random.randint(low=1, high=2, size=(num_samples, answer_vocabulary_size))

model.fit([text,question], answers, epochs=10, batch_size=128)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.callbacks.History at 0x7f0e760b7b00>

In [None]:
# another way to fit model if inputs are named

In [None]:
model.fit({'text':text, 'question':question}, answers, epochs=10, batch_size=128)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.callbacks.History at 0x7f0e60f5ccf8>

# Multi output models

In [None]:
# functional ap implementation of a multi-output model
# input is sequence of social media posts.
# 3 outputs are-> age, gender and income group of person doingthe post

from keras.models import Model
from keras.layers import Conv1D, Embedding, LSTM, MaxPooling1D, GlobalMaxPooling1D, Dense
from keras import Input

vocabulary_size=50_000
num_income_groups=10

posts_input=Input(shape=(None,), dtype='int32', name='posts')
embedded_posts=Embedding(256, vocabulary_size)(posts_input)

x=Conv1D(128, 5, activation='relu')(embedded_posts)
x=MaxPooling1D(5)(x)
x=Conv1D(256, 5, activation='relu')(x)
x=Conv1D(256, 5, activation='relu')(x)
x=MaxPooling1D(5)(x)
x=Conv1D(256, 5, activation='relu')(x)
x=Conv1D(256, 5, activation='relu')(x)
x=GlobalMaxPooling1D()(x)
x=Dense(128, activation="relu")(x)

age_prediction=Dense(1, name='age')(x)
gender_prediction=Dense(1, activation='sigmoid', name='gender')(x)
income_prediction=Dense(10, activation='softmax', name='income')(x)

model=Model(posts_input, [age_prediction, income_prediction, gender_prediction])

model.summary()


Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
posts (InputLayer)              (None, None)         0                                            
__________________________________________________________________________________________________
embedding_5 (Embedding)         (None, None, 50000)  12800000    posts[0][0]                      
__________________________________________________________________________________________________
conv1d_9 (Conv1D)               (None, None, 128)    32000128    embedding_5[0][0]                
__________________________________________________________________________________________________
max_pooling1d_3 (MaxPooling1D)  (None, None, 128)    0           conv1d_9[0][0]                   
____________________________________________________________________________________________

In [None]:
model.compile(optimizer='rmsprop',
                loss=['mse', 'categorical_crossentropy', 'binary_crossentropy'])
model.compile(optimizer='rmsprop',
                loss={'age': 'mse',
                'income': 'categorical_crossentropy',
                'gender': 'binary_crossentropy'})

model.compile(optimizer='rmsprop',
    loss=['mse', 'categorical_crossentropy', 'binary_crossentropy'],
    loss_weights=[0.25, 1., 10.])

model.compile(optimizer='rmsprop',
    loss={'age': 'mse',
    'income': 'categorical_crossentropy',
    'gender': 'binary_crossentropy'},
    loss_weights={'age': 0.25,
    'income': 1.,
    'gender': 10.})

In [None]:
## training the model for multi-output
model.fit(posts, [age_targets, income_targets, gender_targets],
epochs=10, batch_size=64)
model.fit(posts, {'age': age_targets,
            'income': income_targets,
            'gender': gender_targets},
            epochs=10, batch_size=64)

NameError: ignored

# Directed acyclic graph of layers

## INCEPTION
It consists of a stack of modules
that themselves look like small independent networks, split into several parallel
branches.

The most basic form of an Inception module has three to four branches
starting with a 1 × 1 convolution, followed by a 3 × 3 convolution, and ending with the
concatenation of the resulting features

In [None]:
from keras.datasets import iris
from keras.layers import Conv2D, Dense, concatenate, Input, AvgPool2D
from keras.models import Model
from keras.optimizers import RMSprop

# imaginary_input
(x_train, y_train), (x_val, y_val)=mnist.load_data()
x_train.shape

x_train[0]

x=Input(shape=(None,None,256), dtype="float32", name='input1')

branch_a=Conv2D(filters=128, kernel_size=1, strides=2, activation="relu")(x)

branch_b=Conv2D(filters=128, kernel_size=128)(x)
branch_b=Conv2D(filters=128, kernel_size=3, strides=2, activation='relu')(branch_b)

branch_c=AvgPool2D(pool_size=(3,3), strides=2)(x)
branch_c=Conv2D(filters=128, kernel_size=3, activation='relu')(branch_c)

branch_d=Conv2D(filters=128, kernel_size=1, activation='relu')(x)
branch_d=Conv2D(filters=128, kernel_size=3, activation='relu')(branch_d)
branch_d=Conv2D(filters=128, kernel_size=3, activation='relu', strides=2)(branch_d)

output=concatenate([branch_a, branch_b, branch_c, branch_d], axis=-1)

output

model=Model(x, output)

model.summary()





Model: "model_2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input1 (InputLayer)             (None, None, None, 2 0                                            
__________________________________________________________________________________________________
conv2d_20 (Conv2D)              (None, None, None, 1 32896       input1[0][0]                     
__________________________________________________________________________________________________
conv2d_17 (Conv2D)              (None, None, None, 1 536871040   input1[0][0]                     
__________________________________________________________________________________________________
average_pooling2d_3 (AveragePoo (None, None, None, 2 0           input1[0][0]                     
____________________________________________________________________________________________

In [None]:
from keras.datasets import mnist
from keras.layers import Conv2D, Dense, concatenate, Input, AvgPool2D
from keras.models import Model
from keras.optimizers import RMSprop

# imaginary_input
(x_train, y_train), (x_val, y_val)=mnist.load_data()
x_train.shape, y_train.shape

((60000, 28, 28), (60000,))

In [None]:
from keras.applications.inception_v3 import InceptionV3
incp=InceptionV3()
incp.summary()

In [None]:
incp.compile(optimizer='rmsprop', loss='mse', metrics=['acc'])

In [None]:
incp.fit(x_train, y_train, batch_size=128, epochs=20, validation_data=(x_val, y_val) )

ValueError: ignored

# Residual connection

A residual connection consists of making the output of an earlier layer available as
input to a later layer, effectively creating a shortcut in a sequential network. Rather
than being concatenated to the later activation, the earlier output is summed with the
later activation, which assumes that both activations are the same size. If they’re different
sizes, you can use a linear transformation to reshape the earlier activation into the
target shape (for example, a Dense layer without an activation or, for convolutional
feature maps, a 1 × 1 convolution without an activation).


In [None]:
"""
Here’s how to implement a residual connection in Keras when the feature-map
sizes are the same, using identity residual connections. This example assumes the existence
of a 4D input tensor x:
"""
from keras import layers

x=Input(shape=(None,None,128), dtype="float32", name='input1')
y = layers.Conv2D(128, 3, activation='relu', padding='same')(x)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)
y = layers.add([y, x])
model=Model(x,y)
model.summary()

Model: "model_5"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input1 (InputLayer)             (None, None, None, 1 0                                            
__________________________________________________________________________________________________
conv2d_220 (Conv2D)             (None, None, None, 1 147584      input1[0][0]                     
__________________________________________________________________________________________________
conv2d_221 (Conv2D)             (None, None, None, 1 147584      conv2d_220[0][0]                 
__________________________________________________________________________________________________
conv2d_222 (Conv2D)             (None, None, None, 1 147584      conv2d_221[0][0]                 
____________________________________________________________________________________________

In [None]:
"""
And the following implements a residual connection when the feature-map sizes differ,
using a linear residual connection
"""
x=Input(shape=(None,None,256), dtype="float32", name='input1')
y = layers.Conv2D(128, 3, activation='relu', padding='same')(x)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)
y = layers.MaxPooling2D(2, strides=2)(y)
residual = layers.Conv2D(128, 1, strides=2, padding='same')(x)
y = layers.add([y, residual])

model=Model(x,y)
model.summary()

Model: "model_7"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input1 (InputLayer)             (None, None, None, 2 0                                            
__________________________________________________________________________________________________
conv2d_229 (Conv2D)             (None, None, None, 1 295040      input1[0][0]                     
__________________________________________________________________________________________________
conv2d_230 (Conv2D)             (None, None, None, 1 147584      conv2d_229[0][0]                 
__________________________________________________________________________________________________
max_pooling2d_11 (MaxPooling2D) (None, None, None, 1 0           conv2d_230[0][0]                 
____________________________________________________________________________________________

## Vanishing gradients in deep learning
Backpropagation, the master algorithm used to train deep neural networks, works by
propagating a feedback signal from the output loss down to earlier layers. If this feedback
signal has to be propagated through a deep stack of layers, the signal may
become tenuous or even be lost entirely, rendering the network untrainable. This
issue is known as vanishing gradients.
This problem occurs both with deep networks and with recurrent networks over very
long sequences—in both cases, a feedback signal must be propagated through a
long series of operations. You’re already familiar with the solution that the LSTM layer
uses to address this problem in recurrent networks: it introduces a carry track that
propagates information parallel to the main processing track. Residual connections
work in a similar way in feedforward deep networks, but they’re even simpler: they
introduce a purely linear information carry track parallel to the main layer stack, thus
helping to propagate gradients through arbitrarily deep stacks of layers.

## Layer weight sharing
One more important feature of the functional API is the ability to reuse a layer
instance several times. When you call a layer instance twice, instead of instantiating a
new layer for each call, you reuse the same weights with every call. This allows you to
build models that have shared branches—several branches that all share the same
knowledge and perform the same operations. That is, they share the same representations
and learn these representations simultaneously for different sets of inputs.
For example, consider a model that attempts to assess the semantic similarity
between two sentences. The model has two inputs (the two sentences to compare)
and outputs a score between 0 and 1, where 0 means unrelated sentences and 1 means
sentences that are either identical or reformulations of each other. Such a model
could be useful in many applications, including deduplicating natural-language queries
in a dialog system.

In [None]:
from keras import layers
from keras import Input
from keras.models import Model

lstm = layers.LSTM(32)
left_input = Input(shape=(None, 128))
left_output = lstm(left_input)
right_input = Input(shape=(None, 128))
right_output = lstm(right_input)
merged = layers.concatenate([left_output, right_output], axis=-1)
predictions = layers.Dense(1, activation='sigmoid')(merged)
model = Model([left_input, right_input], predictions)
model.fit([left_data, right_data], targets)

## Models as layers
Importantly, in the functional API, models can be used as you’d use layers—effectively,
you can think of a model as a “bigger layer.” This is true of both the Sequential and
Model classes. This means you can call a model on an input tensor and retrieve an output
tensor:
y = model(x)
If the model has multiple input tensors and multiple output tensors, it should be
called with a list of tensors:
y1, y2 = model([x1, x2])
When you call a model instance, you’re reusing the weights of the model—exactly like
what happens when you call a layer instance. Calling an instance, whether it’s a layer
instance or a model instance, will always reuse the existing learned representations of
the instance—which is intuitive.
One simple practical example of what you can build by reusing a model instance is
a vision model that uses a dual camera as its input: two parallel cameras, a few centimeters
(one inch) apart. Such a model can perceive depth, which can be useful in
many applications. You shouldn’t need two independent models to extract visual

In [None]:
from keras import layers
from keras import applications
from keras import Input

xception_base = applications.Xception(weights=None,
include_top=False)
left_input = Input(shape=(250, 250, 3))
right_input = Input(shape=(250, 250, 3))
left_features = xception_base(left_input)
right_input = xception_base(right_input)
merged_features = layers.concatenate([left_features, right_input], axis=-1)
merged_features.shape

TensorShape([None, 8, 8, 4096])

# Keras callbacks and TensorBoard

## Using callbacks to act on a model during training

A
callback is an object (a class instance implementing specific methods) that is passed to
the model in the call to fit and that is called by the model at various points during
training. It has access to all the available data about the state of the model and its performance,
and it can take action: interrupt training, save a model, load a different
weight set, or otherwise alter the state of the model.

Here are some examples of ways you can use callbacks:
 Model checkpointing—Saving the current weights of the model at different points
during training.
 Early stopping—Interrupting training when the validation loss is no longer
improving (and of course, saving the best model obtained during training).
 Dynamically adjusting the value of certain parameters during training—Such as the
learning rate of the optimizer.
 Logging training and validation metrics during training, or visualizing the representations
learned by the model as they’re updated—The Keras progress bar that you’re
familiar with is a callback!

The keras.callbacks module includes a number of built-in callbacks (this is not an
exhaustive list):
keras.callbacks.ModelCheckpoint
keras.callbacks.EarlyStopping
keras.callbacks.LearningRateScheduler
keras.callbacks.ReduceLROnPlateau
keras.callbacks.CSVLogger

### THE MODELCHECKPOINT AND EARLYSTOPPING CALLBACKS
You can use the EarlyStopping callback to interrupt training once a target metric
being monitored has stopped improving for a fixed number of epochs. For instance,
this callback allows you to interrupt training as soon as you start overfitting, thus
avoiding having to retrain your model for a smaller number of epochs. This callback is
typically used in combination with ModelCheckpoint, which lets you continually save
the model during training (and, optionally, save only the current best model so far:
the version of the model that achieved the best performance at the end of an epoch)

In [None]:
from keras.callbacks import EarlyStopping, ModelCheckpoint

callbacks_list=[
                EarlyStopping(monitor='val_loss', patience=1, restore_best_weights=True),
                ModelCheckpoint(filepath='savedModel.h5', monitor='val_loss', save_best_only="True")
]

model.compile(optimizer='rmsprop', metrics=['acc'], loss='binary_cfossentropy')

model.fit(x, y,
        epochs=10,
        batch_size=32,
        callbacks=callbacks_list,
        validation_data=(x_val, y_val))

### THE REDUCELRONPLATEAU CALLBACK
You can use this callback to reduce the learning rate when the validation loss has
stopped improving. Reducing or increasing the learning rate in case of a loss plateau is
is an effective strategy to get out of local minima during training.

In [None]:
callbacks_list = [
                    keras.callbacks.ReduceLROnPlateau(
                    monitor='val_loss'
                    factor=0.1,
                    patience=10,
                    )
]
model.fit(x, y,
        epochs=10,
        batch_size=32,
        callbacks=callbacks_list,
        validation_data=(x_val, y_val))

### WRITING YOUR OWN CALLBACK
If you need to take a specific action during training that isn’t covered by one of the
built-in callbacks, you can write your own callback. Callbacks are implemented by subclassing
the class keras.callbacks.Callback. You can then implement any number
of the following transparently named methods, which are called at various points
during training:
on_epoch_begin
on_epoch_end
on_batch_begin
on_batch_end
on_train_begin
on_train_end
These methods all are called with a logs argument, which is a dictionary containing
information about the previous batch, epoch, or training run: training and validation
metrics, and so on. Additionally, the callback has access to the following attributes:
 self.model—The model instance from which the callback is being called
 self.validation_data—The value of what was passed to fit as validation data
Here’s a simple example of a custom callback that saves to disk (as Numpy arrays) the
activations of every layer of the model at the end of every epoch, computed on the
first sample of the validation set:

In [None]:
import keras
import numpy as np
class ActivationLogger(keras.callbacks.Callback):
    def set_model(self, model):
        self.model = model
        layer_outputs = [layer.output for layer in model.layers]
        self.activations_model = keras.models.Model(model.input, layer_outputs)
    def on_epoch_end(self, epoch, logs=None):
        if self.validation_data is None:
            raise RuntimeError('Requires validation_data.')
        validation_sample = self.validation_data[0][0:1]
        activations = self.activations_model.predict(validation_sample)
        f = open('activations_at_epoch_' + str(epoch) + '.npz', 'w')
        np.savez(f, activations)
        f.close()



# TensorBoard


In [None]:
# text classification on imdb 2000 max features and first 200 words

from keras.layers import Embedding, Conv1D, Dense, MaxPooling1D, GlobalMaxPooling1D
from keras.preprocessing import sequence
import numpy as np
from keras.models import Sequential
from keras.datasets import imdb

max_features=2000
max_len=500

(x_train, y_train), (x_val, y_val) = imdb.load_data(num_words=max_features)
x_train=sequence.pad_sequences(x_train, maxlen=max_len)
x_val=sequence.pad_sequences(x_val, maxlen=max_len)

model=Sequential()
model.add(Embedding(input_dim=max_features, output_dim=128))
model.add(Conv1D(filters=32, kernel_size=7, activation='relu'))
model.add(MaxPooling1D(pool_size=5))
model.add(Conv1D(filters=32, kernel_size=7, activation='relu'))
model.add(GlobalMaxPooling1D())
model.add(Dense(units=1))
model.summary()
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['acc'])

Using TensorFlow backend.


Downloading data from https://s3.amazonaws.com/text-datasets/imdb.npz
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, None, 128)         256000    
_________________________________________________________________
conv1d_1 (Conv1D)            (None, None, 32)          28704     
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, None, 32)          0         
_________________________________________________________________
conv1d_2 (Conv1D)            (None, None, 32)          7200      
_________________________________________________________________
global_max_pooling1d_1 (Glob (None, 32)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 33        
Total params: 291,937
Trainable params: 291,937
No

In [None]:
# Creating a directory for TensorBoard log files
! mkdir my_tensorboard_log_dir

In [None]:
# Creating a directory for TensorBoard log files
from keras.callbacks import TensorBoard

callbacks=[
           TensorBoard(log_dir="my_tensorboard_log_dir", histogram_freq=1, embeddings_freq=1, update_freq='epoch')
]

history=model.fit(x=x_train, y=y_train, batch_size=128, epochs=10, callbacks=callbacks, validation_split=0.2)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 20000 samples, validate on 5000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [None]:
# launching tensorboard server
!tensorboard --logdir=my_tensorboard_log_dir --bind_all

2020-05-27 01:19:11.929438: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
  File "/usr/local/bin/tensorboard", line 5, in <module>
    from tensorboard.main import run_main
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/main.py", line 43, in <module>
    from tensorboard import default
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/default.py", line 40, in <module>
    from tensorboard.plugins.beholder import beholder_plugin_loader
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/plugins/beholder/__init__.py", line 18, in <module>
    import tensorflow
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/__init__.py", line 41, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/__init__.py", line 84, in <module>
    from tensorflow.python import keras