The **Sequential model** makes the assumption that the network has exactly one input and exactly one output, and that it consists of a linear stack of layers.  

But this set of assumptions is too inflexible in a number of cases. Some networks require **several independent inputs** (**Multi-input Models**), others require **multiple outputs**, and some networks have internal branching between layers that makes them look like **graphs of layers** rather than linear stacks of layers.

## Introduction to the functional API

In the functional API, you directly manipulate tensors, and you use layers as functions that take tensors and return tensors (hence, the name functional API):

In [1]:
from keras import Input, layers

input_tensor = Input(shape=(32, ))              # a tensor

dense = layers.Dense(32, activation='relu')     # a layer is a function

output_tensor = dense(input_tensor)             # a layer may be called on a tensor, and it returns a tensor

Using TensorFlow backend.


In [2]:
# Sequential vs Functional API
from keras.models import Sequential, Model
from keras import layers
from keras import Input

# Sequential
seq_model = Sequential()
seq_model.add(layers.Dense(32, activation='relu', input_shape=(64, )))
seq_model.add(layers.Dense(32, activation='relu'))
seq_model.add(layers.Dense(10, activation='softmax'))

# Functional equivalent
input_tensor = Input(shape=(64,))
x = layers.Dense(32, activation='relu')(input_tensor)
x = layers.Dense(32, activation='relu')(x)
output_tensor = layers.Dense(10, activation='softmax')(x)

model = Model(input_tensor, output_tensor)          # the Model class turns an input tensor and output tensor into a model

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         (None, 64)                0         
_________________________________________________________________
dense_5 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_6 (Dense)              (None, 32)                1056      
_________________________________________________________________
dense_7 (Dense)              (None, 10)                330       
Total params: 3,466
Trainable params: 3,466
Non-trainable params: 0
_________________________________________________________________


In [3]:
seq_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_2 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_3 (Dense)              (None, 32)                1056      
_________________________________________________________________
dense_4 (Dense)              (None, 10)                330       
Total params: 3,466
Trainable params: 3,466
Non-trainable params: 0
_________________________________________________________________


The only part that may seem a bit magical at this point is instantiating a Model object using only an input tensor and an output tensor. Behind the scenes, Keras retrieves every layer involved in going from input_tensor to output_tensor, bringing them together into a graph-like data structure—a Model. Of course, the reason it works is that output_tensor was obtained by repeatedly transforming input_tensor. If you tried to build a model from inputs and outputs that weren’t related, you’d get a RuntimeError:

In [4]:
unrelated_input = Input(shape=(32,))
bad_model = model = Model(unrelated_input, output_tensor)

RuntimeError: Graph disconnected: cannot obtain value for tensor Tensor("input_2:0", shape=(?, 64), dtype=float32) at layer "input_2". The following previous layers were accessed without issue: []

In [5]:
# the compiling, evaluating such an instance of Model is the same

# compile
model.compile(optimizer='rmsprop',
             loss='categorical_crossentropy')

In [6]:
import numpy as np
x_train = np.random.random((1000, 64))
y_train = np.random.random((1000, 10))

# train
model.fit(x_train, y_train, epochs=10, batch_size=128)

# evaluate
score = model.evaluate(x_train, y_train)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


## Multi-input Models

A question-answering model. Such models usually have two inputs: a natural-language question and a test snippet (such as a news article) providing information to be used for answerining the question.  

The model must then produce an answer: in the simplest possible setup, this is a one-word answer obtained via a softmax over some predefined vocabulary.

Following is an example of how you can build such a model with the functional API. You set up two independent branches, encoding the text input and the question input as representation vectors; then, concatenate these vectors; and finally, add a softmax classifier on top of the concatenated representations.

**Note**: Here is a valuable note for anyone reading this book and trying to run the exercised. In the embedding layers below, the book gives the following lines:
                                             
                                              ...
                embedded_text = layers.Embedding(64, text_vocabulary_size)(text_input)
                                              ...
                embedded_question = layers.Embedding(32, question_vocabulary_size)(question_input)
                
While the correct is the following:
                            
                                              ...
                embedded_text = layers.Embedding(text_vocabulary_size)(text_input, 64)
                                              ...
                embedded_question = layers.Embedding(question_vocabulary_size, 32)(question_input)
                
**Where I found the solution**: I found how to fix this with the help of the following page:  https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/ , which says:

The embedding must specify 3 arguments:

* **input_dim**: This is the size of the vocabulary in the text data. For example, if your data is integer encoded to values between 0-10, then the size of the vocabulary would be 11 words.
* **output_dim**: This is the size of the vector space in which words will be embedded. It defines the size of the output vectors from this layer for each word. For example, it could be 32 or 100 or even larger. Test different values for your problem.
* **input_length**: This is the length of input sequences, as you would define for any input layer of a Keras model. For example, if all of your input documents are comprised of 1000 words, this would be 1000.

So in the book the places of **input_dim** and **output_dim** are switched.

In [31]:
from keras.models import Model
from keras import layers
from keras import Input

text_vocabulary_size = 10000
question_vocabulary_size = 10000
answer_vocabulary_size = 500

text_input = Input(shape=(None,),                                            
                   dtype='int32', 
                   name='text')                                        # the text input is a variable-length sequence of integers. Note that you can optionally name the inputs.
embedded_text = layers.Embedding(text_vocabulary_size, 64)(text_input) # embeds the inputs into a sequence of vectors of size 64
encoded_text = layers.LSTM(32)(embedded_text)                          # encodes the vectors in a single vector via an LSTM

question_input = Input(shape=(None,), 
                       dtype='int32',
                      name='question')                                 # Same process (with different layer instance) for the question
embedded_question = layers.Embedding(question_vocabulary_size, 32)(question_input)
encoded_question = layers.LSTM(16)(embedded_question)

concatenated = layers.concatenate([encoded_text, encoded_question], 
                                 axis=-1)                              # concatenates the encoded question and encoded text
answer = layers.Dense(answer_vocabulary_size,
                     activation='softmax')(concatenated)                # Adds a softmax classifier on top

model = Model([text_input, question_input], answer)       # At model instantiation, you specify the two inputs and the output

# compile the model
model.compile(optimizer='rmsprop',
             loss='categorical_crossentropy',
             metrics=['acc'])

There are two possible APIs: feed the model a list of Numpy arrays as inputs, or feed it a dictionary that maps input names to Numpy arrays (if you give names to your inputs).

In [32]:
# Feeding data to a multi-input model
import numpy as np

num_samples = 1000
max_length = 100

text = np.random.randint(1, text_vocabulary_size,
                        size=(num_samples, max_length))                   # Generates dummy Numpy data
question = np.random.randint(1, question_vocabulary_size,
                            size=(num_samples, max_length))
answers = np.random.randint(0, 1, 
                           size=(num_samples, answer_vocabulary_size))    # answers are one-hot encoded, not integers

In [34]:
# fitting using a list of inputs
model.fit([text, question], answers, epochs=10, batch_size=128)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x12df1f390>

In [35]:
# fitting using a dictionary of inputs (only if inputs are named)
model.fit({'text': text, 'question': question}, answers,
         epochs=10, batch_size=128)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1335d2da0>

## Multi-output models

In the same way, you can use the functional API to build models with multiple outputs (or multiple **heads**). A simple example is a network that attempts to simultaneously predict different properties of the data, such as a network that takes as input a series of social media posts from a single anonymous person and tries to predict attributes of that person, such as age, gender, and income level.

**Note:** Below is the same error in Embedding step. We discussed it above. Here are the wrong and right ways:

             embedded_posts = layers.Embedding(256, vocabulary_size)(posts_input)   # wrong
             embedded_posts = layers.Embedding(vocabulary_size, 256)(posts_input)   # right



  

**Note**: I commented the second set of convolutional layers because with that layers the computed output would be negative.

In [87]:
# functional API implementation of a three-output model
from keras import layers
from keras import Input
from keras.models import Model

vocabulary_size = 50000
num_income_groups = 10

posts_input = Input(shape=(None,), dtype='int32', name='posts')
embedded_posts = layers.Embedding(vocabulary_size, 256)(posts_input)
x = layers.Conv1D(128, 5, activation='relu')(embedded_posts)
x = layers.MaxPooling1D(5)(x)
#x = layers.Conv1D(256, 5, activation='relu')(x)
#x = layers.Conv1D(256, 5, activation='relu')(x)
#x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.GlobalMaxPooling1D()(x)
x = layers.Dense(128, activation='relu')(x)

age_prediction = layers.Dense(1, name='age')(x)                      # Note that output layers are given names
income_prediction = layers.Dense(num_income_groups,
                                activation='softmax',
                                name='income')(x)
gender_prediction = layers.Dense(1, activation='sigmoid', name='gender')(x)

model = Model(posts_input,
             [age_prediction, income_prediction, gender_prediction])

In [88]:
# Compilation options of a multi-output model: multiple losses
## the different loss values are summed into a global loss, which is minimized during training.
model.compile(optimizer='rmsprop',
             loss=['mse', 'categorical_crossentropy', 'binary_crossentropy'])

# equivalent (possible only of you give names to the output layers)
model.compile(optimizer='rmsprop',
             loss={'age': 'mse',
                  'income': 'categorical_crossentropy',
                  'gender': 'binary_crossentropy'})

Note that very **imbalanced loss contributions will cause the model representations to be optimized preferentially for the task with the largest individual loss**, at the expense of the other tasks. To remedy this, you can assign different levels of importance to the loss values in their contribution to the final loss. This is useful in particular if the losses’ values use different scales. For instance, the mean squared error (MSE) loss used for the age-regression task typically takes a value around 3–5, whereas the cross- entropy loss used for the gender-classification task can be as low as 0.1. In such a situa- tion, to balance the contribution of the different losses, you can assign a weight of 10 to the crossentropy loss and a weight of 0.25 to the MSE loss.

In [89]:
# Compiling options of a multi-output model: loss weighting
model.compile(optimizer='rmsprop',
             loss=['mse', 'categorical_crossentropy', 'binary_crossentropy'],
             loss_weights=[0.25, 1., 10.])

# Equivalent (possible only if you giva nems to the output layers)
model.compile(optimizer='rmsprop',
             loss={'age': 'mse',
                  'income': 'categorical_crossentropy',
                  'gender': 'binary_crossentropy'},
             loss_weights={'age': 0.25,
                          'income': 1.,
                          'gender': 10.})

**Note:** An example data is not provided by the book so I generated some dummy data to run the model.

In [90]:
import numpy as np

num_samples = 1000
max_length = 100

posts = np.random.randint(1, vocabulary_size,
                        size=(num_samples, max_length))                   # Generates dummy Numpy data

age_targets = np.random.randint(1, 100,
                       size=(num_samples, 1))

income_targets = np.random.randint(0, 1,
                          size=(num_samples, num_income_groups))

gender_targets = np.random.randint(0, 1,
                          size=(num_samples, 1))

In [92]:
# Feeding data to a multi-output model (Numpy data or a dictionary of arrays)
model.fit(posts, [age_targets, income_targets, gender_targets],
         epochs=10, batch_size=64)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x13cc9b908>