## Introduction to functional API

Let's start with a minimal example that shows side by side a simple Sequential model and its
equivalent in the functional API

In [1]:
from keras.models import Sequential, Model
from keras import layers
from keras import Input
import numpy as np

Using TensorFlow backend.


In [2]:
seq_model = Sequential()
seq_model.add(layers.Dense(32, activation = 'relu', input_shape = (64, )))
seq_model.add(layers.Dense(32, activation = 'relu'))
seq_model.add(layers.Dense(10, activation = 'softmax'))

In [3]:
input_tensor = Input(shape = (64, ))
x = layers.Dense(32, activation = 'relu')(input_tensor)
x = layers.Dense(32, activation = 'relu')(x)
output_tensor = layers.Dense(10, activation = 'softmax')(x)
model = Model(input_tensor, output_tensor)
model.summary()

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 64)                0         
_________________________________________________________________
dense_4 (Dense)              (None, 32)                2080      
_________________________________________________________________
dense_5 (Dense)              (None, 32)                1056      
_________________________________________________________________
dense_6 (Dense)              (None, 10)                330       
Total params: 3,466
Trainable params: 3,466
Non-trainable params: 0
_________________________________________________________________


In [4]:
model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy')

In [5]:
x_train = np.random.random((1000,64))
y_train = np.random.random((1000,10))

In [6]:
model.fit(x_train, y_train)

Epoch 1/1


<keras.callbacks.callbacks.History at 0x7f85be6904d0>

In [7]:
score = model.evaluate(x_train, y_train)
score



17.862804656982423

In [8]:
predictions = model.predict(x_train)
predictions[:2, ]

array([[0.03192253, 0.03139446, 0.53079146, 0.00996405, 0.00749125,
        0.27539852, 0.00253185, 0.00187797, 0.100104  , 0.00852388],
       [0.02828155, 0.04114874, 0.5928531 , 0.01042755, 0.00764882,
        0.22347376, 0.00261179, 0.00236382, 0.07790326, 0.01328758]],
      dtype=float32)

## Multi-input models
Following is an example of how you can build such a model with the functional API. You set up two independent brances, encoding the tect input and the question input as representation vectors; then, concatenate these vectors; and finally, add a softmax classifier on top of the concatenated representations.

In [9]:
# Functional API implementation of a two-input question-answering model
from keras.models import Model
from keras import layers
from keras import Input
import numpy as np

In [10]:
text_vocabulary_size = 10000
question_vocabulary_size = 10000
answer_vocabulary_size = 500

In [11]:
text_input = Input(shape = (None, ), dtype = 'int32', name = 'text')
embedded_text = layers.Embedding(64, text_vocabulary_size)(text_input)
encoded_text = layers.LSTM(32)(embedded_text)

In [12]:
question_input = Input(shape = (None, ), dtype = 'int32', name = 'question')
embedded_question = layers.Embedding(64, question_vocabulary_size)(question_input)
encoded_question = layers.LSTM(16)(embedded_question)

In [13]:
concatenated = layers.concatenate([encoded_text, encoded_question], axis = -1)

In [14]:
answer = layers.Dense(answer_vocabulary_size, activation = 'softmax')(concatenated)

In [15]:
model = Model([text_input, question_input], answer)
model.compile(optimizer = 'rmsprop',
              loss = 'categorical_crossentropy',
              metrics = ['acc'])

How do you train this two-input model? There are two possible APIs: you can feed the model a list of Numpy arrays as inputs, or you can feed it a dictionary that maps unput names to Numpy arrays. Naturally, the latter option is available only if you give names to your inputs

In [18]:
num_samples = 1000
max_length = 100

text = np.random.randint(1, text_vocabulary_size, size = (num_samples, max_length))
question = np.random.randint(1, answer_vocabulary_size, size = (num_samples, max_length))
answers = np.random.randint(0, 2, size = (num_samples, answer_vocabulary_size))

In [19]:
model.fit([text, question], answers, epochs = 10, batch_size = 128)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.callbacks.History at 0x7f85bc0bec10>

In [20]:
model.fit({'text': text, 'question' : question}, answers, epochs = 10, batch_size = 128)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.callbacks.History at 0x7f85bc13a0d0>

## Multi-output model

In [21]:
from keras import layers
from keras import Input
from keras.models import Model

In [22]:
vocabulary_size = 50000
num_income_groups = 10

In [23]:
posts_input = Input(shape = (None, ), dtype = 'int32', name = 'posts')
embedded_posts = layers.Embedding(256, vocabulary_size)(posts_input)
x = layers.Conv1D(128, 5, activation = 'relu')(embedded_posts)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.GlobalMaxPooling1D()(x)
x = layers.Dense(128, activation='relu')(x)

#Output. We are naming them
age_prediction = layers.Dense(1, name='age')(x)
income_prediction = layers.Dense(num_income_groups, activation = 'softmax', name = 'income')(x)
gender_prediction = layers.Dense(1, activation = 'sigmoid', name = 'gender')(x)

model = Model(posts_input, [age_prediction, income_prediction, gender_prediction])

Importantly, training such a model requires the ability to specify different loss function for different heads of th network: for instance, age prediction is a scalar regression task, but gender prediction is a binary classification task, requiring a different training procedure.
But because gradient descent requires you to minimize a scalar, you must combine different losses into a single value in oorder to train the model. The simplest way to combine different losses is to sum them all.
In keras, you can use either a list or a dictionary of losses in compile t ospecify different objects for different outputs; the resulting loss values are summed into a global loss, which is minimized during traning.

In [26]:
#Compilation options of multi-output model: multiple losses
model.compile(optimizer = 'rmsprop', loss = ['mse', 'categorical_crossentropy', 'binary_crossentropy'])

#model.compile(optimizer = 'rmsprop', 
#              loss = {
#                  'age':'mse', 
#                  'income':'categorical_crossentropy', 
#                  'gender':'binary_crossentropy'})

Note that very imbalanced loss contributions will cause the model representation to be optimized preferentially for the task with the largest individual loss, at the expense of the other tasks.
To remedy this, you can assign different levels of importance to the loss values in their contribution to the final loss. This is useful in particular if the losses' values use different scales.

In [29]:
model.compile(optimizer = 'rmsprop',
              loss = ['mse', 'categorical_crossentropy', 'binary_crossentropy'],
              loss_weights = [0.25,1.,10.])

#model.compile(optimizer = 'rmsprop', 
#              loss = {
#                  'age':'mse', 
#                  'income':'categorical_crossentropy', 
#                  'gender':'binary_crossentropy'},
#             loss_weights = {'age':0.25,
#                             'income':1.,
#                             'gender':10.})

In [None]:
# All data is assumed to be numpy arrays
#model.fit(posts, [age_targets, income_targets, gender_targets], epochs = 10, batch_size = 64)
#model.fit(posts, {'age':age_targets, 'income':income_targets, 'gender':gender_targets}, epochs = 10, batch_size = 64)