## Getting started with the Keras functional API

The Keras functional API is the way to go for defining complex models, such as multi-output models, directed acyclic graphs, or models with shared layers.

This guide assumes that you are already familiar with the Sequential model.

Let's start with something simple.


### First example: a densely-connected network

1. A layer instance is callable (on a tensor), and it returns a tensor
2. Input tensor(s) and output tensor(s) can then be used to define a Model
3. Such a model can be trained just like Keras Sequential models.


In [2]:
import keras
import numpy as np

In [3]:
from keras.layers import Input, Dense
from keras.models import Model

Let's first create an input tensor : 

In [4]:
inputs = Input((784,))

Now we create layer instance, that are callable on a tensor and returns a tensor

In [5]:
x = Dense(64, activation = 'relu')(inputs)

In [6]:
x = Dense(64, activation = 'relu')(x)

In [7]:
output_layer = Dense(10, activation = 'softmax')(x)

This create a model that includes the input layer and 3 dense layers ( including one output layer).

In [8]:
model = Model(inputs = inputs, outputs = output_layer)

In [9]:
model.compile(loss = 'categorical_crossentropy', metrics= ['accuracy'], optimizer = 'rmsprop')

Define the training and the test dataset and then fit into the model.

In [10]:
#model.fit(features, labels)

### All models are callable, just like the layers 

With the functional API, it is easy to reuse trained models: you can treat any model as if it were a layer, by calling it on a tensor. Note that by calling a model you aren't just reusing the architecture of the model, you are also reusing its weights.

In [11]:
x = Input(shape = (784,))

In [12]:
x

<tf.Tensor 'input_2:0' shape=(?, 784) dtype=float32>

In [13]:
y = model(x)

Returns the 10-way softmax we defined above.

In [14]:
y

<tf.Tensor 'model_1/dense_3/Softmax:0' shape=(?, 10) dtype=float32>

This can allow, for instance, to quickly create models that can process sequences of inputs. You could turn an image classification model into a video classification model, in just one line.

In [15]:
from keras.layers import TimeDistributed

In [16]:
input_sequences = Input((20, 784))

Input tensor for a sequence of 20 timesteps, each containing a 784-dimensional vector.

In [17]:
processed_sequences = TimeDistributed(model)(input_sequences)

This applies to our previous model at every timestep in the input sequences.

The output of the previous model was a 10- way softmax, so the new output will be a sequence of 20 vectors of size 10. 

### Multi-input and multi-output models
 

Here's a good use case for the functional API: models with multiple inputs and outputs. The functional API makes it easy to manipulate a large number of intertwined datastreams.

Let's consider the following model. We seek to predict how many retweets and likes a news headline will receive on Twitter. The main input to the model will be the headline itself, as a sequence of words, but to spice things up, our model will also have an auxiliary input, receiving extra data such as the time of day when the headline was posted, etc. The model will also be supervised via two loss functions. Using the main loss function earlier in a model is a good regularization mechanism for deep models.



Let's implement it with the functional API.



The main input will receive the headline, as a sequence of integers (each integer encodes a word). The integers will be between 1 and 10,000 (a vocabulary of 10,000 words) and the sequences will be 100 words long.

In [2]:
from keras.layers import Dense, Input

Using TensorFlow backend.


In [3]:
from keras.models import Model

In [4]:
from keras.layers import Embedding, LSTM

In [5]:
main_input = Input(shape=(100,), name = 'main_input', dtype = 'int32')

In [8]:
x = Embedding(input_dim = 10000, input_length = 100, output_dim = 512)(main_input)

Embedding layer will encode the input sequence into a sequence of 512-dimensional vector.

In [9]:
lstm_out = LSTM(32)(x)

LSTM will transform the vector sequence into a single vector, containing information about the entire sequence.

In [10]:
auxilliary_output = Dense(1, activation = 'sigmoid', name = 'aux_output')(lstm_out)

This is an auxilliary loss, allowing the LSTM and the Embedding layers to be trained smoothly, even though main loss will be very high.

Now we feed in the auxilliary input and then concatenate it with the LSTM output.

In [12]:
auxilliary_input = Input(shape = (5,), name = 'aux_input')

In [15]:
x = keras.layers.concatenate([lstm_out,auxilliary_input])

Now, we stack a deep-densely connected network on top.

In [17]:
x = Dense(64, activation = 'relu')(x)
x = Dense(64, activation = 'relu')(x)
x = Dense(64, activation = 'relu')(x)

Finally, we add the main logistic regression layer.

In [19]:
main_output = Dense(1, activation = 'sigmoid', name = 'main_output')(x)

This defines a model with two outputs and two inputs.

In [20]:
model = Model(inputs = [main_input,auxilliary_input], outputs = [main_output,auxilliary_output])

We compile the model and assign a weight of 0.2 to the auxiliary loss. To specify different loss_weights or loss for each different output, you can use a list or a dictionary. Here we pass a single loss as the loss argument, so the same loss will be used on all outputs.


In [21]:
model.compile(loss = 'binary_crossentropy', optimizer = 'rmsprop', loss_weights=[1.,0.2])

Once we get the input data, we can pass features and labels as lists into our model:

In [22]:
#model.fit([headline_data, aux_data], [labels1, labels2], epochs = 50, batch_size = 32)

### Shared Layers 

Another good use for the functional API are models that use shared layers. Let's take a look at shared layers.



Let's consider a dataset of tweets. We want to build a model that can tell whether two tweets are from the same person or not (this can allow us to compare users by the similarity of their tweets, for instance).



One way to achieve this is to build a model that encodes two tweets into two vectors, concatenates the vectors and then adds a logistic regression; this outputs a probability that the two tweets share the same author. The model would then be trained on positive tweet pairs and negative tweet pairs.



Because the problem is symmetric, the mechanism that encodes the first tweet should be reused (weights and all) to encode the second tweet. Here we use a shared LSTM layer to encode the tweets.



Let's build this with the functional API. We will take as input for a tweet a binary matrix of shape (280, 256), i.e. a sequence of 280 vectors of size 256, where each dimension in the 256-dimensional vector encodes the presence/absence of a character (out of an alphabet of 256 frequent characters).



In [8]:
from keras.layers import Dense, Input
from keras.layers import LSTM
from keras.models import Model
import keras

In [6]:
tweet_a = Input(shape=(280,256))
tweet_b = Input(shape=(280,256))

To share the same layer accross different inputs, simply instantiate the layer once, then call it on as many inputs as you want.

In [9]:
shared_lstm = LSTM(64)

When we use the same layer multiple times, the weights also gets reused.

In [10]:
encoded_a = shared_lstm(tweet_a)

In [11]:
encoded_b = shared_lstm(tweet_b)

We can then concatenate the two vectors:

In [14]:
merged_vector = keras.layers.concatenate([encoded_a, encoded_b], axis = -1)

In [15]:
predictions = Dense(1, activation = 'sigmoid')(merged_vector)

In [18]:
model = Model(inputs = [tweet_a, tweet_b], outputs = predictions)

In [19]:
model.compile(loss = 'binary_crossentropy', optimizer = 'rmsprop', metrics = ['accuracy'])

Now generate the data for both the inputs and fit into the model:

In [20]:
#model.fit([data_a, data_b], labels , epochs = 10)

###  The concept of layer "node"


Whenever you are calling a layer on some input, you are creating a new tensor (the output of the layer), and you are adding a "node" to the layer, linking the input tensor to the output tensor. When you are calling the same layer multiple times, that layer owns multiple nodes indexed as 0, 1, 2...

In [21]:
a = Input(shape=(280, 256))

lstm = LSTM(32)
encoded_a = lstm(a)

assert lstm.output == encoded_a

Not so if the layer has multiple inputs:



In [None]:
a = Input(shape=(280, 256))
b = Input(shape=(280, 256))

lstm = LSTM(32)
encoded_a = lstm(a)
encoded_b = lstm(b)

lstm.output

Okay then. The following works:



In [24]:
assert lstm.get_output_at(0) == encoded_a
assert lstm.get_output_at(1) == encoded_b


The same is true for the properties input_shape and output_shape: as long as the layer has only one node, or as long as all nodes have the same input/output shape, then the notion of "layer output/input shape" is well defined, and that one shape will be returned by layer.output_shape/layer.input_shape. But if, for instance, you apply the same Conv2D layer to an input of shape (32, 32, 3), and then to an input of shape (64, 64, 3), the layer will have multiple input/output shapes, and you will have to fetch them by specifying the index of the node they belong to:

In [38]:
a = Input(shape = (32,32,3))
b = Input(shape = (64,64,3))

In [39]:
from keras.layers import Conv2D
conv = Conv2D(16,(3,3), padding = 'same')
conved_a = conv(a)

Now the following assertion is correct : 

In [40]:
assert conv.input_shape==(None, 32,32,3)

Let's add another input node:

In [41]:
conved_b = conv(b)

In [43]:
#assert conv.input_shape ==(None, 32,32,3)

The above assertion won't work due to multiple input shapes. Instead we use this:

In [45]:
assert conv.get_input_shape_at(0)==(None, 32,32,3)
assert conv.get_input_shape_at(1)==(None, 64,64,3)

## Save a Keras Model 

In [1]:
from keras.models import load_model_model

In [12]:
model.save('first_ex.h5')