# A simple Recurrent Neural Network implementation in TensorFlow

In this notebook, we implement simple Recurrent Neural Networks (RNNs) using TensorFlow. 
We follow the RNN tutorial on [tensorflow.org](https://www.tensorflow.org/guide/keras/rnn).

RNNs are networks which maintain an internal state, which is updated with each new data. 
They are thus well suited to learn on and/or analyze time series.

## Setup

Import the packages we will need:

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

## A simple model

Let us build a first very simple RNN:

In [2]:
# define a sequential model
model = keras.Sequential()

# embedding layer with input size 1000 and output size 64
model.add(layers.Embedding(input_dim=1000, output_dim=64))

# Long Short-Term Memory (LSTM) layer with 128 units
model.add(layers.LSTM(128))

# dense layer with 10 units
model.add(layers.Dense(10))

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, None, 64)          64000     
_________________________________________________________________
lstm (LSTM)                  (None, 128)               98816     
_________________________________________________________________
dense (Dense)                (None, 10)                1290      
Total params: 164,106
Trainable params: 164,106
Non-trainable params: 0
_________________________________________________________________


## A simple encoder-decoder model

When working with an encoder-decoder structure, one generally wants to save the internal state of the encoder as well as its output and use it to initialise the decoder. 

Note that an LSTM layer has two internal state vectors while a GRU unit has only one.

In [3]:
encoder_vocab = 1000
decoder_vocab = 2000
internal_dimension = 64

# a simple encoder
encoder_input = layers.Input(shape=(None,))
encoder_embedded = layers.Embedding(input_dim=encoder_vocab, output_dim=internal_dimension)(encoder_input)
output, state_h, state_c = layers.LSTM(internal_dimension, return_state=True, name="encoder")(encoder_embedded)
encoder_state = [state_h, state_c]

# a simple decoder
decoder_input = layers.Input(shape=(None,))
decoder_embedded = layers.Embedding(input_dim=decoder_vocab, output_dim=internal_dimension)(decoder_input)
decoder_output = layers.LSTM(internal_dimension, name="decoder")(decoder_embedded, initial_state=encoder_state)
output = layers.Dense(10)(decoder_output)

# full model
model = keras.Model([encoder_input, decoder_input], output)
model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, None)]       0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, None)]       0                                            
__________________________________________________________________________________________________
embedding_1 (Embedding)         (None, None, 64)     64000       input_1[0][0]                    
__________________________________________________________________________________________________
embedding_2 (Embedding)         (None, None, 64)     128000      input_2[0][0]                    
______________________________________________________________________________________________

## Cross-batch statefulness

When dealing with long sequences which do not fit in one batch, it is useful to maintain the internal state of the network accross batches. 
This can be achived by passing the `stateful=True` argument to RNN layers.

Here is a simple example:

In [4]:
# data split into three batches
paragraph1 = np.random.random((20, 10, 50)).astype(np.float32)
paragraph2 = np.random.random((20, 10, 50)).astype(np.float32)
paragraph3 = np.random.random((20, 10, 50)).astype(np.float32)

# an LSTM layer eeping its internal state across batches
lstm_layer = layers.LSTM(64, stateful=True)

# inference
output = lstm_layer(paragraph1)
output = lstm_layer(paragraph2)
output = lstm_layer(paragraph3)

# save the internal state for possible future re-use
current_state = lstm_layer.states

# reset the internal state
lstm_layer.reset_states()

## An RNN for MNIST

Let us train and evaluate a RNN on the MNIST dataset. 
We first define a function to build an RNN:

In [5]:
batch_size = 64
units = 64
input_dim = 28
output_size = 10

# build the RNN model
def build_model():
    lstm_layer = keras.layers.LSTM(units, input_shape=(None, input_dim))
    model = keras.models.Sequential([
        lstm_layer, 
        keras.layers.BatchNormalization(),
        keras.layers.Dense(output_size),
    ])
    return model


Let us load the MNIST dataset and normalize it:

In [6]:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train, x_test = x_train / 255., x_test / 255.

Build and train the model, using the sparese categorical crossentropy loss function:

In [7]:
model = build_model()
model.compile(
    loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer = "sgd",
    metrics = ["accuracy"],
)
model.fit(x_train, y_train, validation_data=(x_test, y_test), batch_size=batch_size, epochs=1)



<tensorflow.python.keras.callbacks.History at 0x7fb0d47efd30>

## RNNs with lists or dictionaries of inputs

### Define a custom cell

In [8]:
class NestedCell(keras.layers.Layer):
    
    def __init__(self, unit_1, unit_2, unit_3, **kwargs):
        self.unit_1 = unit_1
        self.unit_2 = unit_2
        self.unit_3 = unit_3
        self.state_size = [tf.TensorShape([unit_1]), tf.TensorShape([unit_2, unit_3])]
        self.output_size = [tf.TensorShape([unit_1]), tf.TensorShape([unit_2, unit_3])]
        super(NestedCell, self).__init__(**kwargs)
        
    def build(self, input_shapes):
        
        # input_shape should contain two items: [(batch, i1), (batch, i2, i3)]
        i1 = input_shapes[0][1]
        i2 = input_shapes[1][1]
        i3 = input_shapes[1][2]
        
        self.kernel_1 = self.add_weight(
            shape=(i1, self.unit_1), 
            initializer="uniform",
            name="kernel_1"
        )

        self.kernel_2_3 = self.add_weight(
            shape=(i2, i3, self.unit_2, self.unit_3), 
            initializer="uniform",
            name="kernel_2_3"
        )
        
    def call(self, inputs, states):
        input_1, input_2 = tf.nest.flatten(inputs)
        s1, s2 = states
        
        output_1 = tf.matmul(input_1, self.kernel_1)
        output_2_3 = tf.einsum("bij,ijkl->bkl", input_2, self.kernel_2_3)
        state_1 = s1 + output_1
        state_2_3 = s2 + output_2_3
        
        output = (output_1, output_2_3)
        new_states = (state_1, state_2_3)
        
        return output, new_states
    
    def get_config(self):
        return {
            "unit_1": self.unit_1,
            "unit_2": self.unit_2,
            "unit_3": self.unit_3,
        }

### Build and train a model

Build a model:

In [9]:
unit_1 = 10
unit_2 = 20
unit_3 = 30

i1 = 32
i2 = 64
i3 = 32
batch_size = 64
num_batches = 10
timestep = 50

cell = NestedCell(unit_1, unit_2, unit_3)
rnn = keras.layers.RNN(cell)

input_1 = keras.Input((None, i1))
input_2 = keras.Input((None, i2, i3))

outputs = rnn((input_1, input_2))

model = keras.models.Model([input_1, input_2], outputs)

model.compile(optimizer="adam", loss="mse", metrics=["accuracy"])

Train it on randomly-generated data:

In [10]:
input_1_data = np.random.random((batch_size * num_batches, timestep, i1))
input_2_data = np.random.random((batch_size * num_batches, timestep, i2, i3))
target_1_data = np.random.random((batch_size * num_batches, unit_1))
target_2_data = np.random.random((batch_size * num_batches, unit_2, unit_3))
input_data = [input_1_data, input_2_data]
target_data = [target_1_data, target_2_data]

model.fit(input_data, target_data, batch_size=batch_size)



<tensorflow.python.keras.callbacks.History at 0x7fb0d42465e0>