# Long short-term memory (LSTM)

In [None]:
import numpy as np
import tensorflow as tf

We want to create a network that has only one LSTM cell. We have to pass 2 elements to LSTM, the <b>prv_output</b> and <b>prv_state</b>, so called, <b>h</b> and <b>c</b>. Therefore, we initialize a state vector, <b>state</b>.  Here, <b>state</b> is a tuple with 2 elements, each one is of size \[1 x 4], one for passing prv_output to next time step, and another for passing the prv_state to next time stamp.

\\

Queremos crear una red que tenga solo una celda LSTM. Tenemos que pasar 2 elementos a LSTM, prv_output y prv_state, llamados h y c. Por lo tanto, inicializamos un vector de estado, state. Aquí, state es una tupla con 2 elementos, cada uno de tamaño [1 x 4], uno para pasar prv_output al siguiente paso de tiempo y otro para pasar prv_state al siguiente sello de tiempo.


In [2]:
LSTM_CELL_SIZE = 4  # output size (dimension), which is same as hidden size in the cell

state = (tf.zeros([1,LSTM_CELL_SIZE]),)*2
state

(<tf.Tensor: shape=(1, 4), dtype=float32, numpy=array([[0., 0., 0., 0.]], dtype=float32)>,
 <tf.Tensor: shape=(1, 4), dtype=float32, numpy=array([[0., 0., 0., 0.]], dtype=float32)>)

In [4]:
lstm = tf.keras.layers.LSTM(LSTM_CELL_SIZE, return_sequences=True, return_state=True)

lstm.states=state

#As we can see, the states has 2 parts, the new state c, and also the output h.
print(lstm.states)

(<tf.Tensor: shape=(1, 4), dtype=float32, numpy=array([[0., 0., 0., 0.]], dtype=float32)>, <tf.Tensor: shape=(1, 4), dtype=float32, numpy=array([[0., 0., 0., 0.]], dtype=float32)>)


Let define a sample input.

In [13]:
#Batch size x time steps x features.
sample_input = tf.constant([[3,2,2,2,2,2]],dtype=tf.float32)

batch_size = 1
sentence_max_length = 1
n_features = 6

new_shape = (batch_size, sentence_max_length, n_features)

inputs = tf.constant(np.reshape(sample_input, new_shape), dtype = tf.float32)

Now, we can pass the input to lstm_cell, and check the new state:

In [14]:
output, final_memory_state, final_carry_state = lstm(inputs)

print('Output shape: ', tf.shape(output))
print('Output :', output)

print('Memory shape: ', tf.shape(final_memory_state))
print('Memory : ', final_memory_state)

print('Carry state shape: ', tf.shape(final_carry_state))
print('Carry state: ', final_carry_state)

Output shape:  tf.Tensor([1 1 4], shape=(3,), dtype=int32)
Output : tf.Tensor([[[-0.15524554  0.0623856  -0.02357852  0.31590742]]], shape=(1, 1, 4), dtype=float32)
Memory shape:  tf.Tensor([1 4], shape=(2,), dtype=int32)
Memory :  tf.Tensor([[-0.15524554  0.0623856  -0.02357852  0.31590742]], shape=(1, 4), dtype=float32)
Carry state shape:  tf.Tensor([1 4], shape=(2,), dtype=int32)
Carry state:  tf.Tensor([[-0.25178537  0.07693953 -0.09550406  0.6387666 ]], shape=(1, 4), dtype=float32)


In [None]:
# Stacked LSTM

What about if we want to have a RNN with stacked LSTM? For example, a 2-layer LSTM. In this case, the output of the first layer will become the input of the second.

Lets create the stacked LSTM cell:


In [16]:
cells = []

Creating the first layer LTSM cell.

In [17]:
LSTM_CELL_SIZE_1 = 4 #4 hidden nodes
cell1 = tf.keras.layers.LSTMCell(LSTM_CELL_SIZE_1)
cells.append(cell1)

Creating the second layer LTSM cell.

In [18]:
LSTM_CELL_SIZE_2 = 5 #5 hidden nodes
cell2 = tf.keras.layers.LSTMCell(LSTM_CELL_SIZE_2)
cells.append(cell2)

To create a multi-layer LTSM we use the <b>tf.keras.layers.StackedRNNCells</b> function, it takes in multiple single layer LTSM cells to create a multilayer stacked LTSM model.

In [19]:
stacked_lstm =  tf.keras.layers.StackedRNNCells(cells)

In [20]:
#Now we can create the RNN from stacked_lstm:
lstm_layer= tf.keras.layers.RNN(stacked_lstm ,return_sequences=True, return_state=True)

In [21]:
#Batch size x time steps x features.
sample_input = [[[1,2,3,4,3,2], [1,2,1,1,1,2],[1,2,2,2,2,2]],[[1,2,3,4,3,2],[3,2,2,1,1,2],[0,0,0,0,3,2]]]
sample_input

batch_size = 2
time_steps = 3
features = 6
new_shape = (batch_size, time_steps, features)

x = tf.constant(np.reshape(sample_input, new_shape), dtype = tf.float32)

In [22]:
output, final_memory_state, final_carry_state  = lstm_layer(x)

In [23]:
print('Output shape: ', tf.shape(output))
print('Output : ', output)

print('Memory shape: ', tf.shape(final_memory_state))
print('Memory : ', final_memory_state)

print('Carry state shape: ', tf.shape(final_carry_state))
print('Carry state : ', final_carry_state)

Output shape:  tf.Tensor([2 3 5], shape=(3,), dtype=int32)
Output :  tf.Tensor(
[[[-0.02477117 -0.00665748 -0.01562491 -0.02800175  0.05869427]
  [-0.06074494  0.00439783 -0.02671911 -0.08622565  0.13191596]
  [-0.09918674  0.01481524 -0.04344293 -0.13013206  0.18972598]]

 [[-0.02477117 -0.00665748 -0.01562491 -0.02800175  0.05869428]
  [-0.05941437  0.01052649 -0.0226018  -0.09334762  0.13623793]
  [-0.06573982  0.00519138 -0.03357539 -0.08612452  0.15062016]]], shape=(2, 3, 5), dtype=float32)
Memory shape:  tf.Tensor([2 2 4], shape=(3,), dtype=int32)
Memory :  [<tf.Tensor: shape=(2, 4), dtype=float32, numpy=
array([[ 0.78775716, -0.37040275,  0.30532894,  0.03228419],
       [ 0.38800165, -0.12782474,  0.14623934,  0.13303787]],
      dtype=float32)>, <tf.Tensor: shape=(2, 4), dtype=float32, numpy=
array([[ 1.4996562 , -1.1974225 ,  0.78079283,  0.24307354],
       [ 1.5502831 , -0.6031351 ,  1.0484042 ,  0.28928846]],
      dtype=float32)>]
Carry state shape:  tf.Tensor([2 2 5], sh

As you see, the output is of shape (2, 3, 5), which corresponds to our 2 batches, 3 elements in our sequence, and the dimensionality of the output which is 5.