## Bi-Directional LSTM exercise

Bi-Directional LSTMs focus on the problem of getting the most out of the input sequence by stepping through input time steps in both forward and backward directions. In practice, this means that the neural network architecture invovles duplicating the first recurrent layer in the network so that there are now two layers side by side, providing the input sequence as input to the first layer and providing a reversed copy of the input sequence to the second.

This approach was developed some time ago as a general approach for improving the performance of Recurrent Neural Networks (RNNs).

This approach has been used to great effect with LSTM Recurrent Neural Networks. Providing the entire sequence both forwards and backwards is based on the assumption that entire sequence is made available. The use of providing an input sequence bi-directionally is justified in the domain of speech recognition because there is evidence that in humans, the context of the whole utterance is used to interpret what is being said rather than a linear interpolation.

## Implementation Keras Code

#model = Sequential()
#model.add(Bidirectional(LSTM(...),input_shape(timesteps,features)))

Bidirectional LSTMs are supported in Keras via the Bidirectional layer wrapper that essentially merges the output from two parallel LSTMs, one with input processed forward and one with outputprocessed backwards. Thius wrapper takes a recurrent layer **(the first LSTM layer)** as an argument.

The **Bidirectional** wrapper layer allows me to specify the **merge** mode. That is how the forward and backward outputs should be combined before being passed on to the next layer.

* *'sum'*: The outputs are added together.
* *'mul'*: The outputs are multiplied together.
* *concat'*: The outputs are concatenated together (default behavior), hence providing double the number of outputs to the next layer.
* *'ave'*: The average of the outputs is taken

The default model is to just concatenate, and this method is frequently used in the studies of Bi-Directional LSTMs. However in general, it might be a good idea to test each of the merge modes on your problem to see if you can improve upon the concatenate option.

In [3]:
import numpy as np
from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense
from keras.layers import TimeDistributed
from keras.layers import Bidirectional

In [4]:
# create a cumulative sum sequence
def get_sequence(n_timesteps):
    X = np.array([np.random.random_sample() for _ in range(n_timesteps)])
  # calculate cut-off value to change class values
    limit = n_timesteps/4.0
  # determine the class outcome for each item in cumulative sequence
    y = np.array([0 if x < limit else 1 for x in np.cumsum(X)])
    return X, y

In [5]:
# create multiple samples of cumulative sum sequences
def get_sequences(n_sequences, n_timesteps):
    seqX, seqY = list(), list()
  # create and store sequences
    for _ in range(n_sequences):
        X, y = get_sequence(n_timesteps)
        seqX.append(X)
        seqY.append(y)
  # reshape input and output for lstm
    seqX = np.array(seqX).reshape(n_sequences, n_timesteps, 1)
    seqY = np.array(seqY).reshape(n_sequences, n_timesteps, 1)
    return seqX, seqY

Now lets see whether the self defined functions work in generating the sequences?

In [6]:
X,y = get_sequences(2,10)
print(X)
print(y)

[[[0.33992229]
  [0.49760616]
  [0.32990696]
  [0.94534795]
  [0.92051324]
  [0.88647614]
  [0.30372564]
  [0.16002192]
  [0.17838608]
  [0.04900133]]

 [[0.02202858]
  [0.72145108]
  [0.49023022]
  [0.21037025]
  [0.83273424]
  [0.71582997]
  [0.77922793]
  [0.86375578]
  [0.02266436]
  [0.07758068]]]
[[[0]
  [0]
  [0]
  [0]
  [1]
  [1]
  [1]
  [1]
  [1]
  [1]]

 [[0]
  [0]
  [0]
  [0]
  [0]
  [1]
  [1]
  [1]
  [1]
  [1]]]


In [7]:
# define problem
n_timesteps = 10
# define LSTM
model = Sequential()
model.add(Bidirectional(LSTM(50, return_sequences=True), input_shape=(n_timesteps, 1))) 
model.add(TimeDistributed(Dense(1, activation='sigmoid'))) 
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['acc']) 
model.summary()

Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
bidirectional_1 (Bidirection (None, 10, 100)           20800     
_________________________________________________________________
time_distributed_1 (TimeDist (None, 10, 1)             101       
Total params: 20,901
Trainable params: 20,901
Non-trainable params: 0
_________________________________________________________________


## Fitting the model

We can use the **get_sequence()** function to generate a large number of random examples on which to fit the model. We can generate a large number of examples, in this case 40,000, store them in memory, and then fit them in one Keras epoch.

In [9]:
# train LSTM
X, y = get_sequences(40000, n_timesteps)
model.fit(X, y, epochs=1, batch_size=10)

Epoch 1/1


<keras.callbacks.callbacks.History at 0x1369c5080>

## Evaluate the model

We can evaluate the model by generating 100 new random sequences and calculating the accuracy of the predictions made by the fit model.

In [11]:
#evaluate LSTM
X, y = get_sequences(100,n_timesteps)
loss, acc = model.evaluate(X ,y, verbose=0)
print('Loss: %f, Accuracy: %f' % (loss, acc*100))

Loss: 0.017961, Accuracy: 99.400002


Running the above code prints both the log loss and accuracy. We can see that the model achieves close to 100% accuracy :)

In [20]:
## Making predictions from the model

for _ in range(20):
    X, y = get_sequences(1,n_timesteps)
    yhat = model.predict_classes(X, verbose=0)
    exp, pred = y.reshape(n_timesteps), yhat.reshape(n_timesteps)
    print('y=%s, yhat=%s, correct=%s' % (exp, pred, np.array_equal(exp,pred)))

y=[0 0 0 0 0 1 1 1 1 1], yhat=[0 0 0 0 0 1 1 1 1 1], correct=True
y=[0 0 0 0 0 0 1 1 1 1], yhat=[0 0 0 0 0 0 1 1 1 1], correct=True
y=[0 0 0 1 1 1 1 1 1 1], yhat=[0 0 0 1 1 1 1 1 1 1], correct=True
y=[0 0 0 0 0 1 1 1 1 1], yhat=[0 0 0 0 0 1 1 1 1 1], correct=True
y=[0 0 0 0 0 0 0 0 1 1], yhat=[0 0 0 0 0 0 0 0 1 1], correct=True
y=[0 0 0 0 1 1 1 1 1 1], yhat=[0 0 0 0 1 1 1 1 1 1], correct=True
y=[0 0 0 1 1 1 1 1 1 1], yhat=[0 0 0 1 1 1 1 1 1 1], correct=True
y=[0 0 0 0 0 0 1 1 1 1], yhat=[0 0 0 0 0 0 1 1 1 1], correct=True
y=[0 0 0 0 1 1 1 1 1 1], yhat=[0 0 0 0 1 1 1 1 1 1], correct=True
y=[0 0 0 1 1 1 1 1 1 1], yhat=[0 0 0 1 1 1 1 1 1 1], correct=True
y=[0 0 0 0 0 0 1 1 1 1], yhat=[0 0 0 0 0 0 1 1 1 1], correct=True
y=[0 0 0 0 1 1 1 1 1 1], yhat=[0 0 0 0 1 1 1 1 1 1], correct=True
y=[0 0 0 1 1 1 1 1 1 1], yhat=[0 0 0 1 1 1 1 1 1 1], correct=True
y=[0 0 0 0 0 1 1 1 1 1], yhat=[0 0 0 0 0 1 1 1 1 1], correct=True
y=[0 0 0 1 1 1 1 1 1 1], yhat=[0 0 0 1 1 1 1 1 1 1], correct=True
y=[0 0 0 1