# RNN Bidireccional

¿Por qué RNN bidireccional?

Tareas de NLP -> Entidades: Nombres, fechas, lugares, etc.
Para la detección de estas entidades, es mejor tener información de toda la secuencia de principio a fin, y no solamente hasta un t particular.

"**General** relativity is an exciting theory about the physics of space and time".

En esta oración "General" no es una entidad.

"**General** Zod is an enemy of Superman"
En esta oración "General" es una persona

Esta decisión no se puede tomar si no miro toda la oración.
Para este tipo de problemas se utilizan RNN Bidireccionales:

<img src="bidir-rnn.png">



* ¿Tiene sentido seguir viendo solamente el último estado?

No, porque la backward RNN no procesó la secuencia. Tiene sentido definir:

$$ out = [h^f_T, h^b_1] $$

En el caso que uno quiera implementar many to one.

Este es el comportamiento de bidirectional en Keras, si return_sequences=False

Para implementarlo en Keras se hace muy fácilmente:

LSTM(M) -> Bidirectional(LSTM(M))

* ¿Cuándo no usar RNN bidireccionales?

Cuando se hace predicción, ya que no tengo datos para $t > t_0$

## ¿Cómo afecta return_states y return_sequences en una Bidirectional RNN?

Implementemos un código de prueba para analizar el comportamiento:

In [1]:
from __future__ import print_function, division
from builtins import range, input

from keras.models import Model
from keras.layers import Input, LSTM, GRU, Bidirectional
import numpy as np
import matplotlib.pyplot as plt


T = 8 #Cantidad de Timesteps
D = 2 #Cantidad de entradas por timestep
M = 3 #Cantidad de unidades en la capa oculta


X = np.random.randn(1, T, D)


input_ = Input(shape=(T, D))
#rnn = Bidirectional(LSTM(M, return_state=True, return_sequences=True),merge_mode="concat")
rnn = Bidirectional(LSTM(M, return_state=True, return_sequences=False),merge_mode="concat") 
# merge_mode, defalut="concat"... también {'sum','ave','mul'}
x = rnn(input_)
print(x)
model = Model(inputs=input_, outputs=x)
o, h1, c1, h2, c2 = model.predict(X)
print("o:", o)
print("o.shape:", o.shape)
print("h1:", h1)
print("c1:", c1)
print("h2:", h2)
print("c2:", c2)

Using TensorFlow backend.


[<tf.Tensor 'bidirectional_1/concat:0' shape=(?, 6) dtype=float32>, <tf.Tensor 'bidirectional_1/while/Exit_3:0' shape=(?, 3) dtype=float32>, <tf.Tensor 'bidirectional_1/while/Exit_4:0' shape=(?, 3) dtype=float32>, <tf.Tensor 'bidirectional_1/while_1/Exit_3:0' shape=(?, 3) dtype=float32>, <tf.Tensor 'bidirectional_1/while_1/Exit_4:0' shape=(?, 3) dtype=float32>]
o: [[-0.26224297 -0.03154224  0.18159726 -0.10032201 -0.20889407 -0.00458867]]
o.shape: (1, 6)
h1: [[-0.26224297 -0.03154224  0.18159726]]
c1: [[-0.5859647  -0.07311182  0.41462934]]
h2: [[-0.10032201 -0.20889407 -0.00458867]]
c2: [[-0.15180665 -0.9164332  -0.014694  ]]


In [2]:
import numpy as np
# Función que devuelve datos formateados
def get_data(data_path = 'fra-eng/fra.txt', num_samples = 10000):
    # num_samples: Number of samples to train on.
    # Vectorize the data.
    input_texts = []
    target_texts = []
    input_characters = set()
    target_characters = set()
    lines = open(data_path).read().split('\n')
    for line in lines[: min(num_samples, len(lines) - 1)]:
        input_text, target_text = line.split('\t')
        # We use "tab" as the "start sequence" character
        # for the targets, and "\n" as "end sequence" character.
        target_text = '\t' + target_text + '\n'
        input_texts.append(input_text)
        target_texts.append(target_text)
        for char in input_text:
            if char not in input_characters:
                input_characters.add(char)
        for char in target_text:
            if char not in target_characters:
                target_characters.add(char)
    input_characters = sorted(list(input_characters))
    target_characters = sorted(list(target_characters))
    num_encoder_tokens = len(input_characters)
    num_decoder_tokens = len(target_characters)
    input_lenghts = [len(txt) for txt in input_texts]
    output_lengths = [len(txt) for txt in target_texts]
    max_encoder_seq_length = max(input_lenghts)
    max_decoder_seq_length = max(output_lengths)
    print('Traducción con secuencia mas larga (Notar el agregado de tab y enter):')
    print(input_texts[np.argmax(output_lengths)])
    print(target_texts[np.argmax(output_lengths)])

    print('Number of samples:', len(input_texts))
    print('Number of unique input tokens:', num_encoder_tokens)
    print('Number of unique output tokens:', num_decoder_tokens)
    print('Max sequence length for inputs:', max_encoder_seq_length)
    print('Max sequence length for outputs:', max_decoder_seq_length)

    input_token_index = dict(
        [(char, i) for i, char in enumerate(input_characters)])
    target_token_index = dict(
        [(char, i) for i, char in enumerate(target_characters)])
    encoder_input_data = np.zeros(
        (len(input_texts), max_encoder_seq_length, num_encoder_tokens),
        dtype='float32')
    decoder_input_data = np.zeros(
        (len(input_texts), max_decoder_seq_length, num_decoder_tokens),
        dtype='float32')
    decoder_target_data = np.zeros(
        (len(input_texts), max_decoder_seq_length, num_decoder_tokens),
        dtype='float32')

    for i, (input_text, target_text) in enumerate(zip(input_texts, target_texts)):
        for t, char in enumerate(input_text):
            encoder_input_data[i, t, input_token_index[char]] = 1.
        for t, char in enumerate(target_text):
            # decoder_target_data is ahead of decoder_input_data by one timestep
            decoder_input_data[i, t, target_token_index[char]] = 1.
            if t > 0:
                # decoder_target_data will be ahead by one timestep
                # and will not include the start character.
                decoder_target_data[i, t - 1, target_token_index[char]] = 1.
    return encoder_input_data, decoder_input_data, decoder_target_data, \
            input_token_index, target_token_index, \
            num_encoder_tokens, num_decoder_tokens, \
            max_encoder_seq_length, max_decoder_seq_length, \
            input_texts, target_texts

In [3]:
num_samples = 100000
encoder_input_data, decoder_input_data, decoder_target_data, \
input_token_index, target_token_index, \
num_encoder_tokens, num_decoder_tokens,  \
max_encoder_seq_length, \
max_decoder_seq_length, \
input_texts, target_texts = get_data(num_samples = num_samples)

Traducción con secuencia mas larga (Notar el agregado de tab y enter):
I figured I might be able to help.
	Je me suis imaginée que je pourrais être en mesure de donner un coup de main.

Number of samples: 100000
Number of unique input tokens: 80
Number of unique output tokens: 110
Max sequence length for inputs: 34
Max sequence length for outputs: 79


In [4]:
print('Idioma Ingles:')
print('Entrada encoder:', encoder_input_data.shape)
print('Idioma frances:')
print('Entrada decoder:', decoder_input_data.shape)
print('Salida decoder:', decoder_target_data.shape)

Idioma Ingles:
Entrada encoder: (100000, 34, 80)
Idioma frances:
Entrada decoder: (100000, 79, 110)
Salida decoder: (100000, 79, 110)


In [None]:
from keras.models import Model
from keras.layers import Input, LSTM, Dense, TimeDistributed, concatenate, Bidirectional
# Estamos utilizando la Functional API

# Esto es donde guardará el contexto
latent_dim = 128  # Latent dimensionality of the encoding space.

# Define an input sequence and process it.
encoder_inputs = Input(shape=(None, num_encoder_tokens), name="Encoder_Inputs") #num_encoder_tokens es la cantidad de features a la entrada
encoder = Bidirectional(LSTM(latent_dim, return_state=True, name="Encoder_LSTM"))
encoder_outputs = encoder(encoder_inputs)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [concatenate([encoder_outputs[1], encoder_outputs[3]]),concatenate([encoder_outputs[2], encoder_outputs[4]])]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None, num_decoder_tokens), name="Dencoder_Inputs") #num_decoder_tokens es la cantidad de features a la entrada del decoder
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don't use the 
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(2*latent_dim, return_sequences=True, return_state=True, name="Decoder_LSTM")
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
                                     initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax', name='Model_Output')
decoder_outputs = decoder_dense(decoder_outputs)

# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
Encoder_Inputs (InputLayer)     (None, None, 80)     0                                            
__________________________________________________________________________________________________
bidirectional_2 (Bidirectional) [(None, 256), (None, 214016      Encoder_Inputs[0][0]             
__________________________________________________________________________________________________
Dencoder_Inputs (InputLayer)    (None, None, 110)    0                                            
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 256)          0           bidirectional_2[0][1]            
                                                                 bidirectional_2[0][3]            
__________

In [None]:
# Run training
batch_size = 256  # Batch size for training.
epochs = 100  # Number of epochs to train for.

model.compile(optimizer='adam', loss='categorical_crossentropy')
model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
          batch_size=batch_size,
          epochs=epochs,
          validation_split=0.2)

Train on 80000 samples, validate on 20000 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100

In [None]:
encoder_model = Model(encoder_inputs, encoder_states)
decoder_state_input_h = Input(shape=(2*latent_dim,), name="State_input_h")
decoder_state_input_c = Input(shape=(2*latent_dim,), name="State_input_c")
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)

In [None]:
# Reverse-lookup token index to decode sequences back to
# something readable.
reverse_input_char_index = dict(
    (i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict(
    (i, char) for char, i in target_token_index.items())

In [None]:
def decode_sequence(input_seq):
    # Encode the input as state vectors.
    states_value = encoder_model.predict(input_seq)

    # Generate empty target sequence of length 1.
    target_seq = np.zeros((1, 1, num_decoder_tokens))
    # Populate the first character of target sequence with the start character.
    target_seq[0, 0, target_token_index['\t']] = 1.

    # Sampling loop for a batch of sequences
    # (to simplify, here we assume a batch of size 1).
    stop_condition = False
    decoded_sentence = ''
    while not stop_condition:
        output_tokens, h, c = decoder_model.predict(
            [target_seq] + states_value)

        # Sample a token
        sampled_token_index = np.argmax(output_tokens[0, -1, :])
        sampled_char = reverse_target_char_index[sampled_token_index]
        decoded_sentence += sampled_char

        # Exit condition: either hit max length
        # or find stop character.
        if (sampled_char == '\n' or
           len(decoded_sentence) > max_decoder_seq_length):
            stop_condition = True

        # Update the target sequence (of length 1).
        target_seq = np.zeros((1, 1, num_decoder_tokens))
        target_seq[0, 0, sampled_token_index] = 1.

        # Update states
        states_value = [h, c]
    return decoded_sentence

In [None]:
for seq_index in range(100):
    # Take one sequence (part of the training test)
    # for trying out decoding.
    input_seq = encoder_input_data[seq_index: seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print('-')
    print('Input sentence:', input_texts[seq_index])
    print('Decoded sentence:', decoded_sentence)

In [None]:
input_texts2, target_texts2 = get_data(num_samples = 20000)

In [None]:
for seq_index in range(8000,8100):
    # Take one sequence (part of the training test)
    # for trying out decoding.
    input_seq = encoder_input_data[seq_index: seq_index + 1]
    decoded_sentence = decode_sequence(input_seq)
    print('-')
    print('Input sentence:', input_texts[seq_index])
    print('Decoded sentence:', decoded_sentence)