### AUTOR: Isaac Reyes

# 12 Advanced Recurrent Neural Networks
Advanced Neural Network architectures represent significant advancements in the field of deep learning, which are most used in the domain of sequence modeling and processing. These architectures build upon the traditional feedforward neural networks and introduce recurrent connections, allowing them to exhibit temporal dynamics and memory capabilities.

The Elman RNN employs a simple recurrent loop in its hidden layer, enabling it to capture short-term temporal dependencies, making it suitable for applications such as speech recognition and time series analysis.
The Jordan RNN possesses feedback connections from the output layer to the hidden layer, rendering it capable of modeling longer-term dependencies, which finds applications in machine translation and language modeling tasks.
The Bidirectional RNN combines both forward and backward temporal processing, allowing it to consider both past and future context in its predictions, making it effective in natural language processing tasks such as sentiment analysis and named entity recognition.
These advanced neural network architectures significantly expand the modeling capabilities of traditional neural networks and have become indispensable tools in various sequential data processing applications.

### Exercise

**Use the IMDB movie reviews dataset to perform sentiment analysis with a Elman, Jordan and Bidirectional RNN. Highlight the differences on the performance of each architecture.**



In [5]:
#Aqui añadimos las librerias
from tensorflow.keras.datasets import imdb
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
#Aqui añado otras librerias:
from tensorflow.keras.layers import Embedding, SimpleRNN, Dense, Bidirectional, TimeDistributed, Lambda
from tensorflow.keras import backend as K


*1. Load the IMDB movie reviews dataset*

In [6]:
max_features = 5000  # Number of words to consider as features
max_len_short = 100  # Maximum sequence length for short sequences
max_len_long = 500   # Maximum sequence length for long sequences
#Todo esto son constantes
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

2. Pad sequences to a fixed length for RNN input

In [7]:
x_train_short = pad_sequences(x_train, maxlen=max_len_short)
x_test_short = pad_sequences(x_test, maxlen=max_len_short)
#cargamos la data
x_train_long = pad_sequences(x_train, maxlen=max_len_long)
x_test_long = pad_sequences(x_test, maxlen=max_len_long)


3. Build the distinct RNN models

In [8]:
def build_elman_rnn_model():
    model = Sequential()
    model.add(Embedding(max_features, 32))
    model.add(SimpleRNN(32, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    return model

    #aqui hago cambios porque a red Jordan no se puede implementar directamente con las capas proporcionadas por Keras porque
    #requiere retroalimentación de la salida de la red a la siguiente entrada, lo que no se permite con las capas RNN estándar.
    #Sin embargo, una forma de implementar una red Jordan es utilizando una combinación de una capa SimpleRNN con
    #return_sequences=True (para obtener la secuencia completa de salidas) seguida por una capa TimeDistributed(Dense(...)) para transformar las salidas en predicciones
    #en cada paso de tiempo, y luego una capa Lambda para tomar la salida final.
def build_jordan_rnn_model():
    model = Sequential()
    model.add(Embedding(max_features, 32))
    model.add(SimpleRNN(32, activation='relu', return_sequences=True))
    #añado:
    model.add(TimeDistributed(Dense(32)))
    model.add(Lambda(lambda x: K.sum(x, axis=1)))
    model.add(Dense(1, activation='sigmoid'))
    return model

def build_bidirectional_rnn_model():
    model = Sequential()
    model.add(Embedding(max_features, 32))
    model.add(Bidirectional(SimpleRNN(32, activation='relu')))
    model.add(Dense(1, activation='sigmoid'))
    return model

4. Train and evaluate the RNN model

In [9]:
def train_and_evaluate_model(model, x_train, y_train, x_test, y_test):
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    history = model.fit(x_train, y_train, epochs=5, batch_size=128, validation_split=0.2)
    loss, accuracy = model.evaluate(x_test, y_test)
    return loss, accuracy, history

5. Train and evaluate the RNN model on short sequences

##### Elman RNN Model

In [10]:
print("\nTraining RNN model on short sequences:")
rnn_model_short = build_elman_rnn_model()
loss_short, accuracy_short, history_short = train_and_evaluate_model(
    rnn_model_short, x_train_short, y_train, x_test_short, y_test
)


Training RNN model on short sequences:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


##### Jordan RNN Model

In [11]:
print("\nTraining Jordan RNN model on short sequences:")
jordan_model_short = build_jordan_rnn_model()
loss_short_jordan, accuracy_short_jordan, history_short_jordan = train_and_evaluate_model(
    jordan_model_short, x_train_short, y_train, x_test_short, y_test)


Training Jordan RNN model on short sequences:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


##### Bidirectional RNN Model

In [12]:
print("\nTraining Bidirectional RNN model on short sequences:")
bidirectional_model_short = build_bidirectional_rnn_model()
loss_short_bidirectional, accuracy_short_bidirectional, history_short_bidirectional = train_and_evaluate_model(
    bidirectional_model_short, x_train_short, y_train, x_test_short, y_test)


Training Bidirectional RNN model on short sequences:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


6. Train and evaluate the RNN model on long sequences

In [13]:
print("\nTraining Elman RNN model on long sequences:")
rnn_model_long = build_elman_rnn_model()
loss_long, accuracy_long, history_long = train_and_evaluate_model(
    rnn_model_long, x_train_long, y_train, x_test_long, y_test
)


Training Elman RNN model on long sequences:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


##### Jordan RNN Model

In [14]:
print("\nTraining Jordan RNN model on long sequences:")
jordan_model_long = build_jordan_rnn_model()
loss_long_jordan, accuracy_long_jordan, history_long_jordan = train_and_evaluate_model(
    jordan_model_long, x_train_long, y_train, x_test_long, y_test)


Training Jordan RNN model on long sequences:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


##### Bidirectional RNN Model

In [15]:
print("\nTraining Bidirectional RNN model on long sequences:")
bidirectional_model_long = build_bidirectional_rnn_model()
loss_long_bidirectional, accuracy_long_bidirectional, history_long_bidirectional = train_and_evaluate_model(
    bidirectional_model_long, x_train_long, y_train, x_test_long, y_test)


Training Bidirectional RNN model on long sequences:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


7. Compare the results

##### Elman RNN Model

In [16]:
print("\nResults on Short Sequences:")
print(f"Loss: {loss_short:.4f}, Accuracy: {accuracy_short:.4f}")

print("\nResults on Long Sequences:")
print(f"Loss: {loss_long:.4f}, Accuracy: {accuracy_long:.4f}")


Results on Short Sequences:
Loss: 0.3716, Accuracy: 0.8381

Results on Long Sequences:
Loss: 0.3818, Accuracy: 0.8470


##### Jordan RNN Model

In [17]:
print("\nResults on Short Sequences with Jordan RNN:")
print(f"Loss: {loss_short_jordan:.4f}, Accuracy: {accuracy_short_jordan:.4f}")

print("\nResults on Long Sequences with Jordan RNN:")
print(f"Loss: {loss_long_jordan:.4f}, Accuracy: {accuracy_long_jordan:.4f}")


Results on Short Sequences with Jordan RNN:
Loss: 0.6082, Accuracy: 0.8200

Results on Long Sequences with Jordan RNN:
Loss: 0.4747, Accuracy: 0.8528


##### Bidirectional RNN Model

In [18]:
print("\nResults on Short Sequences with Bidirectional RNN:")
print(f"Loss: {loss_short_bidirectional:.4f}, Accuracy: {accuracy_short_bidirectional:.4f}")

print("\nResults on Long Sequences with Bidirectional RNN:")
print(f"Loss: {loss_long_bidirectional:.4f}, Accuracy: {accuracy_long_bidirectional:.4f}")


Results on Short Sequences with Bidirectional RNN:
Loss: 0.3698, Accuracy: 0.8378

Results on Long Sequences with Bidirectional RNN:
Loss: 0.3463, Accuracy: 0.8561


En este estudio, se experimentó con tres tipos de arquitecturas de redes neuronales recurrentes (RNN): Elman RNN, Jordan RNN y Bidireccional RNN. Cada uno de estos modelos fue entrenado y evaluado en dos configuraciones diferentes: utilizando secuencias cortas y largas de entrada.

En términos generales, todos los modelos demostraron ser competentes en la tarea de clasificación, logrando una precisión de al menos 0.82 en el conjunto de prueba. Esto confirma la eficacia de las RNN en tareas que implican datos secuenciales, como la clasificación de opiniones basada en comentarios de texto.

Cuando se compararon los diferentes tipos de RNN, el modelo Jordan superó al Elman en las secuencias largas, pero en las secuencias cortas, su rendimiento fue ligeramente inferior. Esto sugiere que las capacidades de "memoria" a largo plazo del modelo Jordan pueden ser especialmente útiles cuando se trata de manejar secuencias más largas.

Por otro lado, el rendimiento de los modelos Bidireccionales fue comparable o incluso superior al de los modelos Elman y Jordan, tanto para secuencias cortas como largas. Esta robustez podría deberse a su capacidad para procesar información tanto en la dirección de avance como en la de retroceso de la secuencia de entrada.

Respecto a la longitud de las secuencias, se observó que todos los modelos -Elman, Jordan y Bidireccional- se desempeñaron ligeramente mejor con secuencias más largas. Esto puede indicar que tener más información disponible, en forma de una secuencia de entrada más larga, resulta beneficioso para la tarea. Sin embargo, es posible que las secuencias más largas simplemente proporcionen un contexto adicional que facilita a los modelos la comprensión del sentimiento general del comentario.

En conclusión, aunque todos los modelos evaluados demostraron ser efectivos para la tarea de clasificación de opiniones, los modelos Bidireccionales mostraron un rendimiento ligeramente superior. Además, utilizar secuencias más largas de entrada mejoró el rendimiento de todos los modelos. Estos hallazgos podrían tener implicaciones valiosas para futuros trabajos en tareas similares de procesamiento del lenguaje natural.