# **Practice 2.2. Recurrent Neural Networks**

- Alejandro Dopico Castro ([alejandro.dopico2@udc.es](mailto:alejandro.dopico2@udc.es)).
- Ana Xiangning Pereira Ezquerro ([ana.ezquerro@udc.es](mailto:ana.ezquerro@udc.es)).

This notebook contains execution examples of the recurrent neural architectures proposed for the [Amazon Reviews dataset](https://www.kaggle.com/datasets/bittlingmayer/amazonreviews). The Python scripts submitted include auxiliar code to simplify the readibility of the coding cells:

- [data.py](data.py): Defines the `AmazonDataset` class to load, split, transform and stream the Amazon Reviews dataset. 
- [recurrent_models.py](recurrent_models.py): Defines the `create_recurrent_model` function to instantiate a Keras model varying its architecture. 
- [utils.py](utils.py): Defines auxiliary function to train and plot the performance of a Keras model.

In [1]:
from data import AmazonDataset
from model import AmazonReviewsModel
from typing import Dict
import plotly.io as pio
from typing import Tuple
from collections import OrderedDict
from keras.layers import *
from keras.regularizers import Regularizer, L1, L2, L1L2
from keras.optimizers import Adam, RMSprop
import pandas as pd
from itertools import product
pio.renderers.default = "vscode"

# global parameters 
MAX_FEATURES = 1000
MODEL_PATH = 'results/'
model_accuracies: Dict[str, Tuple[int, int]] = dict()

# model default parameters
train_default = dict(epochs=30, batch_size=1000, lr=1e-3, dev_patience=5)

# load data
path_dir = 'AmazonDataset/'
dataset = AmazonDataset.load(train_path=path_dir + "train_small.txt", test_path=path_dir + "test_small.txt", max_features=MAX_FEATURES)

2024-04-16 09:17:27.222826: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-04-16 09:17:27.544081: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-04-16 09:17:35.671976: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registe

## Simple Recurrent Baseline 

We used a simple recurrent architecture to set our baseline performance. This model is conformed by two stacked modules: a recurrent encoder of 2-stacked [RNN cells](https://keras.io/api/layers/recurrent_layers/simple_rnn/) ([Rumelhart et al., 1985](https://stanford.edu/~jlmcc/papers/PDP/Volume%201/Chap8_PDP86.pdf)) and a [feed-forward layer](https://keras.io/api/layers/core_layers/dense/) with a sigmoidal activation to return the probability of a good review. We used an input embedding layer of dimension $d_x=64$ and maintained the dimension of the decoder to $d_h=64$.

In [2]:
simpleRNN_model = AmazonReviewsModel(MAX_FEATURES, 64, SimpleRNN, name='SimpleRNN')
_, fig = simpleRNN_model.train(dataset, f'{MODEL_PATH}/simpleRNN.weights.h5', **train_default)
print(simpleRNN_model.evaluate(dataset.X_test, dataset.y_test))
fig

Epoch 1/30


2024-04-15 22:02:55.930300: E tensorflow/core/util/util.cc:131] oneDNN supports DT_BOOL only on platforms with AVX-512. Falling back to the default Eigen-based implementation if present.


[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 249ms/step - accuracy: 0.5377 - loss: 0.6902 - val_accuracy: 0.7042 - val_loss: 0.6339
Epoch 2/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 228ms/step - accuracy: 0.7482 - loss: 0.5542 - val_accuracy: 0.7890 - val_loss: 0.4816
Epoch 3/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 216ms/step - accuracy: 0.8226 - loss: 0.4150 - val_accuracy: 0.8296 - val_loss: 0.3937
Epoch 4/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 218ms/step - accuracy: 0.8627 - loss: 0.3324 - val_accuracy: 0.8270 - val_loss: 0.3909
Epoch 5/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 218ms/step - accuracy: 0.8835 - loss: 0.2920 - val_accuracy: 0.8546 - val_loss: 0.3439
Epoch 6/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m4s[0m 217ms/step - accuracy: 0.9029 - loss: 0.2453 - val_accuracy: 0.8492 - val_loss: 0.3685
Epoch 7/30
[1m20/20[0m [32m━━━━━━━━━

The [simple RNN cell](https://keras.io/api/layers/recurrent_layers/simple_rnn/) reaches 85.28% of test accuracy and is able to learn uppon the 95% of train data. In the next cells we test two different recurrent cells from the simple RNN: the [LSTM](https://keras.io/api/layers/recurrent_layers/lstm/) ([Hochreiter et al., 1997](https://www.bioinf.jku.at/publications/older/2604.pdf)) and the [GRU](https://keras.io/api/layers/recurrent_layers/gru/) ([Chung et al., 2014](https://arxiv.org/abs/1412.3555)). Both cells claimed to improve the performance of the simple RNN with a better inner representation of the temporal data flow by the introduction of different gates modeled with different learnable weights.

In [3]:
lstm_model = AmazonReviewsModel(MAX_FEATURES, 64, LSTM, name='LSTM')
_, fig = lstm_model.train(dataset, f'{MODEL_PATH}/LSTM.weights.h5', **train_default)
print(lstm_model.evaluate(dataset.X_test, dataset.y_test))
fig

Epoch 1/30


[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 501ms/step - accuracy: 0.5398 - loss: 0.6804 - val_accuracy: 0.6486 - val_loss: 0.6754
Epoch 2/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 482ms/step - accuracy: 0.7283 - loss: 0.5610 - val_accuracy: 0.8026 - val_loss: 0.4263
Epoch 3/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 482ms/step - accuracy: 0.8264 - loss: 0.4039 - val_accuracy: 0.8388 - val_loss: 0.3723
Epoch 4/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 481ms/step - accuracy: 0.8525 - loss: 0.3483 - val_accuracy: 0.8530 - val_loss: 0.3555
Epoch 5/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 480ms/step - accuracy: 0.8649 - loss: 0.3217 - val_accuracy: 0.8458 - val_loss: 0.3506
Epoch 6/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 480ms/step - accuracy: 0.8681 - loss: 0.3167 - val_accuracy: 0.8552 - val_loss: 0.3365
Epoch 7/30
[1m20/20[0m [32m━━━

In [4]:
gru_model = AmazonReviewsModel(MAX_FEATURES, 64, GRU, name='GRU')
_, fig = gru_model.train(dataset, f'{MODEL_PATH}/GRU.weights.h5', **train_default)
print(gru_model.evaluate(dataset.X_test, dataset.y_test))
fig

Epoch 1/30


[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 519ms/step - accuracy: 0.5333 - loss: 0.6863 - val_accuracy: 0.6798 - val_loss: 0.6048
Epoch 2/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 494ms/step - accuracy: 0.7167 - loss: 0.5555 - val_accuracy: 0.7952 - val_loss: 0.4464
Epoch 3/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 492ms/step - accuracy: 0.7903 - loss: 0.4590 - val_accuracy: 0.8150 - val_loss: 0.4084
Epoch 4/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 488ms/step - accuracy: 0.8216 - loss: 0.4247 - val_accuracy: 0.8544 - val_loss: 0.3980
Epoch 5/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 488ms/step - accuracy: 0.8500 - loss: 0.3652 - val_accuracy: 0.8592 - val_loss: 0.3356
Epoch 6/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m10s[0m 485ms/step - accuracy: 0.8724 - loss: 0.3129 - val_accuracy: 0.8576 - val_loss: 0.3455
Epoch 7/30
[1m20/20[0m [32m━━━

Using the same architecture but only replacing the simple RNN layer by LSTMs or GRUs, we see that the performance reaches the 85.6% and 85.77% of accuracy, respectively. We see that the difference between the simple RNN, LSTM and GRU is not significative and, at least with a small architecture, we will not obtain benefits from the power of the LSTM and GRU.

## Enhancing the architecture with regularization techniques

Once we have a first estimation of the performance with small models we are going launch experiments with larger architectures. We increased the model dimension to $d_h=128$ and the vocabulary size to $|\mathcal{V}|=2000$. The encoder is now conformed by 3-stacked recurrent cells and the decoder adds a new extra feed-forward network between the last state of the encoder and the output layer. In order to balance this enhancement and avoid a possible overfitting, we included a [dropout](https://keras.io/api/layers/regularization_layers/dropout/) of the 10% in the latent space of the network (between the encoder and decoder).

In [2]:
# relaad the dataset
dataset = AmazonDataset.load(train_path=path_dir + "train_small.txt", test_path=path_dir + "test_small.txt", max_features=2000)

In [3]:
simpleRNN_enhanced = AmazonReviewsModel(
    2000, 256, SimpleRNN, num_recurrent_layers=3, dropout=0.1, ffn_dims=[64], name='SimpleRNN-enhanced')
_, fig = simpleRNN_enhanced.train(dataset, f'{MODEL_PATH}/simpleRNN-enhanced.weights.h5', **train_default)
print(simpleRNN_enhanced.evaluate(dataset.X_test, dataset.y_test))
fig

Epoch 1/30


2024-04-16 09:18:16.431375: E tensorflow/core/util/util.cc:131] oneDNN supports DT_BOOL only on platforms with AVX-512. Falling back to the default Eigen-based implementation if present.


[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m153s[0m 7s/step - accuracy: 0.5107 - loss: 0.7301 - val_accuracy: 0.4936 - val_loss: 0.6968
Epoch 2/30
[1m 8/20[0m [32m━━━━━━━━[0m[37m━━━━━━━━━━━━[0m [1m1:14[0m 6s/step - accuracy: 0.5083 - loss: 0.6979

In [13]:
lstm_enhanced = AmazonReviewsModel(
    2000, 256, LSTM, num_recurrent_layers=3, dropout=0.1, ffn_dims=[64], name='LSTM-enhanced')
_, fig = lstm_enhanced.train(dataset, f'{MODEL_PATH}/LSTM-enhanced.weights.h5', **train_default)
print(lstm_enhanced.evaluate(dataset.X_test, dataset.y_test))
fig

Epoch 1/30


[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m66s[0m 3s/step - accuracy: 0.5337 - loss: 0.6945 - val_accuracy: 0.5970 - val_loss: 0.6842
Epoch 2/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m62s[0m 3s/step - accuracy: 0.6430 - loss: 0.6539 - val_accuracy: 0.5432 - val_loss: 0.6902
Epoch 3/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m61s[0m 3s/step - accuracy: 0.6157 - loss: 0.6541 - val_accuracy: 0.7724 - val_loss: 0.5006
Epoch 4/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m61s[0m 3s/step - accuracy: 0.8158 - loss: 0.4288 - val_accuracy: 0.8348 - val_loss: 0.3854
Epoch 5/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m62s[0m 3s/step - accuracy: 0.8596 - loss: 0.3419 - val_accuracy: 0.8308 - val_loss: 0.3879
Epoch 6/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m63s[0m 3s/step - accuracy: 0.8639 - loss: 0.3322 - val_accuracy: 0.8512 - val_loss: 0.3423
Epoch 7/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━

In [14]:
gru_enhanced = AmazonReviewsModel(
    2000, 256, GRU, num_recurrent_layers=3, dropout=0.1, ffn_dims=[64], name='GRU-enhanced')
_, fig = gru_enhanced.train(dataset, f'{MODEL_PATH}/GRU-enhanced.weights.h5', **train_default)
print(gru_enhanced.evaluate(dataset.X_test, dataset.y_test))
fig

Epoch 1/30


[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m56s[0m 3s/step - accuracy: 0.5730 - loss: 0.6736 - val_accuracy: 0.6202 - val_loss: 0.6326
Epoch 2/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m54s[0m 3s/step - accuracy: 0.6991 - loss: 0.5810 - val_accuracy: 0.7580 - val_loss: 0.4889
Epoch 3/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m53s[0m 3s/step - accuracy: 0.7613 - loss: 0.5023 - val_accuracy: 0.7434 - val_loss: 0.5520
Epoch 4/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m53s[0m 3s/step - accuracy: 0.7889 - loss: 0.4740 - val_accuracy: 0.8336 - val_loss: 0.3932
Epoch 5/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m55s[0m 3s/step - accuracy: 0.8658 - loss: 0.3268 - val_accuracy: 0.8634 - val_loss: 0.3163
Epoch 6/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m53s[0m 3s/step - accuracy: 0.8818 - loss: 0.2876 - val_accuracy: 0.8680 - val_loss: 0.3154
Epoch 7/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━

We see a slight improvement with the LSTM and GRU-based architectures when increasing the number of learnable hyperparameters (both the train and the test set metrics are improved). However, the 

## Bidirectional Processing

In [3]:
bilstm_model = AmazonReviewsModel(
    2000, 256, LSTM, num_recurrent_layers=4, dropout=0.15, ffn_dims=[128, 64], name='BiLSTM', bidirectional=True)
_, fig = bilstm_model.train(dataset, f'{MODEL_PATH}/BiLSTM.weights.h5', **train_default)
print(bilstm_model.evaluate(dataset.X_test, dataset.y_test))
fig

Epoch 1/30


[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m145s[0m 7s/step - accuracy: 0.5442 - loss: 0.7215 - val_accuracy: 0.7280 - val_loss: 0.6726
Epoch 2/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m144s[0m 7s/step - accuracy: 0.7096 - loss: 0.6583 - val_accuracy: 0.6368 - val_loss: 0.6168
Epoch 3/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m147s[0m 7s/step - accuracy: 0.7304 - loss: 0.5381 - val_accuracy: 0.8328 - val_loss: 0.3871
Epoch 4/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m139s[0m 7s/step - accuracy: 0.8407 - loss: 0.3601 - val_accuracy: 0.8636 - val_loss: 0.3224
Epoch 5/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m136s[0m 7s/step - accuracy: 0.8632 - loss: 0.3134 - val_accuracy: 0.8720 - val_loss: 0.3135
Epoch 6/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m136s[0m 7s/step - accuracy: 0.8798 - loss: 0.2880 - val_accuracy: 0.8726 - val_loss: 0.3101
Epoch 7/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━

In [None]:
bilstm_model = AmazonReviewsModel(
    2000, 256, GRU, num_recurrent_layers=4, dropout=0.15, ffn_dims=[128, 64], name='BiGRU', bidirectional=True)
_, fig = bilstm_model.train(dataset, f'{MODEL_PATH}/BiGRU.weights.h5', **train_default)
print(bilstm_model.evaluate(dataset.X_test, dataset.y_test))
fig

Epoch 1/30
[1m 2/20[0m [32m━━[0m[37m━━━━━━━━━━━━━━━━━━[0m [1m1:40[0m 6s/step - accuracy: 0.5373 - loss: 0.6923 

[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m124s[0m 6s/step - accuracy: 0.6234 - loss: 0.6343 - val_accuracy: 0.7934 - val_loss: 0.5149
Epoch 2/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m118s[0m 6s/step - accuracy: 0.8085 - loss: 0.4580 - val_accuracy: 0.8332 - val_loss: 0.3783
Epoch 3/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m118s[0m 6s/step - accuracy: 0.8504 - loss: 0.3389 - val_accuracy: 0.7246 - val_loss: 0.5123
Epoch 4/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m117s[0m 6s/step - accuracy: 0.7922 - loss: 0.4249 - val_accuracy: 0.8448 - val_loss: 0.3548
Epoch 5/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m116s[0m 6s/step - accuracy: 0.8677 - loss: 0.3094 - val_accuracy: 0.8552 - val_loss: 0.3484
Epoch 6/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m118s[0m 6s/step - accuracy: 0.8812 - loss: 0.2815 - val_accuracy: 0.8690 - val_loss: 0.3142
Epoch 7/30
[1m20/20[0m [32m━━━━━━━━━━━━━━━

## Optimal configuration of the recurrent architecture 

In [27]:
grid = OrderedDict(
    regularizer = [L1(1e-3), L2(1e-3), L1L2(1e-4)],
    initializer=['random_normal', 'glorot_uniform', 'glorot_normal', 'he_normal', 'orthogonal'],
    optimizer=[Adam, RMSprop]
)
Regularizer.__repr__ = lambda x: x.__class__.__name__

def tostring(x):
    if isinstance(x, type):
        return x.__name__
    else:
        return repr(x)

def applydeep(lists, func):
    result = []
    for item in lists:
        result.append(list(map(func, item)))
    return result

df = pd.DataFrame(columns=['train', 'val', 'test'], 
                  index=pd.MultiIndex.from_product(applydeep(grid.values(), tostring)))
for i, params in enumerate(product(*grid.values())):
    params = dict(zip(grid.keys(), params))
    optimizer = params.pop('optimizer')
    model = AmazonReviewsModel(
        2000, 256, GRU, num_recurrent_layers=3, ffn_dims=[64], dropout=0.1, bidirectional=True,
        **params
    )
    model.train(dataset,f'results/amazon.weights.h5', opt=optimizer, **train_default)
    _, train_acc = model.evaluate(dataset.X_train, dataset.y_train)
    _, val_acc = model.evaluate(dataset.X_val, dataset.y_val)
    _, test_acc = model.evaluate(dataset.X_test, dataset.y_test)
    df.loc[tuple(map(tostring, params.values()))] = [train_acc, val_acc, test_acc]
    df.to_csv('grid.csv')
df = pd.read_csv('grid.csv', index_col=[0, 1, 2])
df.index.names = ['regularizer', 'initializer', 'optimizer']
df

Epoch 1/30


[1m20/20[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m125s[0m 6s/step - accuracy: 0.6599 - loss: 0.6469 - val_accuracy: 0.7266 - val_loss: 0.5298
Epoch 2/30
[1m 6/20[0m [32m━━━━━━[0m[37m━━━━━━━━━━━━━━[0m [1m1:18[0m 6s/step - accuracy: 0.7704 - loss: 0.4893

KeyboardInterrupt: 

In [26]:
df

Unnamed: 0,Unnamed: 1,Unnamed: 2,train,val,test
L1,'random_normal',Adam,,,
L1,'random_normal',RMSprop,,,
L1,'glorot_uniform',Adam,,,
L1,'glorot_uniform',RMSprop,,,
L1,'glorot_normal',Adam,,,
L1,'glorot_normal',RMSprop,,,
L1,'he_normal',Adam,,,
L1,'he_normal',RMSprop,,,
L1,'orthogonal',Adam,,,
L1,'orthogonal',RMSprop,,,


In [None]:
model_names = list(model_accuracies.keys())
train_values = [item[0] for item in model_accuracies.values()]
test_values = [item[1] for item in model_accuracies.values()]

# Create bar chart
fig = go.Figure()
fig.add_trace(go.Bar(x=names, y=train_values, name='Train Accuracy', marker_color='blue'))
fig.add_trace(go.Bar(x=names, y=test_values, name='Test Accuracy', marker_color='orange'))

# Add title and axis labels
fig.update_layout(title='Comparison of Recurrent Models Accuracies on Train and Test Set',
                  xaxis=dict(title='Model'),
                  yaxis=dict(title='Accuracy'))

# Show the plot
fig.show()
