# Code recurrent neural networks

This demo will walk you through how to build recurrent neural networks to solve problems with text data (these methods may also be used for any sequential data like time series, sound etc...)

## What will you learn in this course? 🧐🧐

This course will focus on the technical approach to building recurrent neural networks and details on how to code the three new layers we have studied!
Here is the outline:

* Recurrent layers
  * SimpleRNN
  * GRU
  * LSTM
* Build a recurrent neural network

## Recurrent layers

In this section we will focus strictly on studying the code around the three new layers we just learned about: simpleRNN, GRU, and LSTM.

# SimpleRNN --------------------------------------------


The most simple recurrent layer corresponds to `tf.keras.layers.SimpleRNN`, you may find the documentation here [simpleRNN](https://www.tensorflow.org/api_docs/python/tf/keras/layers/SimpleRNN).

In [None]:
import tensorflow as tf
from tensorflow.keras.layers import SimpleRNN


# units            = number of neurons in the layer
# return_sequences = whether the layer should output the full sequence of outputs computed while processing the input sequence or just the last output
# return_state     = to return the hidden state in a separate object



# On definit un SimpleRNN avec 16 neurones
# Chaque neurone va être connecté aux 4 colonnes de la matrice WordEmbedding qu'on va lui présenter
# Il va faire Phi(Sigma .... + h w + b) sur chacun des 4 valeurs de la ligne
# On aura donc une ligne de 16 valeurs
# On aura donc 10 lignes et 16 colonnes
srnn = SimpleRNN(units=16, return_sequences=False, return_state=False)


In [None]:
# Ici on "simule" la sortie d'un Embeding Output
# 4 colonnes et 10 lignes
# lignes   = nombre de mots seq_len
# colonnes = une puissance de 2

# Let's create an example input for this layer and see how it works
batch_size  = 1
seq_len     = 10
channels    = 4

input = tf.random.normal(shape=(batch_size, seq_len, channels))
input

<tf.Tensor: shape=(1, 10, 4), dtype=float32, numpy=
array([[[ 1.7618275e-03, -1.1001991e+00,  1.7366928e-01, -8.9706898e-01],
        [-1.1664591e+00,  1.3106228e+00, -1.1294969e+00, -4.6602622e-01],
        [-1.8397214e-01, -2.7872634e-01, -1.0606695e+00, -1.8988892e-01],
        [ 4.8748791e-02,  7.6717454e-01,  1.1084483e+00, -1.1009997e+00],
        [ 5.3341228e-01,  7.9885697e-01,  8.0976820e-01,  5.2871376e-02],
        [-1.1784338e+00,  8.5470957e-01,  1.3981770e+00, -2.0381522e+00],
        [-7.7480239e-01, -1.0039926e+00, -2.0655198e+00, -2.2370915e+00],
        [-5.7550144e-01, -1.5513005e+00,  7.7024323e-01, -2.6107851e-01],
        [-5.7658273e-01,  2.1991391e+00,  9.8297253e-02, -1.8843707e+00],
        [ 1.6924981e+00, -2.6202568e-01,  2.2290002e-01, -6.7218447e-01]]],
      dtype=float32)>

In [None]:
# Voir qu'on a bien un vecteur 16 en sortie
# En fait à la création de RNN on a indiqué
# return_sequences=False        => on retourne la dernière ligne (qui tient compte des 9 lignes précédente)
# return_state=False            => on ne retourne pas lles hidden state

# En entree on a un tenseur 1 x 10 x 4
# En sortie on a un tenseur 1 x 16


# now let's apply the simpleRNN layer and see what comes out
srnn(input)

# the ouput is a batch of one observation with 16 representation channels which
# corresponds to the number of units in the layer

<tf.Tensor: shape=(1, 16), dtype=float32, numpy=
array([[ 0.48304415, -0.94882745, -0.47727525, -0.04252353, -0.60917974,
        -0.84126204,  0.7383893 , -0.45081544, -0.5453766 , -0.5817851 ,
         0.16214931,  0.75372493,  0.42567596,  0.44439647, -0.08026829,
         0.17734928]], dtype=float32)>

In [None]:
# On demande
# return_sequences=True       => On retourne toute la séquence de calcul => 10 lignes x 16 colonne
# return_state=False          => On retorune pas les hidden states

# let's change things up by returning the whole output sequence
srnn = SimpleRNN(units=16, return_sequences=True, return_state=False)

srnn(input)
# now the layer preserves the sequential structure of the input, instead of
# returning a 2D tensor, now outputs a 3D tensor of shape (batch_size, seq_len, units)

<tf.Tensor: shape=(1, 10, 16), dtype=float32, numpy=
array([[[-0.16674514, -0.5428698 , -0.2414192 ,  0.65495515,
          0.16298962, -0.561073  ,  0.35638264,  0.08658373,
          0.13164768, -0.3832713 ,  0.3170101 , -0.05791348,
          0.01584603, -0.52405834,  0.57561624, -0.22586507],
        [ 0.02148461,  0.8013764 , -0.7834751 , -0.5617418 ,
          0.77003765,  0.9167274 , -0.3608897 ,  0.0302986 ,
         -0.6911191 , -0.74479264, -0.27278763, -0.22496752,
          0.5018158 , -0.210946  , -0.10704214,  0.55836844],
        [ 0.46089667,  0.40629053,  0.2596361 , -0.89198124,
          0.45567492,  0.49589124, -0.89426714, -0.33622608,
          0.44279096,  0.41038287, -0.48679465,  0.10893589,
         -0.06289853, -0.53423417,  0.8834528 ,  0.39269677],
        [-0.6661824 ,  0.06607206,  0.17518276,  0.05739673,
         -0.8824435 ,  0.47650233,  0.48217472,  0.7331726 ,
         -0.09654682,  0.24278483,  0.9289079 , -0.8789382 ,
          0.8871039 , -0.7721

In [None]:
# On demande
# return_sequences=False            =>  on retroune que la dernière séquence 1 ligne x 16 colonnes
# return_state=True                 =>  on retourne les hidden states. Pas d'interrêt ici car RNN. retourne 2 fois le même ligne


srnn = SimpleRNN(units=16, return_sequences=False, return_state=True)

srnn(input)
# now the layer returns two objects, the output and the hidden state, well in
# simpleRNN they carry the same values as you can see

[<tf.Tensor: shape=(1, 16), dtype=float32, numpy=
 array([[-0.18483947,  0.9103874 ,  0.75096023, -0.16370803,  0.42190883,
          0.48930535,  0.35711202,  0.45196533,  0.8122    ,  0.48135474,
          0.05689316,  0.35409465, -0.3624563 ,  0.8596767 , -0.5756049 ,
          0.7684309 ]], dtype=float32)>,
 <tf.Tensor: shape=(1, 16), dtype=float32, numpy=
 array([[-0.18483947,  0.9103874 ,  0.75096023, -0.16370803,  0.42190883,
          0.48930535,  0.35711202,  0.45196533,  0.8122    ,  0.48135474,
          0.05689316,  0.35409465, -0.3624563 ,  0.8596767 , -0.5756049 ,
          0.7684309 ]], dtype=float32)>]

# GRU --------------------------------------------

Now let's see how we can code GRU layers, you can read the documentation here: [GRU](https://www.tensorflow.org/api_docs/python/tf/keras/layers/GRU).

In [None]:
# return_sequences = False      => dernière ligne 1 x 16
# return_state     = False      => rien

from tensorflow.keras.layers import GRU

gru = GRU(units=16, return_sequences=False, return_state=False)

gru(input)

# it works mainly in the same way as the SimpleRNN layer

<tf.Tensor: shape=(1, 16), dtype=float32, numpy=
array([[ 0.04443589,  0.04310554, -0.18226852,  0.2776561 ,  0.0530444 ,
         0.07345533, -0.3263973 , -0.02632806, -0.21374685, -0.09900159,
         0.04404004,  0.42702842,  0.08360571, -0.02835929, -0.1385899 ,
         0.22601241]], dtype=float32)>

In [None]:
# return_sequences = True      => 10 x 16
# return_state     = False     => rien


gru = GRU(units=16, return_sequences=True, return_state=False)

gru(input)

# you can still use return_sequences in order to preserve the sequential
# nature of the data

<tf.Tensor: shape=(1, 10, 16), dtype=float32, numpy=
array([[[-1.33378953e-01, -4.02516685e-02, -1.39565051e-01,
          1.87363252e-01, -7.61715602e-03,  1.97818577e-01,
         -8.18353668e-02, -9.57111456e-03,  5.55594936e-02,
         -6.21235929e-02, -2.00618878e-01,  1.04959853e-01,
          8.56895447e-02,  4.80387546e-03,  7.16334805e-02,
         -7.37120807e-02],
        [-4.47026417e-02,  4.92143035e-02,  1.18637614e-01,
         -1.25896230e-01,  3.55513602e-01,  1.80481613e-01,
          2.62508035e-01, -2.30472192e-01,  1.26995906e-01,
         -2.00958058e-01, -1.37123019e-02, -1.86547294e-01,
         -1.19403601e-02,  5.85986525e-02,  1.20771416e-01,
         -3.67411792e-01],
        [-1.47925122e-02,  3.97363082e-02,  5.97124770e-02,
         -8.86607692e-02,  2.40378648e-01,  3.07722807e-01,
          2.07008332e-01, -2.63900757e-01,  5.23323640e-02,
         -4.59286347e-02,  1.06635876e-01,  2.35873014e-02,
         -7.74969459e-02,  4.22633663e-02,  2.2887757

In [None]:
# return_sequences = True     => 10 x 16
# return_state     = True     => hidden state 1 x 16


# voir return_sequences et return_states **************************************************
gru = GRU(units=16, return_sequences=True, return_state=True)

gru(input)

# the state is always equal to the values returned after processing the whole
# sequence

[<tf.Tensor: shape=(1, 10, 16), dtype=float32, numpy=
 array([[[-0.00455865, -0.03974918,  0.20154165, -0.09431591,
          -0.00615946, -0.23103425, -0.02805037, -0.04575203,
           0.0145016 , -0.05197488,  0.19050054,  0.22756319,
           0.01435653,  0.25703236, -0.01960542,  0.05950272],
         [ 0.23313503, -0.13895512, -0.05731003,  0.32369846,
           0.09982147, -0.03538036, -0.1609427 ,  0.02978718,
          -0.02560422, -0.04678731,  0.239238  ,  0.01927432,
          -0.09917975,  0.26365498, -0.01385885,  0.04363044],
         [ 0.3226183 , -0.14882174,  0.00483473,  0.29418123,
          -0.02612038,  0.00973784, -0.02595407, -0.05855656,
          -0.01852   , -0.08050199,  0.22272407,  0.19415647,
          -0.16394046,  0.29052866, -0.09591301,  0.0501655 ],
         [ 0.11655328, -0.12382317, -0.0524689 ,  0.1832883 ,
           0.16511336, -0.12490445, -0.2552358 ,  0.20736434,
           0.04958502,  0.05115388,  0.10890801, -0.10011427,
           0.

### LSTM --------------------------------------------

Last but not least let's see how to code an LSTM neuron, check the documentation:
[LSTM](https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM).

In [None]:
# return_sequences=False, return_state=False => retourne la dernière "ligne" => 1 x 16

from tensorflow.keras.layers import LSTM

lstm = LSTM(units=16, return_sequences=False, return_state=False)

lstm(input)

# it works mainly like GRU

<tf.Tensor: shape=(1, 16), dtype=float32, numpy=
array([[ 0.0437494 ,  0.25056836, -0.09685995, -0.00147228,  0.01740794,
        -0.06458052,  0.0883551 ,  0.02214705, -0.15541825,  0.048528  ,
        -0.14544794,  0.02113206, -0.05433153,  0.17134427,  0.36067814,
        -0.07412378]], dtype=float32)>

In [None]:
# cell state est celui qui est différent
# hidden state est identique à la dernière ligne


# return_sequences=True       => 10 x 16
# return_state=True           => 1 x 16 + 1 x 16

lstm = LSTM(units=16, return_sequences=True, return_state=True)

lstm(input)

# When using return_state, the layer returns
# 1 the output (sequence or not depending on return_sequences)
# 2 the hidden state which is equal to the final output
# 3 the cell state

[<tf.Tensor: shape=(1, 10, 16), dtype=float32, numpy=
 array([[[-0.05265173, -0.00422145, -0.06425991,  0.01683652,
          -0.03372828, -0.00222418,  0.04200807, -0.01008242,
           0.00220767, -0.01172854, -0.0423681 , -0.06866197,
          -0.00223496, -0.12223417, -0.1182522 ,  0.00558752],
         [ 0.03026772,  0.01093519,  0.07037749, -0.04660639,
           0.08601566,  0.06559689,  0.01094318, -0.12858419,
           0.02175834, -0.00668694,  0.031776  , -0.02510739,
          -0.09749302, -0.02072402,  0.02440318, -0.12334388],
         [-0.04334528,  0.0566337 ,  0.13668896, -0.07513575,
           0.04251193,  0.11802781,  0.07696208, -0.1397194 ,
          -0.01389325,  0.03883779,  0.02852535, -0.01484615,
          -0.11613293, -0.00967928,  0.04315626, -0.07691305],
         [ 0.02412329, -0.04597352,  0.08547155, -0.05710659,
           0.11342768, -0.03328148, -0.06315617, -0.08509945,
           0.05219471, -0.04882702,  0.08838362, -0.05723761,
           0.

Now that you know how to code the three different recurrent layers, let's look to build a recurrent neural network on text data.

## Build a recurrent network

Let's show you an example on some toy dataset

In [None]:
# on passe à une archi complète
# debut = copier coller d'hier
# tout en tensor flow

import io
import os
import re
import shutil
import string
import tensorflow as tf

from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Embedding, GlobalAveragePooling1D
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization

We'll use the same dataset we used for the embedding and word2vec demos which is the movie critique dataset.

In [None]:
url = "https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz"

dataset = tf.keras.utils.get_file("aclImdb_v1.tar.gz", url,
                                  untar=True, cache_dir='.',
                                  cache_subdir='')

# after dowloading the data we remove the unlabeled examples stored in the
# unsup folder
remove_dir = os.path.join("/content/aclImdb/train", 'unsup')
shutil.rmtree(remove_dir)

Downloading data from https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz


Now let's proceed to load the data into a batch generator

In [None]:
batch_size  = 128
seed        = 123                     # seed is mandatory here to prevent overlap between train and validation

train_ds = tf.keras.preprocessing.text_dataset_from_directory(
    'aclImdb/train',                  # path to the folder containing the text files
    batch_size=batch_size,            # the size of a batch of data
    validation_split=0.2,             # The proportion of data in the validation set
    subset='training',                # Forms the train set
    seed=seed)                        # similar to random_state

val_ds = tf.keras.preprocessing.text_dataset_from_directory(
    'aclImdb/train',
    batch_size=batch_size,
    validation_split=0.2,
    subset='validation',              # forms the validation set
    seed=seed)

Found 25000 files belonging to 2 classes.
Using 20000 files for training.
Found 25000 files belonging to 2 classes.
Using 5000 files for validation.


Let's take a look at a batch of data

In [None]:
for text_batch, label_batch in train_ds.take(1):
  for i in range(5):
    print(label_batch[i].numpy(), text_batch.numpy()[i])

1 b"I have watched this movie well over 100-200 times, and I love it each and every time I watched it. Yes, it can be very corny but it is also very funny and enjoyable. The camp shown in the movie is a real camp that I actually attended for 7 years and is portrayed as camp really is, a great place to spend the summer. Everyone who has ever gone to camp, wanted to go to camp, or has sent a child to camp should see this movie because it'll bring back wonderful memories for you and for your kids."
1 b'This movie is SOOOO funny!!! The acting is WONDERFUL, the Ramones are sexy, the jokes are subtle, and the plot is just what every high schooler dreams of doing to his/her school. I absolutely loved the soundtrack as well as the carefully placed cynicism. If you like monty python, You will love this film. This movie is a tad bit "grease"esk (without all the annoying songs). The songs that are sung are likable; you might even find yourself singing these songs once the movie is through. This m

There's some preprocessing to be done, it's possible to do it with spacy by loading all the texts in memory and removing stop words and lemmatize tokens, but the more memory friendly way to do this is to create a preprocessing layer.

In [None]:
# Create a custom standardization function to strip HTML break tags '<br />'.

def custom_standardization(input_data):
  # transform all characters to lowercase
  lowercase = tf.strings.lower(input_data)

  # remove all <br and /> strings
  stripped_html = tf.strings.regex_replace(lowercase, '<br />', ' ')

  # replace punctuation with empty string
  # [%s] % re.escape(string.punctuation) is a formatting syntax borrowed to see
  # [] creates a group, and the %s gets replaced by the content of
  # re.escape(string.punctuation) (the escaped punctuation characters)
  return tf.strings.regex_replace(stripped_html, '[%s]' % re.escape(string.punctuation), '')


# Vocabulary size and number of words in a sequence.
vocab_size      = 10_000
sequence_length =    100

# Use the text vectorization layer to normalize, split, and map strings to
# integers. Note that the layer uses the custom standardization defined above.
# Set maximum_sequence length as all samples are not of the same length.
vectorize_layer = TextVectorization(
    standardize             = custom_standardization,   # string tensor input -> string tensor output
    max_tokens              = vocab_size,               # int, keep only the vocab_size most common tokens
    output_mode             = 'int',                    # sets the type of encoding
    output_sequence_length  = sequence_length)          # truncates or pads sequences to a certain length

# Make a text-only dataset (no labels) and call adapt to build the vocabulary.
text_ds = train_ds.map(lambda x, y: x) # this is building a text only tf dataset
vectorize_layer.adapt(text_ds) # lists the vocab and the most common words

Now let's define a model including some recurrent neurons. Note that if you wish to stack recurrent layers you have to preserve the sequential nature of the data with `return_sequence=True`, the last recurrent may use `return_sequence=False` this will flatten the data so you can use dense layers afterwards.

In [None]:
# vocab_size = 10_000 => doit tenir compte de 9999 + 1 de padding

# En calcul, les couche RNN sont gourmandes
# C'est comme si on avait 100 couche dense... En gros

# parametre pour envoyer derner element de la seque
# return_sequence (defaut = True) si on met False il renvoie le
# Y a aussi un return state => renvoie hidden state et cell state (depend de RNN, GRU, LSTM...)

# return sequence = True car on va enchainer sur un 2nd  RNN
# La dernière couche de SimpleRNN il faut mettre return_sequences=False
# Pas de regle sur les 64 et 32 des units des SimpleRNN
#     64 analyse précise
#     32 analyse de synthetise

embedding_dim=32                                              # the dimensionality of the representation space
model = Sequential([
  vectorize_layer,                                            # This layers encodes the string as sequences of int
  Embedding(vocab_size, embedding_dim, name="embedding"),     # the embedding layer the input dim needs to be equal to the size of the vocabulary + 1 (because of the zero padding)
  SimpleRNN(units=64, return_sequences=True),                 # maintains the sequential nature
  SimpleRNN(units=32, return_sequences=False),                # returns the last output
  Dense(16, activation='relu'),                               # a dense layer
  Dense(1, activation="sigmoid")                              # the prediction layer
])

We need to compile the model so it can train on the data

In [None]:
model.compile(optimizer = 'adam',
              loss      = tf.keras.losses.BinaryCrossentropy(),
              metrics   = ['accuracy'])

We can now train the model

In [None]:
# Ca overfit à mort
# moins bon qu'hier
# le soucis vient de RNN

model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7fadb9c10df0>

In [None]:
# voir que le nombre params explose

# Retrouver le nombre de params
# vocab_size      = 10_000
# sequence_length =    100
# embedding_dim   =     32
# batch           =    128
# n_units = nb de neurones = nomb colonne
# n_ligne = n_seq


# Text_vectorization
# Pas de paramètre

# Embedding
# On presente une vecteur de 10_000
# On sort une matrice de 100 x 32
# Parmas = 32 * 10_000 = 320_000

# SimplRNN 3  - return sequence = True
# On présente une matrice 100 x 32
# On sort une matrice 100 x 64
# On a une 32 à multiplier par un vecteur de 64
# Les ht-1 sont à multipliers par les ht => 64 * 64
# 64 biais
# => 64 * (32 + 64 + 1)
# Params = 6_208 =  32 x 64 + 64 x 64 + 64 = 6_208

# SimplRNN 4 - return sequence = False
# On présente une matrice 100 x 64
# On sort un vecteur 32
# Params = 3_104 =  32 x 32 + 32 x 64 + 32 = 3_104

# dense
# On presente un vecteur de 32
# On sort un vecteur de 16
# Params = 16 * 32 + 16 = 528

# dense_1
# On presente un vecteur 16
# On sort 1
# Params = 1 * 16 + 1 = 17

model.summary()


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 text_vectorization (TextVe  (None, 100)               0         
 ctorization)                                                    
                                                                 
 embedding (Embedding)       (None, 100, 32)           320000    
                                                                 
 simple_rnn_3 (SimpleRNN)    (None, 100, 64)           6208      
                                                                 
 simple_rnn_4 (SimpleRNN)    (None, 32)                3104      
                                                                 
 dense (Dense)               (None, 16)                528       
                                                                 
 dense_1 (Dense)             (None, 1)                 17        
                                                        

There is a lot of overfitting! Let's try with the other two types of layers

In [None]:
# On change rien sauf GRU


embedding_dim=32 # the dimensionality of the representation space

model = Sequential([
  vectorize_layer, # This layers encodes the string as sequences of int
  Embedding(vocab_size, embedding_dim, name="embedding"), # the embedding layer
  # the input dim needs to be equal to the size of the vocabulary + 1 (because of
  # the zero padding)
  GRU(units=64, return_sequences=True), # maintains the sequential nature
  GRU(units=32, return_sequences=False), # returns the last output
  Dense(16, activation='relu'), # a dense layer
  Dense(1, activation="sigmoid") # the prediction layer
])

In [None]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=['accuracy'])

In [None]:
# Compliquer n'améliore pas

model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7fadb9c10e20>

Seems like using GRU instead of simpleRNN is helping the model a lot with the overfitting problem. Now let's compare this with LSTM.

In [None]:

embedding_dim = 32 # the dimensionality of the representation space

model = Sequential([
  vectorize_layer, # This layers encodes the string as sequences of int
  Embedding(vocab_size, embedding_dim, name="embedding"), # the embedding layer
  # the input dim needs to be equal to the size of the vocabulary + 1 (because of
  # the zero padding)
  LSTM(units=64, return_sequences=True), # maintains the sequential nature
  LSTM(units=32, return_sequences=False), # returns the last output
  Dense(16, activation='relu'), # a dense layer
  Dense(1, activation="sigmoid") # the prediction layer
])

In [None]:
model.compile(optimizer='adam',
              loss=tf.keras.losses.BinaryCrossentropy(),
              metrics=['accuracy'])

In [None]:
# N'améliore toujours pas encore

# En effet, on fait de la classification
# Pas besoin d'analyser l'ordre des mots
# On peut faire du sentiment "simple" sans tenir compte de l'ordre des mots

model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7fadc8311ea0>

It looks like the results we obtain for GRU and LSTM are quite comparable, they are both able to solve the overfitting problem which is probably due to the fact that the input data consists in long sequences.

## Conclusion
We conclude here that GRU and LSTM layers seem way better for supervised learning tasks than the simple RNN. If you are looking for other best practices for building recurrent neural network, this [blog post](https://danijar.com/tips-for-training-recurrent-neural-networks/) contains lots of great ideas for improving your results and getting better understanding overall of these types of models.