<a href="https://colab.research.google.com/github/PaolaMaribel18/hands-on-2023A/blob/master/notebooks/11_introRNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 11 Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are a class of artificial neural networks designed to process sequential data, where the order of data points matters. Unlike feedforward neural networks, RNNs have connections that form loops, allowing information to persist and be passed from one step to the next. This capability makes RNNs well-suited for tasks involving time series, natural language processing, speech recognition, video analysis, and more.

By leveraging their recurrent connections and hidden state, RNNs excel at capturing temporal dependencies in sequential data. However, training RNNs effectively remains a challenge, particularly for long sequences. Various advanced RNN variants, such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), have been introduced to address the vanishing and exploding gradient problem and improve the modeling capabilities of RNNs for a wider range of applications.

### Key Concepts of RNNs:
1. _Recurrent Connections:_
The defining feature of RNNs is the presence of recurrent connections, which allow information to be retained and propagated through time. In each step of the sequence, an RNN processes the current input along with the hidden state from the previous step, updating the hidden state accordingly. This feedback mechanism enables RNNs to learn dependencies and patterns across different time steps.
2. _Hidden State:_
The hidden state of an RNN acts as its memory and captures relevant information from the past. It is continuously updated at each time step and serves as an internal representation of the input sequence.
3. _Training Challenges:_
Training RNNs can be challenging due to the vanishing and exploding gradient problem. During backpropagation through time (BPTT), gradients can either become too small (vanish) or too large (explode), leading to poor convergence or training instability. This phenomenon occurs when the network has to propagate information over long sequences, and it becomes difficult for the gradients to accurately propagate back to the initial time steps.

### RNN Architectures:
There are different variations of RNN architectures, including Elman RNN, Jordan RNN, and bidirectional RNNs.
1. _Elman RNN:_ In an Elman RNN, the hidden state is fed back to the network's input at the next time step, creating a simple feedback loop. This architecture is suitable for many sequential tasks but can suffer from vanishing gradients for long sequences.
2. _Jordan RNN:_ In a Jordan RNN, the hidden state is fed back to the network's output at the current time step. This type of architecture can be useful for specific problems but is less commonly used compared to Elman RNNs and other more advanced RNN variants.
3. _Bidirectional RNNs:_ Bidirectional RNNs process the input sequence in both forward and backward directions, allowing the model to consider future information as well. This is particularly useful for tasks where context from both past and future elements is essential, such as speech recognition and machine translation.

### Exercise
Use the IMDB movie reviews dataset to perform sentiment analysis with a simple RNN.

In [5]:
from keras.datasets import imdb
from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Embedding, SimpleRNN, Dense, Bidirectional
from tensorflow.keras.preprocessing.sequence import pad_sequences


1. Load the IMDB movie reviews dataset

In [2]:
max_features = 5000  # Number of words to consider as features
max_len_short = 100  # Maximum sequence length for short sequences
max_len_long = 500   # Maximum sequence length for long sequences

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz


2.  Pad sequences to a fixed length for RNN input

In [6]:
x_train_short = pad_sequences(x_train, maxlen=max_len_short)
x_test_short = pad_sequences(x_test, maxlen=max_len_short)

x_train_long = pad_sequences(x_train, maxlen=max_len_long)
x_test_long = pad_sequences(x_test, maxlen=max_len_long)

3. Build the RNN model

In [7]:
def build_rnn_model():
    model = Sequential()
    model.add(Embedding(max_features, 32))
    model.add(SimpleRNN(32, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    return model

4. Train and evaluate the RNN model

In [8]:
def train_and_evaluate_model(model, x_train, y_train, x_test, y_test):
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    history = model.fit(x_train, y_train, epochs=5, batch_size=128, validation_split=0.2)
    loss, accuracy = model.evaluate(x_test, y_test)
    return loss, accuracy, history

5. Train and evaluate RNN on short and long sequences

In [9]:
print("\nTraining SimpleRNN model on short sequences:")
rnn_model_short = build_rnn_model()
loss_short, accuracy_short, history_short = train_and_evaluate_model(
    rnn_model_short, x_train_short, y_train, x_test_short, y_test
)

print("\nTraining SimpleRNN model on long sequences:")
rnn_model_long = build_rnn_model()
loss_long, accuracy_long, history_long = train_and_evaluate_model(
    rnn_model_long, x_train_long, y_train, x_test_long, y_test
)


Training SimpleRNN model on short sequences:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5

Training SimpleRNN model on long sequences:
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


6. Compare the results

In [10]:
print("\nResults on Short Sequences:")
print(f"Loss: {loss_short:.4f}, Accuracy: {accuracy_short:.4f}")

print("\nResults on Long Sequences:")
print(f"Loss: {loss_long:.4f}, Accuracy: {accuracy_long:.4f}")


Results on Short Sequences:
Loss: 0.3951, Accuracy: 0.8360

Results on Long Sequences:
Loss: 0.3798, Accuracy: 0.8491


The comparison between the results on short and long sequences should illustrate the challenges of training RNNs. You should observe that the RNN performs relatively well on short sequences but may struggle to generalize effectively on long sequences. This is due to the vanishing/exploding gradient problem.

The vanishing gradient problem occurs when the gradients become very small as they propagate through time steps during training. Consequently, the model cannot learn long-term dependencies effectively, which negatively impacts performance on long sequences. On the other hand, the exploding gradient problem results in extremely large gradients, causing training instability.

You should experiment further by adjusting the sequence lengths, training epochs, and other hyperparameters to observe how the performance changes and to see the impact of the vanishing/exploding gradient issues more clearly.