---
---
# RNN Introduction
---
---

## Importing required libraries





In [None]:
import tensorflow_datasets as tfds
import tensorflow as tf
import keras
from keras import layers
import matplotlib.pyplot as plt
import re
import string
from sklearn.model_selection import train_test_split

## Loading and preprocessing the data

We will be using the IMDB reviews dataset for our experiments. This dataset has 50,000
reviews in total, including training and testing splits. We will merge these splits and
sample our own, balanced training, validation and testing sets.

![IMDB Reviews](https://storage.googleapis.com/kaggle-datasets-images/1867959/3050570/26a882c248f15808ce3de926ee0eb1ed/dataset-cover.jpg?t=2022-01-16-17-13-19)

In [None]:
dataset = tfds.load(
    "imdb_reviews",
    split="train + test",
    as_supervised=True,
    batch_size=-1,
    shuffle_files=False,
)
reviews, labels = tfds.as_numpy(dataset)

print("Total examples:", reviews.shape[0])

Downloading and preparing dataset 80.23 MiB (download: 80.23 MiB, generated: Unknown size, total: 80.23 MiB) to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0...


Dl Completed...: 0 url [00:00, ? url/s]

Dl Size...: 0 MiB [00:00, ? MiB/s]

Generating splits...:   0%|          | 0/3 [00:00<?, ? splits/s]

Generating train examples...:   0%|          | 0/25000 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/imdb_reviews/plain_text/incomplete.473IB6_1.0.0/imdb_reviews-train.tfrecor…

Generating test examples...:   0%|          | 0/25000 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/imdb_reviews/plain_text/incomplete.473IB6_1.0.0/imdb_reviews-test.tfrecord…

Generating unsupervised examples...:   0%|          | 0/50000 [00:00<?, ? examples/s]

Shuffling /root/tensorflow_datasets/imdb_reviews/plain_text/incomplete.473IB6_1.0.0/imdb_reviews-unsupervised.…

Dataset imdb_reviews downloaded and prepared to /root/tensorflow_datasets/imdb_reviews/plain_text/1.0.0. Subsequent calls will reuse this data.
Total examples: 50000


## Split the data into train and test

In [None]:
# Convert the dataset to NumPy arrays
reviews, labels = tfds.as_numpy(dataset)

# Print total examples
print("Total examples:", reviews.shape[0])

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(reviews, labels, test_size=0.2, random_state=42)

# Print shapes of the train and test sets
print("Train set shape:", X_train.shape)
print("Test set shape:", X_test.shape)

Total examples: 50000
Train set shape: (40000,)
Test set shape: (10000,)


### Text Handling

To prepare text data for use with a Recurrent Neural Network (RNN) in TensorFlow, you need to convert the raw text into numerical format that the network can understand. This process typically involves tokenizing the text into words or subwords and then converting these tokens into numerical indices. These indices are then used to construct embeddings or directly fed into the RNN. TensorFlow provides utilities like tf.keras.preprocessing.text.Tokenizer for tokenizing the text and tf.keras.preprocessing.sequence.pad_sequences for padding sequences to a consistent length.

First, you'll use the Tokenizer to convert the text reviews into sequences of integers.

**Tokenization** (You will study more about text in week 7)

After ensuring the data is correctly formatted as a list of strings, you can proceed with the tokenization as you originally planned.

In [None]:
# Check if the first element is a bytes object
if isinstance(X_train[0], bytes):
    X_train = [x.decode('utf-8') for x in X_train]
    X_test = [x.decode('utf-8') for x in X_test]

In [None]:
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences

# Number of words to consider as features
max_features = 10000  # This is the vocab size

# Create the tokenizer with the top max_features words
tokenizer = Tokenizer(num_words=max_features)

# Fit the tokenizer on the training data
tokenizer.fit_on_texts(X_train)

# Convert text to sequences of integers
X_train_seq = tokenizer.texts_to_sequences(X_train)
X_test_seq = tokenizer.texts_to_sequences(X_test)

**Pad the Sequences**

Since neural networks require inputs to be the same size, you’ll need to pad the sequences to ensure they all have the same length.

In [None]:
# Define the maximum length of sequences. You can set this to be the length of the longest sequence or shorter to trim long reviews
max_length = 100

# Pad the sequences to have the same length
X_train_pad = pad_sequences(X_train_seq, maxlen=max_length)
X_test_pad = pad_sequences(X_test_seq, maxlen=max_length)

## Modeling

Recurrent Neural Networks (RNNs) are a class of neural networks that are powerful for modeling sequence data such as time series or natural language. RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being dependent on the previous computations. Unlike traditional neural networks, RNNs have a memory that captures information about what has been calculated so far, effectively building a sense of time into the model.

### How RNNs Work

1. **Loop Mechanism**: At each step of a sequence, an RNN takes in an input \( x_t \) and outputs \( y_t \). The output at each step is influenced by a "memory" (hidden state) that contains information from previous inputs. This hidden state \( h_t \) is updated at each step of the sequence.

2. **Hidden State**: The hidden state \( h_t \) is computed based on the previous hidden state and the current input: \( h_t = f(W \cdot h_{t-1} + U \cdot x_t) \), where \( f \) is a non-linear activation function, and \( W \) and \( U \) are parameters (weights) of the network.

3. **Output**: The output at each step \( y_t \), in some configurations, is then computed based on the hidden state. However, in other configurations, the output might only be generated at the end of the sequence.

### Applications of RNNs

RNNs are particularly useful in fields such as:

- **Natural Language Processing (NLP)**: For tasks like sentiment analysis, language translation, and text generation.
- **Speech Recognition**: Translating spoken language into text.
- **Time Series Prediction**: Such as stock market forecasting or disease outbreak predictions.

### Further Reading

For those interested in a deeper understanding or implementation details of RNNs, consider the following resources:

- [Recurrent Neural Networks in TensorFlow](https://www.tensorflow.org/guide/keras/rnn): Official TensorFlow documentation on implementing RNNs using Keras.

These resources will provide both theoretical insights and practical guidelines to effectively utilize RNNs in various applications.

In [None]:
from keras.models import Sequential
from keras.layers import Embedding, SimpleRNN, Dense

# Define the RNN model
model = Sequential()
model.add(Embedding(max_features, 32, input_length=max_length))
model.add(SimpleRNN(32))  # Simple RNN layer with 32 units
model.add(Dense(1, activation='sigmoid'))  # Output layer for binary classification

# Compile the model
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])

# Print model summary
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 100, 32)           320000    
                                                                 
 simple_rnn (SimpleRNN)      (None, 32)                2080      
                                                                 
 dense (Dense)               (None, 1)                 33        
                                                                 
Total params: 322113 (1.23 MB)
Trainable params: 322113 (1.23 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________


Recurrent Neural Networks (RNNs) are specially designed to work with sequence data. The training process and challenges like the vanishing gradient problem are unique aspects of RNNs that stem from their architecture and the method used for training, known as **Backpropagation Through Time (BPTT)**.

### RNN Training: Backpropagation Through Time (BPTT)

![alt text](https://miro.medium.com/v2/resize:fit:1400/1*jNs4SDkMVQOAF21y2KWK4Q.gif)

1. **Sequential Data Handling**: Unlike feedforward neural networks, RNNs process data sequentially, maintaining a state from one timestep to the next. This state (hidden state) captures information about previous inputs, allowing the network to exhibit dynamic temporal behavior.

2. **BPTT Explained**: To train RNNs, a technique called Backpropagation Through Time is used. Essentially, BPTT unrolls the RNN for the number of timesteps in the input sequence, creating a feedforward network where each layer corresponds to a timestep. The network is then trained as usual with backpropagation, but since the network is unrolled, the same weights are used at each timestep/layer.

3. **Weight Updates**: After computing the output, the error is calculated and propagated backward through the network, updating weights at each timestep in reverse order. This process involves taking the derivative of the loss function with respect to the weights, considering the contribution of each weight at every timestep where it was involved.

In [None]:
# Train the model
history = model.fit(X_train_pad, y_train, epochs=10, validation_split=0.2)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [None]:
review_list = ["amazing", "such a horriblely bad cast"]
print(review_list)
reviews = tokenizer.texts_to_sequences(review_list)
review_pad = pad_sequences(reviews, maxlen=max_length)

['amazing', 'such a horriblely bad cast']


In [None]:
# Make prediction
prediction = model.predict(review_pad)

# Assuming a binary classification for simplicity: 0 for negative, 1 for positive
predicted_class = (prediction > 0.5).astype(int).squeeze().tolist()  # Using 0.5 as the threshold
for idx, i in enumerate(review_list):
  print("Predicted class for{",i , "} is", predicted_class[idx])

Predicted class for{ amazing } is 1
Predicted class for{ such a horriblely bad cast } is 1


### Challenges in RNN Training

1. **Vanishing Gradient Problem**: During BPTT, as the gradient of the loss is backpropagated through the timesteps, it can get multiplied repeatedly by the network's weights. If these weights are small, the gradient can diminish exponentially as it propagates backward through the layers. This makes it very hard for the network to learn long-range dependencies within the input data because earlier layers train very slowly, if at all.

2. **Exploding Gradient Problem**: Conversely, if the weights are large, the gradient can grow exponentially during backpropagation, which can lead to unstable training dynamics and wildly fluctuating weights. This can make the learning process diverge.

### Variants of RNNs

To overcome some of these challenges, several variants of RNNs have been developed:

- **LSTM (Long Short-Term Memory)**: LSTMs are designed to avoid the vanishing gradient problem and can remember information for long periods of time.
- **GRU (Gated Recurrent Units)**: GRUs are a simpler variant of LSTMs that often provide similar performance and are computationally more efficient.

**These variants will be explored later in the course in the NLP section**