<div class="alert alert-success"><h1>Building a Recurrent Neural Network in Python using Keras</h1></div>

Recurrent Neural Networks (RNNs) are a class of neural networks designed to work with sequential data by maintaining a memory of previous inputs. They are commonly used in natural language processing (NLP), time series forecasting, and speech recognition. In this tutorial, we will build an RNN using Keras and TensorFlow to classify movie reviews as either positive or negative. This is known as **Sentiment Analysis**. Sentiment analysis is a fundamental NLP task used in various applications, such as customer feedback analysis, social media monitoring, and opinion mining.

## Learning Objectives
By the end of this tutorial, you will:
+ Understand the basics of RNNs and their architecture.
+ Learn how to preprocess text data for deep learning models.
+ Build a simple RNN using Keras and TensorFlow.
+ Train and evaluate the RNN model.

## Prerequisites
Before we begin, ensure you have:

+ Basic knowledge of Python programming (variables, functions, classes).
+ Familiarity with fundamental machine learning concepts (datasets, training/testing, overfitting).
+ A Python (version 3.x) environment with the `tensorflow` and `keras` packages installed.

<div class="alert alert-info"><b>Note:</b> For further insights into deep learning and model building with Keras and TensorFlow, consider exploring the LinkedIn Learning course <b>"Deep Learning with Python: Foundations"</b>.</div>

<div class="alert alert-success"><h2>1. Import the Libraries</h2></div>

To start, we'll import the necessary libraries that we'll use throughout this tutorial.

In [None]:
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers

The libraries we imported are:
+ **tensorflow:** An open-source platform for machine learning developed by Google.
+ **keras:** A high-level API for building and training deep learning models, integrated within TensorFlow.
+ **layers:** A module in Keras containing various types of neural network layers.

To ensure reproducibility of my code, so you obtain the same results as my example, I will set a random initialization seed using the `keras.utils.set_random_seed()` function. This will make the weights and biases initialized during the training process deterministic, ensuring consistent results across different runs. However, in practice, setting a seed is not typically necessary.

In [None]:
keras.utils.set_random_seed(1234)

<div class="alert alert-success"><h2>2. Load the Data</h2></div>

For this tutorial, we will use the **IMDB movie reviews dataset**, a widely used benchmark dataset for sentiment classification. This dataset consists of 50,000 movie reviews labeled as either positive (1) or negative (0). The dataset is split into:

+ 25,000 training samples used to teach our model.
+ 25,000 test samples used to evaluate how well our model generalizes to unseen data.

Each review is preprocessed and represented as a sequence of integer indices corresponding to words in a predefined vocabulary. Our goal is to train an RNN model that can learn the sentiment patterns in the text, evaluate the model’s performance using accuracy metrics, and make predictions on new, unseen movie reviews.

In [None]:
max_features = 10000
(train_data, train_labels), (test_data, test_labels) = keras.datasets.imdb.load_data(num_words = max_features)

The IMDB dataset contains many unique words, but not all of them are necessary for training a model. Some words occur very infrequently, and including them may introduce noise without adding much predictive power to the model. By setting `max_features = 10000`, we limit our feature set to the 10,000 most frequently occurring words.

<div class="alert alert-success"><h2>3. Preprocess the Data</h2></div>

Reviews in the IMDB dataset vary in length, with some being very short and others very long. Recurrent Neural Network's require fixed-size inputs, so we standardize the length of each review to 500 words. If a review is longer than 500 words, we truncate it to keep only the first 500 words. However, if a review is shorter than 500 words, we pad it with zeros so that every review is exactly 500 words.

In [None]:
from tensorflow.keras.preprocessing.sequence import pad_sequences

max_len = 500
train_data = pad_sequences(train_data, maxlen = max_len)
test_data = pad_sequences(test_data, maxlen = max_len)

<div class="alert alert-success"><h2>4. Define the Model Architecture</h2></div>

Next, we define the structure of the model. The first layer is an embedding layer that learns a meaningful representation for each of the input words (or tokens). 

The next layer accepts the dense numerical representation of the input words (embeddings) and processes them in sequence. This layer and the next have 32 recurrent neurons, which means that 32 values are processed during each time step. Setting `return_sequences = True` in the first layer ensures that the hidden states for all time steps are passed to the next RNN layer. 

The final layer (dense layer) makes the final prediction (positive or negative sentiment).Since this is a binary classification problem (positive/negative sentiment), a sigmoid function is ideal because it outputs a probability. If the probability is $\gt$ 0.5, the review is predicted as positive (1). However, if the probability is $\leq$ 0.5, the review is predicted as negative (0).

In [None]:
model = keras.Sequential([
    layers.Embedding(input_dim = max_features, output_dim = 128),
    layers.SimpleRNN(32, return_sequences = True),
    layers.SimpleRNN(32),
    layers.Dense(1, activation = 'sigmoid')
])

Note that this model is a basic RNN-based sentiment classifier. In real-world applications, we can improve performance by using a Long Short-Term Memory Network (LSTM) or Gated Recurrent Unit (GRU), both of which handle long-term dependencies more effectively. We will learn more about these architectures later in the course.

<div class="alert alert-success"><h2>5. Compile and Train the Model</h2></div>

In [None]:
model.compile(optimizer = 'adam',
              loss = 'binary_crossentropy',
              metrics = ['accuracy'])

To train the model, we call the `fit()` method and specify the training data, training labels, number of epochs (the number of times the model will iterate over the entire training dataset), batch size (the number of samples processed before the model is updated), and validation split (the fraction of the training data to use for validation).

In [None]:
history = model.fit(train_data, train_labels,
                    epochs = 10,
                    batch_size = 128,
                    validation_split = 0.2)

<div class="alert alert-success"><h2>6. Evaluate the Model</h2></div>

Using the trained model, we can evaluate how well it performs against the test data. This step provides an objective measurement of our model’s generalization ability - that is, how well it classifies movie reviews it has never seen before.

In [None]:
test_loss, test_accuracy = model.evaluate(test_data, test_labels)
print(f"Test Accuracy: {test_accuracy:.4f}")

The result shows that the model correctly classified 79.2% of the test movie reviews.

Let's take a look at the model's sentiment prediction for the first 5 movie reviews in the test data.

In [None]:
num_samples = 5
selected_indices = np.arange(num_samples)

for i, index in enumerate(selected_indices):
    sample_review = test_data[index].reshape(1, max_len)
    predicted_prob = model.predict(sample_review)[0, 0]
    predicted_label = "Positive" if predicted_prob > 0.5 else "Negative"
    print(f"Movie Review #{i + 1}:")
    print(f"Predicted Sentiment: {predicted_label} (Score: {predicted_prob:.4f})")

The predictions are probability scores that go between 0 and 1. If the score is greater than 0.5, the review is classified as **Positive**. However if the score is 0.5 or less, the review is classified as **Negative**.