# Sequential Data Modeling using Recurrent Neural Networks (RNNs)

In many real-world problems, data does not exist in isolation.  
Instead, it arrives **as a sequence**, where the meaning of the current input depends on what came before it.

Examples include:
- Words in a sentence
- Daily temperature readings
- Stock prices over time
- Sensor signals

Traditional feed-forward neural networks process inputs independently and therefore **fail to capture these dependencies**.

To handle such data, we use **Recurrent Neural Networks (RNNs)** — models designed to work with sequential information by maintaining an internal memory of past inputs.

This notebook introduces the core idea of **sequential data modeling** and explains how RNNs learn patterns that unfold over time.


## Sequential Data and the Need for Specialized Models

In many practical machine learning problems, the data is naturally ordered, and this order carries meaning. Such data is known as sequential data. In sequential data, each element is connected to the previous ones, and understanding the current element often requires knowledge of what came before it. This dependency on past information is what distinguishes sequential data from standard fixed-size inputs.

Consider a sentence in natural language. The meaning of a word depends heavily on the words that precede it. Similarly, in time-series data such as temperature readings or stock prices, the current value is influenced by earlier observations. Treating each input independently in these cases leads to a loss of context and results in poor modeling of the underlying pattern.

Traditional feed-forward neural networks are not designed to handle this type of dependency. They process each input in isolation and do not retain any information once the output is produced. As a result, they are unable to capture temporal relationships or long-term dependencies present in sequential data.

To address this limitation, models are required that can remember past inputs while processing new ones. Recurrent Neural Networks were developed specifically for this purpose. They introduce the concept of a hidden state, which acts as a memory that is updated at each step of the sequence. This allows the network to incorporate information from previous time steps when making predictions, making RNNs suitable for sequential data modeling.


## Recurrent Neural Networks (RNNs)
Recurrent Neural Networks are a class of neural networks specifically designed to model sequential data. Unlike feed-forward neural networks, which assume that all inputs are independent of each other, RNNs are built on the idea that past information can influence the processing of current input. This makes them suitable for problems where the order of data matters.

The defining feature of an RNN is its ability to maintain an internal memory, known as the hidden state. When a sequence is processed, the network takes one element at a time and updates this hidden state at every step. The hidden state acts as a summary of all the information the network has seen so far in the sequence. As new inputs arrive, this memory is continuously updated rather than discarded.


## How Recurrent Neural Networks Work 


**Step 1: Learning from Data Like Traditional Neural Networks**

Recurrent Neural Networks, like feedforward neural networks and convolutional neural networks, learn patterns from training data. They rely on forward propagation to generate outputs and use gradient-based optimization techniques to adjust their weights during training. In this sense, RNNs follow the same learning principles as other neural network architectures.

**Step 2: Introducing Memory into the Network**

The key difference between recurrent neural networks and traditional networks is the presence of memory. While feedforward and convolutional networks assume that all inputs are independent, RNNs are designed to work with data where previous inputs influence the current output. This memory allows RNNs to capture dependencies that unfold over time.

**Step 3: Processing Data as a Sequence**

RNNs process data one element at a time rather than all at once. Each element in the input sequence is handled in a specific order. The output at any given time step depends not only on the current input but also on information from earlier elements in the sequence. This makes RNNs suitable for tasks involving text, speech, and time-series data.

**Step 4: Understanding the Role of the Hidden State**

At the heart of an RNN is the hidden state. The hidden state acts as a memory that stores information about what the network has seen so far. After processing an input at one time step, the hidden state is passed to the next time step. This creates a feedback loop that allows contextual information to flow through the sequence.

**Step 5: Combining Current Input with Past Information**

At each time step, the RNN takes two inputs: the current data point and the hidden state from the previous time step. These inputs are combined using shared weights and passed through an activation function to produce a new hidden state. This updated hidden state reflects both the current input and the accumulated context from earlier inputs.

**Step 6: Preserving Word Order and Context**

To understand why order matters, consider the phrase “feeling under the weather.” The meaning of this idiom depends entirely on the words appearing in a specific sequence. An RNN processes each word sequentially and uses its hidden state to remember earlier words, allowing it to correctly interpret or predict the next word in the phrase.

**Step 7: Sharing Parameters Across Time Steps**

Another defining feature of RNNs is parameter sharing. The same set of weights is used at every time step of the sequence. Unlike feedforward networks, where each layer has different weights, RNNs reuse the same weights repeatedly. This enables the network to generalize across sequences of different lengths while keeping the model efficient.

**Step 8: Training Using Backpropagation Through Time**

Recurrent Neural Networks are trained using a technique called Backpropagation Through Time (BPTT). This method is an extension of traditional backpropagation. Because the same parameters are used across multiple time steps, errors are computed at each time step and then summed before updating the weights. This allows the network to learn how earlier inputs influence later outputs in the sequence.

![](https://assets.ibm.com/is/image/ibm/what-are-recurrent-neural-networks-combined:1x1?dpr=on%2C1.25&wid=512&hei=512)

## Activation Functions in RNNs

Activation functions control how information flows through a recurrent neural network and how the hidden state is updated at each time step. The choice of activation function affects gradient stability and the model’s ability to learn long-term dependencies.

The **Sigmoid function** is commonly used when outputs need to be interpreted as probabilities or when controlling information flow, such as in gating mechanisms. However, sigmoid is prone to the vanishing gradient problem, which limits its effectiveness for long sequences.

The **Tanh (hyperbolic tangent) function** is often preferred in RNNs because its outputs are centered around zero. This improves gradient flow and makes learning long-term dependencies easier compared to sigmoid.

The **ReLU activation function** allows stronger gradient flow for positive inputs but can lead to exploding gradients due to its unbounded nature. To reduce this risk, variants such as Leaky ReLU and Parametric ReLU are sometimes used in recurrent models.


## Example: Sequential Data Modeling with a Simple RNN
We will build a simple RNN that takes a sequence of numbers as input and predicts a single output class.
This mirrors real-world tasks such as sentiment analysis or time-series classification, where the final decision depends on the entire sequence.

### Step 1: Import Required Libraries

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim


### Step 2: Create a Toy Sequential Dataset

Each input is a sequence of length 5, and the label depends on the overall pattern of the sequence.

In [3]:
# Input: batch_size x sequence_length x input_size
X = torch.tensor([
    [[1.0], [2.0], [3.0], [4.0], [5.0]],
    [[5.0], [4.0], [3.0], [2.0], [1.0]]
])

# Output labels (many-to-one)
y = torch.tensor([1, 0])


### Step 3: Define the RNN Model


In this step, we define the architecture of our Recurrent Neural Network. The model consists of two main components: a recurrent layer that processes sequential data and a fully connected layer that produces the final output.

We define the model by creating a class that inherits from `nn.Module`. This is a standard practice in PyTorch and allows the model to automatically track parameters and gradients during training.

The constructor of the class initializes the layers used in the network. The RNN layer is responsible for processing the input sequence one time step at a time. It takes three important parameters: the size of each input element, the size of the hidden state, and whether the input data is provided in batch-first format. Setting `batch_first=True` means the input tensor is expected in the shape `(batch_size, sequence_length, input_size)`, which is intuitive and commonly used.

The recurrent layer outputs two values. The first is a sequence of hidden states, one for each time step. The second is the final hidden state of the sequence. In this model, we focus on the hidden states produced at each time step rather than directly using the final hidden state.

After the RNN layer, a fully connected (linear) layer is defined. This layer maps the hidden representation produced by the RNN to the desired output size. In a sequence classification task, this layer converts the learned temporal representation into class scores.

The forward method defines how data flows through the model. When an input sequence is passed to the RNN, it is processed sequentially, and hidden states are generated for each time step. From this sequence of hidden states, only the hidden state corresponding to the final time step is selected. This final hidden state contains information accumulated from the entire sequence and is therefore suitable for making a prediction.

The selected hidden state is then passed through the fully connected layer to produce the final output. This output represents the model’s prediction based on the full sequence rather than any single input element.

Overall, this model follows a many-to-one architecture, where a sequence of inputs is mapped to a single output. The recurrent layer captures temporal dependencies, while the fully connected layer translates those learned dependencies into a prediction.


In [5]:
class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleRNN, self).__init__()
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True)
        self.fc = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        out, hidden = self.rnn(x)
        last_hidden = out[:, -1, :]
        output = self.fc(last_hidden)
        return output


### Step 4: Initialize Model, Loss, and Optimizer

In [6]:
model = SimpleRNN(input_size=1, hidden_size=8, output_size=2)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)


### Step 5: Train the RNN
During training, the model repeatedly processes the input sequences, compares its predictions with the true labels, and adjusts its internal parameters to reduce prediction errors.

The training loop runs for a fixed number of epochs. An epoch represents one complete pass through the training dataset. Multiple epochs are required because neural networks do not learn optimal parameters in a single pass; instead, they gradually improve through repeated exposure to the data.

At the beginning of each epoch, the gradients stored from the previous iteration are cleared. This is necessary because PyTorch accumulates gradients by default. If gradients are not reset, updates from earlier epochs would incorrectly influence the current update step.

Next, the input sequences are passed through the model using a forward pass. During this step, the RNN processes the sequence one time step at a time, updating its hidden state internally and finally producing an output based on the last hidden state. This output represents the model’s current prediction for the entire sequence.

The predicted output is then compared with the true labels using a loss function. The loss function quantifies how far the model’s predictions are from the correct answers. A lower loss indicates better performance, while a higher loss indicates larger prediction errors.

Once the loss is computed, backpropagation is performed by calling the backward function. In recurrent neural networks, this step internally applies Backpropagation Through Time. Errors are propagated backward across all time steps of the sequence, allowing the shared weights of the RNN to be updated based on their contribution to the final error.

After gradients are computed, the optimizer updates the model’s parameters. The optimizer uses the gradients and the learning rate to make small adjustments to the weights, with the goal of reducing the loss in the next iteration.

This process of forward pass, loss computation, backward pass, and parameter update is repeated for every epoch. Over time, the RNN learns to capture sequential patterns and improves its predictions by effectively using information from earlier time steps in the sequence.

In [7]:
for epoch in range(200):
    optimizer.zero_grad()
    
    outputs = model(X)
    loss = criterion(outputs, y)
    
    loss.backward()
    optimizer.step()
    
    if (epoch + 1) % 50 == 0:
        print(f"Epoch [{epoch+1}/200], Loss: {loss.item():.4f}")


Epoch [50/200], Loss: 0.0127
Epoch [100/200], Loss: 0.0033
Epoch [150/200], Loss: 0.0018
Epoch [200/200], Loss: 0.0011


### Step 6: Test the Model

In [8]:
with torch.no_grad():
    predictions = torch.argmax(model(X), dim=1)
    print("Predictions:", predictions)


Predictions: tensor([1, 0])


When the model is tested, it produces a tensor of values for each input sequence. These values are called logits. Each logit corresponds to a class, and larger values indicate higher confidence. However, logits themselves are not probabilities and should not be interpreted directly as predictions.

To convert these logits into a final prediction, we apply an operation such as `argmax`. This selects the index of the largest value in the output tensor, which corresponds to the class the model believes is most likely. The resulting output is a class label rather than raw scores.

Even after converting logits into class labels, the output alone does not explain how the model arrived at its decision. At this stage, the model is effectively acting as a black box. It has learned patterns across the sequence, but those patterns are embedded within its hidden states and weight parameters.

To better understand and validate the model’s behavior, several steps can be taken after testing. One approach is to compare the predicted labels with the true labels to measure accuracy. Another is to test the model on new, unseen sequences to evaluate how well it generalizes. Inspecting the loss value during training can also help determine whether the model has learned meaningful patterns or is underfitting or overfitting.

In [9]:
correct = (predictions == y).sum().item()
accuracy = correct / y.size(0)
print("Accuracy:", accuracy)


Accuracy: 1.0


The printed accuracy value of 1.0 indicates that the model has correctly predicted all labels in the test data. In numerical terms, this means that every predicted class label exactly matches the corresponding true label.

## Experiment: Does the RNN Really Learn Sequence Order?

In [10]:
# Original sequence
original_sequence = torch.tensor([[[1.0], [2.0], [3.0], [4.0], [5.0]]])

# Reversed sequence
reversed_sequence = torch.tensor([[[5.0], [4.0], [3.0], [2.0], [1.0]]])

with torch.no_grad():
    original_pred = torch.argmax(model(original_sequence), dim=1)
    reversed_pred = torch.argmax(model(reversed_sequence), dim=1)

print("Prediction for original sequence:", original_pred.item())
print("Prediction for reversed sequence:", reversed_pred.item())


Prediction for original sequence: 1
Prediction for reversed sequence: 0


## Verifying Sequential Learning Through Order Sensitivity

To confirm that the recurrent neural network is truly modeling sequential data, we perform a simple but powerful experiment. Instead of changing the values in the input, we change only the order in which those values appear. This allows us to test whether the model is sensitive to sequence order.

The original and reversed sequences contain the same elements, but arranged differently. If the model were treating inputs independently, changing the order would not significantly affect the prediction. However, in a properly functioning RNN, the order of inputs plays a critical role in shaping the hidden state at each time step.

When the model processes the original sequence, the hidden state evolves in a specific manner as information flows from earlier to later time steps. Reversing the sequence changes this flow entirely, leading to a different final hidden state and, consequently, a different prediction.

If the predictions for the original and reversed sequences are different, it confirms that the RNN is learning and using temporal dependencies rather than relying solely on the input values. This behavior is a defining characteristic of sequential data modeling.

This experiment demonstrates that the RNN does not simply memorize values but instead learns how information accumulates over time. Such sensitivity to order is essential for tasks like language modeling, speech recognition, and time-series forecasting.


## Task for the Reader

1. Modify the input sequences by increasing their length and observe how the model’s predictions change. Analyze whether the RNN is still able to capture meaningful patterns as the sequence becomes longer.

2. Shuffle the order of elements within a sequence and compare the predictions with those obtained from the original sequence. Explain how and why the change in order affects the output.

3. Print and examine the hidden state at each time step for a given input sequence. Describe how the hidden state evolves as new elements are processed and how it reflects accumulated sequence information.

4. Replace the RNN activation function with a different one, such as ReLU or Tanh, and observe its effect on training stability and performance.