#  Bidirectional RNN

### Bidirectional RNN: Introduction and Need

Today, we'll dive into **Bidirectional Recurrent Neural Networks (Bidirectional RNNs)**, which can be applied to RNNs, LSTM RNNs, or GRU RNNs. First, let's understand why we need Bidirectional RNNs and then move on to their architecture.

#### Simple RNNs: Overview

In a simple RNN or LSTM RNN:
- **Input**: We pass sequential data into the network.
- **Output**: The RNN generates outputs at each timestep.
- **Hidden States**: These capture the context from previous timesteps.

When unrolled over time, the RNN architecture looks like a series of hidden states connected by weights, with inputs processed sequentially from \( t = 1 \) to \( t = T \).

For example, in a sentiment analysis task, we might want to predict the sentiment of a sentence. The output of the RNN after processing all words would be the final sentiment prediction, often using a sigmoid activation function for binary classification.

#### Types of RNNs

![b4sus.jpg](attachment:b798170e-0c96-4550-9bf5-9f3fb0b6bace.jpg)

*[Image Source](https://i.sstatic.net/b4sus.jpg)*

There are different types of RNNs based on input-output configurations:
1. **One-to-One RNN**: Single input, single output.
2. **One-to-Many RNN**: Single input, multiple outputs (e.g., Image Captioning).
3. **Many-to-One RNN**: Multiple inputs, single output (e.g., Sentiment Analysis).
4. **Many-to-Many RNN**: Multiple inputs, multiple outputs (e.g., Language Translation).

These configurations help address various problem statements effectively.

#### The Need for Bidirectional RNNs

Let's consider a scenario where we want to predict a missing word in a sentence:

- **Example 1**: "Krish eats __ in Bangalore."
- **Example 2**: "Krish eats __ in Paris."

In both cases, the missing word depends not only on the previous context ("Krish eats") but also on the subsequent context ("in Bangalore" or "in Paris"). A simple RNN processes the sentence from left to right, capturing only the previous context. However, the prediction for the missing word could be significantly improved if we also consider the words following the missing word.

For instance:
- In Bangalore, "dosa" might be the missing word.
- In Paris, "croissant" or "pizza" might be the missing word.

To effectively predict the missing word, we need a model that considers both past and future contexts. This is where **Bidirectional RNNs** come into play.

### Bidirectional RNNs: Architecture

Bidirectional RNNs process the input sequence in both forward and backward directions. This allows the model to have two hidden states at each timestep—one capturing information from the past (forward) and one from the future (backward).

1. **Forward RNN**: Processes the sequence from \( t = 1 \) to \( t = T \).
2. **Backward RNN**: Processes the sequence from \( t = T \) to \( t = 1 \).
3. **Combined Output**: The outputs from both RNNs are combined, usually by concatenating or averaging them, to form the final prediction.

#### Example with Bidirectional RNN

Let's revisit the previous sentence example:

- Input: "Krish eats __ in Bangalore."
- Bidirectional RNN processes the sentence:
  - **Forward pass**: Processes "Krish eats".
  - **Backward pass**: Processes "in Bangalore".

The model can now leverage both the previous context ("Krish eats") and the future context ("in Bangalore") to accurately predict the missing word, "dosa."

### Summary

Bidirectional RNNs are essential for tasks where context from both past and future words is necessary for accurate predictions. They enhance the model's understanding of the input sequence by processing it in both directions, making them powerful tools for various NLP tasks, including sentiment analysis, language translation, and word prediction.

# Overview

### What is a Bidirectional RNN?

![bidirectional-rnn-2.png](attachment:47da8e4c-0570-49eb-81ba-b4ba0e34f595.png)

*[Image Source](https://www.polarsparc.com/xhtml/DL-Bidirectional-RNN.html)*

A **Bidirectional Recurrent Neural Network (Bidirectional RNN)** is a type of RNN that processes data in both forward and backward directions. Traditional RNNs only process input sequences in one direction (from past to uture), but Bidirectional RNNs combine information from both past and future states, making them more effective for tasks where context from both directions is important.

### Why Do We Use Bidirectional RNN?

Bidirectional RNNs are used because they capture dependencies in the input data from both directions, which is crucial for understanding context. This makes them particularly useful in tasks like language processing, where the meaning of a word can depend on both the words that come before and after it.

For example, in the sentence "The cat sat on the **mat**," the word "mat" is influenced by both the preceding words ("The cat sat on the") and what could potentially follow (e.g., "and looked at the mouse").

### Example

Let's consider a sentence: **"He opened the door."**

- **Simple RNN**: It processes the sentence from left to right, word by word:
  - **He → opened → the → door**
  - Each word is processed based on the previous word.

- **Bidirectional RNN**: It processes the sentence in both directions:
  - **Forward pass**: **He → opened → the → door**
  - **Backward pass**: **door → the → opened → He**

The output at each time step is a combination of the information from both the forward and backward passes, allowing the network to understand the context better. For instance, knowing that "door" follows "the" helps confirm that "opened" likely refers to a physical action, not a metaphorical one.

### Difference between Bidirectional RNN and Simple RNN

| **Feature**                     | **Simple RNN**                                   | **Bidirectional RNN**                           |
|----------------------------------|-------------------------------------------------|-------------------------------------------------|
| **Direction of Processing**      | Processes input in one direction (left to right or right to left) | Processes input in both directions (left to right and right to left) |
| **Context Awareness**            | Limited to past context (previous words/steps)  | Aware of both past and future context           |
| **Information Flow**             | Forward direction only                          | Both forward and backward directions            |
| **Computation Complexity**       | Lower, since it involves only one direction     | Higher, as it involves processing in two directions |
| **Usage**                        | Suitable for tasks where past context is sufficient (e.g., time series prediction) | Suitable for tasks where context from both directions is important (e.g., language modeling, speech recognition) |
| **Accuracy**                     | Lower for tasks requiring full context          | Generally higher for tasks needing context from both directions |
| **Training Time**                | Faster training due to one-direction processing | Slower training due to the dual direction processing |
| **Applications**                 | Simple sequence tasks (e.g., basic text generation) | Complex sequence tasks (e.g., named entity recognition, sentiment analysis) |

[Notes](https://d2l.ai/chapter_recurrent-modern/bi-rnn.html)