# Bidirectional Recurrent Neural Networks

So far, our working example of a sequence learning task has been language modeling, where we aim to predict the next token given all previous tokens in a sequence. In this scenario, we wish only to condition upon the leftward context, and thus the unidirectional chaining of a standard RNN seems appropriate. __However, there are many other sequence learning tasks contexts where it is perfectly fine to condition the prediction at every time step on both the leftward and the rightward context.__ Consider, for example, part of speech detection. Why shouldn’t we take the context in both directions into account when assessing the part of speech associated with a given word?

Fortunately, a simple technique transforms any unidirectional RNN into a bidirectional RNN [Schuster and Paliwal, 1997](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=650093). __We simply implement two unidirectional RNN layers chained together in opposite directions and acting on the same input.__


(Mathematical Intuition. Look at the equations)

In [1]:
import torch
from torch import nn
from d2l import torch as d2l

## Implementation from scratch

To implement a bidirectional RNN from scratch, we can include two unidirectional RNNScratch instances with separate learnable parameters.

In [3]:
class BiRNNScratch(d2l.Module):
    def __init__(self, num_inputs, num_hiddens, sigma=0.01):
        super().__init__()
        self.save_hyperparameters()
        self.f_rnn = d2l.RNNScratch(num_inputs, num_hiddens, sigma)
        self.b_rnn = d2l.RNNScratch(num_inputs, num_hiddens, sigma)
        self.num_hiddens *= 2  # The output dimension will be doubled

To implement a bidirectional RNN from scratch, we can include two unidirectional RNNScratch instances with separate learnable parameters.

__Forward Method__

In [4]:
@d2l.add_to_class(BiRNNScratch)
def forward(self, inputs, Hs=None):
    f_H, b_H = Hs if Hs is not None else (None, None)
    f_outputs, f_H = self.f_rnn(inputs, f_H)
    b_outputs, b_H = self.b_rnn(reversed(inputs), b_H)
    outputs = [torch.cat((f, b), -1) for f, b in zip(
        f_outputs, reversed(b_outputs))]
    return outputs, (f_H, b_H)

## Concise Implementation

Using the high-level APIs, we can implement bidirectional RNNs more concisely. Here we take a GRU model as an example.

In [5]:
class BiGRU(d2l.RNN):
    def __init__(self, num_inputs, num_hiddens):
        d2l.Module.__init__(self)
        self.save_hyperparameters()
        self.rnn = nn.GRU(num_inputs, num_hiddens, bidirectional=True)
        self.num_hiddens *= 2

__In bidirectional RNNs, the hidden state for each time step is simultaneously determined by the data prior to and after the current time step. Bidirectional RNNs are mostly useful for sequence encoding and the estimation of observations given bidirectional context. Bidirectional RNNs are very costly to train due to long gradient chains.__