# 3. Bidirectional Recurrent Neural Networks (BRNN)
## Limitation of RNN

In the last lecture, we learned about **Recurrent Neural Networks**. What distinguishes itself from feedforward layers is that it can process sequential data. Although it sounds super cool enough, there's also limitation to it.  

Let's say we want to build a model that predicts which word would fall into the blank, `[ ]`. What would be the answer in the following sentence?

- He said, "Teddy `[ ]`"

Yes, it's pretty hard to predict the next word since we only have the past data, which are words before the blank. And how would you know if "Teddy" means a Teddy bear, Teddy Roosevelt, Teddy Sears, or the new Indian action thriller [film](https://en.wikipedia.org/wiki/Teddy_(film)) secretly released on 2021? It seems it might be more helpful to know which words come after the blank. The problem of RNNs is that it misses out future information which might hold important context.

## BRNN is here
Here comes Bidirectional Recurrent Neural Networks. As the name suggests, it not only holds a memory of inputs in one order that RNNs does, but the other way around. This way, the output layer can receive more rich information from both directions, which makes it outstanding especially when the context of the input should be considered.

Let's get back to the previous example to intuitively understand why it's awesome. We have the same task to predict the word in the blank, but this time, we are also given next words.
- He said, "Teddy `[ ]` are cuter than you."

Now, it makes sense. The word in the blank should be "Roosevelt."

You spotted it's a joke for sure because you also looked into what comes after the blank. BRNN does the same. 

Be careful about when to use BRRN, though. As you might have guessed, we are not always guaranteed to look into future. That half of its structure depends on the future data means that it can perform half poorly when it has no access to one. 

Still, it makes perfect sense to use Bidirectional RNN in various areas where we're likely to have complete inputs such as

- Speech Recognition
- Translation
- Handwrrting Recognition
- Protein Structure Prediction
- Part-of-speech tagging

## BRNN's Structure



![DIVE INTO DEEP LEARNING](https://d2l.ai/_images/birnn.svg)
<center><i>BRNN Image from DIVE INTO DEEP LEARNING</i></center>

BRNN is largely composed of two vertically stacked RNNs. 

$X_t \in \mathbb{R}^{n \times d}$ (where $n$ is the number of sequences and $d$ is the number of input features) is an input for each time step. And let $\phi$ be an activation function for hidden layer.

In the forward layer, $\overrightarrow{\text{H}_t}$ passes its output to both its next time step and the output. 

$$\overrightarrow{\text{H}_t} = \phi(X_t\overrightarrow{\text{W}}_{xh} + \overrightarrow{\text{H}}_{t-1}\overrightarrow{\text{W}}_{hh} + \overrightarrow{b_h})$$

The layer above $\overleftarrow{\text{H}_t}$ does it backward.

$$\overleftarrow{\text{H}_t} = \phi(X_t\overleftarrow{\text{W}}_{xh} + \overleftarrow{\text{H}}_{t+1}\overleftarrow{\text{W}}_{hh} + \overleftarrow{b_h})$$

We can obtain the output $O_t$ from the concatenation of the two outputs of $\overrightarrow{\text{H}}_t$ and $\overleftarrow{\text{H}}_t$.

$$O_t = H_{t}W_{ho} + b_o$$

where $W_{ho} \in \mathbb{R}^{2h \times o} $ denotes the weights between the hidden layers and the output layer.

## Keras implementation

After covering the theory, let's now look how BRNNs can be implemented using Keras.

In [2]:
import tensorflow as tf

model = tf.keras.models.Sequential([
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences = True, input_shape = (5, 10))),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)),
    tf.keras.layers.Dense(10)
])

As you can see, the `Bidirectional()` layer function handles the structuring of our neural network and the only thing we are required to do is specifying the type of recurrent network we want to use for our Bidirectional layer. In the example case, we used LSTM, but you could use RNN or GRU (we will be covering it in the future tutorials).