### Bidirectional RNN  
![](https://cn.bing.com/th?id=OIP.tZtg4-7QAkaUdPyBbNBlXwHaEt&pid=Api&rs=1&p=0)

- if each RNN unit's latent dimension is M, \begin{equation*}h_t\end{equation*} should be of size 2M


### Many-to-One Problem
- The backward RNN has only seen one last word!

\begin{equation*}
out = [\vec{h_T}, \overleftarrow{h_1}] \\
\end{equation*}

- This is default behavior in Keras if return_sequence= False

- Another solution 
    - take the max over all hidden states
\begin{equation*}
out = \max_{t}  h_t\\
\end{equation*}
    - What if we took the softmax instead of max? keep this in mind for Attention
    


### When not to use a Bidirectional RNN
- not to use for predicting the future
- it doesn't make sense for inputs to come from even further in the future


### Code

In [1]:
from keras.models import Model
from keras.layers import Input, LSTM, GRU, Bidirectional
import numpy as np
import matplotlib.pyplot as plt

import keras.backend as K
if len(K.tensorflow_backend._get_available_gpus()) > 0:
    from keras.layers import CuDNNLSTM as LSTM
    from keras.layers import CuDNNGRU as GRU



Using TensorFlow backend.


In [2]:
T = 8
D = 2
M = 3

X = np.random.randn(1, T, D)
X.shape

(1, 8, 2)

In [3]:
input_ = Input(shape=(T, D))
# rnn = Bidirectional(LSTM(M, return_state=True, return_sequences=True))
rnn = Bidirectional(LSTM(M, return_state=True, return_sequences=False))
x = rnn(input_)


In [4]:
model = Model(inputs=input_, outputs=x)
o, h1, c1, h2, c2 = model.predict(X)
print("o:", o)
print("o.shape:", o.shape)
print("h1:", h1)
print("c1:", c1)
print("h2:", h2)
print("c2:", c2)

o: [[-0.17028646 -0.11698733 -0.26957956 -0.14286296  0.13157322 -0.11919126]]
o.shape: (1, 6)
h1: [[-0.17028646 -0.11698733 -0.26957956]]
c1: [[-0.45001054 -0.26141956 -0.66415256]]
h2: [[-0.14286296  0.13157322 -0.11919126]]
c2: [[-0.30573738  0.30624184 -0.21949151]]


#### toxic data

In [5]:
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from keras.models import Model
from keras.layers import Dense, Embedding, Input
from keras.layers import LSTM, Bidirectional, GlobalMaxPool1D, Dropout
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.optimizers import Adam
from sklearn.metrics import roc_auc_score

import keras.backend as K
if len(K.tensorflow_backend._get_available_gpus()) > 0:
    from keras.layers import CuDNNLSTM as LSTM
    from keras.layers import CuDNNGRU as GRU



In [6]:
# some configuration
MAX_SEQUENCE_LENGTH = 100
MAX_VOCAB_SIZE = 20000
EMBEDDING_DIM = 50
VALIDATION_SPLIT = 0.2
BATCH_SIZE = 128
EPOCHS = 5


In [None]:
print('Building model...')

# create an LSTM network with a single LSTM
input_ = Input(shape=(MAX_SEQUENCE_LENGTH,))
x = embedding_layer(input_)
# x = LSTM(15, return_sequences=True)(x)
x = Bidirectional(LSTM(15, return_sequences=True))(x)
x = GlobalMaxPool1D()(x)
output = Dense(len(possible_labels), activation="sigmoid")(x)

model = Model(input_, output)
model.compile(
  loss='binary_crossentropy',
  optimizer=Adam(lr=0.01),
  metrics=['accuracy']
)


print('Training model...')
r = model.fit(
  data,
  targets,
  batch_size=BATCH_SIZE,
  epochs=EPOCHS,
  validation_split=VALIDATION_SPLIT
)


### Image Classification with Bidirectional RNNs
- RNNs can be used for images

- An image as a sequence of pixels

#### Architecture
- Let's pretend the image is a sequence of word vectors( T x D matrix)
- Pretend height = T , width = D
- Rotate the image and run a Bidirectional RNN on both, we go in all 4 directions

[tensorflow example](http://easy-tensorflow.com/tf-tutorials/recurrent-neural-networks/bidirectional-rnn-for-classification)

- given LSTM latent dimensionality  = M, what is the output size?
- input (N x H x W) -> bi-LSTM(N x H x 2M) -> maxpool -> N x 2M
- rotate (N x H x W) -> bi-LSTM(N x H x 2M) -> maxpool -> N x 2M
- concat output ( N x 4M)
- Dense + softmax to get prediction ( N x K)


In [None]:

# get data
X, Y = get_mnist()

# config
D = 28
M = 15


In [None]:

# input is an image of size 28x28
input_ = Input(shape=(D, D))

# up-down
rnn1 = Bidirectional(LSTM(M, return_sequences=True))
x1 = rnn1(input_) # output is N x D x 2M
x1 = GlobalMaxPooling1D()(x1) # output is N x 2M

# left-right
rnn2 = Bidirectional(LSTM(M, return_sequences=True))

# custom layer
permutor = Lambda(lambda t: K.permute_dimensions(t, pattern=(0, 2, 1)))

x2 = permutor(input_)
x2 = rnn2(x2) # output is N x D x 2M
x2 = GlobalMaxPooling1D()(x2) # output is N x 2M

# put them together
concatenator = Concatenate(axis=1)
x = concatenator([x1, x2]) # output is N x 4M

# final dense layer
output = Dense(10, activation='softmax')(x)

model = Model(inputs=input_, outputs=output)

# testing
# o = model.predict(X)
# print("o.shape:", o.shape)

# compile
model.compile(
  loss='sparse_categorical_crossentropy',
  optimizer='adam',
  metrics=['accuracy']
)

# train
print('Training model...')
r = model.fit(X, Y, batch_size=32, epochs=10, validation_split=0.3)

