### Bidirectional RNN  
![](https://cn.bing.com/th?id=OIP.tZtg4-7QAkaUdPyBbNBlXwHaEt&pid=Api&rs=1&p=0)

- if each RNN unit's latent dimension is M, \begin{equation*}h_t\end{equation*} should be of size 2M


### Many-to-One Problem
- The backward RNN has only seen one last word!

\begin{equation*}
out = [\vec{h_T}, \overleftarrow{h_1}] \\
\end{equation*}

- This is default behavior in Keras if return_sequence= False

- Another solution 
    - take the max over all hidden states
\begin{equation*}
out = \max_{t}  h_t\\
\end{equation*}
    - What if we took the softmax instead of max? keep this in mind for Attention
    


### When not to use a Bidirectional RNN
- not to use for predicting the future
- it doesn't make sense for inputs to come from even further in the future


### Code

In [1]:
from keras.models import Model
from keras.layers import Input, LSTM, GRU, Bidirectional
import numpy as np
import matplotlib.pyplot as plt

import keras.backend as K
if len(K.tensorflow_backend._get_available_gpus()) > 0:
    from keras.layers import CuDNNLSTM as LSTM
    from keras.layers import CuDNNGRU as GRU



Using TensorFlow backend.


In [28]:
T = 8
D = 20
M = 3

X = np.random.randn(1, T, D)
X.shape

(1, 8, 20)

In [29]:
X

array([[[ 1.59215575, -0.99875615, -0.80206125,  0.50841239,
         -0.1559104 ,  1.02235294,  0.49800112, -0.56908889,
         -0.34025952, -0.53371617,  0.00286114,  1.43003495,
          1.52893131,  0.28340541, -1.34811457, -1.46465738,
         -1.03972654, -1.24180737, -0.37111895,  1.05344886],
        [-0.45782456, -0.74904535,  1.66153361,  0.05113275,
          2.77182987, -1.2462556 ,  1.19900642, -0.15166592,
         -1.14872252, -2.18543846,  0.89098527,  0.08289235,
          0.3848108 , -0.00658806, -2.48954578,  2.61407273,
         -1.1011952 , -0.80532234, -1.0876434 , -0.47016428],
        [-0.88635914,  0.11020603, -0.10451324, -0.61823208,
          0.17984517,  0.1392936 , -0.2425376 , -0.34382657,
          0.31293554, -0.47095947,  0.71068175, -0.45563558,
         -0.8569422 ,  1.36839485, -0.80173691,  2.03356689,
         -2.14952568,  0.31888211,  0.38215729, -0.15716281],
        [-0.32067256, -0.76889461, -0.04986577, -1.60472858,
          0.62292881,

In [30]:
input_ = Input(shape=(T, D))
rnn = Bidirectional(LSTM(M, return_state=True, return_sequences=True))
# rnn = Bidirectional(LSTM(M, return_state=True, return_sequences=False))
x = rnn(input_)


In [31]:
model = Model(inputs=input_, outputs=x)
o, h1, c1, h2, c2 = model.predict(X)
print("o:", o)
print("o.shape:", o.shape)
print("h1:", h1)
print("c1:", c1)
print("h2:", h2)
print("c2:", c2)

o: [[[ 0.18119621  0.01583676  0.20686583  0.21239105 -0.05004329
   -0.05696144]
  [ 0.02418935  0.05985089  0.18329658 -0.0693098  -0.16771546
   -0.2273484 ]
  [ 0.06540236  0.19442108  0.2990017  -0.00376892 -0.2814624
   -0.24852385]
  [-0.06846699  0.14171138  0.32254672  0.09992275 -0.06165175
   -0.13075766]
  [-0.15302575  0.19428375  0.0460598   0.15211043 -0.03509453
    0.15433581]
  [-0.08738933  0.063574    0.2251744   0.16290708 -0.02215824
   -0.08731462]
  [-0.13260986  0.25557363  0.10454654  0.12669256  0.02675493
   -0.05862847]
  [-0.04098389  0.45581076  0.02110858  0.00144373 -0.13307135
   -0.04553765]]]
o.shape: (1, 8, 6)
h1: [[-0.04098389  0.45581076  0.02110858]]
c1: [[-0.456286    0.88852847  0.6460419 ]]
h2: [[ 0.21239105 -0.05004329 -0.05696144]]
c2: [[ 0.41276407 -0.5533985  -0.11868316]]


In [32]:
input_ = Input(shape=(T, D))
# rnn = Bidirectional(LSTM(M, return_state=True, return_sequences=True))
rnn = Bidirectional(LSTM(M, return_state=True, return_sequences=False))
x = rnn(input_)
model = Model(inputs=input_, outputs=x)
o, h1, c1, h2, c2 = model.predict(X)
print("o:", o)
print("o.shape:", o.shape)
print("h1:", h1)
print("c1:", c1)
print("h2:", h2)
print("c2:", c2)

o: [[ 0.24157766  0.07058255 -0.4550641   0.00260599  0.09559534 -0.20733805]]
o.shape: (1, 6)
h1: [[ 0.24157766  0.07058255 -0.4550641 ]]
c1: [[ 0.47180873  0.14923403 -0.9594811 ]]
h2: [[ 0.00260599  0.09559534 -0.20733805]]
c2: [[ 0.66224265  0.2606456  -0.3786462 ]]


In [33]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_8 (InputLayer)         (None, 8, 20)             0         
_________________________________________________________________
bidirectional_8 (Bidirection [(None, 6), (None, 3), (N 576       
Total params: 576
Trainable params: 576
Non-trainable params: 0
_________________________________________________________________


In [12]:
rnn.weights

[<tf.Variable 'bidirectional_1/forward_lstm_1/kernel:0' shape=(2, 12) dtype=float32_ref>,
 <tf.Variable 'bidirectional_1/forward_lstm_1/recurrent_kernel:0' shape=(3, 12) dtype=float32_ref>,
 <tf.Variable 'bidirectional_1/forward_lstm_1/bias:0' shape=(12,) dtype=float32_ref>,
 <tf.Variable 'bidirectional_1/backward_lstm_1/kernel:0' shape=(2, 12) dtype=float32_ref>,
 <tf.Variable 'bidirectional_1/backward_lstm_1/recurrent_kernel:0' shape=(3, 12) dtype=float32_ref>,
 <tf.Variable 'bidirectional_1/backward_lstm_1/bias:0' shape=(12,) dtype=float32_ref>]

#### toxic data

In [2]:
import os
import sys
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from keras.models import Model
from keras.layers import Dense, Embedding, Input
from keras.layers import LSTM, Bidirectional, GlobalMaxPool1D, Dropout
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.optimizers import Adam
from sklearn.metrics import roc_auc_score

import keras.backend as K
if len(K.tensorflow_backend._get_available_gpus()) > 0:
    from keras.layers import CuDNNLSTM as LSTM
    from keras.layers import CuDNNGRU as GRU



Using TensorFlow backend.


In [4]:
# some configuration
MAX_SEQUENCE_LENGTH = 100
MAX_VOCAB_SIZE = 20000
EMBEDDING_DIM = 50
VALIDATION_SPLIT = 0.2
BATCH_SIZE = 128
EPOCHS = 5


In [None]:
# load in pre-trained word vectors
word2vec = {}
with open(os.path.join('../large_files/glove.6B/glove.6B.%sd.txt' % EMBEDDING_DIM)) as f:
  # is just a space-separated text file in the format:
  # word vec[0] vec[1] vec[2] ...
  for line in f:
    values = line.split()
    word = values[0]
    vec = np.asarray(values[1:], dtype='float32')
    word2vec[word] = vec


In [None]:
train = pd.read_csv("../large_files/toxic-comment/train.csv")
sentences = train["comment_text"].fillna("DUMMY_VALUE").values
possible_labels = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]
targets = train[possible_labels].values


In [None]:
# convert the sentences (strings) into integers
tokenizer = Tokenizer(num_words=MAX_VOCAB_SIZE)
tokenizer.fit_on_texts(sentences)
sequences = tokenizer.texts_to_sequences(sentences)

# get word -> integer mapping
word2idx = tokenizer.word_index

# pad sequences so that we get a N x T matrix
data = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)


In [None]:
# prepare embedding matrix
num_words = min(MAX_VOCAB_SIZE, len(word2idx) + 1)
embedding_matrix = np.zeros((num_words, EMBEDDING_DIM))
for word, i in word2idx.items():
  if i < MAX_VOCAB_SIZE:
    embedding_vector = word2vec.get(word)
    if embedding_vector is not None:
      # words not found in embedding index will be all zeros.
      embedding_matrix[i] = embedding_vector

In [11]:
# load pre-trained word embeddings into an Embedding layer
# note that we set trainable = False so as to keep the embeddings fixed

# temp code for compile
num_words = MAX_VOCAB_SIZE
embedding_matrix = np.zeros((num_words, EMBEDDING_DIM))
possible_labels = ["toxic", "severe_toxic", "obscene", "threat", "insult", "identity_hate"]

embedding_layer = Embedding(
  num_words,
  EMBEDDING_DIM,
  weights=[embedding_matrix],
  input_length=MAX_SEQUENCE_LENGTH,
  trainable=False
)



In [12]:
print('Building model...')

# create an LSTM network with a single LSTM
input_ = Input(shape=(MAX_SEQUENCE_LENGTH,))
x = embedding_layer(input_)
# x = LSTM(15, return_sequences=True)(x)
x = Bidirectional(LSTM(15, return_sequences=True))(x)
x = GlobalMaxPool1D()(x)
output = Dense(len(possible_labels), activation="sigmoid")(x)

model = Model(input_, output)
model.compile(
  loss='binary_crossentropy',
  optimizer=Adam(lr=0.01),
  metrics=['accuracy']
)




Building model...


In [13]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_3 (InputLayer)         (None, 100)               0         
_________________________________________________________________
embedding_2 (Embedding)      (None, 100, 50)           1000000   
_________________________________________________________________
bidirectional_2 (Bidirection (None, 100, 30)           7920      
_________________________________________________________________
global_max_pooling1d_2 (Glob (None, 30)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 6)                 186       
Total params: 1,008,106
Trainable params: 8,106
Non-trainable params: 1,000,000
_________________________________________________________________


In [None]:
print('Training model...')
r = model.fit(
  data,
  targets,
  batch_size=BATCH_SIZE,
  epochs=EPOCHS,
  validation_split=VALIDATION_SPLIT
)


### Image Classification with Bidirectional RNNs
- RNNs can be used for images

- An image as a sequence of pixels

#### Architecture
- Let's pretend the image is a sequence of word vectors( T x D matrix)
- Pretend height = T , width = D
- Rotate the image and run a Bidirectional RNN on both, we go in all 4 directions

[tensorflow example](http://easy-tensorflow.com/tf-tutorials/recurrent-neural-networks/bidirectional-rnn-for-classification)

- given LSTM latent dimensionality  = M, what is the output size?
- input (N x H x W) -> bi-LSTM(N x H x 2M) -> maxpool -> N x 2M
- rotate (N x H x W) -> bi-LSTM(N x H x 2M) -> maxpool -> N x 2M
- concat output ( N x 4M)
- Dense + softmax to get prediction ( N x K)


In [None]:

# get data
X, Y = get_mnist()

# config
D = 28
M = 15


In [None]:

# input is an image of size 28x28
input_ = Input(shape=(D, D))

# up-down
rnn1 = Bidirectional(LSTM(M, return_sequences=True))
x1 = rnn1(input_) # output is N x D x 2M
x1 = GlobalMaxPooling1D()(x1) # output is N x 2M

# left-right
rnn2 = Bidirectional(LSTM(M, return_sequences=True))

# custom layer
permutor = Lambda(lambda t: K.permute_dimensions(t, pattern=(0, 2, 1)))

x2 = permutor(input_)
x2 = rnn2(x2) # output is N x D x 2M
x2 = GlobalMaxPooling1D()(x2) # output is N x 2M

# put them together
concatenator = Concatenate(axis=1)
x = concatenator([x1, x2]) # output is N x 4M

# final dense layer
output = Dense(10, activation='softmax')(x)

model = Model(inputs=input_, outputs=output)

# testing
# o = model.predict(X)
# print("o.shape:", o.shape)

# compile
model.compile(
  loss='sparse_categorical_crossentropy',
  optimizer='adam',
  metrics=['accuracy']
)

# train
print('Training model...')
r = model.fit(X, Y, batch_size=32, epochs=10, validation_split=0.3)

