# Pseudo code for RNN

let’s implement the forward pass of a
toy RNN in Numpy. This RNN takes as input a sequence of vectors, which you’ll encode
as a 2D tensor of size (timesteps, input_features). It loops over timesteps, and at
each timestep, it considers its current state at t and the input at t (of shape (input_
features,), and combines them to obtain the output at t. You’ll then set the state for
the next step to be this previous output. For the first timestep, the previous output
isn’t defined; hence, there is no current state. So, you’ll initialize the state as an allzero
vector called the initial state of the network.
In pseudocode, this is the RNN.

In [None]:
state_t=0
for input_t in input_sequence:
    output_t=f(input_t, state_t)
    state_t=output_t # output_t becomes the state_t for next iteration

You can even flesh out the function f: the transformation of the input and state into an
output will be parameterized by two matrices, W and U, and a bias vector. It’s similar to
the transformation operated by a densely connected layer in a feedforward network.

In [None]:
state_t=0
for input in input_sequence:
    output_t = activation(dot(W, input_t) + dot(U, input_t) + bias)
    state_t=output_t

# Simple RNN implementation using numpy

In [5]:
import numpy as np
timesteps=100
input_features=32
output_features=64

input_sequence=np.random.random((timesteps, input_features))
# hence input_t will be a vector of shape(input_features,)

# initializing initial state, W, U and bias
state_t=np.zeros((output_features,))
W=np.random.random((output_features, input_features))
U=np.random.random((output_features, output_features))
bias=np.random.random((output_features,))

# Doing 1 forward pass of whole sequence
successive_outputs=[]
for input_t in input_sequence:
    output_t=np.tanh(np.dot(W, input_t) + np.dot(U, state_t) + bias)
    # storing outpu_t at each step in a list
    successive_outputs.append(output_t)
    state_t=output_t # using output of current iteration as state of next iteration
# finally store the output sequence in a np array
final_output_sequence=np.concatenate(successive_outputs, axis=0)

In [7]:
final_output_sequence.shape #2d tensor of shape (timesteps, output_features)

(6400,)

In summary, an RNN is a for loop that reuses quantities computed
during the previous iteration of the loop, nothing more. Of course, there are many
different RNNs fitting this definition that you could build—this example is one of the
simplest RNN formulations. RNNs are characterized by their step function, such as the
following function in this case:
output_t = np.tanh(np.dot(W, input_t) + np.dot(U, state_t) + b)

# Recurrent layers in keras

The process you just naively implemented in Numpy corresponds to an actual Keras
layer—the SimpleRNN layer:
from keras.layers import SimpleRNN
There is one minor difference: SimpleRNN processes batches of sequences, like all other
Keras layers, not a single sequence as in the Numpy example. This means it takes inputs
of shape (batch_size, timesteps, input_features), rather than (timesteps,
input_features).
Like all recurrent layers in Keras, SimpleRNN can be run in two different modes: it
can return either the full sequences of successive outputs for each timestep (a 3D tensor
of shape (batch_size, timesteps, output_features)) or only the last output for
each input sequence (a 2D tensor of shape (batch_size, output_features)). These
two modes are controlled by the return_sequences constructor argument. Let’s look
at an example that uses SimpleRNN and returns only the output at the last timestep:

In [3]:
from keras.models import Sequential
from keras.layers import Embedding, SimpleRNN
model=Sequential()
model.add(Embedding(10000, 32))
model.add(SimpleRNN(32))
model.summary()
# this only outputs the output for last sequence

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, None, 32)          320000    
_________________________________________________________________
simple_rnn_2 (SimpleRNN)     (None, 32)                2080      
Total params: 322,080
Trainable params: 322,080
Non-trainable params: 0
_________________________________________________________________


In [4]:
# output from all batches:
model=Sequential()
model.add(Embedding(10000, 32))
model.add(SimpleRNN(32, return_sequences=True))
model.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_3 (Embedding)      (None, None, 32)          320000    
_________________________________________________________________
simple_rnn_3 (SimpleRNN)     (None, None, 32)          2080      
Total params: 322,080
Trainable params: 322,080
Non-trainable params: 0
_________________________________________________________________


## stacking rnn layers

It’s sometimes useful to stack several recurrent layers one after the other in order to
increase the representational power of a network. In such a setup, you have to get all
of the intermediate layers to return full sequence of outputs:

In [6]:
model=Sequential()
model.add(Embedding(10_000, 32))
model.add(SimpleRNN(32, return_sequences=True))
model.add(SimpleRNN(32, return_sequences=True))
model.add(SimpleRNN(32, return_sequences=True))
model.add(SimpleRNN(32, return_sequences=False)) # last layer may or may not return the output of full sequence
model.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_5 (Embedding)      (None, None, 32)          320000    
_________________________________________________________________
simple_rnn_4 (SimpleRNN)     (None, None, 32)          2080      
_________________________________________________________________
simple_rnn_5 (SimpleRNN)     (None, None, 32)          2080      
_________________________________________________________________
simple_rnn_6 (SimpleRNN)     (None, None, 32)          2080      
_________________________________________________________________
simple_rnn_7 (SimpleRNN)     (None, 32)                2080      
Total params: 328,320
Trainable params: 328,320
Non-trainable params: 0
_________________________________________________________________


## USing above(stacked-simple) RNN for movie review classification

### preparing imdb dataset

In [3]:
from keras.preprocessing.text import Tokenizer
from keras.preprocessing import sequence
from keras.datasets import imdb

max_features=10_000
maxlen=500
batch_size=32

print('loading data...')
(input_train, y_train), (input_test, y_test) = imdb.load_data(num_words=max_features)
print(input_train.shape, input_test.shape)
print("padding sequences..")
input_train=sequence.pad_sequences(input_train, maxlen=maxlen)
input_test=sequence.pad_sequences(input_test, maxlen=maxlen)
print(input_train.shape, input_test.shape)

loading data...
(25000,) (25000,)
padding sequences..
(25000, 500) (25000, 500)


### training with single simple rnn

In [None]:
from keras.models import Sequential
from keras.layers import Embedding, SimpleRNN, Dense

embedding_dimensions=32

model=Sequential()
model.add(Embedding(max_features, embedding_dimensions))
model.add(SimpleRNN(embedding_dimensions))
model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc'])
model.fit(input_train, y_train, batch_size=batch_size, epochs=10, validation_split=0.2)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 20000 samples, validate on 5000 samples
Epoch 1/10


will give memory error in local laptop, train on colab gives- accuracy of 85%
Which is low compared to simple dense networs. 2 reasons:
    1. it looked at only 500 words not all.
    2. simple rnn is not good enough for processing long sequences. Will have to use other RNNs

## LSTM and GRU layers

### using lstm in keras

In [None]:
from keras.models import Sequential
from keras.layers import Embedding, Dense, LSTM

model = Sequential()
model.add(Embedding(max_features, 32))
model.add(LSTM(32))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',loss='binary_crossentropy',metrics=['acc'])
history = model.fit(input_train, y_train,epochs=10,batch_size=128,validation_split=0.2)

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 20000 samples, validate on 5000 samples
Epoch 1/10
