# Understanding RNN structure
- Distinguished from feedforward nets, RNNs are structures that can well handle data with "sequential" format by preserving previous "state" 
- Thus, grasping concepts of **"sequences"** and (hidden) **"states"** in RNNs is crucial

<br>
<img src="http://karpathy.github.io/assets/rnn/charseq.jpeg" style="width: 500px"/>

In [1]:
import numpy as np
from keras.models import Model, Sequential
from keras.layers import *

Using TensorFlow backend.


## 1. SimpleRNN 

Input shape of SimpleRNN should be 3D tensor => (batch_size, timesteps, input_dim)
- **batch_size**: ommitted when creating RNN instance (== None). Usually designated when fitting model.
- **timesteps**: number of input sequence per batch
- **input_dim**: dimensionality of input sequence

In [4]:
# for instance, consider below array
x = np.array([[
             [1,    # => input_dim 1
              2,    # => input_dim 2 
              3],   # => input_dim 3     # => timestep 1                            
             [4, 5, 6]                   # => timestep 2
             ],                                  # => batch 1
             [[7, 8, 9], [10, 11, 12]],          # => batch 2
             [[13, 14, 15], [16, 17, 18]]        # => batch 3
             ])

In [5]:
print('(Batch size, timesteps, input_dim) = ',x.shape)

(Batch size, timesteps, input_dim) =  (3, 2, 3)


In [41]:
x = np.random.normal(0,1,(100,5))
y = 3*x

In [49]:


x = Input(shape = (1, 5))
x1 = SimpleRNN(4, return_sequences = True)(x)
x2 = SimpleRNN(5)(x1)

# rnn = GRU(4)(x)
# rnn = LSTM(4)(x)

model = Model(inputs=x,outputs=x2)
model.summary()
model.compile(optimizer='adam',loss='mse')

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_46 (InputLayer)        (None, 1, 5)              0         
_________________________________________________________________
simple_rnn_48 (SimpleRNN)    (None, 1, 4)              40        
_________________________________________________________________
simple_rnn_49 (SimpleRNN)    (None, 5)                 50        
Total params: 90
Trainable params: 90
Non-trainable params: 0
_________________________________________________________________


In [50]:
model.fit(x,y)

ValueError: When feeding symbolic tensors to a model, we expect thetensors to have a static batch size. Got tensor with shape: (None, 1, 5)

In [None]:
num_params = g × [h(h+i) + h]

In [None]:
1 3
2 8
3 15
4 24

In [None]:
x*x+  x  + x*inputdim

In [6]:
# rnn = SimpleRNN(50)(Input(shape = (10,))) => error
# rnn = SimpleRNN(50)(Input(shape = (10, 30, 40))) => error
rnn = SimpleRNN(50)(Input(shape = (10, 30)))

**return_state** = **return_sequences** = **False** ====> output_shape = **(batch_size = None, num_units)**

In [8]:
rnn = SimpleRNN(50)(Input(shape = (10, 30)))
print(rnn.shape)

(?, 50)


**return_sequences = True** ====> output_shape = **(batch_size, timesteps, num_units)**

In [9]:
rnn = SimpleRNN(50, return_sequences = True)(Input(shape = (10, 30)))
print(rnn.shape)

(?, ?, 50)


return_state = True ===> outputs list of tensor: **[output, state]**
- if return_sequences == False     =>>    output_shape = (batch_size, num_units)
- if return_sequences == True      =>>    output_shape = (batch_size, timesteps, num_units)

In [43]:
rnn = SimpleRNN(50, return_sequences = False, return_state = True)(Input(shape = (10, 30)))
print(rnn[0].shape)         # shape of output
print(rnn[1].shape)         # shape of last state

(?, 50)
(?, 50)


In [44]:
rnn = SimpleRNN(50, return_sequences = True, return_state = True)(Input(shape = (10, 30)))
print(rnn[0].shape)         # shape of output
print(rnn[1].shape)         # shape of last state

(?, ?, 50)
(?, 50)


Current output and state can be unpacked as below

In [None]:
output, state = SimpleRNN(50, return_sequences = True, return_state = True)(Input(shape = (10, 30)))

In [None]:
print(output.shape)
print(state.shape)

## 2. LSTM
- Outputs of LSTM are quite similar to those of RNNs, but there exist subtle differences
- If you compare two diagrams below, there is one more type of "state" that is preserved to next module

<br>
<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-SimpleRNN.png" style="width: 500px"/>

<center> Standard RNN </center>

<br>
<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png" style="width: 500px"/>

<center> LSTM </center>

In addition to "hidden state (ht)" in RNN, there exist "cell state (Ct)" in LSTM structure

<br>
<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-o.png" style="width: 500px"/>

<center> Hidden State </center>

<br>
<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-focus-C.png" style="width: 500px"/>

<center> Cell State </center>

In [None]:
lstm = LSTM(50)(Input(shape = (10, 30)))

In [None]:
print(lstm.shape)

In [None]:
lstm = LSTM(50, return_sequences = False, return_state = True)(Input(shape = (10, 30)))
print(lstm[0].shape)         # shape of output
print(lstm[1].shape)         # shape of hidden state
print(lstm[2].shape)         # shape of cell state

In [None]:
lstm = LSTM(50, return_sequences = True, return_state = True)(Input(shape = (10, 30)))
print(lstm[0].shape)         # shape of output
print(lstm[1].shape)         # shape of hidden state
print(lstm[2].shape)         # shape of cell state

In [None]:
output, hidden_state, cell_state = LSTM(50, return_sequences = True, return_state = True)(Input(shape = (10, 30)))

In [None]:
print(output.shape)
print(hidden_state.shape)
print(cell_state.shape)

## 3. GRU
- GRU, Popular variant of LSTM, does not have cell state
- Hence, it has only hidden state, as simple RNN

In [None]:
gru = GRU(50)(Input(shape = (10, 30)))

In [None]:
print(gru.shape)

In [None]:
gru = GRU(50, return_sequences = False, return_state = True)(Input(shape = (10, 30)))
print(gru[0].shape)         # shape of output
print(gru[1].shape)         # shape of hidden state

In [None]:
gru = GRU(50, return_sequences = True, return_state = True)(Input(shape = (10, 30)))
print(gru[0].shape)         # shape of output
print(gru[1].shape)         # shape of hidden state

In [None]:
output, hidden_state = GRU(50, return_sequences = True, return_state = True)(Input(shape = (10, 30)))

In [None]:
print(output.shape)
print(hidden_state.shape)

# Basic RNN
- Objective: to understand basics of RNN & LSTM

## Recurrent Neural Networks
- Feedforward neural networks (e.g. MLPs and CNNs) are powerful, but they are not optimized to handle "sequential" data
- In other words, they do not possess "memory" of previous inputs
- For instance, consider the case of translating a corpus. You need to consider the **"context"** to guess the next word to come forward

<br>
- RNNs are suitable for dealing with sequential format data since they have **"recurrent"** structure
- To put it differently, they keep the **"memory"** of earlier inputs in the sequence
</br>
<img src="http://www.wildml.com/wp-content/uploads/2015/09/rnn.jpg" style="width: 600px"/>

<br>
- However, in order to reduce the number of parameters, every layer of different time steps shares same parameters
</br>

<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png" style="width: 600px"/>

In [None]:
## Load Dataset

In [None]:
import numpy as np

from sklearn.metrics import accuracy_score
from keras.datasets import reuters
from keras.preprocessing.sequence import pad_sequences
from keras.utils import to_categorical

In [None]:
# parameters for data load
num_words = 30000
maxlen = 50
test_split = 0.3

In [None]:
(X_train, y_train), (X_test, y_test) = reuters.load_data(num_words = num_words, maxlen = maxlen, test_split = test_split)

In [None]:
# pad the sequences with zeros 
# padding parameter is set to 'post' => 0's are appended to end of sequences
X_train = pad_sequences(X_train, padding = 'post')
X_test = pad_sequences(X_test, padding = 'post')

In [None]:
X_train = np.array(X_train).reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = np.array(X_test).reshape((X_test.shape[0], X_test.shape[1], 1))

In [None]:
y_data = np.concatenate((y_train, y_test))
y_data = to_categorical(y_data)

In [None]:
y_train = y_data[:1395]
y_test = y_data[1395:]

In [None]:
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

In [None]:
## 1. Vanilla RNN
- Vanilla RNNs have a simple structure
- However, they suffer from the problem of "long-term dependencies"
- Hence, they are not able to keep the **sequential memory" for long

<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-SimpleRNN.png" style="width: 600px"/>

In [None]:
from keras.models import Sequential
from keras.layers import Dense, SimpleRNN, Activation
from keras import optimizers
from keras.wrappers.scikit_learn import KerasClassifier

In [None]:
def vanilla_rnn():
    model = Sequential()
    model.add(SimpleRNN(50, input_shape = (49,1), return_sequences = False))
    model.add(Dense(46))
    model.add(Activation('softmax'))
    
    adam = optimizers.Adam(lr = 0.001)
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    
    return model

In [None]:
model = KerasClassifier(build_fn = vanilla_rnn, epochs = 200, batch_size = 50, verbose = 1)

In [None]:
model.fit(X_train, y_train)

In [None]:
y_pred = model.predict(X_test)

In [None]:
y_test_ = np.argmax(y_test, axis = 1)

In [None]:
print(accuracy_score(y_pred, y_test_))

## 2. Stacked Vanilla RNN
- RNN layers can be stacked to form a deeper network

<img src="https://lh6.googleusercontent.com/rC1DSgjlmobtRxMPFi14hkMdDqSkEkuOX7EW_QrLFSymjasIM95Za2Wf-VwSC1Tq1sjJlOPLJ92q7PTKJh2hjBoXQawM6MQC27east67GFDklTalljlt0cFLZnPMdhp8erzO" style="width: 500px"/>

In [None]:
def stacked_vanilla_rnn():
    model = Sequential()
    model.add(SimpleRNN(50, input_shape = (49,1), return_sequences = True))   # return_sequences parameter has to be set True to stack
    model.add(SimpleRNN(50, return_sequences = False))
    model.add(Dense(46))
    model.add(Activation('softmax'))
    
    adam = optimizers.Adam(lr = 0.001)
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    
    return model

In [None]:
model = KerasClassifier(build_fn = stacked_vanilla_rnn, epochs = 200, batch_size = 50, verbose = 1)

In [None]:
model.fit(X_train, y_train)

In [None]:
y_pred = model.predict(X_test)

In [None]:
print(accuracy_score(y_pred, y_test_))

## 3. LSTM
- LSTM (long short-term memory) is an improved structure to solve the problem of long-term dependencies

<img src="http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/LSTM3-chain.png" style="width: 600px"/>

In [None]:
from keras.layers import LSTM

In [None]:
def lstm():
    model = Sequential()
    model.add(LSTM(50, input_shape = (49,1), return_sequences = False))
    model.add(Dense(46))
    model.add(Activation('softmax'))
    
    adam = optimizers.Adam(lr = 0.001)
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    
    return model

In [None]:
model = KerasClassifier(build_fn = lstm, epochs = 200, batch_size = 50, verbose = 1)

In [None]:
model.fit(X_train, y_train)

In [None]:
y_pred = model.predict(X_test)

In [None]:
# accuracy improves by adopting LSTM structure
print(accuracy_score(y_pred, y_test_))

In [None]:
## 4. Stacked LSTM
- LSTM layers can be stacked as well

In [None]:
def stacked_lstm():
    model = Sequential()
    model.add(LSTM(50, input_shape = (49,1), return_sequences = True))
    model.add(LSTM(50, return_sequences = False))
    model.add(Dense(46))
    model.add(Activation('softmax'))
    
    adam = optimizers.Adam(lr = 0.001)
    model.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['accuracy'])
    
    return model

In [None]:
model = KerasClassifier(build_fn = stacked_lstm, epochs = 200, batch_size = 50, verbose = 1)

In [None]:
model.fit(X_train, y_train)

In [None]:
y_pred = model.predict(X_test)

In [None]:
print(accuracy_score(y_pred, y_test_))