### Numerical Calculation (RNN)

- suppose
    - T = 20(sequence length)
    - D = 10 (input dimensionality)
    - M = 15 (hidden layer size)
    - K = 3 ( number of output classes)
- input to hidden  = 10 x 15 = 150
- hidden to hidden = 15 x 15 = 225
- hidden to output = 15 x 3 = 45

#### numerical calculation(feedforward)

- suppose
    - T = 20(sequence length)
    - D = 10 (input dimensionality)
    - M = 15 (hidden layer size)
    - K = 3 ( number of output classes)
- input to hidden  = T x D x T x M = 60000
- hidden to output = T x M x T x K = 18000


### GRU
- Gates allow you to remember or forget values 
![](https://camo.githubusercontent.com/95a102b5d99519445225116914b2cb76f4c75df9/68747470733a2f2f696d6167652e736c696465736861726563646e2e636f6d2f6e6c70646c3036666f72736c6964657368617265656e6768656c7665746963612d3136303730363032323732332f39352f726563656e742d70726f67726573732d696e2d726e6e2d616e642d6e6c702d352d3633382e6a70673f63623d31343637383433363034)

#### GRU
\begin{equation*}
z_t = Update\ gate \\ 
r_t = reset\ gate  \\
\hat{h}_t =  Candidate\ state\\
h_t = Next\ State\\
\end{equation*}


#### LSTM
\begin{equation*}
Forget\ gate = f_t \\ 
Input\ gate= i_t  \\
Output\ gate = O_t \\
Candidate\ cell = \hat{c}_t \\
Cell\ state = c_t\\
Hiddel\ state = h_t
\end{equation*}


### In Keras
- Normally, a layer outputs one thing
  - output = Dense(128)(input)
- For recurrent units, we can return hidden state as well( pass in return_state = True)
- output,h = GRU(128,return_state=True)(input)
- output,h = LSTM(128,return_state = True)(input)

- output,h ,c = LSTM(128, return_state=True,return_sequence=True)(input)
__output is a sequence but h,c is not a sequence__

### RNN Input
- T x D
- T = sequence length, D= input dimensionality
- "The quick brown fox jumps"
- if we have 5 words and my word vectors are of size 50, then the input to my LSTM or GRU will be of size 5 x 50

### RNN Input Example
- For example, suppose we measure the temperature at 10 different weather stations every hour for one day
- then D = 10, T = 24
- RNN input size is 24 x 10

### Spam Classification
- given some input x(1),...,x(100), I get some output y(1),...,y(100)
- we only have one question: Is this email spam or not?
    - The answer is either "yes" or "no"
- It makes sense to take y(100) as my answer, since only y(100) has seen the entire email
- __In Keras, just set return_sequence= False, automatically returns last output y(100)__

### Fancy method

\begin{equation*}
h^* = global\ max\ pool\ {h(t)}\\
\end{equation*}


### Categories of Tasks
![](https://cn.bing.com/th?id=OIP.RsRIEyJyfgvisdW363CbDwHaCU&pid=Api&rs=1&p=0)

- one to one : FeedForward
- one to many: Poetry generation
- many to one : Spam classification, Sentiment Analysis
- meny to many : Machine Translation, Chatbots, Questions Answering
- many to many 2 : Parts of speech named entity

### Code

In [1]:
from keras.models import Model
from keras.layers import Input, LSTM, GRU
import numpy as np
import matplotlib.pyplot as plt

import keras.backend as K
if len(K.tensorflow_backend._get_available_gpus()) > 0:
    from keras.layers import CuDNNLSTM as LSTM
    from keras.layers import CuDNNGRU as GRU


Using TensorFlow backend.


In [2]:
T = 8
D = 2
M = 3


In [3]:
X = np.random.randn(1, T, D)
print(X.shape)

(1, 8, 2)


In [4]:
def lstm1():
    input_ = Input(shape=(T, D))
    rnn = LSTM(M, return_state=True)
    x = rnn(input_)

    model = Model(inputs=input_, outputs=x)
    o, h, c = model.predict(X)
    print("o:{} {}".format(o,o.shape))
    print("h:", h)
    print("c:", c)

print("lstm1:")
lstm1()
    


lstm1:
o:[[ 0.25899306  0.07822097 -0.22339009]] (1, 3)
h: [[ 0.25899306  0.07822097 -0.22339009]]
c: [[ 0.63566405  0.22161505 -0.34867376]]


In [5]:
def lstm2():
    input_ = Input(shape=(T, D))
    rnn = LSTM(M, return_state=True, return_sequences=True)
    # rnn = GRU(M, return_state=True)
    x = rnn(input_)

    model = Model(inputs=input_, outputs=x)
    o, h, c = model.predict(X)
    print("o:", o)
    print("o.shape:", o.shape)
    print("h:", h)
    print("c:", c)

print("lstm2:")
lstm2()
print("X:{}".format(X))

lstm2:
o: [[[ 0.01824485  0.04186786  0.18939623]
  [ 0.14611624  0.05766922  0.08710869]
  [ 0.34080002 -0.01078751 -0.05276807]
  [ 0.01505907 -0.00545329 -0.06489272]
  [ 0.26224163  0.04613005 -0.04816915]
  [-0.19226253 -0.11242486 -0.12610993]
  [-0.29883868 -0.26215795 -0.1649565 ]
  [-0.13354035 -0.10853981 -0.1754228 ]]]
o.shape: (1, 8, 3)
h: [[-0.13354035 -0.10853981 -0.1754228 ]]
c: [[-0.35682207 -0.2680019  -0.27919272]]
X:[[[-1.37472998 -0.52175927]
  [-0.00436552  0.68916345]
  [ 1.53497601  1.64563988]
  [ 0.24716274 -0.59502862]
  [-0.06654198  1.38419831]
  [ 1.9396882  -0.42683302]
  [ 1.85916794 -1.05389072]
  [-1.76138438  0.47224838]]]


In [7]:
def gru1():
    input_ = Input(shape=(T, D))
    rnn = GRU(M, return_state=True)
    x = rnn(input_)

    model = Model(inputs=input_, outputs=x)
    o, h = model.predict(X)
    print("o:", o)
    print("o.shape:", o.shape)
    print("h:", h)

print("gru1:")
gru1()

gru1:
o: [[-0.27315468  0.0828017   0.03757757]]
o.shape: (1, 3)
h: [[-0.27315468  0.0828017   0.03757757]]


In [8]:
def gru2():
    input_ = Input(shape=(T, D))
    rnn = GRU(M, return_state=True, return_sequences=True)
    x = rnn(input_)

    model = Model(inputs=input_, outputs=x)
    o, h = model.predict(X)
    print("o:{} {}".format(o,o.shape))
    print("h:", h)

print("gru2:")
gru2()    

gru2:
o:[[[ 0.12943846 -0.07717463 -0.02388917]
  [ 0.28738564 -0.04693612 -0.1245046 ]
  [ 0.20431     0.06732592 -0.23698243]
  [-0.15020794  0.05190871 -0.03193881]
  [ 0.33300433  0.06397247 -0.21179497]
  [-0.56458634  0.11294359  0.09440236]
  [-0.85476255  0.17229506  0.37410137]
  [-0.2616035   0.02520172  0.07869992]]] (1, 8, 3)
h: [[-0.2616035   0.02520172  0.07869992]]
