### GRU
- Gates allow you to remember or forget values 
![](https://camo.githubusercontent.com/95a102b5d99519445225116914b2cb76f4c75df9/68747470733a2f2f696d6167652e736c696465736861726563646e2e636f6d2f6e6c70646c3036666f72736c6964657368617265656e6768656c7665746963612d3136303730363032323732332f39352f726563656e742d70726f67726573732d696e2d726e6e2d616e642d6e6c702d352d3633382e6a70673f63623d31343637383433363034)

#### GRU
\begin{equation*}
z_t = Update\ gate \\ 
r_t = reset\ gate  \\
\hat{h}_t =  Candidate\ state\\
h_t = Next\ State\\
\end{equation*}


#### LSTM
\begin{equation*}
Forget\ gate = f_t \\ 
Input\ gate= i_t  \\
Output\ gate = O_t \\
Candidate\ cell = \hat{c}_t \\
Cell\ state = c_t\\
Hiddel\ state = h_t
\end{equation*}


### In Keras
- Normally, a layer outputs one thing
  - output = Dense(128)(input)
- For recurrent units, we can return hidden state as well( pass in return_state = True)
- output,h = GRU(128,return_state=True)(input)
- output,h = LSTM(128,return_state = True)(input)

- output,h ,c = LSTM(128, return_state=True,return_sequence=True)(input)
__output is a sequence but h,c is not a sequence__

### RNN Input
- T x D
- T = sequence length, D= input dimensionality
- "The quick brown fox jumps"
- if we have 5 words and my word vectors are of size 50, then the input to my LSTM or GRU will be of size 5 x 50

### RNN Input Example
- For example, suppose we measure the temperature at 10 different weather stations every hour for one day
- then D = 10, T = 24
- RNN input size is 24 x 10

### Spam Classification
- given some input x(1),...,x(100), I get some output y(1),...,y(100)
- we only have one question: Is this email spam or not?
    - The answer is either "yes" or "no"
- It makes sense to take y(100) as my answer, since only y(100) has seen the entire email
- __In Keras, just set return_sequence= False, automatically returns last output y(100)__

### Fancy method

\begin{equation*}
h^* = global\ max\ pool\ {h(t)}\\
\end{equation*}


### Categories of Tasks
![](https://cn.bing.com/th?id=OIP.RsRIEyJyfgvisdW363CbDwHaCU&pid=Api&rs=1&p=0)

- one to one : FeedForward
- one to many: Poetry generation
- many to one : Spam classification, Sentiment Analysis
- meny to many : Machine Translation, Chatbots, Questions Answering
- many to many 2 : Parts of speech named entity

### Code

In [1]:
from keras.models import Model
from keras.layers import Input, LSTM, GRU
import numpy as np
import matplotlib.pyplot as plt

import keras.backend as K
if len(K.tensorflow_backend._get_available_gpus()) > 0:
    from keras.layers import CuDNNLSTM as LSTM
    from keras.layers import CuDNNGRU as GRU


Using TensorFlow backend.


In [2]:
T = 8
D = 2
M = 3


In [4]:
X = np.random.randn(1, T, D)
X.shape

(1, 8, 2)

In [6]:
def lstm1():
    input_ = Input(shape=(T, D))
    rnn = LSTM(M, return_state=True)
    x = rnn(input_)

    model = Model(inputs=input_, outputs=x)
    o, h, c = model.predict(X)
    print("o:{} {}".format(o,o.shape))
    print("h:", h)
    print("c:", c)

print("lstm1:")
lstm1()
    


lstm1:
o:[[-0.26979086  0.343693    0.21850814]] (1, 3)
h: [[-0.26979086  0.343693    0.21850814]]
c: [[-0.6403028  0.7597804  0.5214866]]


In [7]:
def lstm2():
    input_ = Input(shape=(T, D))
    rnn = LSTM(M, return_state=True, return_sequences=True)
    # rnn = GRU(M, return_state=True)
    x = rnn(input_)

    model = Model(inputs=input_, outputs=x)
    o, h, c = model.predict(X)
    print("o:", o)
    print("h:", h)
    print("c:", c)

print("lstm2:")
lstm2()


lstm2:
o: [[[-0.05568678 -0.09679367 -0.09874348]
  [-0.18587263 -0.04110741 -0.02611397]
  [-0.35044768  0.01810769  0.07335648]
  [-0.36604235  0.07233787  0.10123552]
  [-0.3669878   0.14892781  0.28963414]
  [-0.1463219   0.06596465  0.07457601]
  [-0.15173775 -0.14319791 -0.08914054]
  [-0.2898902  -0.06707459 -0.03496142]]]
h: [[-0.2898902  -0.06707459 -0.03496142]]
c: [[-0.64311373 -0.13506934 -0.05812597]]


In [8]:
def gru1():
    input_ = Input(shape=(T, D))
    rnn = GRU(M, return_state=True)
    x = rnn(input_)

    model = Model(inputs=input_, outputs=x)
    o, h = model.predict(X)
    print("o:", o)
    print("h:", h)

print("gru1:")
gru1()

gru1:
o: [[-0.20170054 -0.51547587 -0.3959277 ]]
h: [[-0.20170054 -0.51547587 -0.3959277 ]]


In [10]:
def gru2():
    input_ = Input(shape=(T, D))
    rnn = GRU(M, return_state=True, return_sequences=True)
    x = rnn(input_)

    model = Model(inputs=input_, outputs=x)
    o, h = model.predict(X)
    print("o:{} {}".format(o,o.shape))
    print("h:", h)

print("gru2:")
gru2()    

gru2:
o:[[[ 0.11412916 -0.12037048 -0.08807308]
  [-0.04899482 -0.13237429 -0.37075138]
  [-0.2735762  -0.22598538 -0.97016907]
  [-0.42028093 -0.15863864 -0.68616873]
  [-0.689677    0.0127814  -0.52059066]
  [-0.21001369 -0.05192477 -0.16916846]
  [ 0.03749506 -0.22877488 -0.17279106]
  [-0.06917083 -0.19794294 -0.40561715]]] (1, 8, 3)
h: [[-0.06917083 -0.19794294 -0.40561715]]
