## Introduction

 * What I understood from RNNs? 
 

* In the course, we saw how convolutional feed forward neural networks work on handwritten digit recognition (mnist dataset) and object  classification (fashion mnist dataset).  We also saw traiditional feed forward neural networks. They both are really good at classification. 

##### So, what's the problem?

* Although their good successes, feed forward neural networks are still very limited in what they can achieve.  In feed forward nets we were recognizing indivisual sample by classificiation or object detection. However, We can  also  analyze all input sequences in the same system.  In those layers sequences there are many useful information. These sequences may have complex and have different length of information. 
* For example speech recognition, vision are changing over time and as a result they produce high dimensional data. Proccessing this data with feed forward nets is not possible or they are not goot at modelling. 


##### What is RNN?

* For the problem that I have mentioned above, how to learn these sequences of informaion, recurrent neural network RNN is the solution. In the last lecture of the course we talked about memory based operations, feedback conventions and how neurons connected each other different from feed forward nets. 

* Feed forward neural networks were set in layers and information flows in one direction from input to output. It means that there were strict rules. I mean all connection are directed there is no undirected connection.
* So, to make more powerful systems RNNs are allowed to break those rules. RNNs do not have to be organized in layers and actually neurons are allowed to be connected themselves.



# Recurrent Neural Networks (RNN) with Keras
   * Reference: https://www.tensorflow.org/guide/keras/rnn



Recurrent neural networks (RNN) are a class of neural networks that is powerful for
modeling sequence data such as time series or natural language.


Schematically, a RNN layer uses a `for` loop to iterate over the timesteps of a
sequence, while maintaining an internal state that encodes information about the
timesteps it has seen so far.

## Setup

In [1]:
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

## Built-in RNN layers: a simple example

There are three built-in RNN layers in Keras:

keras.layers.SimpleRNN, a fully-connected RNN where the output from previous timestep is to be fed to next timestep.

keras.layers.GRU

keras.layers.LSTM

Here is a simple example of a `Sequential` model that processes sequences of integers,
embeds each integer into a 64-dimensional vector, then processes the sequence of
vectors using a `LSTM` layer.
* Ease of use: the built-in keras.layers.RNN, keras.layers.LSTM, keras.layers.GRU layers enable you to quickly build recurrent models without having to make difficult configuration choices.



#### This code part is copied from official TensorFlow website. Reference: https://www.tensorflow.org/guide/keras/rnn

In [3]:
model = keras.Sequential()
# Add an Embedding layer expecting input vocab of size 1000, and
# output embedding dimension of size 64.
model.add(layers.Embedding(input_dim=1000, output_dim=64))

# Add a LSTM layer with 128 internal units.
model.add(layers.LSTM(128))

# Add a Dense layer with 10 units.
model.add(layers.Dense(10))

model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding (Embedding)        (None, None, 64)          64000     
_________________________________________________________________
lstm (LSTM)                  (None, 128)               98816     
_________________________________________________________________
dense (Dense)                (None, 10)                1290      
Total params: 164,106
Trainable params: 164,106
Non-trainable params: 0
_________________________________________________________________


* In the first line, we are using the Keras Sequential model like we used before in our cnn example. It means we build the network up one layer at a time.
*  For the second line there is an  Embedding layer which maps each input to a 100-dimensional vector. For example if we use this RNN for the text generation, this layer maps the input word to a desired dimension. 

* At the center of an RNN there are layers composed of memory cells. LSTM consists of a memory cell which stores information for extended periods of time. 

* The LSTM has 3 different gates. there is a 'forget' gate for discarding irrelevant information, an “input” gate for using the current input, and an “output” gate for producing predictions at each time step. The function of each cell element is decided by the parameters (weights) which are learned during training.

* A Dense fully-connected output layer.

## Outputs and states

###### Reference: https://www.tensorflow.org/guide/keras/rnn
By default, the output of a RNN layer contains a single vector per sample. This vector
is the RNN cell output corresponding to the last timestep, containing information
about the entire input sequence. The shape of this output is `(batch_size, units)`
where `units` corresponds to the `units` argument passed to the layer's constructor.

A RNN layer can also return the entire sequence of outputs for each sample (one vector
per timestep per sample), if you set `return_sequences=True`. The shape of this output
is `(batch_size, timesteps, units)`.

###### * If you want to use RNN to analyse continuous data (mostly used in), you must use batch_input_shape() function
###### * In this function, there are 3 things you must provide batchsize, timestep, input_dim
   * batchsize: the number of rows to proceed on each epoch
    * timestep: the time interval of your data.
     * input_dim: number of column you are going to put in LSTM

The heart of the network: a layer of LSTM cells with dropout to prevent overfitting.
A Dropout layer to prevent overfitting to the training data.

The input to the LSTM layer is (None, 50, 100) which means that for each batch (the first dimension), each sequence has 50 timesteps (words), each of which has 100 features after embedding. Input to an LSTM layer always has the (batch_size, timesteps, features) shape.

In [23]:
model = keras.Sequential()
model.add(layers.Embedding(input_dim=1000, output_dim=64))

# The output of GRU will be a 3D tensor of shape (batch_size, timesteps, 256)
model.add(layers.GRU(256, return_sequences=True))

# The output of SimpleRNN will be a 2D tensor of shape (batch_size, 128)
model.add(layers.SimpleRNN(128))

model.add(layers.Dense(10))

model.summary()

Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_4 (Embedding)      (None, None, 64)          64000     
_________________________________________________________________
gru_1 (GRU)                  (None, None, 256)         247296    
_________________________________________________________________
simple_rnn_1 (SimpleRNN)     (None, 128)               49280     
_________________________________________________________________
dense_7 (Dense)              (None, 10)                1290      
Total params: 361,866
Trainable params: 361,866
Non-trainable params: 0
_________________________________________________________________


## RNN layers and RNN cells

In addition to the built-in RNN layers, the RNN API also provides cell-level APIs.
Unlike RNN layers, which processes whole batches of input sequences, the RNN cell only
processes a single timestep.

The cell is the inside of the `for` loop of a RNN layer. Wrapping a cell inside a
`keras.layers.RNN` layer gives you a layer capable of processing batches of
sequences, e.g. `RNN(LSTMCell(10))`.

Mathematically, `RNN(LSTMCell(10))` produces the same result as `LSTM(10)`. In fact,
the implementation of this layer in TF v1.x was just creating the corresponding RNN
cell and wrapping it in a RNN layer. 

There are three built-in RNN cells, each of them corresponding to the matching RNN
layer.

- `keras.layers.SimpleRNNCell` corresponds to the `SimpleRNN` layer.

- `keras.layers.GRUCell` corresponds to the `GRU` layer.

- `keras.layers.LSTMCell` corresponds to the `LSTM` layer.
