# Recurrent Neural Networks

In this chapter, we will discuss RNN's that "can analyze time series data" and more generally can "work on sequences of arbitrary lengths, rather than on fixed-sized inputs like all the nets we have discussed so far."

"In this chapter, we will look at the fundamental concepts underlying RNNs, the main problem they face (namely, vanishing/exploding gradients), and the solutions widely used to fight it: LSTM and GRU cells. Along the way, as always, we will show how to implement RNNs using TensorFlow. Finally, we will take a look at the architecture of a machine translation system."

### Recurrent Neurons

"Up to now, we have mostly looked at feedforward neural networks, where the activations flow only in one direction, from the input layer to the output layer. A recurrent neural network loos very much like a feedforward neural network, except it also has connections pointing backward. Let's look at the simplest possible RNN, composed of just one neuron receiving inputs, producing an output, and sending that output back to itself."

"At each time step t, this recurrent neuron receives the inputs **x**(t) as well as its own output from the previous time step y(t-1)."

Now, instead of inputting the weights (transposed) times the inputs + the bias, we have

activation( wx_t x **x** + wy_t x **y**(t-1) + b) where we have two separate weight vectors, wx and wy for the inputs and the outputs of the previous step.

### Memory Cells

"Since the output of a recurrent neuron at time step t is a function of all the inputs from previous time steps, you could say that it has a form of *memory*. A part of a neural network that preserves some state across time steps is called a *memory cell* (or simply a *cell*). A single recurrent neuron, or a layer of recurrent neurons, is a very basic cell, but later in this chapter we will look at some more complex and powerful types of cells."

### Input and Output Sequences

There are 4 possible approaches that RNNs can take in terms of their inputs and outputs.

- **Sequence-Sequence**: i.e., you input a sequence of data into the net, and it outputs another sequence. "This is useful for predicting time series such as stock prices: you feed it the prices over the last N days, and it must output the prices shifted by one day into the future (i.e., from N-1 days ago to tomorrow).

- **Sequence-Vector**: the idea here is that you input a sequence and then simply ignore all the outputs except the last one. This will give you one vector (or scalar if it's only one output neuron). "For example: you could feed the network a sequence of words corresponding to a movie review, and the network would output a sentiment score."

- **Vector-Sequence**: "You could feed the network a single input at the first time step (and zeros for all other time steps), and let it output a sequence... For example, the input could be an image, and the output could be a caption for that image."

- **Encoder-Decoder**: This approach actually combines a sequence-to-vector encoder followed by a vector-to-sequence decoder. "For example, this can be used for translating a sentence from one language to another. You would feed the network a sentence in one language, the encoder would convert this sentence into a single vector representation, and then the decoder would decode this vector into a sentence in another language. This two-step model... works much better than trying to translate on the fly with a single sequence-to-sequence RNN... since the last words of a sentence can affect the first words of the translation, so you need to wait until you have heard the whole sentence before translating it."

![inputoutput](./inputoutput.jpg)

In [27]:
import tensorflow as tf
import math
import numpy as np
from sklearn.model_selection import train_test_split

In [7]:
input_layer = tf.keras.layers.Input((28, 28))
rec = tf.keras.layers.SimpleRNN(150, activation='relu')(input_layer)
dense = tf.keras.layers.Dense(10, activation='softmax')(rec)
rnn = tf.keras.models.Model(inputs=input_layer, outputs=dense)
rnn.summary()

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_6 (InputLayer)         [(None, 28, 28)]          0         
_________________________________________________________________
simple_rnn_4 (SimpleRNN)     (None, 150)               26850     
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1510      
Total params: 28,360
Trainable params: 28,360
Non-trainable params: 0
_________________________________________________________________


In [8]:
rnn.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

In [9]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
(x_train.shape, x_test.shape)

((60000, 28, 28), (10000, 28, 28))

In [10]:
x_train, x_test = x_train/255., x_test/255.

In [11]:
rnn.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x1d257c10d30>

### Training to Predict Time Series

"Each training instance is a randomly selected sequence of 20 consecutive values from the time series, and the target sequence is the same as the input sequence, except it is shifted by one time step into the future."

In [71]:
# We will use the function f(t) = sin(t)/3 + 2*sin(5t) for our time series.
num_values = 800
timeSeries = [math.sin(t)/3 + 2*math.sin(5*t) for t in np.linspace(0,30,num=num_values)]
timeSeries[:5]

[0.0,
 0.3857803980708031,
 0.7584261902904045,
 1.1052637379071897,
 1.4145251384022537]

In [72]:
n_steps = 20
n_inputs = 1
n_neurons = 100
n_outputs = 1

In [93]:
x, y = [], []
for i in range(int(num_values/n_steps)):
    x_batch = timeSeries[n_steps*i:n_steps*(i+1)]
    y_batch = timeSeries[n_steps*i+1:n_steps*(i+1)+1]
    #print(np.array(y_batch).shape)
    x.append(x_batch)
    y.append(y_batch)
np.array(y).shape

(40,)

"At each timestep, we now have an output vector of size 100. But what we actually want is a single output value at each time step." We can solve this by adding a Dense layer of 1 neuron with no activation.

In [74]:
rnn = tf.keras.models.Sequential()
rnn.add(tf.keras.layers.SimpleRNN(n_neurons, activation='relu', input_shape=(20, 1)))
rnn.add(tf.keras.layers.Dense(1, activation=None))
rnn.summary()

Model: "sequential_7"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
simple_rnn_12 (SimpleRNN)    (None, 100)               10200     
_________________________________________________________________
dense_7 (Dense)              (None, 1)                 101       
Total params: 10,301
Trainable params: 10,301
Non-trainable params: 0
_________________________________________________________________


In [75]:
rnn.compile(optimizer='adam',loss='mse')

In [81]:
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=.2)

In [83]:
x_train = np.array(x_train).reshape(-1, 20, 1)
x_test = np.array(x_test).reshape(-1, 20, 1)
y_train = np.array(y_train)
y_train.shape
#y_test = np.array(y_test).reshape(-1,20,1)

(32,)

In [70]:
x_train.shape

(8, 20, 1)

In [60]:
rnn.fit(x_train, y_train, epochs=10, validation_data=(x_test,y_test))

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x1d25d0b95b0>

In [61]:
rnn.output.shape

TensorShape([None, 1])