# Recurrent Neural Networks (RNN)
## - and Long Short Term Memory (LSTM)

In this notebook we'll have a look at time-series and reccurent neural networks (RNN) and in particular LSTM, which is one subcatergory of RNNs.

The strength of recurrent neural networks is that it can keep information that it learned from a previous element in a list to inform the current element you are training on or evaluating. Therefore it is best used in problems where you will be analyzing or predicting the future values of a list or a data sequence. In contrast an ordinary NN will remember what it has trained, but does not remember anything about the sequence in which it evaluates inputs. 

LSTM networks has a more sophisticated memory mechanisms enabling the network to remember information that is further apart in time, or distance within the sequence, but to better illustrate what is new with RNNs and LSTM considerthe figures below. 

Figure 1 shows the architecture of normal or "vanilla" NNs in sequence that takes the sequence of inputs, but has no interactions between each input evaluation. Many networks could be trained for each sequence element separately, but not be able to remember anything about the previous sequence elements. 

<img src="Fig1Vanilla.png" width=700>
<center>Figure 1</center>
    
[Image source](https://towardsdatascience.com/recurrent-neural-networks-d4642c9bc7ce)

Figure 2 introduces the notion of RNNs, the same weights $W_x$ are trained and applied to all inputs of the sequence. The set of weights $W_h$ are trained to take the relevant information from one input evaluation to the next in the series. 

<img src="Fig2RNN.png" width=700>
<center>Figure 2</center>

[Image source](https://towardsdatascience.com/recurrent-neural-networks-d4642c9bc7ce)

What LSTM introduces to the network is to add a cell state that can carry information from all previous input evaluations and has a more sophisticated way of setting the wieghts $W_h$, we refer to the referenced links and sources to get more detail on this. 

[Source](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)

Useful links:
- http://colah.github.io/posts/2015-08-Understanding-LSTMs/
- https://towardsdatascience.com/recurrent-neural-networks-d4642c9bc7ce
- https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/

We'll first test how a simple LSTM network can learn to predict y[n + 1] based on a synthetical a cosine wave with exponential amplitude. We'll then have a look at a time-series from invasive pressure measurements.

Instuctions:
- The code is configured to run the exponential amplitude cosine example. Run all cells to observe the performance. The RNN LSTM trains on the train fraction of the data, which at default is set to half the data, and tests on the remaining fraction. 
- Examine the code to understand how the network uses data to train, and tune the training fraction and other paramaters and see how this affects the results.

In [None]:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
import math
%matplotlib inline

t_start = 0
t_end = 2*np.pi
N = 1001
t = np.linspace(t_start, t_end, N)
y = np.cos(5*t)*np.exp(2*t/t_end)


plt.figure(1, figsize=(8,4),dpi=300)
plt.plot(t, y)
plt.title('Data series')
plt.xlabel('x')
plt.ylabel('y')
plt.show()

y = y[:, np.newaxis] # turn 1D array into 2D array of shape (N, 1)

In [None]:
# convert an array of values into a dataset matrix
def create_dataset(dataset, look_back=1):
    # dataX contains y_value of current time-step in addition to the previous (look_back - 1) time-steps
    # dataY contains the y_value of the next time-step
    dataX, dataY = [], []
    for i in range(len(dataset)-look_back-1):
        a = dataset[i:(i+look_back), 0]
        dataX.append(a)
        dataY.append(dataset[i + look_back, 0])
    return np.array(dataX), np.array(dataY)

# Parameters
look_back = 10
epochs = 10
trainFrac = 0.5
LSTM_blocks = 4

# Scale data to be bounded by 0 and 1
scaler = MinMaxScaler(feature_range=(0, 1))
y = scaler.fit_transform(y)
# Split into train and test sets
train_size = int(len(y) * trainFrac)
test_size = len(y) - train_size
y_train, y_test = y[0:train_size,:], y[train_size:len(y),:]


In [None]:
trainX, trainY = create_dataset(y_train, look_back)
testX, testY = create_dataset(y_test, look_back)
# turn 2D (N_train, look_back) array into 3D array of shape (N_train, 1, look_back)
# dims should be [samples, time steps, features]
trainX = np.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1])) 
testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1]))
trainY = trainY[:, np.newaxis] # turn 1D array into 2D array of shape (N, 1)
testY = testY[:, np.newaxis] # turn 1D array into 2D array of shape (N, 1)

In [None]:
# Set up the neural network architecture 
model = Sequential()
model.add(LSTM(LSTM_blocks, input_shape=(1, look_back)))
model.add(Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(trainX, trainY, epochs=epochs, batch_size=1, verbose=2)
# make predictions
trainPredict = model.predict(trainX)
testPredict = model.predict(testX)


In [None]:
trainPredict = scaler.inverse_transform(trainPredict)
trainY = scaler.inverse_transform(trainY)
testPredict = scaler.inverse_transform(testPredict)
testY = scaler.inverse_transform(testY)
# calculate root mean squared error
trainScore = math.sqrt(mean_squared_error(trainY[:,0], trainPredict[:,0]))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = math.sqrt(mean_squared_error(testY[:,0], testPredict[:,0]))
print('Test Score: %.2f RMSE' % (testScore))
# shift train predictions for plotting
trainPredictPlot = np.empty_like(y)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[look_back:len(trainPredict)+look_back, :] = trainPredict
# shift test predictions for plotting
testPredictPlot = np.empty_like(y)
testPredictPlot[:, :] = np.nan
testPredictPlot[len(trainPredict)+(look_back*2)+1:len(y)-1, :] = testPredict
# plot baseline and predictions
plt.figure(1, figsize=(8,4),dpi=300)
plt.plot(scaler.inverse_transform(y), label="Data")
plt.plot(trainPredictPlot, label="Training data prediction")
plt.plot(testPredictPlot, label="Test data prediction")
plt.title('Data series')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

The code was inspired by this [source](https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/)