# Recurrent Neural Network forecasting

- Reference:`keras.layers.SimpleRNN`

**Basic Architecture diagram**  

<img src="./pic/RNN.png" alt="RNN Architecture" width="500"/>

- U,V,W are weight matrix.
- Input: vector $X_t$ is input for network at time step t.  
- Hidden state: vector $h(t)=\tanh \left(W h(t-1)+U_{x}(t)\right)$
- Output: $y_t$ is the output for the network at time step t. $y_t=\operatorname{softmax}(V s(t))$

Every neuron is assigned to a fixed step. The output of the hidden layer of one time step is part of the input of next time step.

- The algorithm is to find the optimal weight matrix U,V,W that gives the best prediction or minimizes the loss function $J$. 
$$J(\theta)=\frac{1}{m} \sum_{i=1}^{m} \sum_{t=1}^{N_{i}} D(y_t, Y_t)$$
- Forward/Backward propagation

## Vanishing gradiant problems

- LSTM `keras.layers.LSTM`, first proposed in
[Hochreiter & Schmidhuber, 1997](https://www.bioinf.jku.at/publications/older/2604.pdf).
- GRU `keras.layers.GRU`, first proposed in
[Cho et al., 2014](https://arxiv.org/abs/1406.1078).

## RNN Crypto-forcasting

In [None]:
import os
import IPython
import IPython.display
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf
mpl.rcParams['figure.figsize'] = (8, 6)
mpl.rcParams['axes.grid'] = False

import tensorflow as tf
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))
#tf.debugging.set_log_device_placement(True)

In [None]:
##data and date
csv_path = '/Users/dingxian/Documents/GitHub/Crypto_Forecasting_kaggle/codetest/btc.csv'
df = pd.read_csv(csv_path)
df = df[-70000:]
date_time = pd.to_datetime(df.pop('timestamp'),unit='s')
df.head()

## Split the data

split for the training, validation, and test sets. 

In [None]:
df = df[['Count','Open','High','Low','Close','Volume','VWAP']]
column_indices = {name: i for i, name in enumerate(df.columns)}

n = len(df) #rows
train_df = df[0:int(n*0.7)]
val_df = df[int(n*0.7):int(n*0.9)]
test_df = df[int(n*0.9):]

num_features = df.shape[1]

## Normalize the data
scale features before training a neural network

In [None]:
train_mean = train_df.mean()
train_std = train_df.std()

train_df = (train_df - train_mean) / train_std
val_df = (val_df - train_mean) / train_std
test_df = (test_df - train_mean) / train_std

## Data windowing

The models make a set of predictions based on a window of consecutive samples from the data.
- The `width` (number of time steps) of the `input` and `label` windows.
- The `time offset` between them.
- Which features are used as `inputs`, `labels`, or both.

In [None]:
from script.RNN.window import WindowGenerator
wide_window = WindowGenerator(
    input_width=30, label_width=30, shift=1,train_df=train_df, val_df=val_df, test_df=test_df,
    label_columns=['Close'])

wide_window

## Long Short-Term Memory  
<img src="./pic/LSTM-1.png" alt="LSTM Architecture" width="500"/> 

- ` tf.keras.layers.LSTM`
- `return_sequence=True`, it will return something with shape: `(batch_size, timespan, unit)`. 
- `return_sequence=False`, then it just return the last output in shape `(batch_size, unit)`.


### Model Design
- `LSTM(units = 32)`: LSTM layer with 32 internal units.
- `Dense(units=1)`: Dense layer with 1 units.

In [None]:
lstm_model = tf.keras.models.Sequential([
    tf.keras.layers.LSTM(units = 32, return_sequences=True),#Recurrent layers
    tf.keras.layers.Dense(units=1)#densely-connected NN layer
])
IPython.display.clear_output()

In [None]:
print('Input shape:', wide_window.example[0].shape)#[batch, timesteps, feature]
print('Output shape:', lstm_model(inputs= wide_window.example[0]).shape)
print('Label shape:', wide_window.example[1].shape)

In [None]:
val_performance={}
performance={}

### Fitting
- `compile_and_fit()`: do `model.compile`, `model.fit`, `TensorBoard`

In [None]:
# Load the TensorBoard notebook extension.
%load_ext tensorboard
import tensorboard
# Clear any logs from previous runs
!rm -rf ./logs/

In [None]:
from script.RNN.compilefit import compile_and_fit
history = compile_and_fit(model = lstm_model, window = wide_window, 
                            patience=2,MAX_EPOCHS = 2)

val_performance['LSTM'] = lstm_model.evaluate(wide_window.val)
performance['LSTM'] = lstm_model.evaluate(wide_window.test, verbose=0)
#IPython.display.clear_output()

### Visualization the architecture
- `lstm_model.summary()`
- Graphs dashboard

In [None]:
#Op-level graph
%tensorboard --logdir logs

In [None]:
lstm_model.summary()

In [None]:
wide_window.plot(lstm_model)