# Signal Processing (LSTM Training)

Now that we have the data processed properly the challenge will be to design, train and test the Long Short-Term Memory (LSTM) network to predict the classifications we have previously extracted.

For reference here is the image again of the full Deep Deterministic Policy Gradient (DDPG) Reinforcement Learning (RL) architecture we are trying to build.  Please see the full [2nd report](docs/report2.pdf) for a complete description of this network.

![DDPG](docs/ddpg.png "DDGP")

As you can see there is a LSTM in both the actor and critic networks.  Before the full DDPG can be implemented we must confirm that the LSTM can provide satisfactory signals or the DDPG will just be running on noise.

## 1. Load some necessary modules

In [17]:
# from copy import deepcopy
#import matplotlib.pyplot as plt
import numpy as np
#import os
import pandas as pd
import pickle
#import random
#import seaborn as sns
from sklearn.model_selection import train_test_split
#import time
import torch
import yaml

#%matplotlib inline

## 1. Load and encode the timeseries data 
Each data _point_ at time `t` is actually a series of `k` standardized return values from `t-k` to `t` (`x`) and an array of classifications for each `dt` in `prediction_days` (`y`).  Therefore each data point is treated as independent events and will be processed in a random order.  So our first task is to properly structure the timeseries data into indepentent events.

Here is an illustration of this transformation for a single asset with two different number of time values.

![embedding](docs/embedding.png "embedding")

Load the previously saved data and unpack it into Numpy arrays.

In [40]:
with open('training-data-raw.pkl', 'rb') as f:
    data = pickle.load(f)
x_raw = data['x'].values
names = data['x'].columns.tolist()
dates = data['x'].index.tolist()
dt_values = list(data['y'].keys())

y_raw = np.empty((len(dates), len(names), len(dt_values)))
for i, dt in enumerate(dt_values):
    y_raw[:, :, i] = data['y'][dt].values

Load the settings file as well

In [41]:
with open('settings.yml') as f:
    settings = yaml.safe_load(f)

### 1.1 Process the input data `x`
Both `x` and `y` are transformed into 3D matrices as shown below.  Note that the 1st 2 dimensions are the same but the 3rd dimension is different.
![shape](docs/data-shape.png "shape")

In [56]:
k = settings['embedding_days']
n = x_raw.shape[0] - 1
x = np.empty((n - k + 2, len(names), k))
for k_i in range(k):
    x[:, :, k_i] = x_raw[(k - k_i - 1):(n + 1 - k_i), :]

Quick sanity check on the data.  Should be all 0.0's; ie. max(abs(delta)) = 0.0.
Note how the embedded values are in reverse order (latest first) of the raw data (oldest first).

In [57]:
max_delta = 0.0
for t in range(x.shape[0]):
    max_delta = max(max_delta, np.abs(x_raw[t:(t+k), :] - x[t, :, ::-1].T).max())
max_delta

0.0

### 1.2 Process the target values `y`

In [62]:
y = y_raw[(k - 1):, :, :]
print(x.shape)
print(y.shape)

(2928, 70, 63)
(2928, 70, 3)


## 2. Split data into training, validation and testing sets
The `test` set is not used during the training and is only used at the very end to evaluate how well the LSTM can predict unseen data.  For our purposes we defined the `test` set as data from the period from the `training_end` date defined in the settings file to the end of our processed data.

The `train` set is the data actually used to traing the LSTM where as the `val` set isn't directly used to traing the LSTM but is used to evaluate the training process after each training epoch.  After each epoch we evaluate the model against the `train` and `val` set.  We will continue to train as long as both the `train` and `val` errors decay, but we must stop if the `val` error begins to rise.  A falling `train` error but rising `val` error indicates the model is starting to overfit the training data.  A good description of overfitting is contained in this [wiki page](https://en.wikipedia.org/wiki/Overfitting).

We will follow the stardard practice of splitting our non-test data 9-1 between the `train` and `val` sets.  Note also that we are randomizing the data within the `train` and `val` sets to emilinate any temporal biases.


In [63]:
n_test_days = data['x'].loc[settings['training_end']:, :].shape[0]
    
# Test data is simply all the data after the 'training_end' date.
x_test = x[-n_test_days:, :, :]
y_test = y[-n_test_days:, :, :]

# Extract the train and val data and then randomly split 9/1
x_train, x_val, y_train, y_val = train_test_split(x[:-n_test_days, :, :], 
                                                  y[:-n_test_days, :, :], 
                                                  test_size=0.1)

Data is now ready, let the real fun begin....

## 3. Define the LSTM network
Please see the PyTorch LSTM [documentation](https://pytorch.org/docs/stable/nn.html#lstm) for more detail.    

A special thanks for this component goes to the Udacity Deep Learning [Nanodegree](https://www.udacity.com/course/deep-learning-nanodegree--nd101).  Much of this content was derived from the Nanodegree projects.

### 3.1 Check to see if running on a GPU

In [19]:
train_on_gpu = torch.cuda.is_available()
if(train_on_gpu):
    print('Training on GPU!')
else: 
    print('No GPU available, training on CPU; ouch, consider making n_epochs very small.')

No GPU available, training on CPU; ouch, consider making n_epochs very small.


### 3.2 Define the LSTM network

In [20]:
import torch.nn as nn

class LSTM(nn.Module):
    
    def __init__(self, n_features, n_output, n_dt, k, n_layers=1, n_hidden=500, dropout=0.2):
        """
        Initialize the PyTorch LSTM Module plus a linear layer to perform the regression classification.
        
        Args:
            n_features (int): The number of input dimensions (1 if only 1 asset modeled in isolation)
            n_dt (int): Number prediction horizons
            n_output (int): The number assets we are predicting
            k (int): The time embedding 
            n_layers (int): Number of LSTM layers
            n_hidden (int): Number of hidden nodes in the LSTM layers
            dropout (float): dropout to add in between LSTM/GRU layers
        """
        
        super(RNN, self).__init__()
        
        # Set class attributes
        self.n_features = n_features
        self.n_dt = n_dt
        self.n_layers = n_layers
        self.n_hidden = n_hidden
        self.output_size = n_dt * n_output
        
        # Define the LSTM
        self.lstm = nn.LSTM(input_size=n_features, hidden_size=n_hidden, num_layers=n_layers, 
                            batch_first=True, dropout=dropout)
        
        # Linear fully-connected feed forward network
        self.fc = nn.Linear(hidden_dim, self.output_size)
        self.sig = nn.Sigmoid()
        
        
    def forward(self, x, hidden):
        """
        Forward propagation of the neural network
        
        Args:
            x (tensor): [batch_size, k, n_feature] The input to the neural network
            hidden (tuple of tensor): The previous hidden state
        
        Returns:
            tensor:  [self.output_size] Ouput of the network
            tuple of tensor:  (h_n, c_n) The latest hidden state
        """

        # Get LSTM outpout and updated hidden state
        out, hidden = self.lstm(x, hidden)
    
        # Stack up lstm outputs
        out = out.contiguous().view(-1, self.n_hidden)

        # Feed through fully-connected layer and apply the signmoid function to limit to [0,1]
        out = self.sig(self.fc(out))
        
        # Reshape to be batch_size first
        out = out.view(x.shape[0], -1, self.output_size)
        out = out[:, -1] # get last batch of labels

        # return last sigmoid output and hidden state
        return out, hidden
    
    
    def init_hidden(self, batch_size):
        '''
        Initialize the hidden state of an LSTM/GRU
        
        Args:
            batch_size: The batch_size of the hidden state
        
        Returns:
            tuple of int:  hidden state of dims (n_layers, batch_size, n_hidden)
        '''
        # Initialize hidden state with zero weights, and move to GPU if available
        
        weight = next(self.parameters()).data
        
        if (train_on_gpu):
            hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_().cuda(),
                      weight.new(self.n_layers, batch_size, self.n_hidden).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.n_hidden).zero_(),
                      weight.new(self.n_layers, batch_size, self.n_hidden).zero_())
        
        return hidden

### 3.3 Define the forward and backpropagation steps

In [22]:
def forward_back_prop(lstm, optimizer, criterion, x, y, hidden, clip=5.0):
    """
    Forward and backward propagation on the neural network
    
    Args:
        lstm (LSTM): The LSTM object
        optimizer: The PyTorch optimizer for the neural network
        criterion: The PyTorch loss function
        x (tensor): [batch_size, k, n_feature] The input to the neural network
        y (tensor): [batch_size, n_dt * n_output] The neural network output
        hidden (tuple of tensor): The previous hidden state
        clip (float):  Value to clip gradients to avoid exploding LSTM gradients
    
    Returns:
        float: The loss for the last training batch item
        tuple of tensor:  (h_n, c_n) The latest hidden state
    """
    
    if(train_on_gpu):
        x, y = x.cuda(), y.cuda()
 
    # Creating new variables for the hidden state, otherwise
    # we'd backprop through the entire training history
    hidden = tuple([each.data for each in hidden])
    
    # Zero accumulated gradients
    lstm.zero_grad()
    
    # Perform forward propagations, loss calculation and back propagation
    output, hidden = rnn(x, hidden)
    loss = criterion(output, y)
    loss.backward()
    
    # Clip the gradients and then perform the weight optimization
    nn.utils.clip_grad_norm_(lstm.parameters(), clip)
    optimizer.step()

    return loss.item(), hidden