# Signal Processing (LSTM Training)

Now that we have the data processed properly the challenge will be to design, train and test the Long Short-Term Memory (LSTM) network to predict the classifications we have previously extracted.

For reference here is the image again of the full Deep Deterministic Policy Gradient (DDPG) Reinforcement Learning (RL) architecture we are trying to build.  Please see the full [2nd report](docs/report2.pdf) for a complete description of this network.

![DDPG](docs/ddpg.png "DDGP")

As you can see there is a LSTM in both the actor and critic networks.  Before the full DDPG can be implemented we must confirm that the LSTM can provide satisfactory signals or the DDPG will just be running on noise.

## 1. Load some necessary modules

In [15]:
from copy import deepcopy
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import pickle
import random
import seaborn as sns
import time
import yaml

%matplotlib inline

## 1. Load and encode the timeseries data 
Each data _point_ at time `t` is actually a series of `k` standardized return values from `t-k` to `t` (`x`) and an array of classifications for each `dt` in `prediction_days` (`y`).  Therefore each data point is treated as independent events and will be processed in a random order.  So our first task is to properly structure the timeseries data into indepentent events.

Here is an illustration of this transformation for a single asset with two different number of time values.

![embedding](docs/embedding.png "embedding")

Load the previously saved data and unpack it.

In [31]:
with open('training-data-raw.pkl', 'rb') as f:
    data = pickle.load(f)
x_raw = data['x']
y_raw = data['y']

Load the settings file as well

In [32]:
with open('settings.yml') as f:
    settings = yaml.safe_load(f)

### 1.1 Process the input data `x`
For each asset lets create the return matrix with rows for each `t` and columns for time embedding `k`.

In [21]:
k = settings['embedding_days']
n = x_raw.shape[0] - 1
x_all = {}
for i, asset in enumerate(x_raw.columns):
    x = np.empty((n - k + 2, k))
    for k_i in range(k):        
        x[:, k_i] = x_raw.iloc[(k - k_i - 1):(n + 1 - k_i), i].values
    x_all[asset] = x

Quick sanity check on the data.  Should be all 0.0's; ie. max(abs(delta)) = 0.0.
   Note how the embedded values are in reverse order (latest first) of the raw data (oldest first).
   This a brute force check so you can set `run_test = False` to save time if you have faith.

In [30]:
run_test = True
if run_test:
    max_delta = 0.0
    for i, asset in enumerate(x_raw.columns):
        for t in range(x_all[asset].shape[0]):
            max_delta = max(max_delta, np.abs(x_raw.iloc[t:(t+k), i].values - x_all[asset][t, ::-1]).max())
    max_delta        

0.0

### 1.2 Process the target values `y`
For each time in `x` we need the classification for each of the prediction lengths `dt`.  Therefore `y` will be a matrix with `len(dt)` columns and `(n-k+2)` rows like `x`.

In [36]:
y_all = {}
for i_asset, asset in enumerate(x_raw.columns):
    y = np.empty((n-k+2, len(settings['prediction_days'])))
    for t in range(x_all[asset].shape[0]):
        for i_dt, dt in enumerate(settings['prediction_days']):
            y[t, i_dt] = y_raw[dt].iloc[t+k-1, i_asset]
    y_all[asset] = y

## 2. Split data into training, validation and testing sets
The `test` set is not used during the training and is only used at the very end to evaluate how well the LSTM can predict unseen data.  For our purposes we defined the `test` set as data from the period from the `training_end` date defined in the settings file to the end of our processed data.

The `train` set is the data actually used to traing the LSTM where as the `val` set isn't directly used to traing the LSTM but is used to evaluate the training process after each training epoch.  After each epoch we evaluate the model against the `train` and `val` set.  We will continue to train as long as both the `train` and `val` errors decay, but we must stop if the `val` error begins to rise.  A falling `train` error but rising `val` error indicates the model is starting to overfit the training data.  A good description of overfitting is contained in this [wiki page](https://en.wikipedia.org/wiki/Overfitting).

We will follow the stardard practice of splitting our non-test data 9-1 between the `train` and `val` sets.  Note also that we are randomizing the data within the `train` and `val` sets to emilinate any temporal biases.
