# Stock Trading Strategy Based on Neural Net Time Series Analysis

For background, please see our previous notebook: leading_pattern/notebook.ipynb. Also, the work below is based on the ideas from [this page](https://www.tensorflow.org/tutorials/structured_data/time_series) on tensorflow.org

The basic idea is to predict the price of a target stock based on historical values (price, volume, etc..) of a set of predictor stocks; an event happens, it ripples through predictors then to the target. In our previous strategy we found this is difficult... maybe even impossible. Maybe we will have better luck with neural networks (NN).

One possible benefit of NN is they make it easy to combine predictors. Previously we looked for individual stocks as predictors with the thought that we would find a way to combine them. With that method it is even possible we could have two or more predictors that by themselves could not exceed a threshold, but together they would. With NN we would find such a predictor.

In the previous analysis, we looked for spikes in the target that lasted 2 days and exceeded a 3% gain. That approach gave us about 10-20 events/year for an average stock. Using a NN classifier on data like that would fail because the algorithm would right 90% of the time just by predicting "no event" everyday. To get around that issue (kick the can further down the road?) we will try to predict the price tomorrow for every day. If the price tomorrow is some threshold above today, we would buy. After that, we would sell if the price was predicted to go down.

One potential drawback of the approach is we could tie our money up on small, single day price swings. However with our previous strategy we also tied our money up on false positives.

We could design a NN to do multi-day predictions, but at this point it is not clear we will even be able to do single-day predictions. So lets try to walk before we run.

## The Baseline Strategy
A simple and robust strategy is simply to say the price tomorrow will be the same as today. In fact, in a bull market, that strategy might make a lot of money.

Setting aside the idea of making money, this strategy also provide a baseline for how well we can predict prices. If we can't do significantly better than that, then our predictor is useless.

## Linear Model
With this model, we are saying that the price of the target tomorrow is a weighted sum of the predictors today.

In [1]:
%matplotlib widget

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import tensorflow as tf

from nn import nn
from nn.nn import MAX_EPOCHS


plt.rcParams['figure.figsize'] = [10, 5]

config = {
    'start_date': '20170103',
    'end_date': '20181231',
    'price_field': 'Open',
    'predictor_field': 'Open'
}
splits = [0.5, 0.75]

# Load some stocks
target = 'NPTN'  # this is automatically included in the predictors

# These are the ticker symbols for a random selection of stocks
predictors = ['MTL', 'AGR', 'PHD', 'ESS', 'CBOE', 'AGRO', 'FPF', 'CHMA', 'SEAS']
n_predictors = len(predictors) + 1
df = nn.load_data(target, predictors, config)
norm_and_split = nn.NormalizeAndSplit(df, splits, 'target')
train_df, val_df, test_df = norm_and_split.moving_avg(10)
mean_absolute_error = []
column_indices = {name: i for i, name in enumerate(df.columns)}

# Make a window that has one time as an input, one time as an output with a shift of one between them
input_width = 1
label_width = 1
shift = 1
single_step_window = \
    nn.WindowGenerator(input_width, label_width, shift, train_df, val_df, test_df, label_columns=['target'])

# Run baseline model to get performance -----------------------------------------------------------------------
baseline = nn.Baseline(label_index=column_indices['target'])
baseline.compile(loss=tf.losses.MeanSquaredError(), metrics=[tf.metrics.MeanAbsoluteError()])

# We do not need to "train" this model because it simply returns the current value as the prediction
perf = [
    'Baseline',
    baseline.evaluate(single_step_window.train, verbose=0)[1],
    baseline.evaluate(single_step_window.val, verbose=0)[1],
    baseline.evaluate(single_step_window.test, verbose=0)[1]
]
mean_absolute_error.append(perf)

# This is the linear model ------------------------------------------------------------------------------------
linear = tf.keras.Sequential([tf.keras.layers.Dense(units=1)])
nn.compile_and_fit(linear, single_step_window, max_epochs=300, patience=10)

perf = [
    'Linear',
    linear.evaluate(single_step_window.train, verbose=0)[1],
    linear.evaluate(single_step_window.val, verbose=0)[1],
    linear.evaluate(single_step_window.test, verbose=0)[1]
]
mean_absolute_error.append(perf)

single_step_window.plot_fit(linear, norm_and_split.de_normalize, 'Linear')

in call
Epoch 1/300
Epoch 2/300
Epoch 3/300
Epoch 4/300
Epoch 5/300
Epoch 6/300
Epoch 7/300
Epoch 8/300
Epoch 9/300
Epoch 10/300
Epoch 11/300
Epoch 12/300
Epoch 13/300
Epoch 14/300
Epoch 15/300
Epoch 16/300
Epoch 17/300
Epoch 18/300
Epoch 19/300
Epoch 20/300
Epoch 21/300
Epoch 22/300
Epoch 23/300
Epoch 24/300
Epoch 25/300
Epoch 26/300
Epoch 27/300
Epoch 28/300
Epoch 29/300
Epoch 30/300
Epoch 31/300
Epoch 32/300
Epoch 33/300
Epoch 34/300
Epoch 35/300
Epoch 36/300
Epoch 37/300
Epoch 38/300
Epoch 39/300
Epoch 40/300
Epoch 41/300
Epoch 42/300
Epoch 43/300
Epoch 44/300
Epoch 45/300
Epoch 46/300
Epoch 47/300
Epoch 48/300
Epoch 49/300
Epoch 50/300
Epoch 51/300
Epoch 52/300
Epoch 53/300
Epoch 54/300
Epoch 55/300
Epoch 56/300
Epoch 57/300
Epoch 58/300
Epoch 59/300
Epoch 60/300
Epoch 61/300
Epoch 62/300
Epoch 63/300
Epoch 64/300
Epoch 65/300
Epoch 66/300
Epoch 67/300
Epoch 68/300
Epoch 69/300
Epoch 70/300
Epoch 71/300
Epoch 72/300
Epoch 73/300
Epoch 74/300
Epoch 75/300
Epoch 76/300
Epoch 77/300


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [2]:
print(pd.DataFrame(mean_absolute_error, columns=['Model', 'train', 'val', 'test']))

      Model     train       val      test
0  Baseline  0.028716  0.031093  0.028820
1    Linear  0.028520  0.033959  0.029885


Whether or not this is a good fit depends on context. If this were temperature and you were deciding to wear a coat, it might be a pretty good fit.

However, if you look closely, you will see that the predictions are always a day late. This is because one of the predictors is yesterday's value of the target. The linear model would give the same results as the baseline model if the optimizer simply set the weight of the target to 1.0 and the rest to 0.0.

In many training sessions of 100 epochs of this model, we cannot get the fit on the training set to be better than the baseline. This means there are local minima that we are being caught in. We need to increase the number of epochs to 300 to get a fit of the training data this is better than baseline.

## Linear Multistep Dense
The models above only used yesterday to predict today. Our hypothesis is that we could get a better prediction by using several days before today. To implement that, we need a new data window:

In [3]:
CONV_WIDTH = 3
input_width = CONV_WIDTH
label_width = 1
shift = 1
conv_window = nn.WindowGenerator(input_width, label_width, shift, train_df, val_df, test_df, label_columns=['target'])

In this case, we are looking 3 days back on all 10 predictors for a total of 30 input values. The dense model interconnects all those inputs using 900 weights. It feeds that into another layer of 30 units before a summing unit. Here is the model:

In [4]:
n_units = n_predictors * CONV_WIDTH
multi_step_dense = tf.keras.Sequential([
    # Shape: (time, features) => (time*features)
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(units=n_units, activation='relu'),
    tf.keras.layers.Dense(units=n_units, activation='relu'),
    tf.keras.layers.Dense(units=1),
    # Add back the time dimension.
    # Shape: (outputs) => (1, outputs)
    tf.keras.layers.Reshape([1, -1]),
])

nn.compile_and_fit(multi_step_dense, conv_window, patience=10, max_epochs=500)

perf = [
    'Multi step dense',
    multi_step_dense.evaluate(conv_window.train, verbose=0)[1],
    multi_step_dense.evaluate(conv_window.val, verbose=0)[1],
    multi_step_dense.evaluate(conv_window.test, verbose=0)[1]
]
mean_absolute_error.append(perf)
conv_window.plot_fit(multi_step_dense, norm_and_split.de_normalize, 'Multistep Dense')

Epoch 1/500
Epoch 2/500
Epoch 3/500
Epoch 4/500
Epoch 5/500
Epoch 6/500
Epoch 7/500
Epoch 8/500
Epoch 9/500
Epoch 10/500
Epoch 11/500
Epoch 12/500
Epoch 13/500
Epoch 14/500
Epoch 15/500
Epoch 16/500
Epoch 17/500
Epoch 18/500
Epoch 19/500
Epoch 20/500
Epoch 21/500


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

In [5]:
print(pd.DataFrame(mean_absolute_error, columns=['Model', 'train', 'val', 'test']))

              Model     train       val      test
0          Baseline  0.028716  0.031093  0.028820
1            Linear  0.028520  0.033959  0.029885
2  Multi step dense  0.025834  0.033967  0.036070


This model over-fits the data more, which is not surprising given the number of parameters. 

## 1D Convolution Model
Convolution models reduce the number of parameters by sharing them. In this case, fewer parameters could speed up training. Or it might let us use more predictors. Here is the model:

In [6]:
n_filters = n_predictors * 2
conv_model = tf.keras.Sequential([
    tf.keras.layers.Conv1D(filters=n_filters, kernel_size=(CONV_WIDTH,), activation='relu'),
    tf.keras.layers.Dense(units=n_filters, activation='relu'),
    tf.keras.layers.Dense(units=1),
])
nn.compile_and_fit(conv_model, conv_window, patience=10, max_epochs=300)

perf = [
    'Conv',
    conv_model.evaluate(conv_window.train, verbose=0)[1],
    conv_model.evaluate(conv_window.val, verbose=0)[1],
    conv_model.evaluate(conv_window.test, verbose=0)[1]
]
mean_absolute_error.append(perf)

conv_window.plot_fit(conv_model, norm_and_split.de_normalize, 'Conv')

print(pd.DataFrame(mean_absolute_error, columns=['Model', 'train', 'val', 'test']))

Epoch 1/300
Epoch 2/300
Epoch 3/300
Epoch 4/300
Epoch 5/300
Epoch 6/300
Epoch 7/300
Epoch 8/300
Epoch 9/300
Epoch 10/300
Epoch 11/300
Epoch 12/300
Epoch 13/300
Epoch 14/300
Epoch 15/300
Epoch 16/300
Epoch 17/300
Epoch 18/300
Epoch 19/300
Epoch 20/300
Epoch 21/300
Epoch 22/300
Epoch 23/300
Epoch 24/300
Epoch 25/300
Epoch 26/300


Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

              Model     train       val      test
0          Baseline  0.028716  0.031093  0.028820
1            Linear  0.028520  0.033959  0.029885
2  Multi step dense  0.025834  0.033967  0.036070
3              Conv  0.025951  0.032015  0.033144


The convolution model performs as well as the multistep dense model.

## Are Any of These Models Any Good?
We did not show it in this notebook, but we generated synthetic datasets that matched the models. For example, for the linear, synthetic data set, the value of the price today really was the weighted values from yesterday. Every model was able to fit the synthetic data to within round-off error. So the models are working as designed.

All models worked about as well as baseline. Often they did slightly better on training, with slightly worse performance on validation and test data. This is likely due to over-fitting.

No model did better than baseline on the test data. By far the most likely explanation for this is that this target cannot be predicted based on previous values of the predictors. This is not surprising since the target and predictors were randomly selected.

So perhaps are models are good. They are telling us the target cannot be predicted from the predictors, which is very likely true.

## Next Steps
Of course, none of this is useful if we cannot find stocks with predictors. There are about 5000 stocks in the NASDAQ. There are $2x10^{30}$ combinations of sets of 10 stocks drawn from 5000. Searching this whole space is not possible. This simplest approach is just to randomly select targets and predictors. We could analyze each in about 10 seconds or about 10k/day. We only need to find a few predictable targets. Maybe this approach is good enough.

However, there might be simple ways to speed this up. Instead of randomly selecting stocks, we might want limit the selections to stocks with high volatility. These are the targets that have the most opportunities for gain in a short-term trading strategy. They also are predictors that have "ripples".

Another possibility is to use the convolution model and increase the number of predictors.
