# Stock Trading Strategy Based on Neural Net Time Series Analysis

For background, please see our previous notebook: lp_notebook.ipynb and nn_regression.ipynb.

The basic idea is to predict the price of a target stock based on historical values (price, volume, etc..) of a set of predictor stocks; an event happens, it ripples through predictors then to the target. In our previous models we found this is difficult... maybe even impossible. Maybe we will have better luck with neural networks (NN).

One possible benefit of NN is they make it easy to combine predictors. Previously we looked for individual stocks as predictors with the thought that we would find a way to combine them later. With the previous models it is even possible we could have two or more predictors that by themselves could not exceed a threshold, but together they would. With NN we would find such a predictors.

## Unbalanced Classes
In the previous analysis, we looked for spikes in the target that lasted 2 days and exceeded a 3% gain. That approach gave us about 10-20 events/year for an average stock. Using a NN classifier on data like that would fail because the algorithm would be right 90% of the time just by predicting "no price increase" everyday. To get around that issue, we tried using a NN regression model to predict the stock price for every day.

### The Regression Model
In a sense the regression model worked really well, which is not all that surprising given that the past prices of the target stock were used for prediction, and the prices of a stock do not change very much day-to-day. However, as a stock buying strategy the results were almost useless because the model was usually a day late.

### Classification Model
Back to a classification NN. One way to balance the classes is to just decrease the price threshold for getting into the positive class. The problem with that approach is it is not consistent with our hypothesis. It is likely that small price increases are not related to any predictor. Adding them as positive samples is just adding noise.

Furthermore, when we were making synthetic data to test the model, we discovered that our model is not designed to handle price increases that happen over multiple days. To accommodate this, we changed how we label the data. A label is positive if:

    1. (price[today] - price[yesterday]) / price[yesterday]  < threshold
    2. (price[tomorrow] - price[today]) / price[today]  >= threshold
    3. today is at least a convolution kernel width after the last positive label

Of course, this does not resolve the unbalanced classes. There are multiple techniques for dealing with unbalanced classes. Throwing away data seems wrong. So whenever possible we prefer to synthesize data to balance the classes. In this case, it is not clear how to do that. So we will solve the problem by under-sampling the majority class.

## Train, Validate and Test
This data is not stationary. So we should intermix the training and validation samples. This is somewhat tricky because each sample extends over a few time points and we do not want the training and validation samples to contain any of the same time points.

In production, we will only be predicting the next day and we will be doing that sequentially. So the test data is not intermixed.

## Needles in a Haystack
There are 5000 stocks in the NASDAQ. The number of possible combinations of predictors for each stock is enormous. The ideal model would allow us to test many predictors at once. This is similar to image classification, where there is an enormous number of pixels. Convolution models work well for this. We will give it a try in our domain.

## 1D Convolution Model
Our hypothesis is that in the few days before the target stock price goes up, there will be a signal in the predictors. Thus each sample is a 2D (conv_window x n_predictors) tensor. Using these samples, each 1D convolution filter has (conv_window x n_predictors) weights. Every predictor has it's own kernel (column) in this filter. 

After the convolution layer, there needs to be at least one dense layer with the number of inputs equal to the number of filters. If we let:


p = number predictors

f = number 1D convolution filters

k = number time points per sample


number of weights = $p*f*k + f \approx p * f * k$

It seems that approach gives too many degrees of freedom, which will exacerbate over-fitting and increase training time. Our intuition is that we do not need a unique kernel for each predictor for each filter. Rather a small number of kernels, maybe a max of 10, would work for any number of predictors.

To accomplish this, we changed the shape of each sample by stacking the conv_window of points for each predictor. Now each 1D convolution filter has only one kernel. To prevent mixing of data between predictors in this stacked configuration, we set the convolution stride to conv_window. The number of outputs from each filter is now equal to the number of predictors. And the dense layer now has f * p inputs. The number of weights is:

number of weights = $k*f + f*p \approx p * f$

We are using k=3, so this modified 1D convolution has a factor of 3 fewer weights.

In [1]:
%matplotlib widget

import matplotlib.pyplot as plt

import nn.nn as nn
import nn.nn2 as nn2
from my_utils.volatility import load_volatility
from nn.window_generators import IntermixedWindowGenerator

plt.rcParams['figure.figsize'] = [8, 5]

config = {
    'start_date': '20170103',
    'end_date': '20181231',
    'price_field': 'Open',
    'predictor_field': 'Open'
}
splits = [0.5, 0.75]
CONV_WIDTH = 3

# Load the most volatile stocks as predictors
results_dir = nn2.get_results_dir(config)

volatility = load_volatility(results_dir, config)
target = 'PLAG'
predictors = [x[0] for x in volatility[0:10]]
dataframe = nn.load_data(target, predictors, config)
n_predictors = dataframe.shape[1]
n_filters = max(min(10, int(n_predictors / 4)), 4)
print('n predictors: {}'.format(n_predictors))
print('n convolution filters: {}'.format(n_filters))
print(predictors)

n predictors: 10
n convolution filters: 4
['CUEN', 'VERB', 'VRME', 'NMTR', 'AEYE', 'PLAG', 'BLNK', 'ANY', 'ELOX', 'CREX']


The next step is to label the data using the method described above:

In [2]:
# For this target we could not get more than 9% of the samples to be in the (+) class
labels, threshold, frac_pos = nn2.make_labels(dataframe.target, CONV_WIDTH, frac_positive=0.09)
print('Percent positive: {}'.format(100 * frac_pos))

Percent positive: 9.362549800796813


The predictor data varies wildly from stock to stock, and over time. Our hypothesis is that we are looking for localized ripples in the predictor. This suggests that the data can be normalized by taking the difference between successive time points and dividing by the value at one of those time points.

In [3]:
norm_df = nn2.diff_norm(dataframe)

Next we need to make the data samples. As mentioned above, we intermixed the train and validation samples. To balance the data, we under-sampled the majority class. 

In [4]:
conv_window = IntermixedWindowGenerator(norm_df, labels, splits, CONV_WIDTH, balanced=True)
model = nn2.limited_filters_conv_model(n_filters, CONV_WIDTH, n_predictors)

It's not obvious what loss function we should use. It is likely that not every price gain in the target is related to a predictor. So we expect a non-zero rate of false negatives, maybe as high as 50%. At the same time, false positives will either tie up our money on buys with meager gains or worse loose money. The way to avoid false positives is to set the threshold high. This will also increase false negatives. But what is the right rate of false negatives? And if we set the threshold too high, we will not make any buys; we might as well skip all this and just put our money in a stock index fund.

For now, we will go with binary-cross-entropy as the loss function.

In [5]:
history = nn.compile_and_fit_classifier(model, conv_window, patience=2, max_epochs=200, verbose=1)

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200


Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200
Epoch 42/200
Epoch 43/200


Epoch 44/200
Epoch 45/200
Epoch 46/200
Epoch 47/200
Epoch 48/200
Epoch 49/200
Epoch 50/200
Epoch 51/200
Epoch 52/200
Epoch 53/200
Epoch 54/200
Epoch 55/200
Epoch 56/200
Epoch 57/200
Epoch 58/200
Epoch 59/200
Epoch 60/200
Epoch 61/200
Epoch 62/200
Epoch 63/200
Epoch 64/200


Epoch 65/200
Epoch 66/200
Epoch 67/200
Epoch 68/200
Epoch 69/200
Epoch 70/200
Epoch 71/200
Epoch 72/200
Epoch 73/200
Epoch 74/200
Epoch 75/200
Epoch 76/200
Epoch 77/200
Epoch 78/200
Epoch 79/200
Epoch 80/200
Epoch 81/200
Epoch 82/200
Epoch 83/200
Epoch 84/200
Epoch 85/200


Epoch 86/200
Epoch 87/200
Epoch 88/200
Epoch 89/200
Epoch 90/200
Epoch 91/200
Epoch 92/200
Epoch 93/200
Epoch 94/200
Epoch 95/200
Epoch 96/200
Epoch 97/200
Epoch 98/200
Epoch 99/200
Epoch 100/200
Epoch 101/200
Epoch 102/200
Epoch 103/200
Epoch 104/200
Epoch 105/200
Epoch 106/200


Epoch 107/200
Epoch 108/200
Epoch 109/200
Epoch 110/200
Epoch 111/200
Epoch 112/200
Epoch 113/200
Epoch 114/200
Epoch 115/200
Epoch 116/200
Epoch 117/200
Epoch 118/200
Epoch 119/200
Epoch 120/200
Epoch 121/200
Epoch 122/200
Epoch 123/200
Epoch 124/200
Epoch 125/200
Epoch 126/200
Epoch 127/200
Epoch 128/200
Epoch 129/200


Epoch 130/200
Epoch 131/200
Epoch 132/200
Epoch 133/200
Epoch 134/200
Epoch 135/200
Epoch 136/200
Epoch 137/200
Epoch 138/200
Epoch 139/200
Epoch 140/200
Epoch 141/200
Epoch 142/200
Epoch 143/200
Epoch 144/200
Epoch 145/200
Epoch 146/200
Epoch 147/200
Epoch 148/200
Epoch 149/200
Epoch 150/200
Epoch 151/200


Epoch 152/200
Epoch 153/200
Epoch 154/200
Epoch 155/200
Epoch 156/200
Epoch 157/200
Epoch 158/200
Epoch 159/200
Epoch 160/200
Epoch 161/200
Epoch 162/200
Epoch 163/200
Epoch 164/200
Epoch 165/200
Epoch 166/200
Epoch 167/200
Epoch 168/200
Epoch 169/200
Epoch 170/200
Epoch 171/200
Epoch 172/200
Epoch 173/200


Epoch 174/200
Epoch 175/200
Epoch 176/200
Epoch 177/200
Epoch 178/200
Epoch 179/200
Epoch 180/200
Epoch 181/200
Epoch 182/200
Epoch 183/200
Epoch 184/200
Epoch 185/200
Epoch 186/200
Epoch 187/200
Epoch 188/200
Epoch 189/200
Epoch 190/200
Epoch 191/200
Epoch 192/200
Epoch 193/200
Epoch 194/200
Epoch 195/200


Epoch 196/200
Epoch 197/200
Epoch 198/200
Epoch 199/200
Epoch 200/200


Here is the summary of the model.

In [6]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
permute (Permute)            (None, 10, 3)             0         
_________________________________________________________________
reshape (Reshape)            (None, 1, 30, 1)          0         
_________________________________________________________________
conv1d (Conv1D)              (None, 1, 10, 4)          16        
_________________________________________________________________
flatten (Flatten)            (None, 40)                0         
_________________________________________________________________
dense (Dense)                (None, 1)                 41        
Total params: 57
Trainable params: 57
Non-trainable params: 0
_________________________________________________________________


Here are the results:

In [7]:
auc_train = model.evaluate(x=conv_window.train[0], y=conv_window.train[1], verbose=0)[7]
auc_val = model.evaluate(x=conv_window.val[0], y=conv_window.val[1], verbose=0)[7]
print('Train AUC: {}'.format(auc_train))
print('Val   AUC: {}'.format(auc_val))

Train AUC: 0.8791866302490234
Val   AUC: 0.8603895902633667


The value of the ROC AUC is for training is much higher than validation, indicating over-fitting.

This is confirmed by the plot of the loss functions:

In [8]:
plt.plot(history.history['loss'], label='train')
plt.plot(history.history['val_loss'], label='val')
plt.legend()
plt.title('Loss')

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

Text(0.5, 1.0, 'Loss')

Even with over-fitting, the validation ROC AUC is near enough to 0.5 to indicate that these predictors are not predictors for this target. Which brings us back to the need to efficiently search the space of predictors.

## Efficiently Looking for Predictors
We can search the predictor space faster if we can test lots of predictors at once. How many should that be?

To answer that question, we wrote some code to modify a predictor by adding a kernel to the predictor exactly before each positive class in the labels. When we ran the modified predictor along with the target through our model we got a perfect fit of both the training and validation data.

This continued when we add 9 other predictors that did not predict this target. With 99 others, we got an AUC in the high 0.90s. With 499 others AUC was around 0.8. So we probably could search in blocks of 500.

## Running on Test Data
At this point there is no need to run the model on the test data because we have not found a set of predictors that work on validation.

When we find a set of predictors that work, we will update the model weights every week as we run it on the test data.
