# 4.6 Validate LSTM Model

In this notebook we will validate the results of the accurate LSTM model trained in section 4.4. We will also investigate modifying the model components so that they are viable as a life trading model. 

Currently, the model simply predicts up/down price movement in the next bar. While it does so exceedingly accurrately, constraints include trading on minimally moving bars and bid/ask spreads. Because of the existance of the spread, if a bar moves less than the spread it is possible to both be correct about the direction and lose money on trading it. In order to ensure that the model generates return, we need to be weary of trading when price movement falls below the bid/ask spread. 

#### Steps

I) Confirm the model works in a lossless environment (w/o bid/ask spread)  
II) Modify the model to protect against minimal movement environments (w/ bid/ask spread)  

### I) Confirm the model works in a lossless environment (w/o bid/ask spread)  

In order to confirm the model behaves as expected, we will take the following steps: 

1) Investigate actual predictions from the model against validation data  
2) Using validation data, model returns by backtesting the system  
3) Modify the backtesting system to include some penalty for trading (transaction costs)  

#### 1) Investigate actual predictions from the model against validation data

i) Load a dataset (a single stock)  
ii) Split for the most recent data available, validation data that wasn't included in test to begin with  
iii) predict on the data, look at accuracy and heatmap of predictions  
iv) Plot the data with predictions  
v) Backtest the results of trading   

In [1]:
import pandas as pd
import numpy as np
from extract import load_set
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
from os import listdir
from os.path import isfile, join
from sklearn.preprocessing import StandardScaler, MinMaxScaler

pd.set_option('display.max_columns', 130)
pd.set_option('display.max_rows', 100)

In [2]:
data_dir = './data/prepared/august25screenfixed/'
suffix = ''

stocks = [f.split('.')[0] for f in listdir(data_dir) if isfile(join(data_dir, f))]
stocks[0]

'RRR'

In [200]:
stocks

['RRR',
 'RLGY',
 'HOME',
 'FDX',
 'MIK',
 'WSM',
 'NVDA',
 'DE',
 'TGT',
 'LULU',
 'FBHS',
 'XRX',
 'CX',
 'EAT',
 'DHI',
 'ICE',
 'EXPI',
 'CLNY',
 'IMVT',
 'ELAN',
 'PACB',
 'PENN',
 'REAL',
 'NKE',
 'BBY',
 'HTHT',
 'GRPN',
 'CZR',
 'GDDY',
 'LOW']

In [3]:
df = load_set(stocks[0], data_dir, suffix)
df.head()

Unnamed: 0,open,high,low,close,volume,datetime,date,hour,minute,min_num,SYMBOL,prev_close,diff_1,pct_change,log_return,%open,mesa_open,open_amp,open_omega,open_phase,open_offset,open_freq,open_period,open_maxcov,open_sin,time,open_angle,open_rad,open_rad2,open_sin2,open_cos2,open_cos,open_tan,open_tan2,open_xsinx,open_xcosx,open_sinxcosx,open_xsinxcosx,open_xsinx2,open_xcosx2,open_sinxcosx2,open_xsinxcosx2,open_xtanx,open_xtanx2,%high,mesa_high,high_amp,high_omega,high_phase,high_offset,high_freq,high_period,high_maxcov,high_sin,high_angle,high_rad,high_rad2,high_sin2,high_cos2,high_cos,high_tan,high_tan2,high_xsinx,high_xcosx,high_sinxcosx,high_xsinxcosx,high_xsinx2,high_xcosx2,high_sinxcosx2,high_xsinxcosx2,high_xtanx,high_xtanx2,%low,mesa_low,low_amp,low_omega,low_phase,low_offset,low_freq,low_period,low_maxcov,low_sin,low_angle,low_rad,low_rad2,low_sin2,low_cos2,low_cos,low_tan,low_tan2,low_xsinx,low_xcosx,low_sinxcosx,low_xsinxcosx,low_xsinx2,low_xcosx2,low_sinxcosx2,low_xsinxcosx2,low_xtanx,low_xtanx2,%close,mesa_close,close_amp,close_omega,close_phase,close_offset,close_freq,close_period,close_maxcov,close_sin,close_angle,close_rad,close_rad2,close_sin2,close_cos2,close_cos,close_tan,close_tan2,close_xsinx,close_xcosx,close_sinxcosx,close_xsinxcosx,close_xsinx2,close_xcosx2,close_sinxcosx2,close_xsinxcosx2,close_xtanx,close_xtanx2,decision,D2
8845,17.28,17.28,17.28,17.28,100,2020-08-25 22:48:00,2020-08-25,22,48,1368,RRR,17.3,-0.02,-0.001156,-0.001157,-0.001156,0.490588,0.001894,2.99995,141.151213,-0.0004,0.477457,2.09443,222173.7,-0.001211,8845,26675.70572,3.584091,4.061547,-0.001907,-0.605856,-0.903685,0.473836,1.31314,1e-06,0.001045,0.001095,-1e-06,2.204898e-06,0.0007,0.001156,-1.335851e-06,-0.000548,-0.001518,-0.001156,0.490591,-0.00194,2.802241,-332.993391,-0.000401,0.445991,2.2422,244042.8,0.001482,24452.828207,4.954177,5.400168,0.001098,0.634822,0.239439,-4.054941,-1.217125,-2e-06,-0.000277,0.000355,-4.103498e-07,-1e-06,-0.000734,0.000697,-8.057384e-07,0.004688,0.001407,-0.001156,0.490588,0.002575,2.990564,224.104713,-0.000389,0.475963,2.101003,144034.9,-0.001352,26675.646515,3.524886,4.00085,-0.00234,-0.653,-0.927438,0.403237,1.159812,2e-06,0.001072,0.001254,-1e-06,3e-06,0.000755,0.001528,-2e-06,-0.000466,-0.001341,-0.001156,0.490591,-0.002573,2.833824,-611.936535,-0.00039,0.451017,2.217211,172665.9,0.001653,24453.239631,5.365601,5.816618,0.000767,0.893118,0.60774,-1.306702,-0.503655,-2e-06,-0.000703,0.001004,-1e-06,-8.865898e-07,-0.001033,0.000685,-7.918291e-07,0.001511,0.000582,0.0,0.0
8844,17.3,17.3,17.3,17.3,200,2020-08-25 22:44:00,2020-08-25,22,44,1364,RRR,17.44,-0.14,-0.008028,-0.00806,-0.002882,0.489934,0.001881,1.287313,-270.196033,-0.000346,0.204882,4.880852,232257.6,-0.000633,8844,11114.801657,6.130034,6.334916,-0.000249,0.998662,0.988295,-0.15436,0.051777,2e-06,-0.002848,-0.000626,2e-06,7.176933e-07,-0.002878,-0.000249,7.167332e-07,0.000445,-0.000149,-0.008028,0.488493,-0.002108,2.808354,-387.061715,-0.000335,0.446963,2.23732,181936.0,-0.002103,24450.020219,2.146189,2.593152,-0.001434,-0.853339,-0.544164,-1.541776,-0.610962,1.7e-05,0.004368,0.001145,-9.187537e-06,1.2e-05,0.00685,0.001223,-9.820883e-06,0.012377,0.004905,-0.002882,0.489934,0.002457,3.00182,124.667322,-0.000327,0.477755,2.093125,210788.0,0.001151,26672.767225,0.645596,1.123351,0.001888,0.432664,0.798741,0.753279,2.083731,-3e-06,-0.002302,0.000919,-3e-06,-5e-06,-0.001247,0.000817,-2e-06,-0.002171,-0.006005,-0.008028,0.488493,-0.002719,2.838034,-649.214703,-0.000325,0.451687,2.213922,140893.5,-0.00199,24450.356426,2.482395,2.934083,-0.000885,-0.978547,-0.790484,-0.774819,-0.210541,1.6e-05,0.006346,0.001573,-1.3e-05,7.103888e-06,0.007855,0.000866,-6.951488e-06,0.00622,0.00169,0.0,0.0
8843,17.35,17.44,17.35,17.44,25,2020-08-25 22:36:00,2020-08-25,22,36,1356,RRR,17.35,0.09,0.005187,0.005174,0.0,0.496402,0.004198,3.106833,-802.280898,-0.000187,0.494468,2.022376,6295940.0,-0.002822,8843,26671.442713,5.60427,6.098738,-0.000956,0.983038,0.778254,-0.806869,-0.186568,-0.0,0.0,-0.002197,-0.0,-0.0,0.0,-0.00094,-0.0,-0.0,-0.0,0.005187,0.497884,-0.001594,2.843266,-695.427737,-1.2e-05,0.45252,2.209848,222953.1,0.000461,24447.572453,5.981608,6.434128,-0.000252,0.98863,0.954869,-0.311065,0.1521,2e-06,0.004953,0.00044,2.284356e-06,-1e-06,0.005128,-0.000249,-1.292442e-06,-0.001614,0.000789,0.0,0.497337,0.003319,3.069251,-470.566094,-0.00019,0.488486,2.04714,1029450.0,-0.003389,26670.820335,4.981891,5.470377,-0.0026,0.687462,0.266251,-3.620276,-1.056379,-0.0,0.0,-0.000902,-0.0,-0.0,0.0,-0.001788,-0.0,-0.0,-0.0,0.005187,0.499745,0.002211,2.999563,144.588277,-2.9e-05,0.477395,2.094701,341519.5,-0.001521,26669.720405,3.881962,4.359357,-0.002104,-0.345745,-0.73822,0.913766,2.713937,-8e-06,-0.003829,0.001123,6e-06,-1.091391e-05,-0.001793,0.000727,3.773424e-06,0.00474,0.014078,0.0,0.0
8842,17.35,17.35,17.35,17.35,4,2020-08-25 22:34:00,2020-08-25,22,34,1354,RRR,17.32,0.03,0.001732,0.001731,0.001732,0.491849,0.210609,3.140843,-1102.364681,-0.000269,0.499881,2.000478,2372299000.0,0.002702,8842,26668.965927,3.127483,3.627364,-0.0986,-0.884315,-0.9999,-0.014111,0.527968,5e-06,-0.001732,-0.002702,-5e-06,-0.0001707856,-0.001532,0.087194,0.0001510283,-2.4e-05,0.000914,0.001732,0.49248,0.170091,3.14082,-1102.168494,-0.000261,0.499877,2.000492,706510600.0,0.002159,26668.965812,3.127368,3.627245,-0.079657,-0.88437,-0.999899,-0.014225,0.527817,4e-06,-0.001732,-0.002158,-3.738379e-06,-0.000138,-0.001532,0.070446,0.0001220202,-2.5e-05,0.000914,0.001732,0.491378,0.377616,3.141009,-1103.833467,-0.000275,0.499907,2.000372,172314100.0,0.003632,26668.969689,3.131245,3.631152,-0.177844,-0.88254,-0.999946,-0.010348,0.532822,6e-06,-0.001732,-0.003632,-6e-06,-0.000308,-0.001529,0.156955,0.000272,-1.8e-05,0.000923,0.001732,0.491535,0.271152,3.140784,-1101.846602,-0.000272,0.499871,2.000515,9297269000.0,0.00353,26668.966016,3.127572,3.627443,-0.126889,-0.884278,-0.999902,-0.014022,0.52807,6e-06,-0.001732,-0.003529,-6e-06,-0.0002197851,-0.001532,0.112205,0.0001943511,-2.4e-05,0.000915,0.0,0.0
8841,17.32,17.32,17.32,17.32,499,2020-08-25 22:28:00,2020-08-25,22,28,1348,RRR,17.32,0.0,0.0,0.0,0.0,0.49098,0.244868,3.140783,-1101.835132,-0.000101,0.499871,2.000516,17649810000.0,-0.003312,8841,26665.825332,6.270073,6.769944,0.114439,0.883854,0.999914,-0.013113,0.529232,-0.0,0.0,-0.003311,-0.0,0.0,0.0,0.101148,0.0,-0.0,0.0,0.0,0.49098,0.002768,1.216969,339.262618,-0.001003,0.193687,5.162981,243913.8,0.000914,11098.481518,2.376266,2.569952,0.000494,-0.841015,-0.721156,-0.960642,-0.643285,0.0,-0.0,-0.000659,-0.0,0.0,-0.0,-0.000416,-0.0,-0.0,-0.0,0.0,0.491908,0.488526,3.141052,-1104.208483,-9.2e-05,0.499914,2.000344,21822430000.0,-0.004281,26665.829868,6.27461,6.774524,0.230398,0.881702,0.999963,-0.008576,0.535108,-0.0,0.0,-0.004281,-0.0,0.0,0.0,0.203142,0.0,-0.0,0.0,0.0,0.491907,0.003094,1.213933,372.387665,-0.001011,0.193203,5.175891,247825.3,0.001121,11104.769538,2.3811,2.574303,0.000651,-0.843361,-0.724496,-0.95139,-0.637151,0.0,-0.0,-0.000812,-0.0,0.0,-0.0,-0.000549,-0.0,-0.0,-0.0,0.0,0.0


In [4]:
#df.loc[6064:6060]

In [5]:
#df[df.isna().sum(axis=1) > 0]

In [6]:
df['target'] = df['%close'].shift(1)

In [7]:
df[['close','%close','target']]

Unnamed: 0,close,%close,target
8845,17.28,-0.001156,
8844,17.30,-0.008028,-0.001156
8843,17.44,0.005187,-0.008028
8842,17.35,0.001732,0.005187
8841,17.32,0.000000,0.001732
...,...,...,...
4,10.72,0.006573,-0.008396
3,10.65,0.000000,0.006573
2,10.65,-0.004673,0.000000
1,10.70,-0.027273,-0.004673


In [8]:
df['open'][::-1].rolling(30).mean()[::-1]

8845    17.359000
8844    17.362333
8843    17.364333
8842    17.362667
8841    17.367000
          ...    
4             NaN
3             NaN
2             NaN
1             NaN
0             NaN
Name: open, Length: 8846, dtype: float64

In [9]:
e = df.shape[0]
l = e - 4000 - 59

e, l

(8846, 4787)

The train/test data 

In [10]:
df.iloc[:l]['target']

8845         NaN
8844   -0.001156
8843   -0.008028
8842    0.005187
8841    0.001732
          ...   
4063    0.002493
4062   -0.005666
4061    0.002841
4060    0.000711
4059    0.003566
Name: target, Length: 4787, dtype: float64

In [107]:
df = load_set(stocks[0], data_dir, suffix)

for col in ['open','high','low','close']:
        df[col] = df[col] - df[col][::-1].rolling(30).mean()[::-1]

df = df.dropna(axis=0)

e = df.shape[0]
l = e - 4000 - 59

df['target'] = df['%close'].shift(1)

y_ = df.iloc[1:4001]['target']

In [497]:
X = np.zeros([4000,116,60])

df = load_set(stocks[0], data_dir, suffix)


for col in ['open','high','low','close']:
        df[col] = df[col] - df[col][::-1].rolling(30).mean()[::-1]

df = df.dropna(axis=0)
        
e = df.shape[0]
l = e - 4000 - 59


df['target'] = df['%close'].shift(1)

y_ = df.iloc[l:e-59]['target']
#y_ = df.iloc[1:4001]['target']
y = y_.to_numpy() 

drop_cols = ['datetime',
            'date',
            'min_num',
            'SYMBOL',
            'prev_close',
            'diff_1',
            'time',
            'decision',
            'open_maxcov',
            'high_maxcov',
            'low_maxcov',
            'close_maxcov',
            'pct_change',
            'D2',
            'target']

df.drop(drop_cols, axis=1, inplace=True)

# mx = MinMaxScaler()
# mx = mx.fit(df)
# scaled_features = mx.transform(df)
# df = pd.DataFrame(scaled_features, index=df.index, columns=df.columns)

# dt = df.transpose()

for j, i in enumerate(range(1, 4001)):
     X[j] = dt.iloc[:, i:i+60].to_numpy()
    
y.shape, X.shape
l

4756

In [498]:
y = np.zeros([1,4000])

#for k, stock in enumerate(stocks):
df = load_set(stocks[0], data_dir, suffix)

df = df.dropna(axis=0)

e = df.shape[0]
l = e - 4000 - 59

df['y'] = df['%close'].shift(1)
y_ = df.iloc[1+4:4001+4]['y']
y[0] = y_.to_numpy() 
l

4760

In [513]:
df.iloc[1]

open                  17.3
high                  17.3
low                   17.3
close                 17.3
volume                 200
                   ...    
close_xtanx     0.00621988
close_xtanx2    0.00169012
decision                 0
D2                       0
y              -0.00115607
Name: 8844, Length: 131, dtype: object

In [512]:
df.iloc[5]

open                17.32
high                17.32
low                 17.32
close               17.32
volume                  1
                  ...    
close_xtanx     0.0297185
close_xtanx2    0.0751944
decision                0
D2                      0
y                       0
Name: 8840, Length: 131, dtype: object

In [499]:
#y = y.to_numpy()
y = y[0]
y.shape = (4000,1)
y

array([[ 0.        ],
       [ 0.01050175],
       [-0.00580046],
       ...,
       [ 0.00134409],
       [ 0.        ],
       [ 0.00813008]])

In [500]:
y_d = y.copy()
y_d[np.where(y_d > 0)] = 1
y_d[np.where(y_d < 0)] = 0
np.unique(y_d, return_counts=True)

(array([0., 1.]), array([2270, 1730]))

In [501]:
X.shape

(4000, 116, 60)

In [502]:
X = X[:,:,:10]
X.shape

(4000, 116, 10)

In [503]:
X = np.transpose(X, axes=(0,2,1))
X.shape

(4000, 10, 116)

In [504]:
model = keras.models.load_model('./data/Models/1min95acc_030920')

In [505]:
preds = model.predict(X)

In [506]:
preds

array([[9.9999988e-01, 1.6716103e-07],
       [1.8119322e-07, 9.9999976e-01],
       [9.9998832e-01, 1.1724805e-05],
       ...,
       [6.0427058e-03, 9.9395728e-01],
       [9.9999845e-01, 1.5581271e-06],
       [3.6261679e-07, 9.9999964e-01]], dtype=float32)

In [507]:
np.argmax(preds, axis=1)

array([0, 1, 0, ..., 1, 0, 1])

In [508]:
y_preds = np.argmax(preds, axis=1)

In [509]:
y_true = y_d.reshape(4000)

In [510]:
y_d

array([[0.],
       [1.],
       [0.],
       ...,
       [1.],
       [0.],
       [1.]])

In [511]:
from sklearn.metrics import accuracy_score

accuracy_score(y_true, y_preds)

0.945

In [157]:
X_old = np.load('./data/prepared/august25screenfixed/numpy_matrices/X.npy')
X_old.shape

(30, 4000, 116, 60)

In [158]:
X_old[0].shape

(4000, 116, 60)

In [159]:
X_old = X_old[0]

In [160]:
X_old.shape, X.shape

((4000, 116, 60), (4000, 10, 116))

In [162]:
#X_old = X_old[:,:,:10]
X_old = np.transpose(X_old, axes=(0,2,1))
X_old.shape

(4000, 10, 116)

In [354]:
X_old == X

array([[[ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        ...,
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True]],

       [[ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        ...,
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True]],

       [[ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  True,  True],
        ...,
        [ True,  True,  True, ...,  True,  True,  True],
        [ True,  True,  True, ...,  True,  Tr

In [165]:
y_old = np.load('./data/prepared/august25screenfixed/numpy_matrices/y_br.npy')
y_old[0].shape

(4000,)

In [166]:
y_old = y_old[0]

In [187]:
y_oldd = y_old.copy()
y_oldd[np.where(y_oldd > 0)] = 1
y_oldd[np.where(y_oldd < 0)] = 0
np.unique(y_oldd, return_counts=True)

(array([0., 1.]), array([2217, 1783]))

In [188]:
y_oldtrue = y_oldd.reshape(4000)

In [189]:
accuracy_score(y_oldtrue, y_preds)

0.9665

In [190]:
y_oldtrue == y_true

array([False, False,  True, ...,  True, False, False])

In [355]:
y == y_old

array([[False, False,  True, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False]])

In [356]:
y[:10]

array([[-0.00141743],
       [-0.00070822],
       [-0.00035398],
       [-0.00035386],
       [ 0.00070822],
       [ 0.00212917],
       [-0.00141743],
       [-0.00070822],
       [ 0.        ],
       [-0.00423131]])

In [357]:
y_old[:10]

array([ 0.00070822,  0.00212917, -0.00141743, -0.00070822,  0.        ,
       -0.00423131,  0.00070572,  0.        ,  0.        ,  0.00247612])

In [358]:
y.shape, y_old.shape

((4000, 1), (4000,))

In [198]:
y.shape = (4000)

In [199]:
y == y_old

array([False, False, False, ..., False, False, False])

In [201]:
y_old_and_new = np.load('./data/prepared/august25screenfixed/numpy_matrices/y_br2.npy')

In [204]:
y_old_and_new[0] == y_old

array([ True,  True,  True, ...,  True,  True,  True])

In [224]:
y_old_and_new[0][:10]

array([ 0.00070822,  0.00212917, -0.00141743, -0.00070822,  0.        ,
       -0.00423131,  0.00070572,  0.        ,  0.        ,  0.00247612])

In [222]:
yfl = np.zeros([1,4000])

df = load_set(stocks[0], data_dir, suffix)
    
df = df.dropna(axis=0)

e = df.shape[0]
l = e - 4000 - 59

df['y'] = df['%close'].shift(1)
y_fl = df.iloc[l:e-59]['y']
yfl[0] = y_fl.to_numpy() 

In [223]:
yfl[0]

array([ 0.00070822,  0.00212917, -0.00141743, ..., -0.00142653,
        0.00142857, -0.00190114])

In [245]:
y_fl[::-1]

84     -0.001901
85      0.001429
86     -0.001427
87     -0.000476
88      0.002382
          ...   
4080    0.000000
4081   -0.000708
4082   -0.001417
4083    0.002129
4084    0.000708
Name: y, Length: 4000, dtype: float64

In [359]:
y_

TypeError: cannot concatenate object of type '<class 'numpy.ndarray'>'; only Series and DataFrame objs are valid

#### 2) Using validation data, model returns by backtesting the system

#### 3) Modify the backtesting system to include some penalty for trading (transaction costs)

### II) Modify the model to protect against minimal movement environments (w/ bid/ask spread)