# DeepLOB: Deep Convolutional Neural Networks for Limit Order Books

### Authors: Zihao Zhang, Stefan Zohren and Stephen Roberts
Oxford-Man Institute of Quantitative Finance, Department of Engineering Science, University of Oxford

This jupyter notebook is used to demonstrate our recent paper [2] published in IEEE Transactions on Singal Processing. We use FI-2010 [1] dataset and present how model architecture is constructed here. The FI-2010 is publicly avilable and interested readers can check out their paper [1]. The dataset can be downloaded from: https://etsin.fairdata.fi/dataset/73eb48d7-4dbc-4a10-a52a-da745b47a649 

Otherwise, it can be obtained from: https://drive.google.com/drive/folders/1Xen3aRid9ZZhFqJRgEMyETNazk02cNmv?usp=sharing


[1] Ntakaris A, Magris M, Kanniainen J, Gabbouj M, Iosifidis A. Benchmark dataset for mid‐price forecasting of limit order book data with machine learning methods. Journal of Forecasting. 2018 Dec;37(8):852-66. https://arxiv.org/abs/1705.03233

[2] Zhang Z, Zohren S, Roberts S. DeepLOB: Deep convolutional neural networks for limit order books. IEEE Transactions on Signal Processing. 2019 Mar 25;67(11):3001-12. https://arxiv.org/abs/1808.03668

### This notebook runs on tensorflow 2.

In [1]:
# limit gpu memory

import tensorflow as tf

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
    # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
            logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
        print(e)

In [2]:
# load packages
import pandas as pd
import pickle
import numpy as np
import keras
from keras import backend as K
from keras.models import load_model, Model
from keras.layers import Flatten, Dense, Dropout, Activation, Input, LSTM, Reshape, Conv2D, MaxPooling2D
from keras.optimizers import Adam
from keras.layers.advanced_activations import LeakyReLU

from keras.utils import np_utils
import matplotlib.pyplot as plt

# set random seeds
np.random.seed(1)
tf.random.set_seed(2)


# Data preparation

We used no auction dataset that is normalised by decimal precision approach in their work. For illustration purpose, we don't have validation dataset here but you should include it in your work. The first seven days are training data and the last three days are testing data.  

The first 40 columns of the FI-2010 dataset are 10 levels ask and bid information for a limit order book and we only use these 40 features in our network. The last 5 columns of the FI-2010 dataset are the labels with different prediction horizons. 

In [3]:
def prepare_x(data):
    df1 = data[:40, :].T
    return np.array(df1)

def get_label(data):
    lob = data[-5:, :].T
    return lob

def data_classification(X, Y, T):
    [N, D] = X.shape
    df = np.array(X)

    dY = np.array(Y)

    dataY = dY[T - 1:N]

    dataX = np.zeros((N - T + 1, T, D))
    for i in range(T, N + 1):
        dataX[i - T] = df[i - T:i, :]

    return dataX.reshape(dataX.shape + (1,)), dataY

In [5]:
# please change the data_path to your local path
#data_path = '/Desktop/TABL/bench-data/BenchmarkDatasets/NoAuction'

dec_train = np.loadtxt('/home/amithbn/Desktop/TABL/bench-data/BenchmarkDatasets/NoAuction/1.NoAuction_Zscore/NoAuction_Zscore_Testing/Test_Dst_NoAuction_ZScore_CF_7.txt')
dec_test1 = np.loadtxt('/home/amithbn/Desktop/TABL/bench-data/BenchmarkDatasets/NoAuction/1.NoAuction_Zscore/NoAuction_Zscore_Testing/Test_Dst_NoAuction_ZScore_CF_7.txt')
dec_test2 = np.loadtxt('/home/amithbn/Desktop/TABL/bench-data/BenchmarkDatasets/NoAuction/1.NoAuction_Zscore/NoAuction_Zscore_Testing/Test_Dst_NoAuction_ZScore_CF_7.txt')
dec_test3 = np.loadtxt('/home/amithbn/Desktop/TABL/bench-data/BenchmarkDatasets/NoAuction/1.NoAuction_Zscore/NoAuction_Zscore_Testing/Test_Dst_NoAuction_ZScore_CF_7.txt')
dec_test = np.hstack((dec_test1, dec_test2, dec_test3))

# extract limit order book data from the FI-2010 dataset
train_lob = prepare_x(dec_train)
test_lob = prepare_x(dec_test)

# extract label from the FI-2010 dataset
train_label = get_label(dec_train)
test_label = get_label(dec_test)

# prepare training data. We feed past 100 observations into our algorithms and choose the prediction horizon. 
trainX_CNN, trainY_CNN = data_classification(train_lob, train_label, T=100)
trainY_CNN = trainY_CNN[:,3] - 1
trainY_CNN = np_utils.to_categorical(trainY_CNN, 3)

# prepare test data.
testX_CNN, testY_CNN = data_classification(test_lob, test_label, T=100)
testY_CNN = testY_CNN[:,3] - 1
testY_CNN = np_utils.to_categorical(testY_CNN, 3)

IndexError: index -4 is out of bounds for axis 1 with size 3

# Model Architecture

Please find the detailed discussion of our model architecture in our paper.

In [8]:
def create_deeplob(T, NF, number_of_lstm):
    input_lmd = Input(shape=(T, NF, 1))
    
    # build the convolutional block
    conv_first1 = Conv2D(32, (1, 2), strides=(1, 2))(input_lmd)
    conv_first1 = keras.layers.LeakyReLU(alpha=0.01)(conv_first1)
    conv_first1 = Conv2D(32, (4, 1), padding='same')(conv_first1)
    conv_first1 = keras.layers.LeakyReLU(alpha=0.01)(conv_first1)
    conv_first1 = Conv2D(32, (4, 1), padding='same')(conv_first1)
    conv_first1 = keras.layers.LeakyReLU(alpha=0.01)(conv_first1)

    conv_first1 = Conv2D(32, (1, 2), strides=(1, 2))(conv_first1)
    conv_first1 = keras.layers.LeakyReLU(alpha=0.01)(conv_first1)
    conv_first1 = Conv2D(32, (4, 1), padding='same')(conv_first1)
    conv_first1 = keras.layers.LeakyReLU(alpha=0.01)(conv_first1)
    conv_first1 = Conv2D(32, (4, 1), padding='same')(conv_first1)
    conv_first1 = keras.layers.LeakyReLU(alpha=0.01)(conv_first1)

    conv_first1 = Conv2D(32, (1, 10))(conv_first1)
    conv_first1 = keras.layers.LeakyReLU(alpha=0.01)(conv_first1)
    conv_first1 = Conv2D(32, (4, 1), padding='same')(conv_first1)
    conv_first1 = keras.layers.LeakyReLU(alpha=0.01)(conv_first1)
    conv_first1 = Conv2D(32, (4, 1), padding='same')(conv_first1)
    conv_first1 = keras.layers.LeakyReLU(alpha=0.01)(conv_first1)
    
    # build the inception module
    convsecond_1 = Conv2D(64, (1, 1), padding='same')(conv_first1)
    convsecond_1 = keras.layers.LeakyReLU(alpha=0.01)(convsecond_1)
    convsecond_1 = Conv2D(64, (3, 1), padding='same')(convsecond_1)
    convsecond_1 = keras.layers.LeakyReLU(alpha=0.01)(convsecond_1)

    convsecond_2 = Conv2D(64, (1, 1), padding='same')(conv_first1)
    convsecond_2 = keras.layers.LeakyReLU(alpha=0.01)(convsecond_2)
    convsecond_2 = Conv2D(64, (5, 1), padding='same')(convsecond_2)
    convsecond_2 = keras.layers.LeakyReLU(alpha=0.01)(convsecond_2)

    convsecond_3 = MaxPooling2D((3, 1), strides=(1, 1), padding='same')(conv_first1)
    convsecond_3 = Conv2D(64, (1, 1), padding='same')(convsecond_3)
    convsecond_3 = keras.layers.LeakyReLU(alpha=0.01)(convsecond_3)
    
    convsecond_output = keras.layers.concatenate([convsecond_1, convsecond_2, convsecond_3], axis=3)
    conv_reshape = Reshape((int(convsecond_output.shape[1]), int(convsecond_output.shape[3])))(convsecond_output)

    # build the last LSTM layer
    conv_lstm = LSTM(number_of_lstm)(conv_reshape)

    # build the output layer
    out = Dense(3, activation='softmax')(conv_lstm)
    model = Model(inputs=input_lmd, outputs=out)
    adam = keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1)
    model.compile(optimizer=adam, loss='categorical_crossentropy', metrics=['accuracy'])

    return model

deeplob = create_deeplob(100, 40, 64)


# Model Training

In [None]:
deeplob.fit(trainX_CNN, trainY_CNN, epochs=200, batch_size=64, verbose=2, validation_data=(testX_CNN, testY_CNN))

Epoch 1/200
3979/3979 - 141s - loss: 1.0981 - accuracy: 0.3436 - val_loss: 1.1116 - val_accuracy: 0.2753
Epoch 2/200
3979/3979 - 141s - loss: 1.0980 - accuracy: 0.3435 - val_loss: 1.1093 - val_accuracy: 0.2753
Epoch 3/200
3979/3979 - 131s - loss: 1.0980 - accuracy: 0.3438 - val_loss: 1.1101 - val_accuracy: 0.2753
Epoch 4/200
3979/3979 - 129s - loss: 1.0980 - accuracy: 0.3437 - val_loss: 1.1094 - val_accuracy: 0.2753
Epoch 5/200
3979/3979 - 129s - loss: 1.0980 - accuracy: 0.3438 - val_loss: 1.1097 - val_accuracy: 0.2753
Epoch 6/200
3979/3979 - 130s - loss: 1.0980 - accuracy: 0.3438 - val_loss: 1.1107 - val_accuracy: 0.2753
Epoch 7/200
3979/3979 - 130s - loss: 1.0979 - accuracy: 0.3434 - val_loss: 1.1093 - val_accuracy: 0.2753
Epoch 8/200
3979/3979 - 130s - loss: 1.0979 - accuracy: 0.3434 - val_loss: 1.1104 - val_accuracy: 0.2753
Epoch 9/200
3979/3979 - 131s - loss: 1.0979 - accuracy: 0.3435 - val_loss: 1.1081 - val_accuracy: 0.2590
Epoch 10/200
3979/3979 - 138s - loss: 1.0979 - accuracy

Epoch 79/200
3979/3979 - 153s - loss: 1.0840 - accuracy: 0.3753 - val_loss: 1.1094 - val_accuracy: 0.3748
Epoch 80/200
3979/3979 - 149s - loss: 1.0836 - accuracy: 0.3759 - val_loss: 1.0848 - val_accuracy: 0.3917
Epoch 81/200
3979/3979 - 149s - loss: 1.0832 - accuracy: 0.3754 - val_loss: 1.0912 - val_accuracy: 0.3908
Epoch 82/200
3979/3979 - 149s - loss: 1.0831 - accuracy: 0.3755 - val_loss: 1.0946 - val_accuracy: 0.3845
Epoch 83/200
3979/3979 - 146s - loss: 1.0825 - accuracy: 0.3747 - val_loss: 1.0827 - val_accuracy: 0.4109
Epoch 84/200
3979/3979 - 147s - loss: 1.0823 - accuracy: 0.3760 - val_loss: 1.1256 - val_accuracy: 0.3795
Epoch 85/200
3979/3979 - 139s - loss: 1.0819 - accuracy: 0.3755 - val_loss: 1.0975 - val_accuracy: 0.3908
Epoch 86/200
3979/3979 - 141s - loss: 1.0816 - accuracy: 0.3768 - val_loss: 1.0911 - val_accuracy: 0.3870
Epoch 87/200
3979/3979 - 137s - loss: 1.0812 - accuracy: 0.3772 - val_loss: 1.0783 - val_accuracy: 0.4255
Epoch 88/200
3979/3979 - 142s - loss: 1.0810 -