地址：dtanalyse2/paper_std/Statistical properties of stock order books

## Statistical properties of stock order books: empirical results and models

Author: Jean-Philippe Bouchaud, Marc M´ezard, Marc Potters

Time: 2002.06

Journal: Quantitative finance 

Citations: 494

## 总结

本文分析了巴黎证券交易所三只股票的order数据，总结出如下性质：
1. 对于bid一方，新订单的价格与当前best bid price之差 服从幂律分布（power-law distribution），对于ask一方也是如此。
2. 新订单的volume近似服从gamma分布。






## 数据分析

#### Spread

由于对于买卖单的结论是一样的，我们只分析一方。假设$t$时刻的best bid为$b(t)$. 新的bid
价格为$b(t)-\Delta$，这里$\Delta$可以是正的也可以是负的。那么记$P(\Delta)$为价差为$\Delta$的订单数量，$P(\Delta)$可用函数$$P(\Delta)\sim\frac{\Delta_0^\mu}{(\Delta_1+\Delta)^{1+\mu}}$$来刻画，其中$\mu,\Delta_0,\Delta_1$为其他参数。取对数之后变成$$ln(P(\Delta))\sim \mu ln(\Delta_0)-(1+\mu)ln(\Delta_1+\Delta),$$为线性相关。

![1](img/1.png)

注意图中坐标为对数坐标轴，横轴为新订单价格与best bid price的价差（单位为tick），纵轴为顶订单数量。可以看出三只股票的价差与相应订单数量之间基本呈一条直线。

主要特点：在原价格附近的订单数量最多，并且订单数量随着价差增大会迅速下降。

#### Conditional average amount

文中定义volume为number of shares for a given order，这里改为amount。分析给定$\Delta$下的平均amount，作者观察到在1-20个tick之间，二者基本不相关，而大于20个tick之后，基本服从幂律分布$\bar V|_{\Delta}\sim \Delta^{-\nu}, \nu=1.5$。对于不同股票，20这个数字可能不同。

![2](img/2.png)

上图中大图x轴为对数，y轴正常，右上角小图是对数-对数坐标轴，横轴为spread $\Delta$, 纵轴为amount，可以看出1-20之间没有太多规律，而20之后呈幂律分布（小图中几乎为一条直线）

#### Amount

现在直接分析amount的分布。记$R(\Delta)$为amount为$V$的订单数量，可以用伽马分布$$R(V)\sim V^{\gamma -1} exp(-\frac{V}{V_0})$$来刻画，对于这几支股票，$\gamma=0.7-0.8$(应该是1.7-1.8，作者写错了), $V_0=2700.$

![3](img/3.png)

## 模拟

得到以上概率分布后，进行模拟实验。即新的订单价格以概率$P(\Delta)$随机出价（之前讨论的P为频数，需要变成概率），并设置一个最大值，amount则都设为1（作者说这个数值不关键，也考虑了其他amount的设置，得到了类似的结果）。同时以一定概率发生撤单或者market order的交易（即随机达成交易）。

最后模拟出来的结果如图：

![4](img/4.png)

图为conditional averaged amount，趋势与图2类似。因为限制了最大spread，轴的坐标有区别

In [3]:
import os, sys, datetime, logging
import matplotlib.pyplot as plt
import matplotlib
from plotly.subplots import make_subplots
import pandas as pd
import numpy as np
import plotly.express as px
import datetime
import calendar
from datetime import timedelta
import joblib
from tqdm import tqdm
from decimal import Decimal, getcontext
from scipy.optimize import minimize
sys.path.append(os.getcwd().split('paper_std')[0])
if '../' not in sys.path:
    sys.path.append('../')
    
from util.load_s3_data import *
from util.time_method import *
from util.convert_depth_format import convert_depth_format2
from func import *


svmem(total=65877331968, available=7075090432, percent=89.3, used=58118086656, free=6214111232, active=520818688, inactive=58416926720, buffers=92930048, cached=1452204032, shared=851968, slab=338874368)

In [None]:
import psutil
psutil.virtual_memory()

In [None]:
#读取数据
begin_time = datetime.datetime(2023, 3, 1, 0,tzinfo=TZ_8)
end_time = datetime.datetime(2023, 3, 1, 1,tzinfo=TZ_8)
exchange = 'okex'#'okex', 'binance','coinbase', 'FTX'
symbol = 'btc_usdt'
#origin_depth_data = get_cex_depth_origin(begin_time, end_time, exchange, symbol)
recover_depth_data = LoadS3Data.get_cex_depth(begin_time, end_time, exchange, symbol, head_num=50)

begin to recover depth at date 2023030100: 2024-03-25 03:40:20.133711
finish depth recovery at 2024-03-25 03:41:53.336109
begin to recover depth at date 2023030101: 2024-03-25 03:41:53.582853


In [None]:
# 改变字典结构
Bid=[]
Ask=[]
for dic in tqdm(recover_depth_data):
    bid_dic={}
    for i in range(len(dic['bids'])):
        bid_dic[dic['bids'][i]['p']]=dic['bids'][i]['s']
    Bid.append(bid_dic)
    ask_dic={}
    for i in range(len(dic['asks'])):
        ask_dic[dic['asks'][i]['p']]=dic['asks'][i]['s']
    Ask.append(ask_dic)

In [None]:
# 恢复成order数据&进一步数据处理

bid_changes = []
for i in tqdm(range(len(Bid) - 1)):
    first_price_previous_dict = list(Bid[i].keys())[0] # 获取前一个字典的第一个price
    bid_changes.extend(compare_dicts(Bid[i], Bid[i+1], first_price_previous_dict))
    
ask_changes = []
for i in tqdm(range(len(Ask) - 1)):
    first_price_previous_dict = list(Ask[i].keys())[0] # 获取前一个字典的第一个price
    ask_changes.extend(compare_dicts(Ask[i], Ask[i+1], first_price_previous_dict))

bid_change = pd.DataFrame(bid_changes, columns=['price_difference', 'amount_change'])
ask_change = pd.DataFrame(ask_changes, columns=['price_difference', 'amount_change'])

# 筛选出amount_change大于0（小于0的为撤单，不考虑），bid price change小于0, ask price change 大于0
# 股票最小单位为1股，这里把amount最小单位变为0.0001
bid_change['amount_change'] = round(bid_change['amount_change'],4)*10000
bid_change = bid_change[bid_change['amount_change']>0]
bid_change = bid_change[bid_change['price_difference']<1]
bid_change['price_difference'] = (bid_change['price_difference']*-10).round(0)
ask_change['amount_change'] = round(ask_change['amount_change'],4)*10000
ask_change = ask_change[ask_change['amount_change']>0]
ask_change = ask_change[ask_change['price_difference']>-1]
ask_change['price_difference'] = (ask_change['price_difference']*10).round(0)



In [None]:
# Delta 分布图
plt.figure(figsize=(15, 10))

#ask 
plt.subplot(2, 2, 1)
frequency = ask_change['price_difference'].value_counts().sort_index()
plt.scatter(frequency.index, frequency.values, s=5)
plt.xlabel('Delta (tick)')
plt.ylabel('number of orders')
plt.title("ask")

#bid
plt.subplot(2, 2, 2)
frequency = bid_change['price_difference'].value_counts().sort_index()
plt.scatter(frequency.index, frequency.values, s=5)
plt.title("bid")
plt.xlabel('Delta (tick)')
plt.ylabel('number of orders')

#ask 
plt.subplot(2, 2, 3)
frequency = ask_change['price_difference'].value_counts().sort_index()
#拟合
initial_guess = [100,0.6,1000]
result = minimize(loss_delta, initial_guess,args=(frequency.index[10:],frequency.values[10:]), method='L-BFGS-B', bounds=[(0,None), (0.1, 1),(0, None)])
a_estimated, b_estimated, c_estimated = result.x
print(f"Estimated parameters for ask: a={result.x[0]}, b={result.x[1]},c={result.x[2]}")
x_dense = np.linspace(min(frequency.index[10:]), max(frequency.index[10:]), 100)
predicted_dense = model_delta(x_dense, a_estimated, b_estimated, c_estimated)
plt.plot(x_dense, predicted_dense, label='Predicted', color='red')
plt.scatter(frequency.index, frequency.values, s=5)
plt.xlabel('Delta (tick)')
plt.ylabel('number of orders')
plt.title("ask")
plt.xscale('log')
plt.yscale('log')
#bid
plt.subplot(2, 2, 4)
frequency = bid_change['price_difference'].value_counts().sort_index()
initial_guess = [100,0.6,1000]
result = minimize(loss_delta, initial_guess,args=(frequency.index[10:],frequency.values[10:]), method='L-BFGS-B', bounds=[(0,None), (0.1, 1),(0, None)])
a_estimated, b_estimated, c_estimated = result.x
print(f"Estimated parameters for bid: a={result.x[0]}, b={result.x[1]},c={result.x[2]}")
x_dense = np.linspace(min(frequency.index[10:]), max(frequency.index[10:]), 100)
predicted_dense = model_delta(x_dense, a_estimated, b_estimated, c_estimated)
plt.plot(x_dense, predicted_dense, label='Predicted', color='red')
plt.scatter(frequency.index, frequency.values, s=5)
plt.title("bid")
plt.xlabel('Delta (tick)')
plt.ylabel('number of orders')
plt.xscale('log')
plt.yscale('log')
plt.show()

#### Delta分布图

由于没有order数据，用depth数据恢复出order数据，但是可能不够准确，例如相同的价格的order会被合并。最终结果和文中的结果不一样

图中横轴代表不同tick，纵轴代表对应tick下的order数量

也有一些有趣的结果，除了$\Delta=0$时刻订单数量最多，即在best bid price/ best ask price上挂单。此外图中还存在两个峰值，分别在40tick与100tick左右（对于bid是60和130）。 不同时间的情况均是如此，峰值位置有所不同，但是都有两个峰。

分析不同时间段可以发现，峰值的位置和比特币价格有一定关系，第一个峰值大约是在万一的点。

In [None]:
# amount 分布图
plt.figure(figsize=(15, 10))

#ask 
plt.subplot(2, 2, 1)
frequency = ask_change['amount_change'].value_counts().sort_index()
plt.scatter(frequency.index, frequency.values, s=5)
plt.xlabel('amount')
plt.ylabel('number of orders')
plt.title("ask")

#bid
plt.subplot(2, 2, 2)
frequency = bid_change['amount_change'].value_counts().sort_index()
plt.scatter(frequency.index, frequency.values, s=5)
plt.title("bid")
plt.xlabel('amount')
plt.ylabel('number of orders')


#ask 
plt.subplot(2, 2, 3)
log_amount_ask = np.log10(ask_change['amount_change'])

# 计算直方图数据，每间隔0.1
bins = np.arange(start=log_amount_ask.min(), stop=log_amount_ask.max(), step=0.1)
hist, bin_edges = np.histogram(log_amount_ask, bins=bins)

# 绘制柱状图
plt.bar(bin_edges[:-1], hist, width=0.1, edgecolor="black")
# 绘制gamma分布拟合曲线
initial_guess = [1.7,3,1000]
result = minimize(loss_amount, initial_guess,args=(bin_edges[1:],hist), method='L-BFGS-B', bounds=[(1.1,4), (0, None),(0, None)])
a_estimated, b_estimated, c_estimated = result.x
print(f"Estimated parameters for ask: a={result.x[0]}, b={result.x[1]},c={result.x[2]}")
x_dense = np.linspace(min(bin_edges[1:]), max(bin_edges[1:]), 100)
predicted_dense = model_amount(x_dense, a_estimated, b_estimated, c_estimated)
plt.plot(x_dense, predicted_dense, label='Predicted', color='red')

plt.xlabel('amount')
plt.ylabel('number of orders')
plt.title("ask")

#bid
plt.subplot(2, 2, 4)
log_amount_bid = np.log10(bid_change['amount_change'])

# 计算直方图数据，每间隔0.1
bins = np.arange(start=log_amount_bid.min(), stop=log_amount_bid.max(), step=0.1)
hist, bin_edges = np.histogram(log_amount_bid, bins=bins)

# 绘制柱状图
plt.bar(bin_edges[:-1], hist, width=0.1, edgecolor="black", label='Observed')
# 绘制gamma分布拟合曲线

result = minimize(loss_amount, initial_guess,args=(bin_edges[1:],hist), method='L-BFGS-B', bounds=[(1.1,4), (0, None),(0, None)])
a_estimated, b_estimated, c_estimated = result.x
print(f"Estimated parameters for bid: a={result.x[0]}, b={result.x[1]},c={result.x[2]}")
x_dense = np.linspace(min(bin_edges[1:]), max(bin_edges[1:]), 100)
predicted_dense = model_amount(x_dense, a_estimated, b_estimated, c_estimated)
plt.plot(x_dense, predicted_dense, label='Predicted', color='red')

plt.xlabel('amount')
plt.ylabel('number of orders')
plt.title("bid")
plt.legend()

plt.show()
plt.show()

#### amount 分布图

图中横轴代表不同amount，纵轴代表对应amount下的order数量。

由于股票的最小单位是1股，币的交易没有最小单位，这里将0.0001视为一股（将原始amount保留4位小数，再乘10000）。上面两幅图图为原坐标下，下面两幅图为对数坐标下。对数坐标下的amount分布可以看出峰值,和gamma分布图比较类似。不同时间段形状不同，之前还尝试了其他日期，发现有些符合，有些不符合gamma分布。


In [None]:
# conditional amount分布图
plt.figure(figsize=(15, 10))

#ask 
plt.subplot(2, 2, 1)
mean_amount = ask_change.groupby('price_difference')['amount_change'].mean().reset_index()
plt.scatter(mean_amount['price_difference'], mean_amount['amount_change'], s=5)
plt.xlabel('Delta')
plt.ylabel('average amount')
plt.title("ask")

#bid
plt.subplot(2, 2, 2)
mean_amount = bid_change.groupby('price_difference')['amount_change'].mean().reset_index()
plt.scatter(mean_amount['price_difference'], mean_amount['amount_change'], s=5)
plt.xlabel('Delta')
plt.ylabel('average amount')
plt.title("bid")

#ask 
plt.subplot(2, 2, 3)
mean_amount = ask_change.groupby('price_difference')['amount_change'].mean().reset_index()
plt.scatter(mean_amount['price_difference'], mean_amount['amount_change'], s=5)
plt.xlabel('log(Delta)')
plt.ylabel('average amount')
plt.title("ask")
plt.xscale('log')

#bid
plt.subplot(2, 2, 4)
mean_amount = bid_change.groupby('price_difference')['amount_change'].mean().reset_index()
plt.scatter(mean_amount['price_difference'], mean_amount['amount_change'], s=5)
plt.xlabel('log(Delta)')
plt.ylabel('average amount')
plt.title("bid")
plt.xscale('log')

plt.show()

#### conditional average amount分布图

不同Delta下的平均amount图。横轴为不同delta，纵轴为平均amount。

在靠近和远离best price的地方有一些大单，中间位置变化量都很小，且远离best price的地方波动较大。

In [None]:
# sum amount分布图
plt.figure(figsize=(15, 5))

#ask 
plt.subplot(1, 2, 1)
mean_amount = ask_change.groupby('price_difference')['amount_change'].sum().reset_index()
plt.scatter(mean_amount['price_difference'], mean_amount['amount_change'],s=5)
plt.xlabel('Delta')
plt.ylabel('amount_sum')
plt.title("ask")

#bid
plt.subplot(1, 2, 2)
mean_amount = bid_change.groupby('price_difference')['amount_change'].sum().reset_index()
plt.scatter(mean_amount['price_difference'], mean_amount['amount_change'],s=5)
plt.xlabel('Delta')
plt.ylabel('amount_sum')
plt.title("bid")



plt.show()

#### conditional sum amount分布图

上面的是average amount，这个图画出了sum amount。图中横轴代表不同tick，纵轴代表对应tick下的amount之和（$\sum order_i \times amount_i$）

这个图和第一张图（Delta分布图几乎一致）0处的sum amount还是遥遥领先，此外两个订单数量的峰值处订单量之和也很高。