# Analyzing Orderbook data and the impact on Cryptocurrency `Price` and `Volume`

_Author: Chia E Tungom | Email: chemago99@yahoo.com_


The main goal is to analyze cryptocurrency orderbook data, the impact of price and volume to identify key insights to market dynamics. We do this using python and our analytics process is laidout as follows:

1. __Data Acquisition and Preprocessing:__ We use bybit for this
2. __Visualization:__ We create visual charts to visualize price and volume changes
3. __EDA:__ We do Exploratory Data Analysis on Price and Volume Dynamics
4. __Market Patterns:__ We do further anaalysis and feature extraction to find price patterns from volume and price changes
5. __Price Prediction:__ As a Bonus, We use XGBoost learn a model that can predict future prices
6. __Conclusion:__ A conclusion based on insights derived from the analysis and it's importance to market making strategy.



## 1. Data Acquisition and Preprocessing: 
- Our first goal is to get orderbook data for a given Asset
- We get our data using the __bybit API__ (doesn't require an API key and data is only available for top Cryptocurrency pairs)
- The acquired dataset consist of the following variables
    - `Asset`, `Time`, `ID`, `Size`, `Side`
- We further do some statistical aggregation to the incoming data computing features which we deem necessary for analyzing price impact. The aggregate feature variables include
    - `MidPrice`, `Spread`, `Total Size`, `Total Bit Size`, `Total Ask Size`, `Total Size Change`, `Bit Size Change`, `Ask Size Change`
- Note that we store the aggregate data in a saperate DataFrame to avoid redundacy. The Orderbook Data and Aggregate data can be merged on the ID column (unique in the aggreagate DataFrame) if further analysis is needed.
- Note that the data we get is coming in in real time.

### 1.1 Data Acquisition Parameters

For the orderbook class (python class built to crawl orderbook data) we need to define the following parameters based on the `bybit API`

1. `symbol:` This is the symbol of the asset pair you wish to get the data
2. `category:` This is the market data you want to get e.g spot, futures etc
3. `depth:` This is how deep we want our order book to be on the bid and ask side
4. `testnet:` This states if we want to use the testnet or not 

After Initializing the orderbook class, parameters need to defined which determine the nature of the data. The following parameters are needed

1. `frames:` This is the number of snapshots we want to get from the orderbook (we only consider frames for now, time frame might be a better option which can be defined in the time_delay parameter)
2. `time_delay:` This is how long we want to wait before making another call (typically 1s but depends of the asset). If the delay is too long, we can miss some snapshots. 

when we run the function in the OrderBook Class, the Orderbook data and Aggregate data are generated

In [107]:
import requests 
import json 
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
import time
import numpy as np
from pybit.unified_trading import HTTP

#--- import data utility classes and functions-----
import DataUtils

In [109]:
# define parameters for data crawling and preprocessing

symbol = "BTCUSDT"
category = "spot"
depth = 40
testnet = True

frames = 100
time_delay = 0.5

Book = DataUtils.OrderBook(symbol=symbol, depth=depth, category = category)
OrderBookData = Book.getOrderBookData(frames=frames, time_delay=time_delay)
AggData = Book.AnalyticsData

The orderbook has 100 snapshorts


# 2. Visualizing orderbook data (demand and supply)

The OrderBook gives us information about current demand and supply of a given asset. It displays the `LIMIT BIT AND SELL ORDERS` (these are orders that can only execute at that specific price). To understand market action, we visualize the orderbook to see how demand and supply is changing over time and how this affects price movement. To do this we use __plotly__, a powerful python visualization library. Before we visualize keep in mind that.

1. For the Bit or Buy side, we expect higher demand at lower asset prices (these guys want to buy low)
2. For the Ask or Sell side, we expect higher supply at higher asset prices (these guys want to sell high)
3. `LIMIT BIT or SELL ORDERS` can be cancelled which changes order size 
4. When the Bit guys are desperate or an aggresive buyer places `MARKET BUY ORDERS`, the price of the asset will rise 
5. When the Ask guys are desperate or an aggresive Seller places `MARKET SELL ORDERS`, the price of the asset will fall

Note: The dataset only contains L1 and L2 data 


In [110]:
import plotly.express as px

OrderBookData= OrderBookData.astype(str)

fig = px.bar(OrderBookData, y='Size', x='Price', text='Size', animation_frame="ID", color='Side')
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide', title='L2 Orderbook Data Simulation of Price Vs Size',
    xaxis=dict(title='Asset Price'),
    yaxis=dict(title='Order Size'))

fig.update_layout(template='plotly_dark')
fig.show()

### 2.1 Binarizing prices

Since Ask Price and Bid Prices are always changing dynamically and digital asset price quotes are not descritized to some given value, we want to make bins so that we see how the volume changes for a given price range

- This can help us capture price activity in a small price interval
- This can also help us visually see how changes in volume within a price range might move the price
- This keeps our orderbook price static and so we can visually see the movement of the price.

We binarize and visualize the price volume movement above. To do this, start by 
1. Define the number of bins (this will affect and determine the price interval range). A good rule of thumb is to use the depth of the orderbook. This is important and can affect how we interprete the market and depends on the asset under analysis (recommend bin size should be somewhere around the depth of orderbook)
doing the following

2. If you would like to have a bin within a predetermined price range, set the lower and upper bound and compute the bin edges (we don't cover this here) 


In [98]:

bins = depth*10

BinOrderBookData = DataUtils.BinPrices(data = OrderBookData, bins = bins)

In [99]:

fig = px.bar(BinOrderBookData, y='bin_size', x='bin_price', text='bin_size', animation_frame="ID", color='Side')
fig.update_traces(texttemplate='%{text:.2s}', textposition='outside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide', title='L2 Orderbook Data Simulation of Bin Prices Vs Size',
    xaxis=dict(title='Asset Price'),
    yaxis=dict(title='Order Size'))

fig.update_layout(template='plotly_dark')
fig.show()


### 2.2 Binarizing prices From Histogram

We now visualize our orderbook using a histogram. The difference between binarization and binning is that, a histogram automatically bins the prices. In the plot, we can define the number of Bins 

In [100]:
bins = depth*10
fig = px.histogram(OrderBookData, x="Price", y="Size", color="Side", marginal="rug", nbins=bins, hover_data=OrderBookData.columns, animation_frame="ID")
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide', title='Orderbook Histogram Data Simulation',
    xaxis=dict(title='Asset Price'),
    yaxis=dict(title='Size'))

fig.update_layout(template='plotly_dark')
fig.show()

# 3. EDA: Analysing Price Volume Dynamics 

In this section we take a deeper look at how volume and price features may affects the price of an asset. The `Aggregate Data` is used for this Analysis. We achieve this by Doing the following

1. Scatter Matrix: This shows us the relaionship between our vaariables
2. Price Vs Volume: This gives us indight into how Volume and price change over time
3. Price Vs Volume Change: This shows us the relationship between volume change and price over time
4. Price Vs Spread; This shows how price and spread might be related. This is because for a given market depth, there is only so much spread.

Note that we don't draw any conclusions as these factors depend on the coin and state of market.

In [101]:

# ['ID','MidPrice',	'Spread','TotalSize','TotalSizeChange',	'TotalBidSize',	'TotalAskSize',	'TotalBidSizeChange', 'TotalAskSizeChange', 'L1BitPrice', 'L1AskPrice',	'L1BitSize', 'L1AskSize', 'L1BitSizeChange', 'L1AskSizeChange']
AggData['ID']= AggData['ID'].astype(str)
fig = px.scatter_matrix(AggData, dimensions=['MidPrice', 'Spread','TotalSize','TotalBidSize','TotalAskSize', 'L1BitPrice', 'L1AskPrice', 'L1BitSize', 'L1AskSize'], )#color="ID")

fig.update_layout(
    width=1800,  # specify the width in pixels
    height=1000,  # specify the height in pixels
    template='plotly_dark')
fig.show()

In [102]:
import plotly.graph_objects as go
fig = go.Figure()

# Add trace for y1
fig.add_trace(go.Scatter(x=AggData['ID'], y=AggData['MidPrice'], name='Mid Price'))

# Combine the y-values from TotalSize, TotalBidSize, and TotalAskSize
y_values = pd.concat([AggData['TotalSize'], AggData['TotalBidSize'], AggData['TotalAskSize']])

fig.add_trace(go.Scatter(x=AggData['ID'], y=AggData['TotalSize'], name='Total Size', yaxis='y2'))
fig.add_trace(go.Scatter(x=AggData['ID'], y=AggData['TotalBidSize'], name='Total Bid Size', yaxis='y3'))
fig.add_trace(go.Scatter(x=AggData['ID'], y=AggData['TotalAskSize'], name='Total Ask Size', yaxis='y4'))
right_y_range = [y_values.min(), y_values.max()]

# Configure the right y-axes
fig.update_layout(
    yaxis2=dict(title='Total Size', titlefont=dict(color='red'), side='right', overlaying='y', showgrid=False, range=right_y_range),
    yaxis3=dict(title='Total Bid Size', titlefont=dict(color='green'), side='right', overlaying='y', showgrid=False, range=right_y_range),
    yaxis4=dict(title='Total Ask Size', titlefont=dict(color='purple'), side='right', overlaying='y', showgrid=False, range=right_y_range),
    showlegend=True
)

fig.update_layout(margin=dict(l=50, r=50, t=50, b=50), 
                    yaxis2=dict(title_standoff=10), yaxis3=dict(title_standoff=25), yaxis4=dict(title_standoff=40), 
                    template='plotly_dark', title='Dynamics of Price Vs Order Size',)

# Show the chart
fig.show()


In [103]:

fig = go.Figure()

# Add trace for y1
fig.add_trace(go.Scatter(x=AggData['ID'], y=AggData['MidPrice'], name='Mid Price'))

# Combine the y-values from TotalSize, TotalBidSize, and TotalAskSize
y_values = pd.concat([AggData['TotalSizeChange'], AggData['TotalBidSizeChange'], AggData['TotalAskSizeChange']])
fig.add_trace(go.Scatter(x=AggData['ID'], y=AggData['TotalSizeChange'], name='Total Size Change', yaxis='y2'))
fig.add_trace(go.Scatter(x=AggData['ID'], y=AggData['TotalBidSizeChange'], name='Total Bid Size Change', yaxis='y3'))
fig.add_trace(go.Scatter(x=AggData['ID'], y=AggData['TotalAskSizeChange'], name='Total Ask Size Change', yaxis='y4'))

right_y_range = [y_values.min(), y_values.max()]

# Configure the right y-axes
fig.update_layout(
    yaxis2=dict(title='Total Size Change', titlefont=dict(color='red'), side='right', overlaying='y', showgrid=False, range=right_y_range),
    yaxis3=dict(title='Total Bid Size Change', titlefont=dict(color='green'), side='right', overlaying='y', showgrid=False, range=right_y_range),
    yaxis4=dict(title='Total Ask Size Change', titlefont=dict(color='purple'), side='right', overlaying='y', showgrid=False, range=right_y_range),
    showlegend=True
)

fig.update_layout(margin=dict(l=50, r=50, t=50, b=50), 
                    yaxis2=dict(title_standoff=10), yaxis3=dict(title_standoff=25), yaxis4=dict(title_standoff=40), 
                    template='plotly_dark', title='Dynamics of Price Vs Order Size Change',)

# Show the chart
fig.show()


In [104]:

fig = go.Figure()

fig.add_trace(go.Scatter(x=AggData['ID'], y=AggData['MidPrice'], name='Mid Price', xaxis='x'))
fig.add_trace(go.Scatter(x=AggData['ID'], y=AggData['Spread'], name='Spread', yaxis='y2'))


# Configure the right y-axes
fig.update_layout(
    yaxis2=dict(title='Spread', titlefont=dict(color='red'), side='right', overlaying='y', showgrid=False),
    showlegend=True
)

# Adjust the spacing to avoid overlapping axis values
fig.update_layout(margin=dict(l=50, r=50, t=50, b=50),  
                    template='plotly_dark', title='Dynamics of Price Vs Spread',)

# Show the chart
fig.show()

In [105]:
import plotly.express as px
AggData["Spread"] = AggData["Spread"].astype(float)
# Create the line plot using Plotly Express
fig = px.scatter(AggData, y="MidPrice", x="Spread")
fig.update_layout(template='plotly_dark')

# Show the plot
fig.show()

# 4. Market Patterns 

Here we primarily use L1 Data and it's features along with an aggregated dated on a predefined spread. By predefining a apead, we get an L1  orderbook aggregated to our spread size. The Idea here is to 

1. See how changes in Volume and in L1 data affects prices
2. See how the total volume on a defined spread affects volume

We define a spread data class to compute the features and filter the data. the class return a two datasets

1. L1 spread aggregated prderbook data: this can give us insight on which spread size gives the best innformation on price movement on a particular Asset. 
2. L2 orederbook data filtered  by spread

With these  we can analyse patterns based on spread size and highest bid and lowest sell price.

Note that the L1 Data was computed in the aggregated data in the orderbook class with columns 
- `L1BitPrice, L1AskPrice, L1BitSize, L1AskSize, L1BitSizeChange, L1AskSizeChange`
    

In [106]:
# ['ID','MidPrice',	'L1BitPrice', 'L1AskPrice',	'L1BitSize', 'L1AskSize', 'L1BitSizeChange', 'L1AskSizeChange']

import plotly.graph_objects as go

# Create the line chart with left y-axis
fig = go.Figure()

# Add trace for y1
fig.add_trace(go.Scatter(x=AggData['ID'], y=AggData['MidPrice'], name='Mid Price'),)

# Combine the y-values from TotalSize, TotalBidSize, and TotalAskSize
y_values = pd.concat([AggData['L1AskPrice'], AggData['L1BitPrice']])

fig.add_trace(go.Scatter(x=AggData['ID'], y=AggData['L1BitPrice'], name='L1 Bid Price', yaxis='y2'))
fig.add_trace(go.Scatter(x=AggData['ID'], y=AggData['L1AskPrice'], name='L1 Ask Price', yaxis='y3'))

right_y_range = [y_values.min(), y_values.max()]

# Configure the right y-axes
fig.update_layout(
    yaxis2=dict(title='L1 Bid Price', titlefont=dict(color='red'), side='right', overlaying='y', showgrid=False, range=right_y_range),
    yaxis3=dict(title='L1 Ask Price', titlefont=dict(color='green'), side='right', overlaying='y', showgrid=False, range=right_y_range),
    showlegend=True
)


# Adjust the spacing to avoid overlapping axis values
fig.update_layout(margin=dict(l=50, r=50, t=50, b=50),
                    yaxis2=dict(title_standoff=10), yaxis3=dict(title_standoff=25), yaxis4=dict(title_standoff=40),  
                    template='plotly_dark', title='Dynamics of Price Vs L1 Volume',)
fig.update_yaxes(range=right_y_range)

# Show the chart
fig.show()

In [14]:

fig = go.Figure()

# Add trace for y1
fig.add_trace(go.Scatter(x=AggData['ID'], y=AggData['MidPrice'], name='Mid Price'))

# Combine the y-values from TotalSize, TotalBidSize, and TotalAskSize
y_values = pd.concat([AggData['L1AskSize'], AggData['L1BitSize']])

fig.add_trace(go.Scatter(x=AggData['ID'], y=AggData['L1BitSize'], name='L1 Bid Size', yaxis='y2'))
fig.add_trace(go.Scatter(x=AggData['ID'], y=AggData['L1AskSize'], name='L1 Ask Size', yaxis='y3'))

right_y_range = [y_values.min(), y_values.max()]

# Configure the right y-axes
fig.update_layout(
    yaxis2=dict(title='L1 Bid Size', titlefont=dict(color='red'), side='right', overlaying='y', showgrid=False, range=right_y_range),
    yaxis3=dict(title='L1 Ask Size', titlefont=dict(color='green'), side='right', overlaying='y', showgrid=False, range=right_y_range),
    showlegend=True
)

# Adjust the spacing to avoid overlapping axis values
fig.update_layout(margin=dict(l=50, r=50, t=50, b=50),
                    yaxis2=dict(title_standoff=10), yaxis3=dict(title_standoff=25), yaxis4=dict(title_standoff=40),  
                    template='plotly_dark', title='Dynamics of Price Vs L1 Volume',)

# Show the chart
fig.show()

In [39]:

fig = go.Figure()

# Add trace for y1
fig.add_trace(go.Scatter(x=AggData['ID'], y=AggData['MidPrice'], name='Mid Price'))

# Combine the y-values from TotalSize, TotalBidSize, and TotalAskSize
y_values = pd.concat([AggData['L1AskSizeChange'], AggData['L1BitSizeChange']])

fig.add_trace(go.Scatter(x=AggData['ID'], y=AggData['L1BitSizeChange'], name='L1 Bid Size Change', yaxis='y2'))
fig.add_trace(go.Scatter(x=AggData['ID'], y=AggData['L1AskSizeChange'], name='L1 Ask Size Change', yaxis='y3'))

right_y_range = [y_values.min(), y_values.max()]

# Configure the right y-axes
fig.update_layout(
    yaxis2=dict(title='L1 Bid Change', titlefont=dict(color='red'), side='right', overlaying='y', showgrid=False, range=right_y_range),
    yaxis3=dict(title='L1 Ask Change', titlefont=dict(color='green'), side='right', overlaying='y', showgrid=False, range=right_y_range),
    showlegend=True
)

# Adjust the spacing to avoid overlapping axis values
fig.update_layout(margin=dict(l=50, r=50, t=50, b=50),
                    yaxis2=dict(title_standoff=10), yaxis3=dict(title_standoff=25), yaxis4=dict(title_standoff=40),  
                    template='plotly_dark', title='Dynamics of Price Vs L1 Volume Change',)

# Show the chart
fig.show()

### 4.2 Computing a Desired Spread data

To compute a defined Spread based level data we need to do

1. define the spread percentage from a known mid price
2. compute the statistics using the spread data
3. change spread sizes to see which provides the most insight.

The spread price step size is given by $ Step = MidPrice * (SpreadPercentage/100) $. We use this to compute

1. Minimum Bid Price and Mean: 
    - $ MinBidPrice =  MidPrice - Step $
    - The mean is the average of the L2 Bid within the spread
2. Max Ask Price and Mean: 
    - $ MaxAskPrice =  MidPrice + Step $ 
    - The mean is the average of the L2 Ask within the spread

We obtain two datasets from the newly defined spread

1. Aggregated Data: Contains Data for one snapshot with varaibles aggregated appropriately
2. Filtered Data: Contains the Data within the defined Spread

The aggregate data has the following columns
- `ID, Time, BidPrice, BidPriceChange, MidPrice, Spread, TotalSize, TotalSizeChange, TotalBidSize, TotalBidSizeChange, AskPrice, AskPriceChange, TotalAskSize, TotalAskSizeChange, SpredReduced`

In [43]:

OrderBookData['MidPrice'] = OrderBookData['MidPrice'].astype(float).round(4)
OrderBookData.head()

Unnamed: 0,Asset,Time,ID,Price,Size,Side,MidPrice,Spread
39,XRPUSDT,1970-01-01 00:28:05.934180451,604934,0.5201,2132.32,Bid,0.5286,3.5185541242068794
38,XRPUSDT,1970-01-01 00:28:05.934180451,604934,0.5203,2474.18,Bid,0.5286,3.5185541242068794
37,XRPUSDT,1970-01-01 00:28:05.934180451,604934,0.5205,1447.54,Bid,0.5286,3.5185541242068794
36,XRPUSDT,1970-01-01 00:28:05.934180451,604934,0.5207,1359.3,Bid,0.5286,3.5185541242068794
35,XRPUSDT,1970-01-01 00:28:05.934180451,604934,0.5208,59.65,Bid,0.5286,3.5185541242068794


In [73]:
spread = 3.5/4

SpreadModel = DataUtils.SpreadData(OrderBookData, percentage=spread)
SpreadModel.getData()
LAggData = SpreadModel.AggregateData
FilteredSpreadData = SpreadModel.SpreadData

In [54]:

fig = go.Figure()

# Add trace for y1
fig.add_trace(go.Scatter(x=LAggData['ID'], y=LAggData['MidPrice'], name='Mid Price'))

y_values = pd.concat([LAggData['TotalSize'], LAggData['TotalBidSize'], LAggData['TotalAskSize']])

fig.add_trace(go.Scatter(x=LAggData['ID'], y=LAggData['TotalSize'], name='TotalSize', yaxis='y2'))
fig.add_trace(go.Scatter(x=LAggData['ID'], y=LAggData['TotalBidSize'], name='TotalBidSize', yaxis='y3'))
fig.add_trace(go.Scatter(x=LAggData['ID'], y=LAggData['TotalAskSize'], name='TotalAskSize', yaxis='y4'))


right_y_range = [y_values.min(), y_values.max()]

# Configure the right y-axes
fig.update_layout(
    yaxis2=dict(title='TotalSize', titlefont=dict(color='red'), side='right', overlaying='y', showgrid=False, range=right_y_range),
    yaxis3=dict(title='TotalBidSize', titlefont=dict(color='green'), side='right', overlaying='y', showgrid=False, range=right_y_range),
    yaxis4=dict(title='TotalAskSize', titlefont=dict(color='purple'), side='right', overlaying='y', showgrid=False, range=right_y_range),
    showlegend=True
)


# Adjust the position of the right y-axis titles
fig.update_layout(
    yaxis2=dict(title_standoff=10),
    yaxis3=dict(title_standoff=25),
    yaxis4=dict(title_standoff=40)
)
fig.update_layout(template='plotly_dark', title = "Dynamics of Price Vs L1 Aggregated Volume")

# Show the chart
fig.show()


# 5. Price Prediction

Here we want to build a simple regression model to predict the future MidPrice of an Asset. We use XGBoost for our mode and you can play with aany of our prepaared datasets.

We can Try the following Datasets

1. OrderBookData 
2. AggData 
3. BinOrderBookData
4. LAggData
5. FilteredSpreadData 

In [74]:

from sklearn.model_selection import train_test_split
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error

import warnings
# Ignore all warnings
warnings.filterwarnings("ignore")

In [75]:
MLData = LAggData.copy(deep=True)
print(MLData.columns)
print(MLData.dtypes)

Index(['ID', 'Time', 'BidPrice', 'BidPriceChange', 'MidPrice', 'Spread',
       'TotalSize', 'TotalSizeChange', 'TotalBidSize', 'TotalBidSizeChange',
       'AskPrice', 'AskPriceChange', 'TotalAskSize', 'TotalAskSizeChange',
       'SpredReduced'],
      dtype='object')
ID                     object
Time                   object
BidPrice              float64
BidPriceChange        float64
MidPrice              float64
Spread                float64
TotalSize             float64
TotalSizeChange       float64
TotalBidSize          float64
TotalBidSizeChange    float64
AskPrice              float64
AskPriceChange        float64
TotalAskSize          float64
TotalAskSizeChange    float64
SpredReduced            int64
dtype: object


In [76]:
floatCols = ['BidPrice', 'BidPriceChange', 'MidPrice', 'Spread',
       'TotalSize', 'TotalSizeChange', 'TotalBidSize', 'TotalBidSizeChange',
       'AskPrice', 'AskPriceChange', 'TotalAskSize', 'TotalAskSizeChange']
BoolCols = []
IntCols = ['SpredReduced']
Dropcols = []

features = floatCols + IntCols + BoolCols
label = 'MidPrice'

features.remove(label)

# Convert the columns to float
MLData[floatCols] = MLData[floatCols].astype(float).round(5)
MLData[IntCols] = MLData[IntCols].astype(int)
MLData[BoolCols] = MLData[BoolCols].astype(bool)
MLData = MLData.drop(columns=Dropcols)
features


['BidPrice',
 'BidPriceChange',
 'Spread',
 'TotalSize',
 'TotalSizeChange',
 'TotalBidSize',
 'TotalBidSizeChange',
 'AskPrice',
 'AskPriceChange',
 'TotalAskSize',
 'TotalAskSizeChange',
 'SpredReduced']

In [77]:
def getTrainSplitData(df, train_size = 0.7):

    df['ID'] = df['ID'].astype(int)  # Convert ID column to int
    df = df.sort_values('ID')  # Sort the dataframe by ID
    df = df.reset_index(drop=True)  # Reset the index in ascending order
    last_index = int(len(df) * train_size)  # Calculate the index corresponding to the last 70% of the data
    last_id = df.iloc[last_index]['ID'] 

    train = df.loc[df.ID < last_id]
    test = df.loc[df.ID >= last_id]

    return train, test

MLData['MidPrice'] = MLData['MidPrice'].astype(float).round(5)
train, test = getTrainSplitData(MLData, train_size = 0.6)

trace1 = go.Scatter(x=train['Time'], y=train['MidPrice'], name='Train', line=dict(color='blue'))
trace2 = go.Scatter(x=test['Time'], y=test['MidPrice'], name='Test', line=dict(color='green'))

# Combine the traces into a single figure
fig = go.Figure(data=[trace1, trace2])

# Configure the layout
fig.update_layout(
    title='Price Time Series',
    yaxis=dict(title='Mid Price'),
    xaxis=dict(title='Time', type='category', tickmode='linear')
)
fig.update_layout(template='plotly_dark')

fig.show()

In [78]:

X_train, y_train = train[features], train[label]
X_test, y_test = test[features], test[label]

reg = XGBRegressor(base_score=0.5, booster='dart',   
                           n_estimators=1000,
                           early_stopping_rounds=50,
                           objective='reg:squarederror',
                           max_depth=3,
                           learning_rate=0.01)
reg.fit(X_train, 
        y_train,
        eval_set=[(X_train, y_train), (X_test, y_test)],
        eval_metric='mae',
        verbose=20)

[0]	validation_0-mae:0.03011	validation_1-mae:0.03057
[20]	validation_0-mae:0.02464	validation_1-mae:0.02510
[40]	validation_0-mae:0.02016	validation_1-mae:0.02058
[60]	validation_0-mae:0.01650	validation_1-mae:0.01688
[80]	validation_0-mae:0.01351	validation_1-mae:0.01385
[100]	validation_0-mae:0.01105	validation_1-mae:0.01136
[120]	validation_0-mae:0.00905	validation_1-mae:0.00932
[140]	validation_0-mae:0.00740	validation_1-mae:0.00765
[160]	validation_0-mae:0.00606	validation_1-mae:0.00628
[180]	validation_0-mae:0.00496	validation_1-mae:0.00515
[200]	validation_0-mae:0.00406	validation_1-mae:0.00423
[220]	validation_0-mae:0.00333	validation_1-mae:0.00347
[240]	validation_0-mae:0.00272	validation_1-mae:0.00284
[260]	validation_0-mae:0.00223	validation_1-mae:0.00233
[280]	validation_0-mae:0.00183	validation_1-mae:0.00191
[300]	validation_0-mae:0.00149	validation_1-mae:0.00157
[320]	validation_0-mae:0.00122	validation_1-mae:0.00129
[340]	validation_0-mae:0.00100	validation_1-mae:0.0010

In [79]:
predictions = reg.predict(X_test)
test = test.reset_index().drop('index', axis=1)
test['predictions'] = pd.Series(predictions)
test['predictions'] = test['predictions'].round(5)

In [80]:
y_actual = test['MidPrice']
mse = mean_squared_error(y_actual, predictions)
rmse = mse**0.5
print("Root Mean Squared Error:", round(rmse,4) , "Mean Squared Error is ", round(mse,5))

Root Mean Squared Error: 0.0001 Mean Squared Error is  0.0


In [81]:


# Plotting the actual and predicted values using Plotly
fig = px.line(MLData, x='Time', y='MidPrice', title='Actual vs. Predicted Price')
fig.add_scatter(x=test['Time'], y=test['predictions'], mode='lines', name='Predictions')

# Customize the layout
fig.update_layout(
    title='Price Time Series',template='plotly_dark',
    yaxis=dict(title='Mid Price'),
    xaxis=dict(title='Time', type='category', tickmode='linear')
)

# Show the plot
fig.show()


# 6. Conclusion

We draw our conclusion in this section with regards to a simple market making strategy. Let's consider the market making strategy in which limit buy and limit sell orders are placed on the order book at prices relative to the mid-price with spreads determined by bid_spread and ask_spread. At regular time intervals, the orders are refreshed with new spreads and order amounts. The strategy aims to provide liquidity and profit from the bid-ask spread by continuously adjusting and maintaining orders on the order book. Keep in mind that in a market, there are are several market makers as well and that we are competing to provide liquidity while making a profit. From Exploratory to Predictive machine learning, we have clearly seen patterns that emerge from volume action and spread that provide insights on market movement over time. 

- Orderbook analysis: From An EDA standpoints, doing L2 anaalysis in section 2, It's hard to conclude with certainty to what direction volume drives the market but using L2 binarized prices and histogram price volume plot, we can clearly see patterns emerge from LOB visual showing that majority of the time when supply is high at low ask prices, the asset price is most likely to fall. On the other hand when demand is high at high ask prices, the market turns to move upwards which makes. This is ultimately demand and supply at play and Understanding these dynamics allows for market maker stragy to fill limit ask and bid orders of given sizes within a given price ranges and spreads. 

- Price Volume Dynamics: From analysing price volume dynamics in section 3, it's pretty clear that ask and bid volume influences price movement but it is much clearer with L1 price change but the changes are very dynamic. We can see a fall in demand at high supply pulls down the price and vice versa. This give room for us to anticipate market movement when demand is too high, the market making inventory can be adjusted and a new ask side size at higher prices should be placed. Also in section 3 we see from the scatter matrix and line chart that for the same orderbook depth, a increase in the spread is correlated with the mid-price. Observing changes to the spread will be usefull for a market making strategy to determine it's ask and bid spread for an asset at any given moment.

- Spread tightening and Orderbook Adjustment: from Section 4 where we can dynamically adjust a spread to compute an L1 orderbook using statistical features, from the original orderbook, we can easily see where at which spread patterns easily emerge and determine the spread at which bit side and ask side orders for a market maker should be placed as. This is an important analysis section for a market makaing strategy and should be tuned carfeully for a given asset as false signals can easily arise due to outliers.

- Price Prediction for Market Making: In section 5, we perform price prediction using our data. This allows us to anticipate the movement of the mid-price, which is valuable information for a market making strategy. It informs us about which orders to close and which new orders to place at competitive prices before the market moves in that direction.

- Market Making as a Dynamic Decision-Making Process: In conclusion, market making is a decision-making process. By obtaining valuable information about market demand and supply at specific prices, resources can be allocated to meet these demands and supplies, aiming to avoid resource wastage and financial losses.

In [123]:
def compound_interest(principal, interest_rate, compounding_periods, time):
    """
    Calculates the compound interest.
    
    Arguments:
    principal -- the principal amount (initial investment or loan amount)
    interest_rate -- the annual interest rate (as a decimal)
    compounding_periods -- the number of times interest is compounded per year
    time -- the number of years
    
    Returns:
    The final amount (including principal and interest)
    """
    amount = principal * (1 + (interest_rate / compounding_periods)) ** (compounding_periods * time)
    return amount


principal_amount = 10000
annual_interest_rate = 0.05
compounding_periods = 2
years = 10

final_amount = compound_interest(principal_amount, annual_interest_rate, compounding_periods, years)
print("The final amount after", years, "years is:", final_amount)


The final amount after 10 years is: 16386.16440290394
