## Features_quantitative indicators

## Features
- autocorrelation
- order book imbalance

__Load data__

__Data Source:__ lob_sample_data.parquet

In [1]:
import pandas as pd

df = pd.read_parquet('lob_sample_data.parquet', engine='pyarrow')

In [2]:
df.head()

Unnamed: 0,Timestamp,Exchange,Bid,Ask,Date,Mid_Price
0,0.0,Exch0,[],[],2025-01-02,
1,0.279,Exch0,"[[1, 6]]",[],2025-01-02,
2,1.333,Exch0,"[[1, 6]]","[[800, 1]]",2025-01-02,400.5
3,1.581,Exch0,"[[1, 6]]","[[799, 1]]",2025-01-02,400.0
4,1.643,Exch0,"[[1, 6]]","[[798, 1]]",2025-01-02,399.5


In [3]:
import ast

#convert string to lists
df['Bid'] = df['Bid'].apply(ast.literal_eval)
df['Ask'] = df['Ask'].apply(ast.literal_eval)

In [4]:
#drop missing rows in mid price
df = df.dropna(subset=['Mid_Price'])

__Autocorrelation__

Autocorrelation measures the similarity between observations as a function of the time lag between them. It can capture momentum or mean-reversion effects.

defined as

$$
\rho_k = \frac{\sum_{t=1}^{T-k}(y_t - \bar{y})(y_{t+k} - \bar{y})}{\sum_{t=1}^{T}(y_t - \bar{y})^2}
$$

where-
- $y_t$ is the observation at time $t$
- $\bar{y}$ is the mean of the observations
- $T$ is the total number of observations

A model can learn from the cyclic patterns or the persistence of trends in price movements.

In [None]:
#calculate  for different lags
for lag in [10, 20]:  #we can adjust these
    df[f'Autocorr_Lag_{lag}'] = df['Mid_Price'].rolling(window=lag+1).apply(lambda x: x.autocorr(lag), raw=False)
    
#this isn't quite working-need to revisit

__Order book imbalance__

Order book imbalance reflects the proportion of buy to sell orders and can indicate potential price movements based on supply and demand dynamics.

calc as:

$$
\text{Imbalance} = \frac{Q_{\text{bid}} - Q_{\text{ask}}}{Q_{\text{bid}} + Q_{\text{ask}}}
$$

where:
- $Q_{\text{bid}}$ is the total quantity of buy orders,
- $Q_{\text{ask}}$ is the total quantity of sell orders.

Positive imbalance suggests a predominance of buy orders, which could indicate upward pressure on prices, while a negative imbalance suggests the opposite.

In [7]:
import pandas as pd
import numpy as np

#calc total quantity
def total_quantity(price_qty_list):
    if not price_qty_list:
        return 0
    total_qty = sum(qty for _, qty in price_qty_list)
    return total_qty

#calc total bid ask quantities
df['Total_Bid_Qty'] = df['Bid'].apply(total_quantity)
df['Total_Ask_Qty'] = df['Ask'].apply(total_quantity)

#calc order book imbalance
df['Order_Book_Imbalance'] = (df['Total_Bid_Qty'] - df['Total_Ask_Qty']) / (df['Total_Bid_Qty'] + df['Total_Ask_Qty'])