# Feature Engineering

### Alihmud Illiquidity
Alihmud Illiquidity is a measure of the cost of trading a security. It is calculated as the absolute value of the return divided by the dollar volume, adjusted by the closing price.

$$
\text{Amihud\_illiquidity} = \frac{|\Delta \text{LR}|}{\text{Volume} \times \text{Close}}
$$

Reference: Amihud, Y. (2002). Illiquidity and stock returns: cross-section and time-series effects. Journal of Financial Markets, 5(1), 31-56.

### TAQ Lambda (Kyle's Lambda)
Kyle's Lambda measures the elasticity of the price to the order flow. It is estimated by regressing the price change on the square root of the volume.

$$
\log(\Delta P) = \lambda \times \text{trade\_sign} \times \sqrt{V} + \epsilon
$$

Reference: Kyle, A. S. (1985). Continuous auctions and insider trading. Econometrica: Journal of the Econometric Society, 1315-1335.

### Proportion of Zero Returns
Proportion of Zero Returns is the number of transactions with 0 price change divided by the total number of transactions in a given window.

$$
\text{prop\_zero} = \frac{\sum_{i=1}^{n} \mathbb{1}(\Delta \text{Close}_i = 0)}{n}
$$

where $n$ is the window size.

### Modified Roll Measure
Modified Roll Measure is the square root of the negative covariance of the log of the price change, setting positive autocovariance to 0.

$$
\text{modified\_roll} = \sqrt{-\text{Cov}(\Delta \log(\text{Price}), \Delta \log(\text{Price}_{t-1}))}
$$

Reference: Roll, R. (1984). A simple implicit measure of the effective bid-ask spread in an efficient market. The Journal of Finance, 39(4), 1127-1139.

### Liquidity Replenishment Rate
Liquidity Replenishment Rate is the rate at which liquidity is being replenished, calculated as the difference in total size divided by the total size, over a specified window.

$$
\text{liquidity\_replenishment\_rate} = \frac{\text{Total\_Size}_{t} - \text{Total\_Size}_{t-1}}{\text{Total\_Size}}
$$

where Total_Size is the sum of bid and ask sizes.

These formulas and concepts are derived from various academic papers and are used to calculate liquidity indicators in financial markets.

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import talib as ta


In [4]:
import pandas as pd
import numpy as np
import talib as ta

class TechnicalIndicators:
    def __init__(self, data):
        self.data = data

    def add_momentum_indicators(self):
        self.data['RSI'] = ta.RSI(self.data['Close'], timeperiod=14)
        self.data['MACD'], self.data['MACD_signal'], self.data['MACD_hist'] = ta.MACD(self.data['Close'], fastperiod=12, slowperiod=26, signalperiod=9)
        self.data['Stoch_k'], self.data['Stoch_d'] = ta.STOCH(self.data['High'], self.data['Low'], self.data['Close'],
                                                              fastk_period=14, slowk_period=3, slowd_period=3)

    def add_volume_indicators(self):
        self.data['OBV'] = ta.OBV(self.data['Close'], self.data['Volume'])

    def add_volatility_indicators(self):
        self.data['Upper_BB'], self.data['Middle_BB'], self.data['Lower_BB'] = ta.BBANDS(self.data['Close'], timeperiod=20)
        self.data['ATR_1'] = ta.ATR(self.data['High'], self.data['Low'], self.data['Close'], timeperiod=1)
        self.data['ATR_2'] = ta.ATR(self.data['High'], self.data['Low'], self.data['Close'], timeperiod=2)
        self.data['ATR_5'] = ta.ATR(self.data['High'], self.data['Low'], self.data['Close'], timeperiod=5)
        self.data['ATR_10'] = ta.ATR(self.data['High'], self.data['Low'], self.data['Close'], timeperiod=10)
        self.data['ATR_20'] = ta.ATR(self.data['High'], self.data['Low'], self.data['Close'], timeperiod=20)

    def add_trend_indicators(self):
        self.data['ADX'] = ta.ADX(self.data['High'], self.data['Low'], self.data['Close'], timeperiod=14)
        self.data['+DI'] = ta.PLUS_DI(self.data['High'], self.data['Low'], self.data['Close'], timeperiod=14)
        self.data['-DI'] = ta.MINUS_DI(self.data['High'], self.data['Low'], self.data['Close'], timeperiod=14)
        self.data['CCI'] = ta.CCI(self.data['High'], self.data['Low'], self.data['Close'], timeperiod=5)

    def add_other_indicators(self):
        self.data['DLR'] = np.log(self.data['Close'] / self.data['Close'].shift(1))
        self.data['TWAP'] = self.data['Close'].expanding().mean()
        self.data['VWAP'] = (self.data['Volume'] * (self.data['High'] + self.data['Low']) / 2).cumsum() / self.data['Volume'].cumsum()

    def add_all_indicators(self):
        self.add_momentum_indicators()
        self.add_volume_indicators()
        self.add_volatility_indicators()
        self.add_trend_indicators()
        self.add_other_indicators()
        return self.data

In [5]:
data = pd.read_csv('xnas-itch-20230703.tbbo.csv')

# Preprocessing to create necessary columns
data['price']=data['price']/1e9
data['bid_px_00']=data['bid_px_00']/1e9
data['ask_px_00']=data['ask_px_00']/1e9

data['Close'] = data['price']
data['Volume'] = data['size']
data['High'] = data[['bid_px_00', 'ask_px_00']].max(axis=1)
data['Low'] = data[['bid_px_00', 'ask_px_00']].min(axis=1)
data['Open'] = data['Close'].shift(1).fillna(data['Close'])


ti = TechnicalIndicators(data)
df_with_indicators = ti.add_all_indicators()
market_features_df = df_with_indicators[35:]

In [6]:
from statsmodels.regression.rolling import RollingOLS

class AdditionalLiquidityIndicators:
    def __init__(self, data):
        self.data = data

    #Alihmud illiquidity is a measure of the cost of trading a security. It is calculated as the absolute value of the return divided by the dollar volume.
    def add_alihumd_illiquidity(self):
        self.data['Amihud_illiquidity'] = abs(self.data['DLR']) / self.data['Volume']*self.data['Close']

    #Kyle 1985 lambda measures the elasticity of the price. it's measured by regrssing the price change on the square root of the volume.
    #log(delta P) = lambda * trade_sign * np.sqrt(V) + epsilon
    #volatility is aggregated over a 5 minute window
    def TAQ_lambda(self):

        self.data['trade_sign'] = [1 if 'B' in x else -1 for x in self.data['side']]
        self.data['TAQ_lambda'] = RollingOLS(self.data['DLR'], self.data['trade_sign']*np.sqrt(self.data['Volume']), window=5).fit().params
    
    #Prop zero is the number of transactions with 0 price change divided by the total number of transactions in a given window.
    def prop_zero(self,window=5):
       self.data['prop_zero'] = self.data['Close'].diff().rolling(window=window).apply(lambda x: len(x[x==0])/len(x))

    #Modified Roll measures the square root of the negative autocovariance of the log of the price change and sets positive autocovariance to 0.
    #roll
    def modified_roll(self):
        self.data['log_price'] = np.log(self.data['Close'])
        self.data['modified_roll'] = np.sqrt(-self.data['log_price'].diff().cov(self.data['log_price'].shift(1).diff()))

    #The liquidity replenishment rate is the rate at which liquidity is being replenished. It is calculated as the difference in total size divided by the total size.
    def add_liquidity_replenishment_indicator(self,window):
        total_size = self.data['bid_sz_00'] + self.data['ask_sz_00']
        replenishment = total_size.diff()
        self.data['liquidity_replenishment_rate']=replenishment.rolling(window=window).mean() / total_size.rolling(window=window).mean()

    
    #Arrival Price is 
    def add_all_indicators(self):
        self.add_liquidity_replenishment_indicator(5)
        self.add_alihumd_illiquidity()
        self.TAQ_lambda()
        self.prop_zero()
        self.modified_roll()
        return self.data
  
    
ai = AdditionalLiquidityIndicators(market_features_df)
df_with_added_indicators = ai.add_all_indicators()
df_with_added_indicators = df_with_added_indicators.dropna()

df_with_added_indicators.head(35)
    

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.data['liquidity_replenishment_rate']=replenishment.rolling(window=window).mean() / total_size.rolling(window=window).mean()
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.data['Amihud_illiquidity'] = abs(self.data['DLR']) / self.data['Volume']*self.data['Close']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-

Unnamed: 0,ts_recv,ts_event,rtype,publisher_id,instrument_id,action,side,depth,price,size,...,DLR,TWAP,VWAP,liquidity_replenishment_rate,Amihud_illiquidity,trade_sign,TAQ_lambda,prop_zero,log_price,modified_roll
40,1688371229849940201,1688371229849775570,1,2,32,T,B,0,194.13,100,...,0.0,194.033659,194.028188,-0.191297,0.0,1,1.990793e-05,0.4,5.268528,8e-06
41,1688371230451172473,1688371230451005195,1,2,32,T,N,0,194.02,10,...,-0.000567,194.033333,194.02846,-0.235437,0.010997,-1,3.839841e-05,0.2,5.267961,8e-06
42,1688371230451172473,1688371230451005195,1,2,32,T,A,0,194.01,1,...,-5.2e-05,194.032791,194.028488,-0.309476,0.01,-1,2.153437e-05,0.2,5.26791,8e-06
43,1688371230451172473,1688371230451005195,1,2,32,T,A,0,194.01,100,...,0.0,194.032273,194.031073,-0.546232,0.0,-1,7.606348e-06,0.4,5.26791,8e-06
44,1688371230451995982,1688371230451829005,1,2,32,T,A,0,194.0,3075,...,-5.2e-05,194.031556,194.05967,0.003266,3e-06,-1,1.430982e-06,0.4,5.267858,8e-06
45,1688371230451995982,1688371230451829005,1,2,32,T,A,0,194.0,4,...,0.0,194.03087,194.059686,-0.0086,0.0,-1,1.474046e-06,0.4,5.267858,8e-06
46,1688371230451995982,1688371230451829005,1,2,32,T,A,0,194.0,5,...,0.0,194.030213,194.059705,-0.011288,0.0,-1,9.136121e-07,0.6,5.267858,8e-06
47,1688371230451995982,1688371230451829005,1,2,32,T,A,0,194.0,16,...,0.0,194.029583,194.059766,-0.012364,0.0,-1,8.932225e-07,0.8,5.267858,8e-06
48,1688371230566546422,1688371230566381995,1,2,32,T,N,0,194.09,10,...,0.000464,194.030816,194.059805,-0.015408,0.009002,-1,4.47465e-07,0.6,5.268322,8e-06
49,1688371237858109689,1688371237857944791,1,2,32,T,B,0,194.12,10,...,0.000155,194.0326,194.059824,-1.370153,0.003,1,-2.173219e-05,0.6,5.268477,8e-06
