## Feature Engineering_Trend

## Features

- not going to include sma/ema here as they will be done as part of noise supression
- trying to choose a blend of options that help us identify new trends, confirm existing trends and maybe suggest trend reversals.
- Moving average convergence/divergence oscillator
- Hull moving average 
- Keltner channels
- Detrended price oscillator

__Load data__

__Data Source:__ lob_sample_data.parquet

In [1]:
import pandas as pd

df = pd.read_parquet('lob_sample_data.parquet', engine='pyarrow')

In [2]:
df.head()

Unnamed: 0,Timestamp,Exchange,Bid,Ask,Date,Mid_Price
0,0.0,Exch0,[],[],2025-01-02,
1,0.279,Exch0,"[[1, 6]]",[],2025-01-02,
2,1.333,Exch0,"[[1, 6]]","[[800, 1]]",2025-01-02,400.5
3,1.581,Exch0,"[[1, 6]]","[[799, 1]]",2025-01-02,400.0
4,1.643,Exch0,"[[1, 6]]","[[798, 1]]",2025-01-02,399.5


In [3]:
import ast

#convert string to lists
df['Bid'] = df['Bid'].apply(ast.literal_eval)
df['Ask'] = df['Ask'].apply(ast.literal_eval)

In [4]:
#drop missing rows in mid price
df = df.dropna(subset=['Mid_Price'])

__Moving average convergence/divergence oscillator__

MACD is a trend-following momentum indicator that shows the relationship between two moving averages of a security’s price.

$$
\text{MACD} = EMA_{12}(\text{price}) - EMA_{26}(\text{price})
$$
$$
\text{Signal Line} = EMA_{9}(\text{MACD})
$$

where
- $EMA_{12}$ and $EMA_{26}$ are the exponential moving averages for 12 and 26 periods, respectively.
- The Signal Line is the exponential moving average of the MACD itself.

In [5]:
#calc macdd and signal

#12 and 26 industry standards
ema_12 = df['Mid_Price'].ewm(span=12, adjust=False).mean()
ema_26 = df['Mid_Price'].ewm(span=26, adjust=False).mean()

df['MACD'] = ema_12 - ema_26
df['MACD_Signal'] = df['MACD'].ewm(span=9, adjust=False).mean()

__Hull moving average__

Hull ma reduces the lag of traditional moving averages with improved smoothing and responsiveness.

Calc a weighted moving average with period half the length of the Hull MA, then calculate a WMA for the full period of the Hull MA and subtract it from the first WMA calculation, and finally, calculate a WMA of the result with a period the square root of the Hull MA length.

$$
\text{Hull MA} = WMA(2 * \text{WMA}(n/2) - \text{WMA}(n), \sqrt{n})
$$

where
- $WMA(n)$ is the weighted moving average over $n$ periods.
- $\sqrt{n}$ is the square root of the period $n$.

In [7]:
#calc
import numpy as np

hull_ma_period = int(np.sqrt(9)) #9 isn't industry standard just suggested starting point- may want to experiment here

wma_9 = df['Mid_Price'].rolling(window=9).apply(lambda x: np.dot(x, np.arange(1, 10)) / np.sum(
    np.arange(1, 10)), raw=True)

df['Hull_MA'] = wma_9.rolling(window=hull_ma_period).mean()

__Keltner channels__

Keltner channels are volatility based envelopes set above and below an exponential moving average.

$$
\text{Middle Line} = EMA_{20}(\text{price})
$$
$$
\text{Upper Channel Line} = \text{Middle Line} + 2 \times ATR_{20}
$$
$$
\text{Lower Channel Line} = \text{Middle Line} - 2 \times ATR_{20}
$$

where
- $EMA_{20}$ is the 20-period exponential moving average.
- $ATR_{20}$ is the 20-period average true range.


In [9]:
#calc
ema_20 = df['Mid_Price'].ewm(span=20, adjust=False).mean()

atr = df['Mid_Price'].rolling(window=20).apply(lambda x: np.max(x) - np.min(x), raw=True)

df['Keltner_Channel_Middle'] = ema_20
df['Keltner_Channel_Upper'] = ema_20 + 2 * atr
df['Keltner_Channel_Lower'] = ema_20 - 2 * atr

__Detrended price oscillator__

Detrended price oscillator is an indicator designed to remove the trend from price and allow the measurement of the length and magnitude of price cycles from peak to peak or trough to trough.

DPO is calculated by subtracting the displaced moving average from the price \( \frac{lookback\ period}{2} + 1 \) periods ago.

$$
\text{DPO} = P_{t - \left(\frac{\text{lookback period}}{2} + 1\right)} - SMA_{t - \left(\frac{\text{lookback period}}{2}\right)}
$$

where
- \( P_{t} \) is the price at time \( t \).
- \( SMA \) is the simple moving average over the lookback period.
- The lookback period is the number of periods used to calculate the SMA and displace it.

By removing trends from the price data, the DPO helps to identify cycles and overbought or oversold conditions.

In [10]:
#efine lookback period
lookback_period = 20

#calc sma
df['SMA_20'] = df['Mid_Price'].rolling(window=lookback_period).mean() #may already have this from noise suppression?

#calc detrended 
#shift sma backwards by (lookback_period / 2 + 1) periods
df['DPO'] = df['Mid_Price'].shift(int(lookback_period / 2 + 1)) - df['SMA_20'].shift(int(lookback_period / 2))