# Feature Engineering

In this notebook, technical, options-based, and derived features are created from the cleaned datasets.

The objective is to build a compact and interpretable feature set that can be used for regime detection, trading strategy logic, and machine learning models.


In [1]:
import pandas as pd
import numpy as np


In [2]:
spot = pd.read_csv("../data/nifty_spot_5min.csv")
futures = pd.read_csv("../data/nifty_futures_5min.csv")
options = pd.read_csv("../data/nifty_options_5min.csv")

spot["timestamp"] = pd.to_datetime(spot["timestamp"])
futures["timestamp"] = pd.to_datetime(futures["timestamp"])
options["timestamp"] = pd.to_datetime(options["timestamp"])


## Exponential Moving Averages (EMA)

EMA indicators are used as the primary trading signals in the strategy.
A short-term EMA (5) and a medium-term EMA (15) are calculated on spot prices.


In [3]:
spot["ema_5"] = spot["close"].ewm(span=5, adjust=False).mean()
spot["ema_15"] = spot["close"].ewm(span=15, adjust=False).mean()


## Returns

Log returns are calculated to capture short-term price movements.


In [4]:
spot["spot_return"] = np.log(spot["close"] / spot["close"].shift(1))
futures["futures_return"] = np.log(futures["close"] / futures["close"].shift(1))


## Options-Based Features

Options data is aggregated at each timestamp to compute implied volatility and put-call ratios.
Only ATM Call and Put options are considered.


In [5]:
# Separate Call and Put
calls = options[options["option_type"] == "CE"]
puts = options[options["option_type"] == "PE"]

# Merge CE & PE on timestamp
opt_merged = pd.merge(
    calls,
    puts,
    on="timestamp",
    suffixes=("_call", "_put")
)

# Average IV & IV Spread
opt_merged["avg_iv"] = (opt_merged["iv_call"] + opt_merged["iv_put"]) / 2
opt_merged["iv_spread"] = opt_merged["iv_call"] - opt_merged["iv_put"]

# PCR
opt_merged["pcr_oi"] = opt_merged["open_interest_put"] / opt_merged["open_interest_call"]
opt_merged["pcr_volume"] = opt_merged["volume_put"] / opt_merged["volume_call"]


## Futures Basis

Futures basis represents the relative difference between futures and spot prices.


In [6]:
merged_sf = pd.merge(
    spot[["timestamp", "close"]],
    futures[["timestamp", "close"]],
    on="timestamp",
    suffixes=("_spot", "_futures")
)

merged_sf["futures_basis"] = (
    merged_sf["close_futures"] - merged_sf["close_spot"]
) / merged_sf["close_spot"]


## Final Feature Dataset

All engineered features are merged into a single dataset for further analysis.


In [7]:
features = spot.merge(
    merged_sf[["timestamp", "futures_basis"]],
    on="timestamp",
    how="left"
).merge(
    opt_merged[[
        "timestamp",
        "avg_iv",
        "iv_spread",
        "pcr_oi",
        "pcr_volume"
    ]],
    on="timestamp",
    how="left"
)

features.dropna(inplace=True)
features.head()


Unnamed: 0,timestamp,open,high,low,close,volume,ema_5,ema_15,spot_return,futures_basis,avg_iv,iv_spread,pcr_oi,pcr_volume
1,2015-01-09 09:20:00,8300.5,8303.0,8293.25,8301.0,0,8301.133333,8301.175,-2.4e-05,0.0005,0.209528,-0.00166,1.215374,0.94108
2,2015-01-09 09:25:00,8301.65,8302.55,8286.8,8294.15,0,8298.805556,8300.296875,-0.000826,0.0005,0.187823,-0.012707,3.066087,0.621079
3,2015-01-09 09:30:00,8294.1,8295.75,8280.65,8288.5,0,8295.37037,8298.822266,-0.000681,0.0005,0.191463,-0.024234,0.516905,0.395105
4,2015-01-09 09:35:00,8289.1,8290.45,8278.0,8283.45,0,8291.396914,8296.900732,-0.000609,0.0005,0.173081,0.045038,2.092148,4.619883
5,2015-01-09 09:40:00,8283.4,8288.3,8277.4,8285.55,0,8289.447942,8295.481891,0.000253,0.0005,0.225163,-0.020501,2.222767,0.645333


In [8]:
features.to_csv("../data/nifty_features_5min.csv", index=False)


## Feature Engineering Notes

- A compact feature set is intentionally used to reduce noise.
- Options features are aggregated at the timestamp level.
- EMA indicators are reserved exclusively for trading signals.
- Features are designed to be interpretable and interview-friendly.
