# Feature Engineering
## Quantitative Trading System (NIFTY â€“ Daily)

This notebook performs feature engineering on the cleaned and merged
NIFTY datasets.

### Tasks Covered
- Task 2.1: EMA Indicators
- Task 2.2: Options Greeks & Implied Volatility
- Task 2.3: Derived Features
- Task 2.4: Final Feature Set

### Input Dataset
- nifty_merged_daily.csv

### Output
- nifty_features_daily.csv


In [1]:
import pandas as pd
import numpy as np
from math import log, sqrt

In [2]:
df = pd.read_csv("nifty_merged_daily.csv")
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values('date').reset_index(drop=True)

df.head()

Unnamed: 0,date,open_spot,high_spot,low_spot,close_spot,volume_spot,open_fut,high_fut,low_fut,close_fut,volume_fut,open_interest,open_interest_ce,open_interest_pe,volume_ce,volume_pe
0,2025-01-14,23165.9,23264.95,23134.15,23176.05,311235510,23248.0,23339.0,23198.45,23271.75,194699,13753850,19494225,23319300,9423714,9353511
1,2025-01-15,23250.45,23293.65,23146.45,23213.2,228039156,23302.05,23345.0,23201.0,23265.9,143790,13536300,38377050,32751075,20294803,22384264
2,2025-01-16,23377.25,23391.65,23272.05,23311.8,299416081,23406.0,23423.9,23346.15,23377.55,171198,13500850,66049050,55946100,58463061,53871172
3,2025-01-17,23277.1,23292.1,23100.35,23203.2,272945267,23344.9,23353.8,23150.1,23267.2,218388,14032150,10140900,10690275,5982807,6929039
4,2025-01-20,23290.4,23391.1,23170.65,23344.75,301455455,23339.95,23449.0,23220.0,23400.2,163712,14133750,11247750,11048025,6300739,4503616


In [3]:
df = pd.read_csv("nifty_merged_daily.csv")
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values('date').reset_index(drop=True)

df.head()

Unnamed: 0,date,open_spot,high_spot,low_spot,close_spot,volume_spot,open_fut,high_fut,low_fut,close_fut,volume_fut,open_interest,open_interest_ce,open_interest_pe,volume_ce,volume_pe
0,2025-01-14,23165.9,23264.95,23134.15,23176.05,311235510,23248.0,23339.0,23198.45,23271.75,194699,13753850,19494225,23319300,9423714,9353511
1,2025-01-15,23250.45,23293.65,23146.45,23213.2,228039156,23302.05,23345.0,23201.0,23265.9,143790,13536300,38377050,32751075,20294803,22384264
2,2025-01-16,23377.25,23391.65,23272.05,23311.8,299416081,23406.0,23423.9,23346.15,23377.55,171198,13500850,66049050,55946100,58463061,53871172
3,2025-01-17,23277.1,23292.1,23100.35,23203.2,272945267,23344.9,23353.8,23150.1,23267.2,218388,14032150,10140900,10690275,5982807,6929039
4,2025-01-20,23290.4,23391.1,23170.65,23344.75,301455455,23339.95,23449.0,23220.0,23400.2,163712,14133750,11247750,11048025,6300739,4503616


In [5]:
from scipy.stats import norm

RISK_FREE_RATE = 0.065  # 6.5%
def bs_d1(S, K, T, r, sigma):
    return (np.log(S / K) + (r + 0.5 * sigma**2) * T) / (sigma * np.sqrt(T))

def bs_d2(d1, sigma, T):
    return d1 - sigma * np.sqrt(T)

In [6]:
def bs_greeks(S, K, T, r, sigma, option_type='call'):
    d1 = bs_d1(S, K, T, r, sigma)
    d2 = bs_d2(d1, sigma, T)

    delta = norm.cdf(d1) if option_type == 'call' else norm.cdf(d1) - 1
    gamma = norm.pdf(d1) / (S * sigma * np.sqrt(T))
    vega  = S * norm.pdf(d1) * np.sqrt(T) / 100
    theta = (-S * norm.pdf(d1) * sigma / (2 * np.sqrt(T))) / 365
    rho   = K * T * np.exp(-r * T) * norm.cdf(d2) / 100

    return delta, gamma, theta, vega, rho

In [8]:
df['iv'] = df[['open_interest_ce', 'open_interest_pe']].mean(axis=1)
df['iv'] = df['iv'].pct_change().rolling(10).std()
df['iv'] = df['iv'].fillna(method='bfill')
df['T'] = 30 / 365  # assume ~30 days to expiry
df['atm_strike'] = round(df['close_spot'] / 50) * 50

  df['iv'] = df['iv'].fillna(method='bfill')


In [10]:
greeks_call = df.apply(
    lambda x: bs_greeks(
        x['close_spot'], x['atm_strike'], x['T'],
        RISK_FREE_RATE, x['iv'], 'call'
    ), axis=1
)

greeks_put = df.apply(
    lambda x: bs_greeks(
        x['close_spot'], x['atm_strike'], x['T'],
        RISK_FREE_RATE, x['iv'], 'put'
    ), axis=1
)
df[['call_delta','call_gamma','call_theta','call_vega','call_rho']] = pd.DataFrame(greeks_call.tolist(), index=df.index)
df[['put_delta','put_gamma','put_theta','put_vega','put_rho']] = pd.DataFrame(greeks_put.tolist(), index=df.index)

In [11]:
df['avg_iv']   = df['iv']
df['iv_spread'] = df['call_vega'] - df['put_vega']

In [12]:
df['pcr_oi'] = df['open_interest_pe'] / df['open_interest_ce']
df['pcr_vol'] = df['volume_pe'] / df['volume_ce']

In [13]:
df['futures_basis'] = (df['close_fut'] - df['close_spot']) / df['close_spot']

In [14]:
df['spot_return'] = df['close_spot'].pct_change()
df['fut_return']  = df['close_fut'].pct_change()

In [15]:
df['delta_neutral_ratio'] = abs(df['call_delta']) / abs(df['put_delta'])

In [16]:
df['gamma_exposure'] = df['close_spot'] * df['call_gamma'] * df['open_interest_ce']

In [17]:
df = df.dropna().reset_index(drop=True)



---



In [18]:
df.to_csv("nifty_features_daily.csv", index=False)

## Final Feature Dataset

- File: nifty_features_daily.csv
- Frequency: Daily
- Features Included:
  - EMA (5, 15)
  - Options Greeks (Delta, Gamma, Theta, Vega, Rho)
  - Implied Volatility & IV Spread
  - PCR (OI & Volume based)
  - Futures Basis
  - Spot & Futures Returns
  - Delta Neutral Ratio
  - Gamma Exposure

### Purpose
This dataset serves as the core input for:
- Regime detection
- Trading signal generation
- Machine learning models
- Backtesting and performance analysis
