## Data Sources & Assumptions

Due to the complexity and access limitations of real-time broker APIs for historical
options data, the following approach is used:

- NIFTY Spot data is sourced from publicly available historical datasets.
- NIFTY Futures data is sourced from historical futures datasets or proxy instruments.
- Options Greeks and derived options-based features will be calculated later using
  theoretical pricing models (Black–Scholes), not raw options tick data.

This approach is suitable for research, backtesting, and academic evaluation purposes.


## Objective

The objective of this notebook is to:
- Load historical 5-minute data for NIFTY Spot and NIFTY Futures
- Prepare raw datasets for further cleaning and feature engineering
- Store raw data files in the data/raw directory


# Data Acquisition – Quantitative Trading Strategy

This notebook handles the acquisition and initial loading of 5-minute interval data
required for the quantitative trading strategy.



In [2]:
import pandas as pd
import numpy as np
import os

pd.set_option("display.max_columns", None)

# path
RAW_DATA_PATH = os.path.join("..", "data", "raw")

# Create directory (if it does not exist)
os.makedirs(RAW_DATA_PATH, exist_ok=True)

print("Raw data directory ready at:", RAW_DATA_PATH)


Raw data directory ready at: ..\data\raw


In [3]:
# File paths (placeholders for now)
spot_file_path = os.path.join(RAW_DATA_PATH, "nifty_spot_5min.csv")
futures_file_path = os.path.join(RAW_DATA_PATH, "nifty_futures_5min.csv")

# Check if files exist
if not os.path.exists(spot_file_path):
    print("NIFTY Spot CSV not found:", spot_file_path)

if not os.path.exists(futures_file_path):
    print("NIFTY Futures CSV not found:", futures_file_path)

# Load CSVs if they exist
if os.path.exists(spot_file_path):
    spot_df = pd.read_csv(spot_file_path)
    print("Spot Data Preview:")
    display(spot_df.head())

if os.path.exists(futures_file_path):
    futures_df = pd.read_csv(futures_file_path)
    print("Futures Data Preview:")
    display(futures_df.head())


Spot Data Preview:


Unnamed: 0,date,close,high,low,open,volume
0,2015-01-09 09:15:00+05:30,8301.2,8301.3,8285.45,8285.45,0
1,2015-01-09 09:20:00+05:30,8301.0,8303.0,8293.25,8300.5,0
2,2015-01-09 09:25:00+05:30,8294.15,8302.55,8286.8,8301.65,0
3,2015-01-09 09:30:00+05:30,8288.5,8295.75,8280.65,8294.1,0
4,2015-01-09 09:35:00+05:30,8283.45,8290.45,8278.0,8289.1,0


Futures Data Preview:


Unnamed: 0,date,close,high,low,open,volume
0,2015-01-09 09:15:00+05:30,8301.2,8301.3,8285.45,8285.45,0
1,2015-01-09 09:20:00+05:30,8301.0,8303.0,8293.25,8300.5,0
2,2015-01-09 09:25:00+05:30,8294.15,8302.55,8286.8,8301.65,0
3,2015-01-09 09:30:00+05:30,8288.5,8295.75,8280.65,8294.1,0
4,2015-01-09 09:35:00+05:30,8283.45,8290.45,8278.0,8289.1,0


## Data Scope Clarification

While the problem statement mentions historical options chain data, this project focuses on
a research-oriented and explainable pipeline.

Due to practical limitations in accessing granular historical options data without paid APIs,
options-related features (IV, Greeks, PCR) are derived using theoretical pricing models and
spot/futures data.

This approach enables robust feature engineering, regime detection, and ML-based filtering
while maintaining transparency and reproducibility.
