# Raw Data Fetching and Exploration

This notebook demonstrates how to:
- Fetch minute-level data for SPY using Polygon.io API
- Fetch daily data for VIX using Yahoo Finance
- Load and validate CSV data
- Analyze calendar coverage and identify missing trading days


In [12]:
import sys
import os
from pathlib import Path
import importlib

# Get the project root directory (parent of notebooks/)
project_root = Path.cwd().parent if Path.cwd().name == 'notebooks' else Path.cwd()

# Add src directory to Python path
sys.path.insert(0, str(project_root / 'src'))

# Import and reload to pick up any code changes
try:
    from classes.data import loader as loader_module
except ImportError:
    # First time import
    import classes.data.loader as loader_module
else:
    # Module already imported, reload it
    importlib.reload(loader_module)

from classes.data.loader import DataLoader
from datetime import datetime, timedelta

# Initialize DataLoader
loader = DataLoader()

# Calculate date range for last 2 years
end_date = (datetime.now() - timedelta(days=1)).strftime('%Y-%m-%d')
start_date = (datetime.now() - timedelta(days=732)).strftime('%Y-%m-%d')

print(f"\n{'='*60}")
print(f"Fetching data from {start_date} to {end_date}")
print(f"Project root: {project_root}")
print(f"{'='*60}\n")



Fetching data from 2023-10-27 to 2025-10-27
Project root: c:\Users\simo0\Documents\GitHub\intraday-momentum



### 2. Fetch SPY Minute Data

Fetch 1-minute intraday data for SPY using Polygon.io API (last 2 years).


In [13]:
spy_df = loader.fetch_polygon_data('SPY', start_date, end_date, 'minute')
print(f"\nSPY DataFrame shape: {spy_df.shape}")
print(f"Columns: {spy_df.columns.tolist()}")
spy_df.head()


Fetched 50000 entries
Fetched 50000 entries
Fetched 50000 entries
Fetched 50000 entries
Fetched 50000 entries
Rate limit reached. Waiting 26.07 seconds...
Fetched 50000 entries
Fetched 50000 entries
Fetched 50000 entries
Fetched 1018 entries
Data fetching complete. Total entries: 194105
Data saved to: c:\Users\simo0\Documents\GitHub\intraday-momentum\data\raw\SPY_1min_20231027_20251027.csv

SPY DataFrame shape: (194105, 6)
Columns: ['volume', 'open', 'high', 'low', 'close', 'caldt']


Unnamed: 0,volume,open,high,low,close,caldt
0,630285.0,413.56,413.94,413.53,413.77,2023-10-30 09:30:00
1,322290.0,413.78,414.01,413.75,413.882,2023-10-30 09:31:00
2,455364.0,413.91,414.21,413.845,414.14,2023-10-30 09:32:00
3,269190.0,414.13,414.24,414.05,414.205,2023-10-30 09:33:00
4,330914.0,414.205,414.32,414.205,414.27,2023-10-30 09:34:00


### 3. Fetch VIX Daily Data

Fetch daily closing data for VIX using Yahoo Finance (last 2 years).


In [14]:
vix_df = loader.fetch_yahoo_data('^VIX', start_date, end_date, 'day')
print(f"\nVIX DataFrame shape: {vix_df.shape}")
print(f"Columns: {vix_df.columns.tolist()}")
vix_df.head()


Fetched 500 entries from Yahoo Finance
Data saved to: c:\Users\simo0\Documents\GitHub\intraday-momentum\data\raw\^VIX_1day_20231027_20251027.csv

VIX DataFrame shape: (500, 6)
Columns: ['volume', 'open', 'high', 'low', 'close', 'caldt']


Unnamed: 0,volume,open,high,low,close,caldt
0,0,20.389999,22.07,19.719999,21.27,2023-10-27 00:00:00-05:00
1,0,21.129999,21.16,19.549999,19.75,2023-10-30 00:00:00-05:00
2,0,19.860001,19.860001,17.969999,18.139999,2023-10-31 00:00:00-05:00
3,0,18.02,18.42,16.629999,16.870001,2023-11-01 00:00:00-05:00
4,0,16.59,16.620001,15.58,15.66,2023-11-02 00:00:00-05:00


### 4. Load CSV Data

Load previously saved CSV files to verify the data was saved correctly.


In [15]:
# Example: Load the SPY CSV file we just created
spy_file_path = project_root / 'data/raw/SPY_1min_20231027_20251027.csv'
vix_file_path = project_root / 'data/raw/^VIX_1day_20231027_20251027.csv'
spy_df = loader.load_csv(spy_file_path)
vix_df = loader.load_csv(vix_file_path)
print(f"Loaded CSV with shape: {spy_df.shape}")
spy_df.head()


Loaded CSV with shape: (194105, 6)


Unnamed: 0,volume,open,high,low,close,caldt
0,630285.0,413.56,413.94,413.53,413.77,2023-10-30 09:30:00
1,322290.0,413.78,414.01,413.75,413.882,2023-10-30 09:31:00
2,455364.0,413.91,414.21,413.845,414.14,2023-10-30 09:32:00
3,269190.0,414.13,414.24,414.05,414.205,2023-10-30 09:33:00
4,330914.0,414.205,414.32,414.205,414.27,2023-10-30 09:34:00


### 5. Validate Calendar Coverage

Analyze the calendar coverage to identify missing trading days and calculate coverage statistics.
The function only considers business days (weekdays) since markets don't trade on weekends.


In [16]:
# Validate SPY calendar coverage
spy_calendar = loader.validate_calendar(spy_df)
print("SPY Calendar Coverage:")
print(f"Total days: {spy_calendar['total_days']}")
print(f"Expected days: {spy_calendar['expected_days']}")
print(f"Coverage: {spy_calendar['coverage_percentage']}%")
print(f"Date range: {spy_calendar['date_range']}")
print(f"Missing dates: {spy_calendar['missing_dates_count']}")
print(f"Weekdays: {spy_calendar['weekday_count']}, Weekends: {spy_calendar['weekend_count']}")
if spy_calendar['missing_dates']:
    print(f"First 10 missing: {spy_calendar['missing_dates'][:10]}")


SPY Calendar Coverage:
Total days: 500
Expected days: 521
Coverage: 95.97%
Date range: (datetime.date(2023, 10, 30), datetime.date(2025, 10, 27))
Missing dates: 21
Weekdays: 500, Weekends: 0
First 10 missing: [datetime.date(2023, 11, 23), datetime.date(2023, 12, 25), datetime.date(2024, 1, 1), datetime.date(2024, 1, 15), datetime.date(2024, 2, 19), datetime.date(2024, 3, 29), datetime.date(2024, 5, 27), datetime.date(2024, 6, 19), datetime.date(2024, 7, 4), datetime.date(2024, 9, 2)]


In [17]:
# Validate VIX calendar coverage
vix_calendar = loader.validate_calendar(vix_df)
print("VIX Calendar Coverage:")
print(f"Total days: {vix_calendar['total_days']}")
print(f"Expected days: {vix_calendar['expected_days']}")
print(f"Coverage: {vix_calendar['coverage_percentage']}%")
print(f"Date range: {vix_calendar['date_range']}")
print(f"Missing dates: {vix_calendar['missing_dates_count']}")
print(f"Weekdays: {vix_calendar['weekday_count']}, Weekends: {vix_calendar['weekend_count']}")
if vix_calendar['missing_dates']:
    print(f"First 10 missing: {vix_calendar['missing_dates'][:10]}")


VIX Calendar Coverage:
Total days: 500
Expected days: 521
Coverage: 95.97%
Date range: (datetime.date(2023, 10, 27), datetime.date(2025, 10, 24))
Missing dates: 21
Weekdays: 500, Weekends: 0
First 10 missing: [datetime.date(2023, 11, 23), datetime.date(2023, 12, 25), datetime.date(2024, 1, 1), datetime.date(2024, 1, 15), datetime.date(2024, 2, 19), datetime.date(2024, 3, 29), datetime.date(2024, 5, 27), datetime.date(2024, 6, 19), datetime.date(2024, 7, 4), datetime.date(2024, 9, 2)]
