# 4. Market Features Engineering

Create engineered features from market data and upload to Hopsworks.

**Pipeline**: Hopsworks FGs (raw) → Feature Engineering → Hopsworks FGs (engineered)

**Features Created**:
- **QQQ Technical**: Returns, volatility, RSI, moving average ratios
- **XLK Sector**: Sector returns, rolling correlation with QQQ
- **VIX Volatility**: VIX levels, changes, rolling statistics

**No Look-Ahead Bias**: All features use only past data (rolling windows, lagged returns)

In [None]:
import sys
sys.path.append('..')

import pandas as pd
from utils.feature_functions import (
    calculate_returns, calculate_rolling_volatility, calculate_rsi,
    calculate_ma_ratios, calculate_rolling_correlation
)
from utils.hopsworks_helpers import get_feature_store, create_feature_group
from dotenv import load_dotenv
import yaml

load_dotenv()

# Load config
with open('../config/config.yaml', 'r') as f:
    config = yaml.safe_load(f)

## Connect to Hopsworks and Load Raw Data

In [None]:
# Connect to Hopsworks
print("Connecting to Hopsworks...")
fs = get_feature_store()
print(f"✓ Connected to feature store: {fs.name}")

In [None]:
# Load raw market data from Hopsworks
print("\nLoading raw market data from Hopsworks...")
qqq_fg = fs.get_feature_group('qqq_raw', version=1)
xlk_fg = fs.get_feature_group('xlk_raw', version=1)
vix_fg = fs.get_feature_group('vix_raw', version=1)

qqq_df = qqq_fg.read()
xlk_df = xlk_fg.read()
vix_df = vix_fg.read()

print(f"✓ QQQ data: {qqq_df.shape}")
print(f"✓ XLK data: {xlk_df.shape}")
print(f"✓ VIX data: {vix_df.shape}")

In [None]:
# Verify data alignment
print("\nDate ranges:")
print(f"QQQ: {qqq_df['date'].min()} to {qqq_df['date'].max()}")
print(f"XLK: {xlk_df['date'].min()} to {xlk_df['date'].max()}")
print(f"VIX: {vix_df['date'].min()} to {vix_df['date'].max()}")

## QQQ Technical Features

Create technical indicators for QQQ ETF:
- **Returns**: Multi-period returns (1d, 2d, 3d, 5d)
- **Volatility**: Rolling standard deviation of returns (5d, 10d, 20d)
- **RSI**: Relative Strength Index (14-day)
- **MA Ratios**: Price / Moving Average ratios (10d, 20d, 50d)

In [None]:
print("Creating QQQ technical features...")

# Start with base data
qqq_features = qqq_df.copy()

# Calculate returns
qqq_features = calculate_returns(
    qqq_features, 
    price_col='qqq_close',
    periods=config['features']['technical']['return_lags']
)

# Calculate rolling volatility
qqq_features = calculate_rolling_volatility(
    qqq_features,
    price_col='qqq_close',
    windows=config['features']['technical']['volatility_windows']
)

# Calculate RSI
qqq_features = calculate_rsi(
    qqq_features,
    price_col='qqq_close',
    period=config['features']['technical']['rsi_period']
)

# Calculate MA ratios
qqq_features = calculate_ma_ratios(
    qqq_features,
    price_col='qqq_close',
    periods=config['features']['technical']['ma_periods']
)

print(f"\nQQQ features shape: {qqq_features.shape}")
print(f"Feature columns added: {[col for col in qqq_features.columns if col.startswith(('return_', 'volatility_', 'rsi_', 'ma_ratio_'))]}")

In [None]:
# Preview QQQ features
feature_cols = ['date'] + [col for col in qqq_features.columns if col.startswith(('return_', 'volatility_', 'rsi_', 'ma_ratio_'))]
qqq_features[feature_cols].head(20)

In [None]:
# Check for missing values
print("\nMissing values in QQQ features:")
print(qqq_features[feature_cols].isnull().sum())
print(f"\nRows with any NaN: {qqq_features[feature_cols].isnull().any(axis=1).sum()} (expected due to rolling windows)")

## XLK Sector Features

Create sector-specific features:
- **XLK Returns**: Technology sector returns
- **Rolling Correlation**: QQQ-XLK correlation (20d, 60d windows)

In [None]:
print("Creating XLK sector features...")

# Calculate XLK returns
xlk_features = calculate_returns(
    xlk_df,
    price_col='xlk_close',
    periods=[1, 5]  # 1-day and 5-day returns
)

# Calculate rolling correlation between QQQ and XLK
correlation_df = calculate_rolling_correlation(
    qqq_df,
    xlk_df,
    col1='qqq_close',
    col2='xlk_close',
    windows=config['features']['sector']['correlation_windows']
)

# Merge XLK features with correlation
xlk_features = xlk_features.merge(correlation_df, on='date', how='left')

print(f"\nXLK features shape: {xlk_features.shape}")
print(f"Feature columns: {[col for col in xlk_features.columns if col.startswith(('return_', 'corr_'))]}")

In [None]:
# Preview XLK features
xlk_cols = ['date'] + [col for col in xlk_features.columns if col.startswith(('return_', 'corr_'))]
xlk_features[xlk_cols].head(20)

In [None]:
# Summary statistics for XLK correlation
print("\nXLK-QQQ Correlation Statistics:")
xlk_features[[col for col in xlk_features.columns if col.startswith('corr_')]].describe()

## VIX Volatility Features

Create volatility regime features:
- **VIX Close**: Current volatility level
- **VIX Change**: Absolute and percentage changes
- **VIX Rolling Stats**: 20-day moving average and standard deviation

In [None]:
print("Creating VIX volatility features...")

# Calculate VIX changes and rolling stats
vix_features = vix_df.copy()
vix_features['vix_change'] = vix_features['vix_close'].diff()
vix_features['vix_pct_change'] = vix_features['vix_close'].pct_change()

# Rolling statistics
vix_features['vix_ma_20'] = vix_features['vix_close'].rolling(20).mean()
vix_features['vix_std_20'] = vix_features['vix_close'].rolling(20).std()

# VIX relative to moving average (regime indicator)
vix_features['vix_ma_ratio'] = vix_features['vix_close'] / vix_features['vix_ma_20']

print(f"\nVIX features shape: {vix_features.shape}")
print(f"Feature columns: {[col for col in vix_features.columns if col.startswith('vix_') and col != 'vix_open' and col != 'vix_high' and col != 'vix_low' and col != 'vix_volume']}")

In [None]:
# Preview VIX features
vix_cols = ['date', 'vix_close', 'vix_change', 'vix_pct_change', 'vix_ma_20', 'vix_std_20', 'vix_ma_ratio']
vix_features[vix_cols].head(20)

In [None]:
# VIX regime analysis
print("\nVIX Regime Analysis:")
print(f"VIX range: {vix_features['vix_close'].min():.2f} to {vix_features['vix_close'].max():.2f}")
print(f"VIX mean: {vix_features['vix_close'].mean():.2f}")
print(f"\nDays with VIX > 30 (high volatility): {(vix_features['vix_close'] > 30).sum()}")
print(f"Days with VIX < 15 (low volatility): {(vix_features['vix_close'] < 15).sum()}")

## Visualize Key Features

In [None]:
import matplotlib.pyplot as plt

fig, axes = plt.subplots(3, 2, figsize=(15, 12))

# QQQ returns
axes[0, 0].plot(qqq_features['date'], qqq_features['return_1d'])
axes[0, 0].set_title('QQQ 1-Day Returns')
axes[0, 0].set_ylabel('Return')
axes[0, 0].grid(True, alpha=0.3)

# QQQ volatility
axes[0, 1].plot(qqq_features['date'], qqq_features['volatility_20d'])
axes[0, 1].set_title('QQQ 20-Day Volatility')
axes[0, 1].set_ylabel('Volatility')
axes[0, 1].grid(True, alpha=0.3)

# RSI
axes[1, 0].plot(qqq_features['date'], qqq_features['rsi_14'])
axes[1, 0].axhline(y=70, color='r', linestyle='--', alpha=0.5, label='Overbought')
axes[1, 0].axhline(y=30, color='g', linestyle='--', alpha=0.5, label='Oversold')
axes[1, 0].set_title('QQQ RSI (14-day)')
axes[1, 0].set_ylabel('RSI')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# QQQ-XLK correlation
axes[1, 1].plot(xlk_features['date'], xlk_features['corr_20d'])
axes[1, 1].set_title('QQQ-XLK Rolling Correlation (20d)')
axes[1, 1].set_ylabel('Correlation')
axes[1, 1].grid(True, alpha=0.3)

# VIX level
axes[2, 0].plot(vix_features['date'], vix_features['vix_close'], label='VIX')
axes[2, 0].plot(vix_features['date'], vix_features['vix_ma_20'], label='20-day MA', alpha=0.7)
axes[2, 0].axhline(y=30, color='r', linestyle='--', alpha=0.5)
axes[2, 0].set_title('VIX Level')
axes[2, 0].set_ylabel('VIX')
axes[2, 0].legend()
axes[2, 0].grid(True, alpha=0.3)

# VIX regime indicator
axes[2, 1].plot(vix_features['date'], vix_features['vix_ma_ratio'])
axes[2, 1].axhline(y=1.0, color='k', linestyle='-', alpha=0.3)
axes[2, 1].set_title('VIX / MA Ratio (Regime Indicator)')
axes[2, 1].set_ylabel('Ratio')
axes[2, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n✓ Feature distributions look reasonable")

## Upload Market Features to Hopsworks

In [None]:
# Select relevant columns for feature groups
print("\nPreparing feature groups for upload...")

# QQQ technical features only (exclude OHLCV)
qqq_cols = ['date'] + [col for col in qqq_features.columns if col.startswith(('return_', 'volatility_', 'rsi_', 'ma_ratio_'))]
qqq_features_final = qqq_features[qqq_cols].copy()

# XLK sector features only
xlk_cols = ['date'] + [col for col in xlk_features.columns if col.startswith(('return_', 'corr_'))]
xlk_features_final = xlk_features[xlk_cols].copy()

# VIX volatility features
vix_cols = ['date', 'vix_close', 'vix_change', 'vix_pct_change', 'vix_ma_20', 'vix_std_20', 'vix_ma_ratio']
vix_features_final = vix_features[vix_cols].copy()

print(f"QQQ features to upload: {len(qqq_cols)-1} columns, {qqq_features_final.shape[0]} rows")
print(f"XLK features to upload: {len(xlk_cols)-1} columns, {xlk_features_final.shape[0]} rows")
print(f"VIX features to upload: {len(vix_cols)-1} columns, {vix_features_final.shape[0]} rows")

In [None]:
# Upload QQQ technical features
print("\nUploading QQQ technical features to Hopsworks...")
qqq_fg = create_feature_group(
    fs,
    name='qqq_technical_features',
    df=qqq_features_final,
    primary_key=['date'],
    description='QQQ technical indicators: returns, volatility, RSI, MA ratios'
)
print(f"✓ Created feature group: qqq_technical_features (version {qqq_fg.version})")

In [None]:
# Upload XLK sector features
print("\nUploading XLK sector features to Hopsworks...")
xlk_fg = create_feature_group(
    fs,
    name='xlk_sector_features',
    df=xlk_features_final,
    primary_key=['date'],
    description='XLK sector features: returns and rolling correlation with QQQ'
)
print(f"✓ Created feature group: xlk_sector_features (version {xlk_fg.version})")

In [None]:
# Upload VIX volatility features
print("\nUploading VIX volatility features to Hopsworks...")
vix_fg = create_feature_group(
    fs,
    name='vix_volatility_features',
    df=vix_features_final,
    primary_key=['date'],
    description='VIX volatility features: close, change, rolling stats, regime indicator'
)
print(f"✓ Created feature group: vix_volatility_features (version {vix_fg.version})")

## Summary

**✅ Market features created and uploaded to Hopsworks**:

**Feature Group 1**: `qqq_technical_features`
- Returns: 1d, 2d, 3d, 5d
- Volatility: 5d, 10d, 20d rolling std
- RSI: 14-day relative strength index
- MA Ratios: 10d, 20d, 50d price/MA ratios

**Feature Group 2**: `xlk_sector_features`
- XLK returns: 1d, 5d
- QQQ-XLK correlation: 20d, 60d rolling windows

**Feature Group 3**: `vix_volatility_features`
- VIX close: Current volatility level
- VIX changes: Absolute and percentage
- VIX MA: 20-day moving average
- VIX std: 20-day rolling standard deviation
- VIX regime: VIX / MA ratio

**No look-ahead bias**:
- All features use only past data (rolling windows, lagged calculations)
- RSI and moving averages calculated with historical data only
- Correlation windows look backward only

**Next steps**:
- Notebook 6: Create Feature View to join all feature groups (QQQ technical, XLK sector, VIX volatility, macro features)
- Notebook 7: Train models with time-series splits