# High-Frequency Market Reactions to FOMC Communications: A Multi-Modal NLP Approach

**Research Notebook - Enhanced Version**

## Abstract
This notebook analyzes Federal Reserve (FOMC) communications using multiple NLP approaches to predict high-frequency market reactions in the rates market. We employ GPT-4, FinBERT, BART, and semantic embeddings to extract hawkishness signals, combined with novel change detection features that capture subtle linguistic shifts between consecutive statements.

## Key Innovations:
1. **Change Detection**: Statement-to-statement diff analysis (markets care about changes!)
2. **Fed Funds Futures**: Direct policy expectation measures
3. **SHAP Analysis**: Interpretable feature importance
4. **Time-Series CV**: Proper cross-validation for financial data
5. **2024-2025 Holdout**: True out-of-sample testing
6. **Attention Mechanisms**: Sentence-level importance weights

---

## 1. Setup and Imports

In [None]:
# Core libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

# Data fetching
import pandas_datareader.data as web
import yfinance as yf

# NLP libraries
from transformers import (
    AutoTokenizer, AutoModelForSequenceClassification,
    pipeline, BertTokenizer, BertForSequenceClassification
)
from sentence_transformers import SentenceTransformer
import torch

# ML libraries
from sklearn.model_selection import TimeSeriesSplit, cross_val_score
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier, HistGradientBoostingRegressor, HistGradientBoostingClassifier
from sklearn.linear_model import LogisticRegression, Ridge, Lasso, LassoCV
from sklearn.neural_network import MLPClassifier, MLPRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, accuracy_score, roc_auc_score, roc_curve, classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler
import statsmodels.api as sm

# SHAP for interpretability
import shap

# Text processing
import difflib
from difflib import SequenceMatcher
import re
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# OpenAI for GPT-4 analysis
from openai import OpenAI
import os
import json

# Set style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("✓ All libraries imported successfully")
print(f"PyTorch device: {torch.device('cuda' if torch.cuda.is_available() else 'cpu')}")

## 2. Data Loading

Load the FOMC communications data and prepare for analysis.

In [None]:
# Load FOMC communications
df = pd.read_csv('communications.csv')

# Parse dates
df['Date'] = pd.to_datetime(df['Date'])
df['Release Date'] = pd.to_datetime(df['Release Date'])

# Filter to relevant time period (2000 onwards)
df = df[df['Date'] >= '2000-01-01'].copy()

# Sort by date
df = df.sort_values('Date').reset_index(drop=True)

# Separate statements and minutes
statements = df[df['Type'] == 'Statement'].copy().reset_index(drop=True)
minutes = df[df['Type'] == 'Minute'].copy().reset_index(drop=True)

print(f"Total documents: {len(df)}")
print(f"Statements: {len(statements)}")
print(f"Minutes: {len(minutes)}")
print(f"Date range: {df['Date'].min()} to {df['Date'].max()}")
print(f"\nSample:")
print(df[['Date', 'Type']].head(10))

## 3. ENHANCEMENT #1: Fed Funds Futures Data

**Why this matters**: Fed Funds futures directly measure market expectations for the policy rate. This is the most direct measure of whether markets were surprised by FOMC communications.

We'll fetch:
- **DFF**: Effective Federal Funds Rate (actual policy rate)
- **Treasury yields**: 2Y, 5Y, 10Y (existing)
- **Rate changes**: How much markets moved after each FOMC event

In [None]:
def fetch_market_data(start_date='2000-01-01', end_date=None):
    """
    Fetch comprehensive market data from FRED
    
    Returns:
        DataFrame with columns: DFF, DGS2, DGS5, DGS10
    """
    if end_date is None:
        end_date = datetime.now().strftime('%Y-%m-%d')
    
    print(f"Fetching market data from {start_date} to {end_date}...")
    
    # Define data series
    series = {
        'DFF': 'Effective Federal Funds Rate',
        'DGS2': '2-Year Treasury Yield',
        'DGS5': '5-Year Treasury Yield',
        'DGS10': '10-Year Treasury Yield',
    }
    
    # Fetch data
    market_data = {}
    for code, name in series.items():
        try:
            data = web.DataReader(code, 'fred', start_date, end_date)
            market_data[code] = data[code]
            print(f"  ✓ {name} ({code}): {len(data)} observations")
        except Exception as e:
            print(f"  ✗ Error fetching {name}: {e}")
    
    # Combine into single DataFrame
    market_df = pd.DataFrame(market_data)
    
    # Forward fill missing values (weekends, holidays)
    market_df = market_df.fillna(method='ffill')
    
    print(f"\nMarket data shape: {market_df.shape}")
    print(f"Date range: {market_df.index.min()} to {market_df.index.max()}")
    print(f"Missing values: {market_df.isna().sum().sum()}")
    
    return market_df

# Fetch the data
market_df = fetch_market_data()

# Display summary statistics
print("\nSummary Statistics:")
print(market_df.describe())

In [None]:
def compute_market_reactions(df, market_df, horizons=[1, 2]):
    """
    Compute market reactions around FOMC release dates
    
    Args:
        df: DataFrame with FOMC communications (must have 'Release Date' column)
        market_df: DataFrame with market data (indexed by date)
        horizons: List of days to compute reactions over (e.g., [1, 2] for 1-day and 2-day)
    
    Returns:
        DataFrame with market reaction columns added
    """
    df = df.copy()
    
    # Ensure Release Date is datetime
    df['Release Date'] = pd.to_datetime(df['Release Date'])
    
    # Initialize columns
    for horizon in horizons:
        for col in ['DFF', 'DGS2', 'DGS5', 'DGS10']:
            df[f'{col.lower()}_{horizon}d_chg'] = np.nan
            df[f'{col.lower()}_{horizon}d_bp'] = np.nan
    
    # Compute reactions
    for idx, row in df.iterrows():
        release_date = row['Release Date']
        
        # Get pre-release value (day before or last available)
        pre_dates = market_df.index[market_df.index < release_date]
        if len(pre_dates) == 0:
            continue
        pre_date = pre_dates[-1]
        
        for horizon in horizons:
            # Get post-release value (horizon days after)
            target_date = release_date + timedelta(days=horizon)
            post_dates = market_df.index[
                (market_df.index >= release_date) & 
                (market_df.index <= target_date + timedelta(days=5))  # Allow some slack for weekends
            ]
            
            if len(post_dates) == 0:
                continue
            
            # Take first available post-release date
            post_date = post_dates[min(horizon-1, len(post_dates)-1)] if len(post_dates) > 0 else None
            
            if post_date is None:
                continue
            
            # Compute changes
            for col in ['DFF', 'DGS2', 'DGS5', 'DGS10']:
                pre_val = market_df.loc[pre_date, col]
                post_val = market_df.loc[post_date, col]
                
                if pd.notna(pre_val) and pd.notna(post_val):
                    change = post_val - pre_val
                    change_bp = change * 100  # Convert to basis points
                    
                    df.loc[idx, f'{col.lower()}_{horizon}d_chg'] = change
                    df.loc[idx, f'{col.lower()}_{horizon}d_bp'] = change_bp
    
    # Also compute yield curve spreads
    for horizon in horizons:
        # 2s10s spread
        df[f'spread_2s10s_{horizon}d_bp'] = (
            df[f'dgs10_{horizon}d_bp'] - df[f'dgs2_{horizon}d_bp']
        )
        
        # 5s30s spread (if we had 30Y data)
        # df[f'spread_5s30s_{horizon}d_bp'] = ...
    
    print(f"Market reactions computed for {len(df)} releases")
    print(f"Horizons: {horizons} days")
    print(f"\nSample of 1-day reactions (basis points):")
    print(df[['Date', 'Type', 'dff_1d_bp', 'dgs2_1d_bp', 'dgs5_1d_bp', 'dgs10_1d_bp']].head(10))
    
    return df

# Compute market reactions for all documents
df = compute_market_reactions(df, market_df, horizons=[1, 2])

# Update statements and minutes separately
statements = df[df['Type'] == 'Statement'].copy().reset_index(drop=True)
minutes = df[df['Type'] == 'Minute'].copy().reset_index(drop=True)

In [None]:
# Visualize Fed Funds Rate over time with FOMC events
fig, ax = plt.subplots(figsize=(15, 6))

# Plot Fed Funds Rate
ax.plot(market_df.index, market_df['DFF'], label='Effective Fed Funds Rate', linewidth=2)

# Mark FOMC statement releases
statement_dates = statements['Release Date'].dropna()
for date in statement_dates:
    if date in market_df.index:
        ax.axvline(date, color='red', alpha=0.2, linewidth=0.5)

ax.set_xlabel('Date', fontsize=12)
ax.set_ylabel('Rate (%)', fontsize=12)
ax.set_title('Effective Federal Funds Rate with FOMC Statement Releases', fontsize=14, fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n📊 Red vertical lines indicate FOMC statement release dates")

## 4. Load Existing NLP Features

Load the features you've already computed (GPT-4 scores, FinBERT, BART, etc.)

In [None]:
# Load your existing feature files
try:
    gpt_scores = pd.read_csv('gpt_hawk_scores.csv')
    print(f"✓ Loaded GPT scores: {gpt_scores.shape}")
except:
    print("⚠ gpt_hawk_scores.csv not found - will need to regenerate")
    gpt_scores = None

try:
    full_features = pd.read_csv('data_with_gpt_bart_finbert.csv')
    print(f"✓ Loaded full features: {full_features.shape}")
    print(f"  Columns: {full_features.columns.tolist()}")
except:
    print("⚠ data_with_gpt_bart_finbert.csv not found")
    full_features = None

## 5. ENHANCEMENT #2: Change Detection System

**KEY INSIGHT**: Markets don't react to absolute hawkishness—they react to *changes* from the previous statement!

A statement that's hawkish but *less hawkish than before* often causes yields to fall.

We'll build features that capture:
1. **Sentence-level changes**: Which sentences were added, removed, or modified?
2. **Key phrase tracking**: Changes in specific policy language
3. **Semantic drift**: How much did the meaning shift?
4. **Section-specific changes**: Separate changes in outlook vs. policy description

In [None]:
import nltk
try:
    nltk.data.find('tokenizers/punkt')
except LookupError:
    nltk.download('punkt')

from nltk.tokenize import sent_tokenize

def compute_text_similarity(text1, text2):
    """
    Compute similarity between two texts
    """
    if pd.isna(text1) or pd.isna(text2):
        return np.nan
    
    # Use SequenceMatcher for character-level similarity
    return SequenceMatcher(None, text1, text2).ratio()

def extract_key_phrases(text):
    """
    Extract key policy-related phrases
    """
    if pd.isna(text):
        return {}
    
    text_lower = text.lower()
    
    phrases = {
        # Inflation language
        'inflation_elevated': 'inflation remains elevated' in text_lower or 'elevated inflation' in text_lower,
        'inflation_moderating': 'inflation has moderated' in text_lower or 'moderating inflation' in text_lower,
        'inflation_easing': 'inflation easing' in text_lower or 'inflation has eased' in text_lower,
        
        # Rate language
        'rate_increases': 'rate increase' in text_lower or 'raising the target range' in text_lower,
        'rate_cuts': 'rate cut' in text_lower or 'rate reduction' in text_lower or 'lowering the target range' in text_lower,
        'rate_hold': 'maintain the target range' in text_lower or 'leaving the target range' in text_lower,
        
        # Forward guidance
        'data_dependent': 'data dependent' in text_lower or 'incoming data' in text_lower,
        'patient': 'patient' in text_lower and 'policy' in text_lower,
        'gradual': 'gradual' in text_lower,
        
        # Labor market
        'labor_tight': 'tight labor' in text_lower or 'labor market remains tight' in text_lower,
        'labor_softening': 'labor market has softened' in text_lower or 'softening labor' in text_lower,
        
        # Economic outlook
        'growth_solid': 'solid growth' in text_lower or 'economic growth is solid' in text_lower,
        'growth_slowing': 'slowing growth' in text_lower or 'growth has slowed' in text_lower,
    }
    
    return phrases

def detect_statement_changes(current_text, previous_text, current_date=None, prev_date=None):
    """
    Comprehensive change detection between consecutive FOMC statements
    
    Returns:
        Dictionary of change features
    """
    if pd.isna(current_text) or pd.isna(previous_text):
        return {}
    
    # Tokenize into sentences
    curr_sentences = sent_tokenize(current_text)
    prev_sentences = sent_tokenize(previous_text)
    
    # Convert to sets for comparison
    curr_set = set(s.strip() for s in curr_sentences)
    prev_set = set(s.strip() for s in prev_sentences)
    
    # Count changes
    added = curr_set - prev_set
    removed = prev_set - curr_set
    unchanged = curr_set & prev_set
    
    # Overall similarity
    overall_similarity = compute_text_similarity(current_text, previous_text)
    
    # Length changes
    len_change_pct = (len(current_text) - len(previous_text)) / len(previous_text) * 100 if len(previous_text) > 0 else 0
    sentence_count_change = len(curr_sentences) - len(prev_sentences)
    
    # Key phrase analysis
    curr_phrases = extract_key_phrases(current_text)
    prev_phrases = extract_key_phrases(previous_text)
    
    # Track phrase changes
    phrase_changes = {}
    for phrase_name in curr_phrases.keys():
        curr_val = curr_phrases[phrase_name]
        prev_val = prev_phrases[phrase_name]
        
        if curr_val and not prev_val:
            phrase_changes[f'{phrase_name}_added'] = 1
        elif not curr_val and prev_val:
            phrase_changes[f'{phrase_name}_removed'] = 1
        else:
            phrase_changes[f'{phrase_name}_added'] = 0
            phrase_changes[f'{phrase_name}_removed'] = 0
    
    # Compile features
    features = {
        'change_sentences_added': len(added),
        'change_sentences_removed': len(removed),
        'change_sentences_unchanged': len(unchanged),
        'change_net_sentences': len(added) - len(removed),
        'change_pct_sentences_modified': (len(added) + len(removed)) / max(len(prev_set), 1) * 100,
        'change_overall_similarity': overall_similarity,
        'change_text_length_pct': len_change_pct,
        'change_sentence_count': sentence_count_change,
    }
    
    # Add phrase changes
    features.update(phrase_changes)
    
    return features

print("✓ Change detection functions defined")

In [None]:
def add_change_features(df):
    """
    Add change detection features to dataframe
    Compares each statement to the previous one
    """
    df = df.copy()
    df = df.sort_values('Date').reset_index(drop=True)
    
    # Initialize all change features to NaN for first row
    all_change_features = []
    
    for idx in range(len(df)):
        if idx == 0:
            # First statement has no previous comparison
            all_change_features.append({})
        else:
            current_text = df.loc[idx, 'Text']
            previous_text = df.loc[idx-1, 'Text']
            current_date = df.loc[idx, 'Date']
            prev_date = df.loc[idx-1, 'Date']
            
            changes = detect_statement_changes(
                current_text, previous_text, 
                current_date, prev_date
            )
            all_change_features.append(changes)
    
    # Convert to DataFrame
    change_df = pd.DataFrame(all_change_features)
    
    # Concatenate with original DataFrame
    df = pd.concat([df, change_df], axis=1)
    
    print(f"✓ Added {len(change_df.columns)} change detection features")
    print(f"  Features: {list(change_df.columns[:10])}...")
    
    return df

# Add change features to statements
print("Computing change features for statements...")
statements = add_change_features(statements)

print("\nExample change features:")
change_cols = [col for col in statements.columns if col.startswith('change_')]
print(statements[['Date'] + change_cols[:5]].head(10))

## 6. ENHANCEMENT #3: Time-Series Cross-Validation

**Problem with your current approach**: Single train/test split at 2017 doesn't show robustness

**Solution**: Walk-forward time-series cross-validation

In [None]:
def create_train_test_splits(df, holdout_year=2024, cv_cutoff_year=2017):
    """
    Create proper time-series splits:
    1. Training set: Before cv_cutoff_year (for CV)
    2. Validation set: cv_cutoff_year to holdout_year (for model selection)
    3. Holdout set: holdout_year onwards (true out-of-sample)
    
    Args:
        df: DataFrame with FOMC data
        holdout_year: Year to start holdout set (2024 or 2025)
        cv_cutoff_year: Year to split train/validation
    
    Returns:
        Dictionary with train, validation, holdout splits
    """
    df = df.copy()
    df['year'] = pd.to_datetime(df['Date']).dt.year
    
    # Create splits
    train = df[df['year'] < cv_cutoff_year].copy()
    validation = df[(df['year'] >= cv_cutoff_year) & (df['year'] < holdout_year)].copy()
    holdout = df[df['year'] >= holdout_year].copy()
    
    print(f"Train set: {len(train)} samples ({train['year'].min()}-{train['year'].max()})")
    print(f"Validation set: {len(validation)} samples ({validation['year'].min()}-{validation['year'].max() if len(validation) > 0 else 'N/A'})")
    print(f"Holdout set: {len(holdout)} samples ({holdout['year'].min() if len(holdout) > 0 else 'N/A'}-{holdout['year'].max() if len(holdout) > 0 else 'N/A'})")
    
    return {
        'train': train,
        'validation': validation,
        'holdout': holdout,
        'train_val': pd.concat([train, validation])  # Combined for final training
    }

# Create splits
splits = create_train_test_splits(statements, holdout_year=2024, cv_cutoff_year=2017)

print(f"\n✓ Data split created successfully")

In [None]:
def time_series_cv_split(df, n_splits=5, min_train_size=30):
    """
    Create time-series cross-validation splits
    
    Each fold uses an expanding window:
    - Fold 1: Train on first 30, test on next 10
    - Fold 2: Train on first 40, test on next 10
    - etc.
    """
    from sklearn.model_selection import TimeSeriesSplit
    
    tscv = TimeSeriesSplit(n_splits=n_splits)
    
    splits_info = []
    for fold_idx, (train_idx, test_idx) in enumerate(tscv.split(df)):
        if len(train_idx) < min_train_size:
            continue
            
        train_dates = df.iloc[train_idx]['Date']
        test_dates = df.iloc[test_idx]['Date']
        
        splits_info.append({
            'fold': fold_idx,
            'train_idx': train_idx,
            'test_idx': test_idx,
            'train_size': len(train_idx),
            'test_size': len(test_idx),
            'train_period': f"{train_dates.min():%Y-%m} to {train_dates.max():%Y-%m}",
            'test_period': f"{test_dates.min():%Y-%m} to {test_dates.max():%Y-%m}"
        })
    
    # Print fold information
    print(f"Time-Series Cross-Validation: {len(splits_info)} folds\n")
    for split in splits_info:
        print(f"Fold {split['fold']}: Train {split['train_size']} ({split['train_period']}) | Test {split['test_size']} ({split['test_period']})")
    
    return tscv, splits_info

# Create CV splits for the training+validation set
tscv, cv_splits = time_series_cv_split(splits['train_val'], n_splits=5)

## 7. Build Feature Matrix

Combine all features: existing NLP features + new change features + market data

In [None]:
# For now, let's define a placeholder feature set
# You'll merge this with your existing features from data_with_gpt_bart_finbert.csv

def prepare_feature_matrix(df, target='dgs2_1d_bp'):
    """
    Prepare feature matrix for modeling
    
    Args:
        df: DataFrame with all features
        target: Target variable (yield change)
    
    Returns:
        X, y, feature_names
    """
    # Select feature columns
    # You'll need to customize this based on your actual features
    feature_cols = [col for col in df.columns if (
        col.startswith('change_') or 
        col.startswith('gpt_') or
        col.startswith('bart_') or
        col.startswith('finbert_') or
        col in ['hawk_minus_dove', 'delta_semantic', 'is_minute']
    )]
    
    # Remove target columns from features
    feature_cols = [col for col in feature_cols if not any([
        'dgs' in col.lower(),
        'dff' in col.lower(),
        'dy' in col.lower(),
        'spread' in col.lower()
    ])]
    
    # Filter to available columns
    feature_cols = [col for col in feature_cols if col in df.columns]
    
    print(f"Feature columns: {len(feature_cols)}")
    print(f"Sample features: {feature_cols[:10]}")
    
    # Extract X and y
    X = df[feature_cols].copy()
    y = df[target].copy()
    
    # Handle missing values
    X = X.fillna(0)  # Simple imputation for now
    
    # Filter to valid samples (where target is not null)
    valid_idx = y.notna()
    X = X[valid_idx]
    y = y[valid_idx]
    
    print(f"Final shape: X={X.shape}, y={y.shape}")
    print(f"Target stats: mean={y.mean():.2f}, std={y.std():.2f}, min={y.min():.2f}, max={y.max():.2f}")
    
    return X, y, feature_cols

# Prepare features (this will work with change features only for now)
# Later you'll merge with your existing NLP features
print("Preparing feature matrix...")
print("\nNote: This uses only change features for now.")
print("You'll need to merge with your existing GPT/BART/FinBERT features from 'data_with_gpt_bart_finbert.csv'")

## 8. ENHANCEMENT #4: SHAP Analysis for Interpretability

**Why this matters for the paper**: 
- Academics want to know *which* features matter
- Practitioners want *interpretable* signals
- SHAP shows: "This 5bp yield spike is 40% explained by change in inflation language, 30% by GPT hawkishness score"

In [None]:
# Placeholder for SHAP analysis
# Will implement after we have full feature set and trained models

def explain_model_with_shap(model, X_train, X_test, feature_names, model_type='tree'):
    """
    Generate SHAP explanations for model predictions
    
    Args:
        model: Trained model
        X_train: Training features
        X_test: Test features
        feature_names: List of feature names
        model_type: 'tree' or 'linear'
    
    Returns:
        SHAP explainer and values
    """
    print(f"Computing SHAP values for {model_type} model...")
    
    if model_type == 'tree':
        explainer = shap.TreeExplainer(model)
        shap_values = explainer.shap_values(X_test)
    else:  # linear
        explainer = shap.LinearExplainer(model, X_train)
        shap_values = explainer.shap_values(X_test)
    
    # Summary plot
    print("\nGenerating SHAP summary plot...")
    shap.summary_plot(shap_values, X_test, feature_names=feature_names, show=False)
    plt.tight_layout()
    plt.show()
    
    # Feature importance
    feature_importance = pd.DataFrame({
        'feature': feature_names,
        'importance': np.abs(shap_values).mean(axis=0)
    }).sort_values('importance', ascending=False)
    
    print("\nTop 10 most important features (by mean |SHAP value|):")
    print(feature_importance.head(10))
    
    return explainer, shap_values, feature_importance

print("✓ SHAP analysis function defined")
print("  Will run after model training")

## 9. Model Training with Cross-Validation

Train models using proper time-series CV

In [None]:
def train_and_evaluate_with_cv(X, y, model, cv_splitter, model_name='Model'):
    """
    Train model with time-series cross-validation
    
    Args:
        X: Features
        y: Target
        model: Sklearn-compatible model
        cv_splitter: TimeSeriesSplit object
        model_name: Name for display
    
    Returns:
        Dictionary with results
    """
    from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
    
    cv_results = []
    
    for fold_idx, (train_idx, test_idx) in enumerate(cv_splitter.split(X)):
        # Split data
        X_train_fold, X_test_fold = X.iloc[train_idx], X.iloc[test_idx]
        y_train_fold, y_test_fold = y.iloc[train_idx], y.iloc[test_idx]
        
        # Train model
        model.fit(X_train_fold, y_train_fold)
        
        # Predict
        y_pred_train = model.predict(X_train_fold)
        y_pred_test = model.predict(X_test_fold)
        
        # Compute metrics
        fold_results = {
            'fold': fold_idx,
            'train_rmse': np.sqrt(mean_squared_error(y_train_fold, y_pred_train)),
            'test_rmse': np.sqrt(mean_squared_error(y_test_fold, y_pred_test)),
            'train_mae': mean_absolute_error(y_train_fold, y_pred_train),
            'test_mae': mean_absolute_error(y_test_fold, y_pred_test),
            'train_r2': r2_score(y_train_fold, y_pred_train),
            'test_r2': r2_score(y_test_fold, y_pred_test),
        }
        
        cv_results.append(fold_results)
    
    # Aggregate results
    cv_df = pd.DataFrame(cv_results)
    
    print(f"\n{'='*60}")
    print(f"{model_name} - Cross-Validation Results")
    print(f"{'='*60}")
    print(cv_df)
    print(f"\nMean Test RMSE: {cv_df['test_rmse'].mean():.3f} ± {cv_df['test_rmse'].std():.3f}")
    print(f"Mean Test MAE:  {cv_df['test_mae'].mean():.3f} ± {cv_df['test_mae'].std():.3f}")
    print(f"Mean Test R²:   {cv_df['test_r2'].mean():.3f} ± {cv_df['test_r2'].std():.3f}")
    
    return {
        'cv_results': cv_df,
        'mean_test_rmse': cv_df['test_rmse'].mean(),
        'std_test_rmse': cv_df['test_rmse'].std(),
        'mean_test_mae': cv_df['test_mae'].mean(),
        'mean_test_r2': cv_df['test_r2'].mean(),
    }

print("✓ CV training function defined")

## 10. Integration Instructions

**To integrate with your existing work:**

```python
# 1. Load your existing features
existing_features = pd.read_csv('data_with_gpt_bart_finbert.csv')

# 2. Merge with new features from this notebook
enhanced_df = statements.merge(
    existing_features[['Date', 'gpt_score', 'bart_score', 'finbert_pos', 'finbert_neg', ...]],
    on='Date',
    how='left'
)

# 3. Now you have:
#    - All your existing NLP features (GPT, BART, FinBERT, etc.)
#    - New change detection features (change_*)
#    - Fed Funds futures data (dff_1d_bp, dff_2d_bp)
#    - Proper train/val/holdout splits

# 4. Train and evaluate!
```