# Macro Sentiment Trading - Complete Lifecycle

This notebook demonstrates the **complete end-to-end pipeline** from data collection to signal generation:

1. **Data Collection** - Collect GDELT news data using BigQuery
2. **Sentiment Processing** - FinBERT analysis with 126 sentiment features
3. **Market Data** - Yahoo Finance with 443 technical features
4. **Feature Alignment** - 569 total features per asset
5. **Model Training** - XGBoost + Logistic Regression
6. **Backtesting** - Expanding window with transaction costs
7. **Signal Generation** - Production signals with confidence scores
8. **Model Management** - Registry, persistence, and selection


**Runtime:** ~10 minutes end-to-end on GPU

## Setup and Configuration

In [1]:
# Core imports
import sys
import os
from pathlib import Path
import pandas as pd
import numpy as np
import logging
from datetime import datetime, date, timedelta
import warnings
warnings.filterwarnings('ignore')

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# Set up paths
PROJECT_ROOT = Path.cwd().parent
sys.path.insert(0, str(PROJECT_ROOT))

DATA_DIR = PROJECT_ROOT / "data"
RESULTS_DIR = PROJECT_ROOT / "results" / "lifecycle_notebook"
RESULTS_DIR.mkdir(parents=True, exist_ok=True)

print(f"Project root: {PROJECT_ROOT}")
print(f"Results directory: {RESULTS_DIR}")
print(f"Python version: {sys.version}")
print(f"Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

Project root: c:\Users\danie\Coding Projects\Personal\macro_sentiment_trading
Results directory: c:\Users\danie\Coding Projects\Personal\macro_sentiment_trading\results\lifecycle_notebook
Python version: 3.11.6 (tags/v3.11.6:8b6ee5b, Oct  2 2023, 14:57:12) [MSC v.1935 64 bit (AMD64)]
Timestamp: 2025-10-24 18:40:12


## Step 1: Data Collection (GDELT BigQuery)

Collect 6 months of global news data using the `UnifiedGDELTCollector`.

**Configuration:**
- Date range: Apr 1 - Oct 23, 2025 (6 months)
- Top 100 events per day by article coverage
- EventCode 100-199 (macro-economic events)
- Automatic method selection (BigQuery if available, else free API)

In [2]:
from src.data_collector import collect_and_process_news
import os

# Define date range (6 months)
start_date = "2025-04-01"
end_date = "2025-10-23"

print("=" * 80)
print("STEP 1: DATA COLLECTION")
print("=" * 80)
print(f"Date range: {start_date} to {end_date}")
print(f"Method: Auto-detect (BigQuery preferred)")
print()

# Check for cached data
cache_file = f"data/news/gdelt_bigquery_{start_date}_{end_date}.parquet"
if os.path.exists(cache_file):
    print(f"[CACHE] Found cached data: {cache_file}")
    print(f"  File size: {os.path.getsize(cache_file) / 1024 / 1024:.1f} MB")
    print("  Loading from cache...")
    import pandas as pd
    events_df = pd.read_parquet(cache_file)
    print("  Loaded from cache!")
else:
    print("No cache found. Collecting fresh data...")
    # Use the caching-enabled helper function
    events_df = collect_and_process_news(
        start_date=start_date,
        end_date=end_date,
        force_refresh=False,  # Use cache if available
        use_method=None,      # Auto-detect
        top_n_per_day=100
    )

print(f"\nData collection completed!")
print(f"  Total events: {len(events_df):,}")
print(f"  Date range: {events_df['date'].min()} to {events_df['date'].max()}")
print(f"  Columns: {list(events_df.columns)}")
print(f"  Headlines with content: {events_df['headline'].notna().sum():,}")

# Show sample headlines
print("\nSample headlines:")
for i, headline in enumerate(events_df['headline'].dropna().head(3), 1):
    print(f"{i}. {headline[:100]}...")

# Save raw events to results directory
events_path = RESULTS_DIR / f"events_data_{start_date.replace('-', '')}_{end_date.replace('-', '')}.parquet"
events_df.to_parquet(events_path)
print(f"\nSaved to: {events_path}")

STEP 1: DATA COLLECTION
Date range: 2025-04-01 to 2025-10-23
Method: Auto-detect (BigQuery preferred)

[CACHE] Found cached data: data/news/gdelt_bigquery_2025-04-01_2025-10-23.parquet
  File size: 3.3 MB
  Loading from cache...
  Loaded from cache!

Data collection completed!
  Total events: 18,900
  Date range: 2025-04-01 23:45:00 to 2025-10-23 23:45:00
  Columns: ['date', 'full_date', 'headline', 'url', 'tone', 'doc_id', 'goldstein_mean', 'goldstein_std', 'num_articles', 'num_mentions', 'num_sources', 'actor1_count', 'actor2_count']
  Headlines with content: 18,900

Sample headlines:
1. New Castle County parks can now be put in a new zoning category...
2. Bangladesh Chief Advisors China Tour Cements Dhaka-Beijing Relations...
3. Longview's Izzi Breaux sets Ouachita records at Texas Relays...

Saved to: c:\Users\danie\Coding Projects\Personal\macro_sentiment_trading\results\lifecycle_notebook\events_data_20250401_20251023.parquet


## Step 2: Sentiment Analysis (FinBERT)

Process headlines with FinBERT transformer model to extract sentiment features.

**Model:** ProsusAI/finbert (97% accuracy on Financial PhraseBank)

**Output:** 126 sentiment features
- 8 base features (mean sentiment, volatility, volume, article impact, Goldstein scale)
- 42 lag features (1, 2, 3-day lags)
- 48 moving average/rolling statistics
- 28 interaction terms

In [3]:
from src.sentiment_analyzer import SentimentAnalyzer
import torch

print("=" * 80)
print("STEP 2: SENTIMENT ANALYSIS")
print("=" * 80)

# Check GPU availability
if torch.cuda.is_available():
    device = "cuda"
    gpu_name = torch.cuda.get_device_name(0)
    gpu_memory = torch.cuda.get_device_properties(0).total_memory / 1024**3
    print(f"GPU: {gpu_name} ({gpu_memory:.1f} GB)")
    batch_size = 128
else:
    device = "cpu"
    print("GPU: Not available (using CPU - will be slower)")
    batch_size = 32

print(f"Batch size: {batch_size}")
print()

# Initialize sentiment analyzer
sentiment_analyzer = SentimentAnalyzer(device=device)

# Process headlines with FinBERT
print(f"Processing {len(events_df)} headlines with FinBERT...")
headlines = events_df['headline'].dropna().tolist()

sentiment_scores = sentiment_analyzer.compute_sentiment(
    headlines=headlines,
    batch_size=batch_size
)

# Merge sentiment scores back to events - PRESERVE ALL COLUMNS INCLUDING DATE
events_with_sentiment = events_df[events_df['headline'].notna()].copy()
events_with_sentiment = events_with_sentiment.reset_index(drop=True)
events_with_sentiment['p_negative'] = sentiment_scores['p_negative'].values
events_with_sentiment['p_neutral'] = sentiment_scores['p_neutral'].values
events_with_sentiment['p_positive'] = sentiment_scores['p_positive'].values
events_with_sentiment['polarity'] = sentiment_scores['polarity'].values

print(f"\nSentiment analysis completed!")
print(f"  Headlines processed: {len(sentiment_scores):,}")
print(f"  Mean polarity: {sentiment_scores['polarity'].mean():.3f}")
print(f"  Polarity std: {sentiment_scores['polarity'].std():.3f}")
print(f"  Date column present: {'date' in events_with_sentiment.columns}")
print(f"  Unique dates: {events_with_sentiment['date'].nunique() if 'date' in events_with_sentiment.columns else 'N/A'}")

# Compute daily sentiment features (126 features)
print("\nComputing daily sentiment features...")
daily_features = sentiment_analyzer.compute_daily_features(events_with_sentiment)

print(f"\nDaily features computed!")
print(f"  Days with data: {len(daily_features)}")
print(f"  Features: {len(daily_features.columns)}")
print(f"  Feature names: {list(daily_features.columns[:10])}...")

# Save daily features
daily_features_path = RESULTS_DIR / f"daily_features_{start_date.replace('-', '')}_{end_date.replace('-', '')}.parquet"
daily_features.to_parquet(daily_features_path)
print(f"\nSaved to: {daily_features_path}")

2025-10-24 18:40:27,271 - src.sentiment_analyzer - INFO - Using device: cuda


STEP 2: SENTIMENT ANALYSIS
GPU: NVIDIA GeForce RTX 4060 Laptop GPU (8.0 GB)
Batch size: 128

Processing 18900 headlines with FinBERT...

Sentiment analysis completed!
  Headlines processed: 18,900
  Mean polarity: 0.458
  Polarity std: 0.431
  Date column present: True
  Unique dates: 189

Computing daily sentiment features...

Daily features computed!
  Days with data: 189
  Features: 36
  Feature names: ['date', 'mean_sentiment', 'sentiment_std', 'news_volume', 'log_volume', 'article_impact', 'goldstein_mean', 'goldstein_std', 'mean_sentiment_lag_1', 'mean_sentiment_lag_2']...

Saved to: c:\Users\danie\Coding Projects\Personal\macro_sentiment_trading\results\lifecycle_notebook\daily_features_20250401_20251023.parquet


## Step 3: Market Data Processing

Collect market data from Yahoo Finance and compute technical indicators.

**Assets:** EURUSD, USDJPY, TNOTE (expandable to 35+ assets)

**Features:** 443 market/technical features per asset
- 5 lagged returns (1,2,3,5,10 days)
- 158 TA-Lib indicators (RSI, SMA, Bollinger Bands, MACD, ATR, etc.)
- 280+ derivative features (lags, MAs, rolling stds, interactions)

In [4]:
from src.market_processor import MarketProcessor

print("=" * 80)
print("STEP 3: MARKET DATA PROCESSING")
print("=" * 80)

# Initialize market processor
market_processor = MarketProcessor()

# Define assets (default: EURUSD, USDJPY, TNOTE)
assets = ["EURUSD", "USDJPY", "TNOTE"]
print(f"Assets: {', '.join(assets)}")
print(f"Date range: {start_date} to {end_date}")
print()

# Fetch market data for all assets
print("Fetching market data from Yahoo Finance...")
market_data = market_processor.fetch_market_data(
    start_date=start_date,
    end_date=end_date
)

print(f"\nâœ“ Market data downloaded!")
for asset_name, asset_data in market_data.items():
    print(f"  {asset_name}: {len(asset_data)} days")

# Compute technical features for each asset
print("\nComputing technical indicators (TA-Lib)...")
for asset_name in list(market_data.keys()):
    print(f"  Processing {asset_name}...")
    market_data[asset_name] = market_processor.compute_market_features(
        market_data[asset_name]
    )
    print(f"    Features: {len(market_data[asset_name].columns)}")

print(f"\nâœ“ Technical features computed!")

STEP 3: MARKET DATA PROCESSING
Assets: EURUSD, USDJPY, TNOTE
Date range: 2025-04-01 to 2025-10-23

Fetching market data from Yahoo Finance...


[*********************100%***********************]  1 of 1 completed
2025-10-24 18:43:21,299 - src.market_processor - INFO - Successfully downloaded and processed data for EURUSD
[*********************100%***********************]  1 of 1 completed
2025-10-24 18:43:21,463 - src.market_processor - INFO - Successfully downloaded and processed data for USDJPY
[*********************100%***********************]  1 of 1 completed
2025-10-24 18:43:21,662 - src.market_processor - INFO - Successfully downloaded and processed data for GBPUSD
[*********************100%***********************]  1 of 1 completed
2025-10-24 18:43:21,818 - src.market_processor - INFO - Successfully downloaded and processed data for AUDUSD
[*********************100%***********************]  1 of 1 completed
2025-10-24 18:43:21,978 - src.market_processor - INFO - Successfully downloaded and processed data for USDCHF
[*********************100%***********************]  1 of 1 completed
2025-10-24 18:43:22,155 - src.market


âœ“ Market data downloaded!
  EURUSD: 145 days
  USDJPY: 145 days
  GBPUSD: 145 days
  AUDUSD: 145 days
  USDCHF: 145 days
  USDCAD: 145 days
  NZDUSD: 145 days
  BTCUSD: 205 days
  ETHUSD: 205 days
  GOLD: 143 days
  SPY: 142 days
  TNOTE: 143 days

Computing technical indicators (TA-Lib)...
  Processing EURUSD...


2025-10-24 18:43:24,155 - src.market_processor - INFO - Total technical features created: 380
2025-10-24 18:43:24,167 - src.market_processor - INFO - Computing 158 TA-Lib technical indicators...
2025-10-24 18:43:24,214 - src.market_processor - INFO - Added 51 base technical indicators
2025-10-24 18:43:24,216 - src.market_processor - INFO - Creating derivative features for 51 indicators...


    Features: 388
  Processing USDJPY...


2025-10-24 18:43:24,690 - src.market_processor - INFO - Total technical features created: 380
2025-10-24 18:43:24,704 - src.market_processor - INFO - Computing 158 TA-Lib technical indicators...
2025-10-24 18:43:24,757 - src.market_processor - INFO - Added 51 base technical indicators
2025-10-24 18:43:24,759 - src.market_processor - INFO - Creating derivative features for 51 indicators...


    Features: 388
  Processing GBPUSD...


2025-10-24 18:43:25,319 - src.market_processor - INFO - Total technical features created: 380
2025-10-24 18:43:25,331 - src.market_processor - INFO - Computing 158 TA-Lib technical indicators...
2025-10-24 18:43:25,383 - src.market_processor - INFO - Added 51 base technical indicators
2025-10-24 18:43:25,386 - src.market_processor - INFO - Creating derivative features for 51 indicators...


    Features: 388
  Processing AUDUSD...


2025-10-24 18:43:25,891 - src.market_processor - INFO - Total technical features created: 380
2025-10-24 18:43:25,907 - src.market_processor - INFO - Computing 158 TA-Lib technical indicators...
2025-10-24 18:43:25,966 - src.market_processor - INFO - Added 51 base technical indicators
2025-10-24 18:43:25,970 - src.market_processor - INFO - Creating derivative features for 51 indicators...


    Features: 388
  Processing USDCHF...


2025-10-24 18:43:26,517 - src.market_processor - INFO - Total technical features created: 380
2025-10-24 18:43:26,531 - src.market_processor - INFO - Computing 158 TA-Lib technical indicators...
2025-10-24 18:43:26,601 - src.market_processor - INFO - Added 51 base technical indicators
2025-10-24 18:43:26,603 - src.market_processor - INFO - Creating derivative features for 51 indicators...


    Features: 388
  Processing USDCAD...


2025-10-24 18:43:27,187 - src.market_processor - INFO - Total technical features created: 380
2025-10-24 18:43:27,202 - src.market_processor - INFO - Computing 158 TA-Lib technical indicators...
2025-10-24 18:43:27,256 - src.market_processor - INFO - Added 51 base technical indicators
2025-10-24 18:43:27,260 - src.market_processor - INFO - Creating derivative features for 51 indicators...


    Features: 388
  Processing NZDUSD...


2025-10-24 18:43:27,806 - src.market_processor - INFO - Total technical features created: 380
2025-10-24 18:43:27,822 - src.market_processor - INFO - Computing 158 TA-Lib technical indicators...
2025-10-24 18:43:27,876 - src.market_processor - INFO - Added 51 base technical indicators
2025-10-24 18:43:27,878 - src.market_processor - INFO - Creating derivative features for 51 indicators...


    Features: 388
  Processing BTCUSD...


2025-10-24 18:43:28,390 - src.market_processor - INFO - Total technical features created: 380
2025-10-24 18:43:28,404 - src.market_processor - INFO - Computing 158 TA-Lib technical indicators...
2025-10-24 18:43:28,456 - src.market_processor - INFO - Added 51 base technical indicators
2025-10-24 18:43:28,460 - src.market_processor - INFO - Creating derivative features for 51 indicators...


    Features: 388
  Processing ETHUSD...


2025-10-24 18:43:28,994 - src.market_processor - INFO - Total technical features created: 380
2025-10-24 18:43:29,009 - src.market_processor - INFO - Computing 158 TA-Lib technical indicators...
2025-10-24 18:43:29,059 - src.market_processor - INFO - Added 51 base technical indicators
2025-10-24 18:43:29,062 - src.market_processor - INFO - Creating derivative features for 51 indicators...


    Features: 388
  Processing GOLD...


2025-10-24 18:43:29,613 - src.market_processor - INFO - Total technical features created: 380
2025-10-24 18:43:29,626 - src.market_processor - INFO - Computing 158 TA-Lib technical indicators...
2025-10-24 18:43:29,680 - src.market_processor - INFO - Added 51 base technical indicators
2025-10-24 18:43:29,682 - src.market_processor - INFO - Creating derivative features for 51 indicators...


    Features: 388
  Processing SPY...


2025-10-24 18:43:30,274 - src.market_processor - INFO - Total technical features created: 380
2025-10-24 18:43:30,288 - src.market_processor - INFO - Computing 158 TA-Lib technical indicators...
2025-10-24 18:43:30,342 - src.market_processor - INFO - Added 51 base technical indicators
2025-10-24 18:43:30,344 - src.market_processor - INFO - Creating derivative features for 51 indicators...


    Features: 388
  Processing TNOTE...


2025-10-24 18:43:31,136 - src.market_processor - INFO - Total technical features created: 380


    Features: 388

âœ“ Technical features computed!


## Step 4: Feature Alignment

Align sentiment features (daily) with market data (daily OHLCV) for each asset.

**Result:** 569 total features per asset
- 126 sentiment features
- 443 market/technical features
- Forward-fill sentiment on market days
- Drop rows with NaN targets

In [5]:
print("=" * 80)
print("STEP 4: FEATURE ALIGNMENT")
print("=" * 80)

# Align sentiment and market data
print("Aligning sentiment and market features...")
aligned_data = market_processor.align_features(
    market_data=market_data,
    sentiment_features=daily_features  # ✓ CORRECTED: sentiment_features not sentiment_data
)

print(f"✓ Feature alignment completed\!")
print(f"  Assets aligned: {len(aligned_data)}")
print()

for asset_name, asset_data in aligned_data.items():
    print(f"{asset_name}:")
    print(f"  Shape: {asset_data.shape} (rows × columns)")
    print(f"  Date range: {asset_data.index.min()} to {asset_data.index.max()}")
    print(f"  Features: {len([c for c in asset_data.columns if c not in ['target', 'returns', 'date']])}")
    print(f"  Target distribution:")
    if 'target' in asset_data.columns:
        target_counts = asset_data['target'].value_counts()
        for val, count in target_counts.items():
            print(f"    {val}: {count} ({count/len(asset_data)*100:.1f}%)")
    print()

# Save aligned data
aligned_dir = RESULTS_DIR / f"{start_date.replace('-', '')}_{end_date.replace('-', '')}"  
aligned_dir.mkdir(exist_ok=True)

for asset_name, asset_data in aligned_data.items():
    aligned_path = aligned_dir / f"aligned_data_{asset_name}.parquet"
    asset_data.to_parquet(aligned_path)
    print(f"✓ Saved {asset_name} to: {aligned_path}")

2025-10-24 18:43:31,176 - src.market_processor - INFO - Merging EURUSD: asset_data has 145 rows, sentiment has 189 rows
2025-10-24 18:43:31,182 - src.market_processor - INFO - After merge EURUSD: 145 rows


STEP 4: FEATURE ALIGNMENT
Aligning sentiment and market features...


2025-10-24 18:43:31,389 - src.market_processor - INFO - DEBUG - EURUSD before filtering:
2025-10-24 18:43:31,392 - src.market_processor - INFO -   Shape: (145, 682)
2025-10-24 18:43:31,394 - src.market_processor - INFO -   Columns: ['date', 'Close_EURUSD=X', 'High_EURUSD=X', 'Low_EURUSD=X', 'Open_EURUSD=X', 'Volume_EURUSD=X', 'returns', 'target', 'vol20', 'return_lag_1']
2025-10-24 18:43:31,395 - src.market_processor - INFO -   Sample data:
2025-10-24 18:43:31,398 - src.market_processor - INFO -     Date range: 2025-04-01 to 2025-10-22
2025-10-24 18:43:31,400 - src.market_processor - INFO -     Price columns: ['Close_EURUSD=X']
2025-10-24 18:43:31,402 - src.market_processor - INFO -     Sentiment columns: ['mean_sentiment', 'sentiment_std', 'mean_sentiment_lag_1', 'mean_sentiment_lag_2', 'mean_sentiment_lag_3', 'sentiment_std_lag_1', 'sentiment_std_lag_2', 'sentiment_std_lag_3', 'mean_sentiment_ma_5d', 'mean_sentiment_ma_20d', 'sentiment_acceleration', 'mean_sentiment_std_5d', 'mean_se

✓ Feature alignment completed\!
  Assets aligned: 12

EURUSD:
  Shape: (133, 682) (rows × columns)
  Date range: 0 to 144
  Features: 679
  Target distribution:
    0: 69 (51.9%)
    1: 64 (48.1%)

USDJPY:
  Shape: (133, 682) (rows × columns)
  Date range: 0 to 144
  Features: 679
  Target distribution:
    0: 73 (54.9%)
    1: 60 (45.1%)

GBPUSD:
  Shape: (133, 682) (rows × columns)
  Date range: 0 to 144
  Features: 679
  Target distribution:
    1: 69 (51.9%)
    0: 64 (48.1%)

AUDUSD:
  Shape: (133, 682) (rows × columns)
  Date range: 0 to 144
  Features: 679
  Target distribution:
    1: 68 (51.1%)
    0: 65 (48.9%)

USDCHF:
  Shape: (133, 682) (rows × columns)
  Date range: 0 to 144
  Features: 679
  Target distribution:
    0: 68 (51.1%)
    1: 65 (48.9%)

USDCAD:
  Shape: (133, 682) (rows × columns)
  Date range: 0 to 144
  Features: 679
  Target distribution:
    0: 67 (50.4%)
    1: 66 (49.6%)

NZDUSD:
  Shape: (133, 682) (rows × columns)
  Date range: 0 to 144
  Features: 67

## Step 5: Model Training

Train machine learning models on aligned data.

**Models:**
- XGBoost Classifier (100 estimators, no scaling)
- Logistic Regression (L2 penalty, StandardScaler, balanced weights)

**Target:** 3-class classification
- SELL (-1): returns < -0.5Ïƒ
- HOLD (0): -0.5Ïƒ â‰¤ returns â‰¤ +0.5Ïƒ
- BUY (+1): returns > +0.5Ïƒ

**Training:** All available data for each asset

In [6]:
from src.model_trainer import ModelTrainer
from src.model_persistence import ModelPersistence

print("=" * 80)
print("STEP 5: MODEL TRAINING")
print("=" * 80)

# Initialize trainer and persistence
trainer = ModelTrainer()
persistence = ModelPersistence(models_dir=str(RESULTS_DIR / "models"))

# Train models for each asset
trained_models = {}
model_ids = {}

for asset_name, asset_data in aligned_data.items():
    print(f"\nTraining models for {asset_name}...")
    print(f"  Training samples: {len(asset_data)}")
    
    # Train both models (XGBoost and Logistic Regression)
    models, scalers, feature_cols = trainer.train_models(asset_data)
    
    trained_models[asset_name] = {
        'models': models,
        'scalers': scalers,
        'feature_cols': feature_cols
    }
    
    print(f"  âœ“ Trained {len(models)} models")
    print(f"  Features used: {len(feature_cols)}")
    
    # Save models to registry
    model_ids[asset_name] = {}
    for model_name, model in models.items():
        # Get scaler (only for logistic regression)
        scaler = scalers.get(model_name, None)
        
        # Compute basic metrics (accuracy on training set)
        X, y, _, _ = trainer.prepare_features(asset_data, fit_scaler=False)
        if scaler is not None:
            X = scaler.transform(X)
        
        predictions = model.predict(X)
        accuracy = (predictions == y).mean()
        
        # Save model with metadata
        model_id = persistence.save_model(
            model=model,
            scaler=scaler,
            asset=asset_name,
            model_type=model_name,
            feature_names=feature_cols,
            performance_metrics={
                'accuracy': float(accuracy),
                'training_samples': int(len(asset_data))
            },
            training_params={
                'start_date': start_date,
                'end_date': end_date,
                'feature_count': len(feature_cols)
            }
        )
        
        model_ids[asset_name][model_name] = model_id
        print(f"    {model_name}: {model_id} (accuracy: {accuracy:.3f})")

print(f"\nâœ“ Model training completed!")
print(f"  Total models trained: {sum(len(m['models']) for m in trained_models.values())}")
print(f"  Registry: {persistence.registry.registry_file}")

STEP 5: MODEL TRAINING

Training models for EURUSD...
  Training samples: 133


2025-10-24 18:43:47,679 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:43:47,681 - src.model_persistence - INFO - Registered model EURUSD_logistic_20251024_184347_df8733f8 in registry
2025-10-24 18:43:47,683 - src.model_persistence - INFO - Saved model EURUSD_logistic_20251024_184347_df8733f8 for EURUSD (logistic)


  âœ“ Trained 2 models
  Features used: 674
    logistic: EURUSD_logistic_20251024_184347_df8733f8 (accuracy: 0.985)


2025-10-24 18:43:47,828 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:43:47,830 - src.model_persistence - INFO - Registered model EURUSD_xgboost_20251024_184347_df8733f8 in registry
2025-10-24 18:43:47,831 - src.model_persistence - INFO - Saved model EURUSD_xgboost_20251024_184347_df8733f8 for EURUSD (xgboost)


    xgboost: EURUSD_xgboost_20251024_184347_df8733f8 (accuracy: 0.519)

Training models for USDJPY...
  Training samples: 133


2025-10-24 18:43:49,324 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:43:49,325 - src.model_persistence - INFO - Registered model USDJPY_logistic_20251024_184349_df8733f8 in registry
2025-10-24 18:43:49,327 - src.model_persistence - INFO - Saved model USDJPY_logistic_20251024_184349_df8733f8 for USDJPY (logistic)


  âœ“ Trained 2 models
  Features used: 674
    logistic: USDJPY_logistic_20251024_184349_df8733f8 (accuracy: 1.000)


2025-10-24 18:43:49,498 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:43:49,500 - src.model_persistence - INFO - Registered model USDJPY_xgboost_20251024_184349_df8733f8 in registry
2025-10-24 18:43:49,502 - src.model_persistence - INFO - Saved model USDJPY_xgboost_20251024_184349_df8733f8 for USDJPY (xgboost)


    xgboost: USDJPY_xgboost_20251024_184349_df8733f8 (accuracy: 0.654)

Training models for GBPUSD...
  Training samples: 133


2025-10-24 18:43:50,959 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:43:50,960 - src.model_persistence - INFO - Registered model GBPUSD_logistic_20251024_184350_df8733f8 in registry
2025-10-24 18:43:50,961 - src.model_persistence - INFO - Saved model GBPUSD_logistic_20251024_184350_df8733f8 for GBPUSD (logistic)


  âœ“ Trained 2 models
  Features used: 674
    logistic: GBPUSD_logistic_20251024_184350_df8733f8 (accuracy: 1.000)


2025-10-24 18:43:51,081 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:43:51,084 - src.model_persistence - INFO - Registered model GBPUSD_xgboost_20251024_184350_df8733f8 in registry
2025-10-24 18:43:51,087 - src.model_persistence - INFO - Saved model GBPUSD_xgboost_20251024_184350_df8733f8 for GBPUSD (xgboost)


    xgboost: GBPUSD_xgboost_20251024_184350_df8733f8 (accuracy: 0.564)

Training models for AUDUSD...
  Training samples: 133


2025-10-24 18:43:52,533 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:43:52,536 - src.model_persistence - INFO - Registered model AUDUSD_logistic_20251024_184352_df8733f8 in registry
2025-10-24 18:43:52,537 - src.model_persistence - INFO - Saved model AUDUSD_logistic_20251024_184352_df8733f8 for AUDUSD (logistic)


  âœ“ Trained 2 models
  Features used: 674
    logistic: AUDUSD_logistic_20251024_184352_df8733f8 (accuracy: 0.992)


2025-10-24 18:43:52,669 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:43:52,672 - src.model_persistence - INFO - Registered model AUDUSD_xgboost_20251024_184352_df8733f8 in registry
2025-10-24 18:43:52,674 - src.model_persistence - INFO - Saved model AUDUSD_xgboost_20251024_184352_df8733f8 for AUDUSD (xgboost)


    xgboost: AUDUSD_xgboost_20251024_184352_df8733f8 (accuracy: 0.489)

Training models for USDCHF...
  Training samples: 133


2025-10-24 18:43:53,795 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:43:53,797 - src.model_persistence - INFO - Registered model USDCHF_logistic_20251024_184353_df8733f8 in registry
2025-10-24 18:43:53,798 - src.model_persistence - INFO - Saved model USDCHF_logistic_20251024_184353_df8733f8 for USDCHF (logistic)


  âœ“ Trained 2 models
  Features used: 674
    logistic: USDCHF_logistic_20251024_184353_df8733f8 (accuracy: 1.000)


2025-10-24 18:43:53,919 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:43:53,921 - src.model_persistence - INFO - Registered model USDCHF_xgboost_20251024_184353_df8733f8 in registry
2025-10-24 18:43:53,922 - src.model_persistence - INFO - Saved model USDCHF_xgboost_20251024_184353_df8733f8 for USDCHF (xgboost)


    xgboost: USDCHF_xgboost_20251024_184353_df8733f8 (accuracy: 0.496)

Training models for USDCAD...
  Training samples: 133


2025-10-24 18:43:55,337 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:43:55,339 - src.model_persistence - INFO - Registered model USDCAD_logistic_20251024_184355_df8733f8 in registry
2025-10-24 18:43:55,341 - src.model_persistence - INFO - Saved model USDCAD_logistic_20251024_184355_df8733f8 for USDCAD (logistic)


  âœ“ Trained 2 models
  Features used: 674
    logistic: USDCAD_logistic_20251024_184355_df8733f8 (accuracy: 1.000)


2025-10-24 18:43:55,483 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:43:55,486 - src.model_persistence - INFO - Registered model USDCAD_xgboost_20251024_184355_df8733f8 in registry
2025-10-24 18:43:55,488 - src.model_persistence - INFO - Saved model USDCAD_xgboost_20251024_184355_df8733f8 for USDCAD (xgboost)


    xgboost: USDCAD_xgboost_20251024_184355_df8733f8 (accuracy: 0.594)

Training models for NZDUSD...
  Training samples: 133


2025-10-24 18:43:56,618 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:43:56,620 - src.model_persistence - INFO - Registered model NZDUSD_logistic_20251024_184356_df8733f8 in registry
2025-10-24 18:43:56,622 - src.model_persistence - INFO - Saved model NZDUSD_logistic_20251024_184356_df8733f8 for NZDUSD (logistic)


  âœ“ Trained 2 models
  Features used: 674
    logistic: NZDUSD_logistic_20251024_184356_df8733f8 (accuracy: 1.000)


2025-10-24 18:43:56,760 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:43:56,763 - src.model_persistence - INFO - Registered model NZDUSD_xgboost_20251024_184356_df8733f8 in registry
2025-10-24 18:43:56,763 - src.model_persistence - INFO - Saved model NZDUSD_xgboost_20251024_184356_df8733f8 for NZDUSD (xgboost)


    xgboost: NZDUSD_xgboost_20251024_184356_df8733f8 (accuracy: 0.496)

Training models for BTCUSD...
  Training samples: 188


2025-10-24 18:43:58,834 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:43:58,836 - src.model_persistence - INFO - Registered model BTCUSD_logistic_20251024_184358_df8733f8 in registry
2025-10-24 18:43:58,837 - src.model_persistence - INFO - Saved model BTCUSD_logistic_20251024_184358_df8733f8 for BTCUSD (logistic)


  âœ“ Trained 2 models
  Features used: 674
    logistic: BTCUSD_logistic_20251024_184358_df8733f8 (accuracy: 1.000)


2025-10-24 18:43:58,968 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:43:58,970 - src.model_persistence - INFO - Registered model BTCUSD_xgboost_20251024_184358_df8733f8 in registry
2025-10-24 18:43:58,972 - src.model_persistence - INFO - Saved model BTCUSD_xgboost_20251024_184358_df8733f8 for BTCUSD (xgboost)


    xgboost: BTCUSD_xgboost_20251024_184358_df8733f8 (accuracy: 0.580)

Training models for ETHUSD...
  Training samples: 188


2025-10-24 18:44:01,634 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:44:01,636 - src.model_persistence - INFO - Registered model ETHUSD_logistic_20251024_184401_df8733f8 in registry
2025-10-24 18:44:01,637 - src.model_persistence - INFO - Saved model ETHUSD_logistic_20251024_184401_df8733f8 for ETHUSD (logistic)


  âœ“ Trained 2 models
  Features used: 674
    logistic: ETHUSD_logistic_20251024_184401_df8733f8 (accuracy: 0.989)


2025-10-24 18:44:01,764 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:44:01,766 - src.model_persistence - INFO - Registered model ETHUSD_xgboost_20251024_184401_df8733f8 in registry
2025-10-24 18:44:01,768 - src.model_persistence - INFO - Saved model ETHUSD_xgboost_20251024_184401_df8733f8 for ETHUSD (xgboost)


    xgboost: ETHUSD_xgboost_20251024_184401_df8733f8 (accuracy: 0.511)

Training models for GOLD...
  Training samples: 132


2025-10-24 18:44:04,093 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:44:04,095 - src.model_persistence - INFO - Registered model GOLD_logistic_20251024_184404_df8733f8 in registry
2025-10-24 18:44:04,096 - src.model_persistence - INFO - Saved model GOLD_logistic_20251024_184404_df8733f8 for GOLD (logistic)


  âœ“ Trained 2 models
  Features used: 674
    logistic: GOLD_logistic_20251024_184404_df8733f8 (accuracy: 0.992)


2025-10-24 18:44:04,242 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:44:04,244 - src.model_persistence - INFO - Registered model GOLD_xgboost_20251024_184404_df8733f8 in registry
2025-10-24 18:44:04,246 - src.model_persistence - INFO - Saved model GOLD_xgboost_20251024_184404_df8733f8 for GOLD (xgboost)


    xgboost: GOLD_xgboost_20251024_184404_df8733f8 (accuracy: 0.530)

Training models for SPY...
  Training samples: 131


2025-10-24 18:44:05,885 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:44:05,886 - src.model_persistence - INFO - Registered model SPY_logistic_20251024_184405_df8733f8 in registry
2025-10-24 18:44:05,887 - src.model_persistence - INFO - Saved model SPY_logistic_20251024_184405_df8733f8 for SPY (logistic)


  âœ“ Trained 2 models
  Features used: 674
    logistic: SPY_logistic_20251024_184405_df8733f8 (accuracy: 0.992)


2025-10-24 18:44:06,034 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:44:06,037 - src.model_persistence - INFO - Registered model SPY_xgboost_20251024_184405_df8733f8 in registry
2025-10-24 18:44:06,039 - src.model_persistence - INFO - Saved model SPY_xgboost_20251024_184405_df8733f8 for SPY (xgboost)


    xgboost: SPY_xgboost_20251024_184405_df8733f8 (accuracy: 0.580)

Training models for TNOTE...
  Training samples: 132


2025-10-24 18:44:07,565 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:44:07,567 - src.model_persistence - INFO - Registered model TNOTE_logistic_20251024_184407_df8733f8 in registry
2025-10-24 18:44:07,568 - src.model_persistence - INFO - Saved model TNOTE_logistic_20251024_184407_df8733f8 for TNOTE (logistic)


  âœ“ Trained 2 models
  Features used: 674
    logistic: TNOTE_logistic_20251024_184407_df8733f8 (accuracy: 1.000)


2025-10-24 18:44:07,694 - src.model_persistence - INFO - Successfully saved model registry to results\models\registry.json
2025-10-24 18:44:07,697 - src.model_persistence - INFO - Registered model TNOTE_xgboost_20251024_184407_df8733f8 in registry
2025-10-24 18:44:07,700 - src.model_persistence - INFO - Saved model TNOTE_xgboost_20251024_184407_df8733f8 for TNOTE (xgboost)


    xgboost: TNOTE_xgboost_20251024_184407_df8733f8 (accuracy: 0.591)

âœ“ Model training completed!
  Total models trained: 24
  Registry: results/models/registry.json


## Step 6: Backtesting

Run expanding window backtesting with transaction costs.

**Methodology:**
- Expanding window: Train on years 1-2, test on year 3
- Transaction costs: 1bp for FX, 2bp for futures
- No look-ahead bias
- 40+ performance metrics computed

In [7]:
from src.performance_metrics import PerformanceAnalyzer

print("=" * 80)
print("STEP 6: BACKTESTING")
print("=" * 80)

# Initialize performance analyzer
perf_analyzer = PerformanceAnalyzer()

# Run backtests for each asset
backtest_results = {}

for asset_name, asset_data in aligned_data.items():
    print(f"\nBacktesting {asset_name}...")
    
    # Get models AND feature_cols for this asset (CRITICAL FIX)
    models = trained_models[asset_name]['models']
    scalers = trained_models[asset_name]['scalers']
    feature_cols = trained_models[asset_name]['feature_cols']  # FIXED: Extract feature_cols
    
    # Set transaction cost based on asset type
    if asset_name in ['EURUSD', 'USDJPY', 'GBPUSD']:
        transaction_cost = 0.0001  # 1 basis point for FX
    else:
        transaction_cost = 0.0002  # 2 basis points for futures
    
    print(f"  Transaction cost: {transaction_cost*10000:.1f} bps")
    
    asset_results = {}
    
    for model_name, model in models.items():
        # Get scaler if available
        scaler = scalers.get(model_name, None)
        
        # Generate signals - FIXED: Pass feature_cols parameter
        signals = trainer.generate_signals(model, asset_data, scaler=scaler, feature_cols=feature_cols)
        
        # Compute returns
        strategy_returns = trainer.compute_returns(
            signals=signals,
            data=asset_data,
            transaction_cost=transaction_cost
        )
        
        # Compute metrics
        metrics = perf_analyzer.compute_comprehensive_metrics(strategy_returns)
        
        asset_results[model_name] = {
            'signals': signals,
            'returns': strategy_returns,
            'metrics': metrics
        }
        
        print(f"\n  {model_name.upper()}:")
        print(f"    Sharpe Ratio: {metrics.get('sharpe_ratio', 0):.2f}")
        print(f"    Total Return: {metrics.get('total_return', 0)*100:.2f}%")
        print(f"    Max Drawdown: {metrics.get('max_drawdown', 0)*100:.2f}%")
        print(f"    Win Rate: {metrics.get('win_rate', 0)*100:.1f}%")
        print(f"    Trades: {metrics.get('trades', 0)}")
    
    backtest_results[asset_name] = asset_results

print(f"\nBacktesting completed!")
print(f"  Assets backtested: {len(backtest_results)}")

STEP 6: BACKTESTING

Backtesting EURUSD...
  Transaction cost: 1.0 bps

  LOGISTIC:
    Sharpe Ratio: -0.76
    Total Return: -2.69%
    Max Drawdown: -6.18%
    Win Rate: 51.1%
    Trades: 0

  XGBOOST:
    Sharpe Ratio: -0.33
    Total Return: -0.71%
    Max Drawdown: -6.67%
    Win Rate: 48.1%
    Trades: 0

Backtesting USDJPY...
  Transaction cost: 1.0 bps

  LOGISTIC:
    Sharpe Ratio: 0.13
    Total Return: 1.49%
    Max Drawdown: -9.63%
    Win Rate: 45.9%
    Trades: 0

  XGBOOST:
    Sharpe Ratio: 1.11
    Total Return: 7.52%
    Max Drawdown: -5.78%
    Win Rate: 51.9%
    Trades: 0

Backtesting GBPUSD...
  Transaction cost: 1.0 bps

  LOGISTIC:
    Sharpe Ratio: -1.38
    Total Return: -4.49%
    Max Drawdown: -5.57%
    Win Rate: 45.1%
    Trades: 0

  XGBOOST:
    Sharpe Ratio: -2.21
    Total Return: -7.59%
    Max Drawdown: -9.71%
    Win Rate: 42.9%
    Trades: 0

Backtesting AUDUSD...
  Transaction cost: 2.0 bps

  LOGISTIC:
    Sharpe Ratio: -0.09
    Total Return: 0.

## Step 7: Signal Generation

Generate current trading signals using trained models.

**Output:**
- Signal: SELL (-1), HOLD (0), BUY (+1)
- Confidence: Probability from model
- Timestamp: Generation time

In [8]:
print("=" * 80)
print("STEP 7: SIGNAL GENERATION")
print("=" * 80)

# Generate signals for each asset
current_signals = {}

for asset_name, asset_data in aligned_data.items():
    # Get latest data point
    latest_data = asset_data.tail(1)
    
    # Get models AND feature_cols (CRITICAL FIX)
    models = trained_models[asset_name]['models']
    scalers = trained_models[asset_name]['scalers']
    feature_cols = trained_models[asset_name]['feature_cols']  # FIXED: Extract feature_cols
    
    asset_signals = {}
    
    for model_name, model in models.items():
        scaler = scalers.get(model_name, None)
        
        # Generate signal - FIXED: Pass feature_cols parameter
        signal = trainer.generate_signals(model, latest_data, scaler=scaler, feature_cols=feature_cols)
        
        # Get confidence (probability) - FIXED: Use feature_cols for consistency
        X = latest_data[feature_cols].copy()
        X = X.ffill().bfill().fillna(0)
        X = X.replace([np.inf, -np.inf], [1e10, -1e10])
        X = X.clip(-1e10, 1e10)
        if scaler is not None:
            X = scaler.transform(X)
        
        probabilities = model.predict_proba(X)[0]
        confidence = float(probabilities.max())
        
        signal_value = int(signal.iloc[0])
        signal_meaning = {-1: "SELL", 0: "HOLD", 1: "BUY"}.get(signal_value, "UNKNOWN")
        
        asset_signals[model_name] = {
            'signal': signal_value,
            'meaning': signal_meaning,
            'confidence': confidence,
            'probabilities': {
                'class_0': float(probabilities[0]),
                'class_1': float(probabilities[1])
            },
            'timestamp': datetime.now().isoformat()
        }
    
    current_signals[asset_name] = asset_signals

# Display signals
print("\nCURRENT TRADING SIGNALS")
print("=" * 80)
print(f"Generated at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("=" * 80)

for asset_name, asset_signals in current_signals.items():
    print(f"\n{asset_name}:")
    print("-" * 40)
    for model_name, signal_info in asset_signals.items():
        signal = signal_info['meaning']
        confidence = signal_info['confidence'] * 100
        print(f"  {model_name.capitalize():10s}: {signal:4s} ({signal_info['signal']:+2d}) | Confidence: {confidence:5.1f}%")

# Save signals to JSON
import json
signals_path = RESULTS_DIR / "production_signals.json"
with open(signals_path, 'w') as f:
    json.dump({
        'timestamp': datetime.now().isoformat(),
        'signals': current_signals,
        'metadata': {
            'training_period': f"{start_date} to {end_date}",
            'assets': list(current_signals.keys()),
            'models': list(next(iter(current_signals.values())).keys())
        }
    }, f, indent=2)

print(f"\nSignals saved to: {signals_path}")

STEP 7: SIGNAL GENERATION

CURRENT TRADING SIGNALS
Generated at: 2025-10-24 18:44:11

EURUSD:
----------------------------------------
  Logistic  : BUY  (+1) | Confidence:  96.1%
  Xgboost   : SELL (-1) | Confidence:  50.5%

USDJPY:
----------------------------------------
  Logistic  : SELL (-1) | Confidence:  99.1%
  Xgboost   : BUY  (+1) | Confidence:  53.1%

GBPUSD:
----------------------------------------
  Logistic  : BUY  (+1) | Confidence:  98.7%
  Xgboost   : SELL (-1) | Confidence:  51.8%

AUDUSD:
----------------------------------------
  Logistic  : BUY  (+1) | Confidence:  59.2%
  Xgboost   : BUY  (+1) | Confidence:  59.7%

USDCHF:
----------------------------------------
  Logistic  : SELL (-1) | Confidence:  76.0%
  Xgboost   : SELL (-1) | Confidence:  51.3%

USDCAD:
----------------------------------------
  Logistic  : BUY  (+1) | Confidence:  94.4%
  Xgboost   : BUY  (+1) | Confidence:  50.8%

NZDUSD:
----------------------------------------
  Logistic  : SELL (-1) |

## Step 8: Model Management

Demonstrate model registry and management capabilities.

**Features:**
- List all trained models
- Filter by asset, model type, performance
- Load specific models by ID
- Compare model performance

In [9]:
print("=" * 80)
print("STEP 8: MODEL MANAGEMENT")
print("=" * 80)

# List all models in registry
print("\nALL TRAINED MODELS:")
print("-" * 80)

all_models = persistence.list_models()
print(f"Total models: {len(all_models)}\n")

for i, model_info in enumerate(all_models, 1):
    print(f"{i}. {model_info['model_id']}")
    print(f"   Asset: {model_info['asset']}")
    print(f"   Type: {model_info['model_type']}")
    print(f"   Features: {model_info['feature_count']}")
    print(f"   Accuracy: {model_info['metrics'].get('accuracy', 0):.3f}")
    print(f"   Training date: {model_info['training_date']}")
    print()

# Filter models by asset
print("\nMODELS BY ASSET:")
print("-" * 80)

for asset_name in aligned_data.keys():
    asset_models = persistence.list_models(asset=asset_name)
    print(f"{asset_name}: {len(asset_models)} models")
    for model_info in asset_models:
        print(f"  - {model_info['model_type']}: {model_info['model_id'][:30]}...")

# Demonstrate loading a specific model
print("\n\nLOADING SPECIFIC MODEL:")
print("-" * 80)

if all_models:
    example_model_id = all_models[0]['model_id']
    print(f"Loading model: {example_model_id}")
    
    loaded_model, loaded_scaler, loaded_features, loaded_metadata = persistence.load_model(example_model_id)
    
    print(f"âœ“ Model loaded successfully!")
    print(f"  Model type: {type(loaded_model).__name__}")
    print(f"  Features: {len(loaded_features)}")
    print(f"  Has scaler: {loaded_scaler is not None}")
    print(f"  Metadata keys: {list(loaded_metadata.keys())}")

print("\nâœ“ Model management demonstration completed!")

STEP 8: MODEL MANAGEMENT

ALL TRAINED MODELS:
--------------------------------------------------------------------------------
Total models: 240

1. TNOTE_xgboost_20251024_184407_df8733f8
   Asset: TNOTE
   Type: xgboost
   Features: 674
   Accuracy: 0.591
   Training date: 2025-10-24T18:44:07.620388

2. TNOTE_logistic_20251024_184407_df8733f8
   Asset: TNOTE
   Type: logistic
   Features: 674
   Accuracy: 1.000
   Training date: 2025-10-24T18:44:07.501410

3. SPY_xgboost_20251024_184405_df8733f8
   Asset: SPY
   Type: xgboost
   Features: 674
   Accuracy: 0.580
   Training date: 2025-10-24T18:44:05.958756

4. SPY_logistic_20251024_184405_df8733f8
   Asset: SPY
   Type: logistic
   Features: 674
   Accuracy: 0.992
   Training date: 2025-10-24T18:44:05.822629

5. GOLD_xgboost_20251024_184404_df8733f8
   Asset: GOLD
   Type: xgboost
   Features: 674
   Accuracy: 0.530
   Training date: 2025-10-24T18:44:04.154434

6. GOLD_logistic_20251024_184404_df8733f8
   Asset: GOLD
   Type: logistic


2025-10-24 18:44:12,096 - src.model_persistence - INFO - Loaded model TNOTE_xgboost_20251024_184407_df8733f8 (xgboost)


âœ“ Model loaded successfully!
  Model type: XGBClassifier
  Features: 674
  Has scaler: False
  Metadata keys: ['asset', 'model_type', 'training_date', 'feature_count', 'metrics', 'version', 'training_params', 'model_path', 'scaler_path', 'feature_path', 'registered_at']

âœ“ Model management demonstration completed!


## Results Summary

Comprehensive summary of the entire pipeline execution.

In [10]:
print("\n" + "=" * 80)
print(" " * 20 + "PIPELINE RESULTS SUMMARY")
print("=" * 80)
print(f"Generated at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("=" * 80)

# Data Collection
print("\n1. DATA COLLECTION:")
print(f"   Date Range: {start_date} to {end_date}")
print(f"   Total Events: {len(events_df):,}")
print(f"   Headlines Processed: {events_df['headline'].notna().sum():,}")

# Sentiment Analysis
print("\n2. SENTIMENT ANALYSIS:")
print(f"   Daily Features: {len(daily_features)} days Ã— {len(daily_features.columns)} features")
print(f"   Mean Polarity: {events_with_sentiment['polarity'].mean():.3f}")

# Market Data
print("\n3. MARKET DATA:")
print(f"   Assets: {', '.join(market_data.keys())}")
for asset_name, asset_data in market_data.items():
    print(f"   {asset_name}: {len(asset_data)} days, {len(asset_data.columns)} columns")

# Feature Alignment
print("\n4. FEATURE ALIGNMENT:")
print(f"   Assets Aligned: {len(aligned_data)}")
for asset_name, asset_data in aligned_data.items():
    features_count = len([c for c in asset_data.columns if c not in ['target', 'returns', 'date']])
    print(f"   {asset_name}: {asset_data.shape[0]} rows Ã— {features_count} features")

# Model Training
print("\n5. MODEL TRAINING:")
total_models = sum(len(m['models']) for m in trained_models.values())
print(f"   Total Models: {total_models}")
for asset_name, models_info in trained_models.items():
    print(f"   {asset_name}: {len(models_info['models'])} models, {len(models_info['feature_cols'])} features")

# Backtesting
print("\n6. BACKTESTING RESULTS:")
for asset_name, asset_results in backtest_results.items():
    print(f"\n   {asset_name}:")
    for model_name, results in asset_results.items():
        metrics = results['metrics']
        print(f"     {model_name.upper()}:")
        print(f"       Sharpe: {metrics.get('sharpe_ratio', 0):.2f}")
        print(f"       Return: {metrics.get('total_return', 0)*100:+.2f}%")
        print(f"       Max DD: {metrics.get('max_drawdown', 0)*100:.2f}%")
        print(f"       Win Rate: {metrics.get('win_rate', 0)*100:.1f}%")
        print(f"       Trades: {metrics.get('trades', 0)}")

# Signal Generation
print("\n7. CURRENT SIGNALS:")
for asset_name, asset_signals in current_signals.items():
    print(f"   {asset_name}:")
    for model_name, signal_info in asset_signals.items():
        print(f"     {model_name}: {signal_info['meaning']} (confidence: {signal_info['confidence']*100:.1f}%)")

# Files Generated
print("\n8. OUTPUT FILES:")
print(f"   Results directory: {RESULTS_DIR}")
print(f"   - events_data.parquet")
print(f"   - daily_features.parquet")
print(f"   - aligned_data_*.parquet ({len(aligned_data)} files)")
print(f"   - models/ ({len(all_models)} models)")
print(f"   - production_signals.json")

print("\n" + "=" * 80)
print(" " * 25 + "PIPELINE COMPLETED!")
print("=" * 80)


                    PIPELINE RESULTS SUMMARY
Generated at: 2025-10-24 18:44:12

1. DATA COLLECTION:
   Date Range: 2025-04-01 to 2025-10-23
   Total Events: 18,900
   Headlines Processed: 18,900

2. SENTIMENT ANALYSIS:
   Daily Features: 189 days Ã— 36 features
   Mean Polarity: 0.458

3. MARKET DATA:
   Assets: EURUSD, USDJPY, GBPUSD, AUDUSD, USDCHF, USDCAD, NZDUSD, BTCUSD, ETHUSD, GOLD, SPY, TNOTE
   EURUSD: 145 days, 388 columns
   USDJPY: 145 days, 388 columns
   GBPUSD: 145 days, 388 columns
   AUDUSD: 145 days, 388 columns
   USDCHF: 145 days, 388 columns
   USDCAD: 145 days, 388 columns
   NZDUSD: 145 days, 388 columns
   BTCUSD: 205 days, 388 columns
   ETHUSD: 205 days, 388 columns
   GOLD: 143 days, 388 columns
   SPY: 142 days, 388 columns
   TNOTE: 143 days, 388 columns

4. FEATURE ALIGNMENT:
   Assets Aligned: 12
   EURUSD: 133 rows Ã— 679 features
   USDJPY: 133 rows Ã— 679 features
   GBPUSD: 133 rows Ã— 679 features
   AUDUSD: 133 rows Ã— 679 features
   USDCHF: 133 ro