# Master Pipeline Runner for Colab

This notebook runs the entire ML research pipeline on Colab.

## üöÄ Quick Start (Running via Cursor ‚Üí Colab Connection)

### Step 1: Connect Cursor to Colab
1. Press `Ctrl+Shift+P` (or `Cmd+Shift+P` on Mac)
2. Type: `Colab: Connect to Colab`
3. Authenticate with Google account
4. Select a Colab runtime

### Step 2: Select Colab Kernel

**If kernel selector doesn't work, try these:**

**Method A: Command Palette**
1. Press `Ctrl+Shift+P`
2. Type: `Notebook: Select Notebook Kernel`
3. Select "Colab" or "Google Colab" from list

**Method B: Status Bar**
1. Click kernel indicator (bottom-right status bar)
2. Select "Select Another Kernel"
3. Choose Colab runtime

**Method C: Verify Connection First**
1. `Ctrl+Shift+P` ‚Üí `Colab: List Runtimes` (to see if connected)
2. If not connected: `Colab: Connect to Colab`
3. Then try selecting kernel again

**Alternative:** If Cursor connection doesn't work, just run this notebook directly in Colab web interface (see troubleshooting guide)

### Step 3: Add API Keys to Colab Secrets
**Must be done via Colab web UI:**
1. Open https://colab.research.google.com in browser
2. Click üîë icon ‚Üí Secrets tab
3. Add keys (see cell below for list)

### Step 4: Upload Project
Upload `ml_research_pipeline` folder to Colab (via web UI or see instructions)

### Step 5: Run All Cells
- Press `Shift+Enter` through each cell
- Or right-click ‚Üí "Run All"

**Key Names (matching keys.env):**
- FINNHUB_API_KEY, NEWS_API_KEY, TIINGO_API_KEY (required)
- See cell below for complete list


## üîç Quick Diagnostic: Check Colab Connection

Run this cell first to verify Colab is accessible.


In [None]:
# [STEP 1] Diagnostic: Check if running on Colab
import sys

print("="*60)
print("COLAB CONNECTION DIAGNOSTIC")
print("="*60)

try:
    import google.colab
    print("‚úì SUCCESS: Running on Google Colab!")
    print(f"‚úì Python executable: {sys.executable}")
    print(f"‚úì Colab module location: {google.colab.__file__}")
    ON_COLAB = True
except ImportError:
    print("‚úó NOT running on Colab")
    print(f"  Python executable: {sys.executable}")
    print("\nTo connect:")
    print("1. Press Ctrl+Shift+P")
    print("2. Type: 'Colab: Connect to Colab'")
    print("3. Authenticate and select runtime")
    print("4. Then select Colab kernel (top-right or via Command Palette)")
    ON_COLAB = False

print("\n" + "="*60)
if ON_COLAB:
    print("‚úì Ready to proceed - Colab is connected!")
else:
    print("‚ö† Need to connect to Colab first")
print("="*60)


## üì§ [STEP 2] Upload Project Files

**EASIEST: Upload directly in Cell 5 below - no manual folder upload needed!**

**RUN CELL 5** and it will prompt you to upload the zip file. Everything else is automatic.


In [None]:
# [STEP 2] AUTO-SETUP: Create project structure directly in Colab
# This will build the entire project in Colab - no manual upload needed!

import os
import sys
from pathlib import Path
import json

print("="*70)
print("AUTO-SETUP: Creating Project Structure in Colab")
print("="*70)

# Create project directory
project_dir = Path('/content/ml_research_pipeline')
project_dir.mkdir(parents=True, exist_ok=True)
os.chdir(project_dir)

# Create directory structure
dirs = [
    'src/data', 'src/features', 'src/models', 
    'src/ensemble', 'src/backtest', 'src/utils',
    'data/raw', 'data/processed',
    'models/specialists', 'models/meta',
    'results', 'artifacts', 'tests', 'notebooks'
]

for dir_path in dirs:
    (project_dir / dir_path).mkdir(parents=True, exist_ok=True)

print("‚úì Created directory structure")

# Create __init__.py files
init_files = [
    'src/__init__.py',
    'src/data/__init__.py',
    'src/features/__init__.py',
    'src/models/__init__.py',
    'src/ensemble/__init__.py',
    'src/backtest/__init__.py',
    'src/utils/__init__.py',
    'tests/__init__.py'
]

for init_file in init_files:
    (project_dir / init_file).touch()
    (project_dir / init_file).write_text('"""Module initialization."""\n')

print("‚úì Created __init__.py files")

# Now we'll download the source files from a repository or create them
# Since we can't upload directly, we'll use git clone or create files inline

print("\n" + "="*70)
print("DOWNLOADING SOURCE CODE")
print("="*70)
print("Option 1: If you have the project in a Git repository, uncomment below:")
print("  !git clone <your-repo-url> /content/ml_research_pipeline")
print("\nOption 2: We'll create the source files directly (see next cell)")
print("="*70)


## ‚ö†Ô∏è IMPORTANT: Cell 5 Upload Issue

**If you get "Upload widget is only available" error:**

This happens because the upload widget only works in Colab web UI, not from Cursor.

**Solution:** Run Cell 5 in Colab web browser:
1. Open https://colab.research.google.com
2. Upload this notebook there
3. Run Cell 5 in the browser (it will show file picker)
4. Then continue in Cursor or Colab

**OR use Method 3 below (manual upload via file browser)**


In [None]:
# [STEP 3 ALTERNATIVE] Manual Upload - Extract if zip already uploaded
# Use this if you manually uploaded ml_research_pipeline.zip via Colab file browser

import zipfile
from pathlib import Path

print("="*70)
print("EXTRACTING UPLOADED FILES")
print("="*70)

# Check for zip file
zip_path = Path('/content/ml_research_pipeline.zip')
project_dir = Path('/content/ml_research_pipeline')

if zip_path.exists():
    print(f"‚úì Found zip file: {zip_path}")
    print("Extracting...")
    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        zip_ref.extractall('/content/')
    zip_path.unlink()  # Remove zip after extraction
    print("‚úì Extracted successfully!")
elif (project_dir / 'src').exists():
    print("‚úì Project files already exist - no extraction needed")
else:
    print("‚ö† Zip file not found at /content/ml_research_pipeline.zip")
    print("\nTo upload manually:")
    print("1. In Colab web UI, click folder icon (üìÅ) in left sidebar")
    print("2. Click 'Upload' button")
    print("3. Upload ml_research_pipeline.zip")
    print("4. Re-run this cell")

# Verify
if (project_dir / 'src').exists():
    print(f"\n‚úì Project files ready at: {project_dir}")
    print(f"  Contents: {list((project_dir / 'src').iterdir())}")
else:
    print(f"\n‚ö† Project structure not found")
    print(f"  Current contents: {list(Path('/content').iterdir())}")


In [None]:
# [STEP 3] UPLOAD PROJECT FILES
# ‚ö†Ô∏è IMPORTANT: This cell has 3 options - choose ONE that works for you

import zipfile
from pathlib import Path
import os

print("="*70)
print("UPLOAD PROJECT FILES - CHOOSE ONE METHOD")
print("="*70)

project_dir = Path('/content/ml_research_pipeline')

# Check if files already exist
if (project_dir / 'src').exists():
    print("‚úì Project files already exist!")
    print(f"  Location: {project_dir}")
    print(f"  Contents: {list((project_dir / 'src').iterdir())}")
    print("\n‚úì Skipping upload - files already present")
else:
    print("\n‚ö† Project files not found. Choose one method below:\n")
    
    # METHOD 1: Use Colab's file upload (must run in Colab web UI)
    print("="*70)
    print("METHOD 1: Upload via Colab Web UI (Recommended)")
    print("="*70)
    print("1. Open this notebook in Colab web UI: https://colab.research.google.com")
    print("2. Run THIS CELL (Cell 5) in the Colab web browser")
    print("3. Click 'Choose Files' when prompted")
    print("4. Select ml_research_pipeline.zip")
    print("5. Wait for upload and extraction")
    print("\nUncomment the code below to use this method:")
    print("""
    # Uncomment these lines and run in Colab web UI:
    # from google.colab import files
    # uploaded = files.upload()
    # for filename in uploaded.keys():
    #     if filename.endswith('.zip'):
    #         with zipfile.ZipFile(filename, 'r') as zip_ref:
    #             zip_ref.extractall('/content/')
    #         os.remove(filename)
    """)
    
    # METHOD 2: Use Google Drive
    print("\n" + "="*70)
    print("METHOD 2: Upload via Google Drive")
    print("="*70)
    print("1. Upload ml_research_pipeline.zip to your Google Drive")
    print("2. Uncomment and run the code below:")
    print("""
    # Uncomment to use Google Drive:
    # from google.colab import drive
    # drive.mount('/content/drive')
    # # Copy from drive (adjust path to where you uploaded the zip)
    # !cp /content/drive/MyDrive/ml_research_pipeline.zip /content/
    # !unzip /content/ml_research_pipeline.zip -d /content/
    # !rm /content/ml_research_pipeline.zip
    """)
    
    # METHOD 3: Manual upload via Colab file browser
    print("\n" + "="*70)
    print("METHOD 3: Manual Upload via Colab File Browser")
    print("="*70)
    print("1. In Colab web UI, click folder icon (üìÅ) in left sidebar")
    print("2. Click 'Upload' button")
    print("3. Upload ml_research_pipeline.zip")
    print("4. Then run this code to extract:")
    print("""
    # Run this after manually uploading zip to /content/:
    zip_path = Path('/content/ml_research_pipeline.zip')
    if zip_path.exists():
        print("Found zip file, extracting...")
        with zipfile.ZipFile(zip_path, 'r') as zip_ref:
            zip_ref.extractall('/content/')
        zip_path.unlink()  # Remove zip
        print("‚úì Extracted successfully!")
    else:
        print("‚ö† Zip file not found at /content/ml_research_pipeline.zip")
        print("  Make sure you uploaded it via the file browser")
    """)
    
    print("\n" + "="*70)
    print("QUICK FIX: Run this cell in Colab web UI, not from Cursor!")
    print("="*70)
    print("\nAfter uploading files, re-run this cell to verify.")


In [None]:
# Step 1: Install Dependencies
%pip install -q pandas numpy scikit-learn xgboost lightgbm yfinance requests python-dotenv tqdm joblib transformers torch sentencepiece
print("‚úì Dependencies installed")


In [None]:
# Step 2: Setup Environment and Keys
import os
import sys
from pathlib import Path

# Detect Colab
try:
    import google.colab
    ON_COLAB = True
    print("‚úì Running on Google Colab")
except ImportError:
    ON_COLAB = False
    print("‚ö† Not on Colab")

# Set project directory
if ON_COLAB:
    project_dir = Path('/content/ml_research_pipeline')
    if not project_dir.exists():
        project_dir = Path('/content')
    os.chdir(project_dir)
else:
    project_dir = Path().absolute()

# Setup API keys from Colab secrets - ALL keys from keys.env
if ON_COLAB:
    from google.colab import userdata
    
    # All keys from keys.env (matching the actual key names)
    keys_from_env = {
        'TIINGO_API_KEY': 'b815ff7c64c1a7370b9ae8c0b8907673fdb5eb5f',
        'FINAGE_API_KEY': 'API_KEY6aZPLW0IIOEOAZFW1IMW46CC8WIMRP23',
        'NEWS_API_KEY': '9ff201f1e68b4544ab5d358a261f1742',
        'FINNHUB_API_KEY': 'd28ndhhr01qmp5u9g65gd28ndhhr01qmp5u9g660',
        'FINNHUB2_API_KEY': 'd38b891r01qlbdj4nnlgd38b891r01qlbdj4nnm0',
        'POLYGON_API_KEY': 'xVilYBLLH5At9uE3r6CIMrusXxWwxp0G',
        'TWELVEDATA_API_KEY': '77c34e29fa104ee9bd7834c3b476b824',
        'QUANDL_API_KEY': 'fN3R5X9VPSaeqFC6R2hF',
        'GROQ_API_KEY': '<GROQ_API_KEY>',
        'FRED_API_KEY': '3c86f2f10c5e2b13454447d184ddb268',
        'SEC_API_KEY': '0cb9c45a821668958bab90d73e70bc26b28b68ffeb83065da0495d0b7db2c138',
        'OPENROUTER_KEY': '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',
    }
    
    print("Loading keys from Colab secrets (matching keys.env names)...")
    keys_loaded = 0
    for key, default_value in keys_from_env.items():
        try:
            value = userdata.get(key)
            os.environ[key] = value
            print(f"  ‚úì {key}")
            keys_loaded += 1
        except:
            # Not in secrets, use default from keys.env
            os.environ[key] = default_value
            print(f"  ‚ö† {key} not in secrets, using keys.env value")
    
    print(f"\n‚úì Loaded {keys_loaded}/{len(keys_from_env)} keys from Colab secrets")
    print("  ‚Üí Add missing keys via Colab UI: üîë ‚Üí Secrets ‚Üí Add new secret")
else:
    # Local: try loading from keys.env
    from dotenv import load_dotenv
    env_file = project_dir / 'keys.env'
    if env_file.exists():
        load_dotenv(env_file)
        print("‚úì Loaded keys from keys.env")
    else:
        print("‚ö† keys.env not found")

# Add src to path
sys.path.insert(0, str(project_dir / 'src'))

print(f"\nProject directory: {project_dir}")
print(f"Working directory: {os.getcwd()}")
print(f"\nKey status:")
print(f"  FINNHUB_API_KEY: {'‚úì' if os.getenv('FINNHUB_API_KEY') else '‚úó'}")
print(f"  NEWS_API_KEY: {'‚úì' if os.getenv('NEWS_API_KEY') else '‚úó'}")
print(f"  TIINGO_API_KEY: {'‚úì' if os.getenv('TIINGO_API_KEY') else '‚úó'}")


In [None]:
# [STEP 6.5] Fix __init__.py files - Add proper imports
# This ensures all modules can be imported correctly

from pathlib import Path
import os

project_dir = Path('/content/ml_research_pipeline')
src_dir = project_dir / 'src'

print("="*70)
print("FIXING __init__.py FILES")
print("="*70)

# Fix data/__init__.py
data_init = src_dir / 'data' / '__init__.py'
if data_init.exists():
    data_init.write_text('''"""Data fetching modules."""
from .price_fetcher import PriceFetcher
from .news_fetcher import NewsFetcher
__all__ = ["PriceFetcher", "NewsFetcher"]
''')
    print("‚úì Fixed data/__init__.py")

# Fix features/__init__.py
features_init = src_dir / 'features' / '__init__.py'
if features_init.exists():
    features_init.write_text('''"""Feature engineering modules."""
from .feature_builder import FeatureBuilder
__all__ = ["FeatureBuilder"]
''')
    print("‚úì Fixed features/__init__.py")

# Fix models/__init__.py
models_init = src_dir / 'models' / '__init__.py'
if models_init.exists():
    models_init.write_text('''"""Model modules."""
from .base_model import BaseModel, ModelSignal
from .xgboost_model import XGBoostModel
from .lightgbm_model import LightGBMModel
from .sentiment_model import SentimentModel
from .rule_based_model import RuleBasedModel
__all__ = ["BaseModel", "ModelSignal", "XGBoostModel", "LightGBMModel", "SentimentModel", "RuleBasedModel"]
''')
    print("‚úì Fixed models/__init__.py")

# Fix ensemble/__init__.py
ensemble_init = src_dir / 'ensemble' / '__init__.py'
if ensemble_init.exists():
    ensemble_init.write_text('''"""Ensemble modules."""
from .meta_ensemble import MetaEnsemble
__all__ = ["MetaEnsemble"]
''')
    print("‚úì Fixed ensemble/__init__.py")

# Fix backtest/__init__.py
backtest_init = src_dir / 'backtest' / '__init__.py'
if backtest_init.exists():
    backtest_init.write_text('''"""Backtesting modules."""
from .walkforward_backtest import WalkForwardBacktest
__all__ = ["WalkForwardBacktest"]
''')
    print("‚úì Fixed backtest/__init__.py")

# Verify source files exist
print("\n" + "="*70)
print("VERIFYING SOURCE FILES")
print("="*70)

required_files = [
    'src/data/price_fetcher.py',
    'src/data/news_fetcher.py',
    'src/features/feature_builder.py',
    'src/models/base_model.py',
    'src/models/xgboost_model.py',
    'src/models/lightgbm_model.py',
    'src/models/sentiment_model.py',
    'src/models/rule_based_model.py',
    'src/ensemble/meta_ensemble.py',
    'src/backtest/walkforward_backtest.py',
    'src/utils/config.py',
    'src/utils/helpers.py',
]

missing = []
for file_path in required_files:
    full_path = project_dir / file_path
    if full_path.exists():
        print(f"  ‚úì {file_path}")
    else:
        print(f"  ‚úó {file_path} - MISSING!")
        missing.append(file_path)

if missing:
    print(f"\n‚ö† {len(missing)} files are missing!")
    print("  ‚Üí Make sure you uploaded the complete project folder")
    print("  ‚Üí Check that ml_research_pipeline.zip was fully extracted")
else:
    print(f"\n‚úì All {len(required_files)} required files found!")

print("\n" + "="*70)


## Optional: Add All Keys to Colab Secrets

Run this cell to see which keys need to be added to Colab secrets.


In [None]:
# Check and add keys to Colab secrets
# This will show you which keys are missing and need to be added via UI

try:
    from google.colab import userdata
    
    # All keys from keys.env
    keys_from_env = {
        "TIINGO_API_KEY": "b815ff7c64c1a7370b9ae8c0b8907673fdb5eb5f",
        "FINAGE_API_KEY": "API_KEY6aZPLW0IIOEOAZFW1IMW46CC8WIMRP23",
        "NEWS_API_KEY": "9ff201f1e68b4544ab5d358a261f1742",
        "FINNHUB_API_KEY": "d28ndhhr01qmp5u9g65gd28ndhhr01qmp5u9g660",
        "FINNHUB2_API_KEY": "d38b891r01qlbdj4nnlgd38b891r01qlbdj4nnm0",
        "POLYGON_API_KEY": "xVilYBLLH5At9uE3r6CIMrusXxWwxp0G",
        "TWELVEDATA_API_KEY": "77c34e29fa104ee9bd7834c3b476b824",
        "QUANDL_API_KEY": "fN3R5X9VPSaeqFC6R2hF",
        "GROQ_API_KEY": "<GROQ_API_KEY>",
        "FRED_API_KEY": "3c86f2f10c5e2b13454447d184ddb268",
        "SEC_API_KEY": "0cb9c45a821668958bab90d73e70bc26b28b68ffeb83065da0495d0b7db2c138",
        "OPENROUTER_KEY": "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",
    }
    
    print("="*60)
    print("COLAB SECRETS STATUS")
    print("="*60)
    
    existing = []
    missing = []
    
    for key, value in keys_from_env.items():
        try:
            userdata.get(key)
            existing.append(key)
            print(f"‚úì {key} - Already in secrets")
        except:
            missing.append((key, value))
            print(f"‚úó {key} - Missing from secrets")
    
    if missing:
        print("\n" + "="*60)
        print("ADD THESE KEYS TO COLAB SECRETS")
        print("="*60)
        print("\n1. Click üîë icon (left sidebar)")
        print("2. Go to 'Secrets' tab")
        print("3. Click 'Add new secret' for each:\n")
        
        for key, value in missing:
            print(f"Key Name: {key}")
            print(f"Value: {value}")
            print()
        
        print("After adding, re-run Step 2 cell to load them.")
    else:
        print("\n‚úì All keys are in Colab secrets!")
        
except ImportError:
    print("Not running on Colab - keys will be loaded from keys.env file")


In [None]:
# Step 3: Setup Project Structure
dirs = ['data/raw', 'data/processed', 'models/specialists', 'models/meta', 'results', 'artifacts']
for dir_path in dirs:
    (project_dir / dir_path).mkdir(parents=True, exist_ok=True)

print("‚úì Project structure ready")


## Step 4: Run Pipeline - Data Fetching


In [None]:
# Import and run data fetching
from data import PriceFetcher, NewsFetcher
import pandas as pd
from utils.config import PROCESSED_DATA_DIR

# Configuration
TICKER = "AAPL"
START_DATE = "2020-01-01"
END_DATE = "2023-12-31"
INDEX_SYMBOL = "^GSPC"

print(f"Fetching data for {TICKER}...")

# Fetch prices
price_fetcher = PriceFetcher()
stock_prices = price_fetcher.fetch(TICKER, START_DATE, END_DATE)
print(f"‚úì Fetched {len(stock_prices)} days of price data")

# Fetch index
index_prices = price_fetcher.fetch_index(INDEX_SYMBOL, START_DATE, END_DATE)
print(f"‚úì Fetched {len(index_prices)} days of index data")

# Fetch news
news_fetcher = NewsFetcher()
news_data = news_fetcher.fetch_all(TICKER, START_DATE, END_DATE)
print(f"‚úì Fetched {len(news_data)} news articles")

# Save
stock_prices.to_csv(PROCESSED_DATA_DIR / f"{TICKER}_prices.csv")
index_prices.to_csv(PROCESSED_DATA_DIR / f"{INDEX_SYMBOL.replace('^', '')}_prices.csv")
if not news_data.empty:
    news_data.to_csv(PROCESSED_DATA_DIR / f"{TICKER}_news.csv", index=False)

print("\n‚úì Data fetching complete")


## Step 5: Feature Engineering


In [None]:
from features import FeatureBuilder
from utils.helpers import save_artifact

# Load data
stock_prices = pd.read_csv(PROCESSED_DATA_DIR / f"{TICKER}_prices.csv", index_col=0, parse_dates=True)
index_prices = pd.read_csv(PROCESSED_DATA_DIR / f"{INDEX_SYMBOL.replace('^', '')}_prices.csv", index_col=0, parse_dates=True)

news_file = PROCESSED_DATA_DIR / f"{TICKER}_news.csv"
if news_file.exists():
    news_data = pd.read_csv(news_file, parse_dates=['date'])
else:
    news_data = pd.DataFrame()

# Build features
builder = FeatureBuilder()
features = builder.build_all_features(stock_prices, index_prices, news_data)

print(f"‚úì Built {len(features.columns)} features, {len(features)} samples")

# Save
features.to_csv(PROCESSED_DATA_DIR / f"{TICKER}_features.csv")
save_artifact(builder.feature_metadata, PROCESSED_DATA_DIR / f"{TICKER}_feature_metadata.pkl")

print("‚úì Feature engineering complete")


## Step 6: Train Price Models (XGBoost + LightGBM)


In [None]:
# [STEP 9] Train Price Models
from models import XGBoostModel, LightGBMModel
from utils.config import SPECIALIST_MODELS_DIR

# Load features
features = pd.read_csv(PROCESSED_DATA_DIR / f"{TICKER}_features.csv", index_col=0, parse_dates=True)

feature_cols = [col for col in features.columns 
                if col not in ['target_return_1d', 'target_direction', 'open', 'high', 'low', 'close', 'volume']]
X = features[feature_cols].fillna(0)
y = features['target_direction']
mask = y != 0
X = X[mask]
y = y[mask]

print(f"Training on {len(X)} samples")

# Train XGBoost
xgb = XGBoostModel(min_confidence=0.6)
xgb_metrics = xgb.train(X, y)
print(f"‚úì XGBoost: {xgb_metrics}")
xgb.save(SPECIALIST_MODELS_DIR / f"{TICKER}_xgb.model")

# Train LightGBM
lgb = LightGBMModel(min_confidence=0.6)
lgb_metrics = lgb.train(X, y)
print(f"‚úì LightGBM: {lgb_metrics}")
lgb.save(SPECIALIST_MODELS_DIR / f"{TICKER}_lgb.txt")

print("‚úì Price models trained")


## Step 7: Train Sentiment Models


In [None]:
# [STEP 10] Train Sentiment Models
from models import SentimentModel, RuleBasedModel

# Train Sentiment
sentiment = SentimentModel(min_confidence=0.6, use_pretrained=False)
sentiment_metrics = sentiment.train(X, y)
print(f"‚úì Sentiment: {sentiment_metrics}")
sentiment.save(SPECIALIST_MODELS_DIR / f"{TICKER}_sentiment.pkl")

# Train Rule-based
rule = RuleBasedModel(min_confidence=0.6)
rule_metrics = rule.train(X, y)
print(f"‚úì Rule-based: {rule_metrics}")
rule.save(SPECIALIST_MODELS_DIR / f"{TICKER}_rule.pkl")

print("‚úì Sentiment models trained")


## Step 8: Train Meta-Ensemble


In [None]:
# [STEP 11] Train Meta-Ensemble
from ensemble import MetaEnsemble
from utils.config import META_MODEL_DIR

# Load all specialists
xgb = XGBoostModel()
xgb.load(SPECIALIST_MODELS_DIR / f"{TICKER}_xgb.model")

lgb = LightGBMModel()
lgb.load(SPECIALIST_MODELS_DIR / f"{TICKER}_lgb.txt")

sentiment = SentimentModel()
sentiment.load(SPECIALIST_MODELS_DIR / f"{TICKER}_sentiment.pkl")

rule = RuleBasedModel()
rule.load(SPECIALIST_MODELS_DIR / f"{TICKER}_rule.pkl")

specialists = [xgb, lgb, sentiment, rule]

# Extract market features
vol_cols = [col for col in X.columns if 'volatility' in col.lower()]
news_cols = [col for col in X.columns if 'news' in col.lower()]

market_features = pd.DataFrame(index=X.index)
if vol_cols:
    market_features['volatility'] = X[vol_cols[0]]
if news_cols:
    market_features['news_intensity'] = X[news_cols[0]]
market_features = market_features.fillna(0)

# Train ensemble
ensemble = MetaEnsemble(specialists, min_confidence=0.6)
meta_metrics = ensemble.train(X, y, market_features)
print(f"‚úì Meta-ensemble: {meta_metrics}")
ensemble.save(META_MODEL_DIR / f"{TICKER}_meta_ensemble.pkl")

print("‚úì Meta-ensemble trained")


## Step 9: Walk-Forward Backtest


In [None]:
# [STEP 12] Walk-Forward Backtest
from backtest import WalkForwardBacktest
from utils.config import RESULTS_DIR

# Load data
features = pd.read_csv(PROCESSED_DATA_DIR / f"{TICKER}_features.csv", index_col=0, parse_dates=True)
prices = pd.read_csv(PROCESSED_DATA_DIR / f"{TICKER}_prices.csv", index_col=0, parse_dates=True)

feature_cols = [col for col in features.columns 
                if col not in ['target_return_1d', 'target_direction', 'open', 'high', 'low', 'close', 'volume']]
X = features[feature_cols].fillna(0)
y = features['target_return_1d']

# Load ensemble
ensemble = MetaEnsemble(specialists)
ensemble.load(META_MODEL_DIR / f"{TICKER}_meta_ensemble.pkl")

# Run backtest
backtest = WalkForwardBacktest(ensemble, train_window_days=252, test_window_days=21)
results = backtest.run(X, prices, y)

print("\n" + "="*60)
print("BACKTEST RESULTS")
print("="*60)
for key, value in results['metrics'].items():
    print(f"{key}: {value:.4f}")

# Save
backtest.save_results(RESULTS_DIR / f"{TICKER}_backtest_results.pkl")
results['predictions'].to_csv(RESULTS_DIR / f"{TICKER}_predictions.csv")
results['actuals'].to_csv(RESULTS_DIR / f"{TICKER}_actuals.csv")

print("\n‚úì Backtest complete - results saved to results/")
