# QuantConnect Native Selection Playbook

This notebook catalogs how we can leverage QuantConnect's first-party datasets to curate a shortlist of crypto assets. The focus is on identifying the raw data feeds that provide the highest signal-to-noise ratio, then outlining the indicator stacks and composite measures we will prototype in the primary `selection_playbook.ipynb`.

> Scope: native QC crypto + cross-asset datasets only; external/on-chain integrations remain staged in the existing selection notebook.


## QuantConnect Native Data Inventory

- **Crypto Spot History (`AddCrypto`)**  
  Minute/second/tick OHLCV across major venues (Binance/US, Coinbase, Kraken, Bitfinex, Bybit). Used for price/volume analytics, momentum, volatility, and liquidity screens.
- **Crypto Universe Fundamentals (`CoarseFundamental`)**  
  Market-cap, dollar volume, price filters refreshed daily; supports liquidity-driven universe gating.
- **Equity/Index Futures & Options**  
  CME BTC/ETH futures, SPX/Nasdaq proxies, and option chains with greeks. Enables cross-asset lead/lag and macro-regime conditioning.
- **Alternative News & Sentiment Feeds**  
  Benzinga, Tiingo, Brain Language Metrics, Quiver Social. Provide point-in-time sentiment scores and headline novelty for BTC/ETH and related proxies.
- **Macro & Econ Datasets (FRED/EODHD)**  
  Interest rates, inflation prints, DXY, liquidity gauges used for regime and correlation studies.
- **CoinGecko Market Cap Dataset**  
  Daily market-cap ranks, dominance, turnover ratios; key for tiering and survivorship-aware benchmarking.
- **Lean Execution Telemetry (paper/live logs)**  
  Native fill/slippage logs for evaluating tradability and venue risk when turning selection outputs into trades.


## Indicator & Composite Measure Blueprint

### Liquidity & Tradability
- **10d Median Dollar Volume + Venue Coverage Score**  
  Ensures we only trade assets with sufficient depth across multiple exchanges. Pulls from QC history + coarse fundamentals; composite score weights multi-venue fill logs once live.
- **Relative Volume Acceleration / Percentile**  
  Detect early liquidity surges via short-vs-long rolling averages and rolling percentile ranks; strong signals often precede momentum breakouts.
- **Spread & Slippage Proxy (tick-based)**  
  Uses high-frequency bid/ask data (where available) to estimate expected impact. Combined with Lean execution telemetry to flag assets requiring throttled sizing.

### Momentum & Regime Filters
- **Multi-horizon ROC Stack (24h / 72h / 168h)**  
  Captures breakout persistence. Assets promoted when all horizons are aligned and volatility-adjusted Sharpe exceeds a threshold.
- **ATR / Realized Vol De-trended Momentum**  
  Normalizes momentum by current volatility to avoid chasing assets in blow-off tops; discourages selections where volatility surge outpaces returns.
- **Price/Volume Ratio Drift**  
  Flags assets where price momentum outpaces volume momentum (or vice versa) to differentiate accumulation from exhaustion.
- **Cross-Asset Beta Residual**  
  Regress coin returns vs. CME futures + macro factors (DXY, SPX). Prioritize assets with positive residual momentum (idiosyncratic strength).

### Sentiment & News Flow
- **Headline Polarity Velocity (Benzinga/Tiingo)**  
  Track rolling z-score of sentiment changes; qualification trigger when news momentum confirms price move.
- **Brain Language Novelty × Price Reaction**  
  Flag assets with high novelty scores that historically precede sustained moves; avoids stale headlines.
- **Quiver Social Dominance Divergence**  
  Detect crowd attention surges relative to price—useful for early entries or fade setups depending on momentum alignment.

### Structural & Fundamental Filters
- **CoinGecko Market Cap Rank Delta**  
  Measure how quickly an asset climbs tiers; cross-reference with liquidity to ensure promote/demote decisions keep capacity in check.
- **Volume-Weighted Venue Reliability Score**  
  Rate exchanges by uptime/slippage using Lean logs, then discount assets heavily concentrated on unreliable venues.
- **Regime Overlay (Macro Trigger Grid)**  
  Combine rate trend, DXY, and BTC dominance to toggle between risk-on vs. defensive selection templates.

### Composite Selector
- **Selection Score = (Momentum × Confirmation) + (Liquidity × Reliability) + Sentiment Modifier**  
  Base stack multiplies normalized momentum by a confirmation factor (news + social). Liquidity/reliability acts as a gating multiplier, while sentiment contributes positive/negative nudges bounded to avoid overpowering fundamentals. Final score determines shortlist tiers (Core, Explore, Watchlist).


In [2]:
from datetime import datetime, timedelta
from pathlib import Path

import pandas as pd

try:
    from QuantConnect import Resolution  # type: ignore[import]
    from QuantConnect.Research import QuantBook  # type: ignore[import]
except ImportError:
    raise RuntimeError("Run inside a QuantConnect Research environment.")

from research.scripts import qc_native_features as qc_feat

qb = QuantBook()
SYMBOLS = [qb.AddCrypto(ticker, Resolution.Hour).Symbol for ticker in ["BTCUSD", "ETHUSD", "SOLUSD"]]

LOOKBACK = timedelta(days=60)
FEATURE_CACHE = Path("data/native_features")
FEATURE_CACHE.mkdir(parents=True, exist_ok=True)

history = qb.History(SYMBOLS, LOOKBACK, Resolution.Hour)
if history.empty:
    raise ValueError("History request returned empty frame. Check permissions/timeframe.")

closes = history.close.unstack(level=0).tz_localize(None)
highs = history.high.unstack(level=0).tz_localize(None)
lows = history.low.unstack(level=0).tz_localize(None)
volumes = history.volume.unstack(level=0).tz_localize(None)

feature_snapshots = {}
for symbol in SYMBOLS:
    key = symbol.Value
    price = closes[key].dropna()
    high = highs[key].reindex(price.index)
    low = lows[key].reindex(price.index)
    volume = volumes[key].reindex(price.index)

    features = pd.concat(
        [
            qc_feat.multi_horizon_roc(price, windows=(24, 72, 168)),
            qc_feat.normalized_momentum(price, window=24, vol_window=72),
            qc_feat.atr_percent(high, low, price),
            qc_feat.liquidity_metrics(price, volume, lookback=10),
            qc_feat.relative_volume(volume),
            qc_feat.volume_percentile(volume),
            qc_feat.price_volume_ratio(price, volume),
        ],
        axis=1,
    ).dropna()
    feature_snapshots[key] = features.tail()
    features.to_parquet(FEATURE_CACHE / f"{key.lower()}_native_features.parquet")

feature_snapshots


{'BTCUSD':                       roc_24h   roc_72h  roc_168h  normalized_mom_24_72  \
 time                                                                      
 2025-11-16 01:00:00  0.004996 -0.061972 -0.061082              0.225482   
 2025-11-16 02:00:00 -0.000576 -0.063175 -0.063401             -0.025854   
 2025-11-16 03:00:00 -0.008726 -0.063733 -0.059713             -0.391476   
 2025-11-16 04:00:00 -0.008314 -0.064259 -0.059986             -0.372566   
 2025-11-16 05:00:00 -0.005238 -0.064104 -0.060503             -0.234143   
 
                      atr_percent  liquidity_score_10  
 time                                                  
 2025-11-16 01:00:00     0.005615        1.473324e+09  
 2025-11-16 02:00:00     0.005250        1.505353e+09  
 2025-11-16 03:00:00     0.004852        1.285757e+09  
 2025-11-16 04:00:00     0.004643        1.273466e+09  
 2025-11-16 05:00:00     0.004475        6.232825e+08  ,
 'ETHUSD':                       roc_24h   roc_72h  roc_168h  n

## Next Steps

1. Wire these indicators as reusable loaders/functions in `research/scripts/` (e.g., `qc_native_features.py`).
2. Mirror the selection score template inside `selection_playbook.ipynb` and connect it to the new data pipelines.
3. Backtest candidate portfolios (Core / Explore / Watchlist) against QC historical data to validate hit rates and drawdown profiles.
4. Promote successful composites into Lean universes and signal-generation notebooks.
