You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Phase 4j scout (PR #119, merged 2026-05-19 at SHA f0ade65b) shipped the pyqlib install + Alpha158 158-feature manifest + 6 offline tests + access-path investigation. This issue tracks the full integration PR that consumes the scout's foundation and wires Qlib's Alpha158 per-stock per-date features into the QuantRank composite layer.
The scout intentionally deferred all production wiring per PR #119 §"Out of scope". This issue is that deferred work.
Access-path foundation (locked by scout)
The integration PR builds on the scout's compute/ingest/qlib_features.py module docstring, which records the 2026-05-19 verification:
✅ pyqlib 0.9.7 installs cleanly via [factors] extra
✅ MIT license (no commercial complication unlike JKP's CC BY-NC 4.0)
native — Alpha158 emits 158 features per (stock, date)
YES — yfinance OHLCV → Qlib .bin adapter
This means the OSAP/JKP integration pattern does NOT apply here. There's no factor-returns CSV to proxy/regress against; Alpha158 produces per-stock features directly. The integration PR's shape is therefore substantially different from PR #112 (Phase 4h full integration) and the eventual Phase 4i full integration (#115).
Scope IN (5 deferred work items from PR #119 §"Out of scope")
(a) yfinance-to-Qlib BYO adapter — THE precondition
This is the largest single piece of work and the precondition for everything else. Qlib publishes no public US data bundle; the integration PR must convert QuantRank's existing compute/cache/prices/*.parquet files (yfinance OHLCV) into Qlib's .bin format that qlib.init(provider_uri=...) can read.
Approximate steps:
Custom universe registration — Qlib expects an "instruments" file listing stock symbols + active date ranges. Build sp500.txt from the universe-resolution step that already runs in compute/main.py.
Per-ticker .bin conversion — for each of 502 tickers, walk the cached yfinance parquet → write OPEN / HIGH / LOW / CLOSE / VWAP / VOLUME columns to .bin files at QLIB_DATA_CACHE/features/<ticker>/.
Calendar file — Qlib expects a calendars/day.txt listing all trading dates in the universe.
dump_bin scaffolding — pyqlib's PyPI wheel does NOT bundle scripts/dump_bin.py (verified during scout). Either vendor a minimal port (~50 LOC) into compute/ingest/qlib_features.py or pin pyqlib to a release that includes it, OR write our own .bin writer using qlib's low-level binary format spec.
Refresh cadence — Qlib bundle stays valid for 31 days per config.QLIB_DATA_MAX_AGE_DAYS; weekly cron re-converts only when stale.
Effort: ~150 LOC + ~80 LOC tests. ~2 days.
(b) Full Alpha158 feature compute on 502-ticker universe
Once the BYO adapter is live:
init_qlib(QLIB_DATA_CACHE) — single init per weekly compute run (qlib.init is global state)
Decide whether to materialize for the full S&P 500 universe in production OR sample (e.g., latest single date only — ~502 × 158 = 79 316 floats, lightweight)
Effort: ~80 LOC + tests. ~1 day.
(c) Per-feature cross-validation framework
PBO/DSR (used by Phase 4h OSAP for long-short factor returns) does NOT directly apply to Alpha158's per-stock per-date features. Bailey 2014 PBO is rank-based across strategy time-series; Alpha158 emits cross-sectional features per date, not strategy returns. The integration PR needs a different validation surface.
Likely replacement: walk-forward IC scoring per feature. For each of the 158 features, compute Spearman rank correlation between (feature value at t) vs (forward 1-month or 3-month return) over a rolling window. Surface accepted features (IC > 0.02 absolute mean over the rolling window, per defense-infrastructure/PLAN.md:121) into composite blending; reject the rest.
Walk-forward CV is the canonical version, but per defense-infrastructure/PLAN.md:270: the full walk-forward + purged + embargoed CV is the Phase 5 backtest infra stronger version. Phase 4j integration ships with simpler rolling-12m IC validation as a stopgap.
frontend/lib/schema-snapshot.json — regenerate via python -m compute.output.schema_check --update-snapshot
Effort: ~50 LOC + ~30 LOC tests. ~0.5 day.
(e) compute/main.py wiring decision
Open question — defer to integration-PR planning time: how does Alpha158 feed the composite?
Three plausible patterns:
Observability-only — write per-feature IC + per-stock features into metadata + StockDetail but do NOT blend into composite. Rule 16 trivially holds. Lowest risk.
Recommended default: (1) observability-only for the integration PR. Defer (2)/(3) to Phase 5+ once IC evidence accumulates from production diagnostics.
Effort: ~50 LOC + tests. ~0.5 day.
Top-5 rotation impact analysis
Rule 16 lock applies as for all prior factor libraries: Top-5 ranking stays on raw composite_score. qlib_blended_score (if wired in §(e)) is informational only. Confirm entered_top5 / exited_top5 distributions are unchanged when qlib_blended_score is excluded from ranking.
Effort: ~30 LOC test + spot-check. ~0.5 day.
Triggers (open implementation PR when EITHER fires)
Phase 5 backtest infra lands (.claude/skills/phase-5/backtest-infrastructure/PLAN.md) — provides the canonical walk-forward + purged + embargoed CV that replaces this issue's rolling-12m IC stopgap. Recommended trigger.
Analyst / user feedback indicates Alpha158 features are needed for an in-flight analysis use case (forces the BYO-adapter + integration timeline ahead of Phase 5).
Phase 4k scout (IPCA) — the final factor scout, ships separately
Phase 5 backtest infra — strongly preferred trigger for this 4j.1 (walk-forward CV supersedes the rolling-12m IC stopgap)
Tag v1.1.0-phase4 is gated on all 4 factor library scouts (4h ✅ + 4i ✅ + 4j ✅ + 4k pending) and their respective integration PRs (4h ✅ + 4h.2 ✅ + 4i.1 + 4j.1 + 4k.1) all merging. This issue is the gating item for 4j specifically.
Qlib's built-in model training (LightGBM, MLP, etc.) — Alpha158 is feature engineering only; ML model training is Phase 5 ML meta-learner work
Qlib's portfolio optimization (cvxpy-based) — out of scope; QuantRank's Top-5 selection is Rule 16 composite-based
scripts/dump_bin.py upstream contribution — if we end up vendoring a port, consider upstreaming to microsoft/qlib as a follow-up community contribution (out of scope for QuantRank itself)
Context
Phase 4j scout (PR #119, merged 2026-05-19 at SHA
f0ade65b) shipped thepyqlibinstall + Alpha158 158-feature manifest + 6 offline tests + access-path investigation. This issue tracks the full integration PR that consumes the scout's foundation and wires Qlib's Alpha158 per-stock per-date features into the QuantRank composite layer.The scout intentionally deferred all production wiring per PR #119 §"Out of scope". This issue is that deferred work.
Access-path foundation (locked by scout)
The integration PR builds on the scout's
compute/ingest/qlib_features.pymodule docstring, which records the 2026-05-19 verification:pyqlib0.9.7 installs cleanly via[factors]extraqlib.contrib.data.handler.Alpha158exposes 158-feature surfaceALPHA158_FEATURE_NAMESmanifest hardcoded; drift detector test catches upstream changesqlib_features.py(notqlib.py) avoids Python namespace collision with installedqlibpackageprovider_uricovers CN A-share only; US universe is BYO via local.binfiles[minimal]extra —pip install pyqlibpulls ~150-180 MB of heavy transitives (mlflow / lightgbm / cvxpy / pymongo / redis / gym / jupyter)Qlib's distinct shape vs OSAP/JKP
Critical reminder from the scout's PR body — Qlib is structurally different from prior factor libraries:
This means the OSAP/JKP integration pattern does NOT apply here. There's no factor-returns CSV to proxy/regress against; Alpha158 produces per-stock features directly. The integration PR's shape is therefore substantially different from PR #112 (Phase 4h full integration) and the eventual Phase 4i full integration (#115).
Scope IN (5 deferred work items from PR #119 §"Out of scope")
(a) yfinance-to-Qlib BYO adapter — THE precondition
This is the largest single piece of work and the precondition for everything else. Qlib publishes no public US data bundle; the integration PR must convert QuantRank's existing
compute/cache/prices/*.parquetfiles (yfinance OHLCV) into Qlib's.binformat thatqlib.init(provider_uri=...)can read.Approximate steps:
sp500.txtfrom the universe-resolution step that already runs incompute/main.py..binconversion — for each of 502 tickers, walk the cached yfinance parquet → write OPEN / HIGH / LOW / CLOSE / VWAP / VOLUME columns to.binfiles atQLIB_DATA_CACHE/features/<ticker>/.calendars/day.txtlisting all trading dates in the universe.pyqlib's PyPI wheel does NOT bundlescripts/dump_bin.py(verified during scout). Either vendor a minimal port (~50 LOC) intocompute/ingest/qlib_features.pyor pinpyqlibto a release that includes it, OR write our own .bin writer usingqlib's low-level binary format spec.config.QLIB_DATA_MAX_AGE_DAYS; weekly cron re-converts only when stale.Effort: ~150 LOC + ~80 LOC tests. ~2 days.
(b) Full Alpha158 feature compute on 502-ticker universe
Once the BYO adapter is live:
init_qlib(QLIB_DATA_CACHE)— single init per weekly compute run (qlib.initis global state)fetch_alpha158_features(instruments="sp500", start_time=..., end_time=...)— returns the 502 × N_dates × 158 DataFrameEffort: ~80 LOC + tests. ~1 day.
(c) Per-feature cross-validation framework
PBO/DSR (used by Phase 4h OSAP for long-short factor returns) does NOT directly apply to Alpha158's per-stock per-date features. Bailey 2014 PBO is rank-based across strategy time-series; Alpha158 emits cross-sectional features per date, not strategy returns. The integration PR needs a different validation surface.
Likely replacement: walk-forward IC scoring per feature. For each of the 158 features, compute Spearman rank correlation between (feature value at t) vs (forward 1-month or 3-month return) over a rolling window. Surface accepted features (IC > 0.02 absolute mean over the rolling window, per
defense-infrastructure/PLAN.md:121) into composite blending; reject the rest.Walk-forward CV is the canonical version, but per
defense-infrastructure/PLAN.md:270: the full walk-forward + purged + embargoed CV is the Phase 5 backtest infra stronger version. Phase 4j integration ships with simpler rolling-12m IC validation as a stopgap.Effort: ~120 LOC + tests. ~2 days.
(d) Schema additions
Schema triple lockstep edit (per
AGENTS.md:229-231):compute/output/schemas.py— add toStockDetail:qlib_features: dict[str, float] | None = None— per-stock subset of accepted Alpha158 features (curated cross-section atas_ofdate)qlib_blended_score: float | None = None— optional blended score IF feature blending into composite is wired (see §(e) below)compute/output/schemas.py— add toMetadata:qlib_features_used: list[str] | None = None— Alpha158 features that passed IC gateqlib_features_excluded: list[str] | None = None— features rejected by IC gateqlib_features_ic_12m: dict[str, float] | None = None— per-feature rolling-12m IC (observability)qlib_features_coverage_pct: dict[str, float] | None = None— per-feature S&P 500 coverage %compute/config.py:30—SCHEMA_VERSION:0.9.1-phase4h.2→0.10.0-phase4j(MINOR bump — new phase boundary)frontend/lib/types.ts— mirror Pydantic additionsfrontend/lib/schema-snapshot.json— regenerate viapython -m compute.output.schema_check --update-snapshotEffort: ~50 LOC + ~30 LOC tests. ~0.5 day.
(e)
compute/main.pywiring decisionOpen question — defer to integration-PR planning time: how does Alpha158 feed the composite?
Three plausible patterns:
compute_composite()(preservesPHASE3_WEIGHTSsum-to-1.0 invariant atcompute/scoring/composite.py:43-45). Mirror PR feat(phase-4h): OSAP integration — foundation + replicate + blend + PBO/DSR gate #112 pattern.phase-5/meta-label/PLAN.md) but NOT the Phase 4 composite. Waits for Phase 5 backtest infra (PR 4b: defense-infrastructure (cross-source validator + PBO/DSR gate + IC-decay monitor) #75).Recommended default: (1) observability-only for the integration PR. Defer (2)/(3) to Phase 5+ once IC evidence accumulates from production diagnostics.
Effort: ~50 LOC + tests. ~0.5 day.
Top-5 rotation impact analysis
Rule 16 lock applies as for all prior factor libraries: Top-5 ranking stays on raw
composite_score.qlib_blended_score(if wired in §(e)) is informational only. Confirmentered_top5/exited_top5distributions are unchanged whenqlib_blended_scoreis excluded from ranking.Effort: ~30 LOC test + spot-check. ~0.5 day.
Triggers (open implementation PR when EITHER fires)
Phase 5 backtest infra lands (
.claude/skills/phase-5/backtest-infrastructure/PLAN.md) — provides the canonical walk-forward + purged + embargoed CV that replaces this issue's rolling-12m IC stopgap. Recommended trigger.Analyst / user feedback indicates Alpha158 features are needed for an in-flight analysis use case (forces the BYO-adapter + integration timeline ahead of Phase 5).
Effort estimate
compute/main.pywiring (default = observability-only)Slightly larger than Phase 4h's ~1160 LOC because:
Sequencing relative to other Phase 4+ tracks
2125aea8Tag
v1.1.0-phase4is gated on all 4 factor library scouts (4h ✅ + 4i ✅ + 4j ✅ + 4k pending) and their respective integration PRs (4h ✅ + 4h.2 ✅ + 4i.1 + 4j.1 + 4k.1) all merging. This issue is the gating item for 4j specifically.Out of scope for this issue
scripts/dump_bin.pyupstream contribution — if we end up vendoring a port, consider upstreaming tomicrosoft/qlibas a follow-up community contribution (out of scope for QuantRank itself)pip install pyqlibbut unused by QuantRank. Upstream[minimal]extra would help; for now we accept the ~150-180 MB install footprint per PR feat(ingest): Qlib scout — pyqlib MIT install + Alpha158 handler smoke + 158-feature manifest #119 disclosureRelated
fbd1acf4)2125aea8) — observability follow-up0.9.1-phase4h.2(current) →0.10.0-phase4j(this issue's target).claude/skills/phase-4/qlib-alpha158-fit/PLAN.md(if exists; else create stub during integration-PR planning).claude/skills/phase-5/backtest-infrastructure/PLAN.md🤖 Filed by Claude Code via the Anthropic SDK after PR #119 (Phase 4j Qlib scout) shipped at
f0ade65b.Generated by Claude Code