feat(ingest): Qlib scout — pyqlib MIT install + Alpha158 handler smoke + 158-feature manifest#119
Merged
Merged
Conversation
…e + 158-feature manifest Phase 4j scout PR. Mirrors the proven Phase 4i scout pattern (PR #114) for Microsoft Qlib's Alpha158 feature library. Scope is install + API surface + manifest verification ONLY; the yfinance-to-Qlib BYO adapter + full Alpha158 feature compute on the 502-ticker universe ships in a follow-on integration PR. **Pre-plan access-path discovery** (verified 2026-05-19; full record in ``compute/ingest/qlib_features.py`` module docstring): 1. **PyPI package**: ``pyqlib`` 0.9.7 (also 0.9.6 available). Other candidate names (``qlib``, ``microsoft-qlib``) return 404. 2. **License**: MIT (verified via wheel METADATA inspection — ``Classifier: License :: OSI Approved :: MIT License``). **No CC BY-NC complication** like JKP. Safe for Phase 6+ commercial roadmap. 3. **Data init**: ``qlib.init(provider_uri=..., region=REG_US)`` where ``REG_US = "us"``. **NO public US data bundle published by Qlib** — the ``provider_uri`` defaults to ``~/.qlib/qlib_data/cn_data`` (Chinese A-share, irrelevant for QuantRank); the US universe is BYO via local ``.bin`` files. 4. **Alpha158 surface**: ``qlib.contrib.data.handler.Alpha158`` → ``handler.fetch(col_set="feature")`` returns a DataFrame with ``(datetime, instrument)`` MultiIndex × 158 feature columns. The 158-name manifest is fetched via ``Alpha158DL.get_feature_config()[1]`` — captured at scout time and hardcoded for stability; offline test 3 below locks it against upstream drift. **Module** (``compute/ingest/qlib_features.py``, 186 LOC including docstring): - Module-name choice locked per architectural review: NOT ``compute/ingest/qlib.py``. Python's import resolution would treat the latter as the ``qlib`` package and shadow the actual installed PyPI package, breaking the entire integration. Distinct module name avoids the namespace collision. - ``QLIB_INSTRUMENTS_UNIVERSE = "sp500"`` — custom universe ID; integration PR registers this against Qlib's instruments API. - ``ALPHA158_FEATURE_NAMES: tuple[str, ...]`` — 158-name manifest hardcoded from ``Alpha158DL.get_feature_config()[1]`` at scout implementation time against pyqlib 0.9.7. Cardinality asserted at module load against ``config.ALPHA158_FEATURE_COUNT``. - ``init_qlib(provider_uri=None)`` — idempotent thin wrapper around ``qlib.init(provider_uri=..., region="us")``. Local import so the scout module loads even when ``[factors]`` extra isn't installed. - ``fetch_alpha158_features(*, instruments, start_time, end_time)`` — forward-compat wrapper around ``Alpha158(...).fetch(col_set= "feature")``. NOT exercised end-to-end by the scout (see §"No ``@network`` test" below). **Config** (``compute/config.py``, +23 LOC): new ``# --- Phase 4j scout: Microsoft Qlib (Alpha158) integration ---`` block adds: - ``QLIB_DATA_CACHE: Path = CACHE_DIR / "qlib" / "us_data"`` (gitignored — ``compute/cache/`` parent glob at .gitignore:221 covers it). - ``QLIB_DATA_MAX_AGE_DAYS: int = 31`` (BYO bundle, monthly refresh). - ``ALPHA158_FEATURE_COUNT: int = 158``. **pyproject.toml**: ``[factors]`` extra extended with ``pyqlib>=0.9.7,<0.10``. The ``<0.10`` cap pins against Qlib 0.10+ which may drift the feature set; offline test 3 will catch any drift on a deliberate version bump. **Tests** (``tests/test_ingest/test_qlib_features.py``, 113 LOC, 6 offline — NO ``@network``): 1. ``test_alpha158_feature_manifest_has_158_entries`` — primary CI signal. Pure cardinality + uniqueness check; survives even when the ``[factors]`` extra isn't installed. 2. ``test_alpha158_feature_manifest_first_5_anchor`` — anchors the K-bar leading features (``KMID, KLEN, KMID2, KUP, KUP2``) against the canonical Qlib v0.9.7 surface. 3. ``test_alpha158_feature_manifest_matches_runtime_introspection`` — hardcoded tuple must equal ``Alpha158DL.get_feature_config() [1]``. Wrapped in ``pytest.importorskip("qlib")``. The drift detector. 4. ``test_qlib_data_cache_constant_under_repo_cache_dir`` — config sanity + locks gitignore coverage via the ``compute/cache/`` parent glob. 5. ``test_init_qlib_passes_us_region_and_provider_uri`` — monkeypatch capture; asserts ``region="us"`` + provided ``provider_uri`` are passed through. 6. ``test_init_qlib_defaults_to_config_cache_when_no_uri`` — default ``provider_uri`` resolves to ``config.QLIB_DATA_CACHE``. **Critical scope decision — NO ``@network`` test for this scout**: Phase 4h scout (PR #110) and Phase 4i scout (PR #114) each had a ``@pytest.mark.network`` test that hit a remote CDN. **Qlib has no remote CDN** — its data flow is local-bin filesystem I/O, not download-from-network. The originally planned synthetic-OHLCV → ``.bin`` conversion → ``init_qlib`` → ``Alpha158.fetch`` smoke test was DROPPED post-investigation: pyqlib's PyPI wheel does NOT bundle the ``scripts/dump_bin.py`` utility needed for OHLCV → ``.bin`` conversion. That scaffolding is integration-PR scope. Test #3 (runtime introspection match) is the **replacement verification surface** — actually a stronger drift detector than the dropped end-to-end test would have been, because it asserts the hardcoded manifest matches upstream on every ``pip install``. **CI install footprint impact**: ~150-180 MB net-new. ``pyqlib`` pulls ~22 transitive deps including ``mlflow`` (~20 MB), ``lightgbm`` (~15 MB), ``cvxpy`` (~30 MB), ``pymongo``, ``redis`` client, ``gym``, ``jupyter``, ``nbconvert``. None of these heavy deps are actually consumed by the scout — they come along for the ride because pyqlib doesn't expose a ``[minimal]`` extra. CI cold- start latency bump is one-time per workflow; pip wheel caching mitigates subsequent runs. **Tenacity policy NOT applied**: Qlib's data flow is local filesystem I/O. No network retry semantics needed. This is the first ingest module in QuantRank that diverges from the canonical ``compute/ingest/osap.py:52-56`` retry decorator (documented explicitly in the module docstring). **Verification ladder** (steps 1-5 complete): - ``ruff check .`` → clean ✅ - ``pytest tests/ -m "not network"`` → **930 passed** (924 baseline + 6 new offline) ✅ - ``pytest -m network --run-network`` → 20 (unchanged; NO new ``@network``) ✅ - ``python -m compute.output.schema_check`` → in-sync (NO schema delta this scout) ✅ - ``python -c "from compute.ingest.qlib_features import init_qlib, fetch_alpha158_features, ALPHA158_FEATURE_NAMES; print('OK', len(ALPHA158_FEATURE_NAMES))"`` → ``OK 158`` ✅ Steps 6-8: ``git push`` → open Draft PR → ``subscribe_pr_activity`` + STOP for user audit + Mark-Ready authorization. **Ask-first surfaces touched**: NONE for the workflow / schema triple. ``pyproject.toml [factors]`` extra extended in this commit (authorized in advance via the plan-mode approval). ``.github/workflows/ci.yml`` unchanged (``[dev,factors]`` install already covers the new pyqlib dep). ``.github/workflows/compute-rankings.yml`` UNTOUCHED per user hard constraint. **Defense layer**: unchanged at 17. **Top-5 rotation**: unchanged. **Schema version**: unchanged at ``0.9.1-phase4h.2`` (no schema delta this scout). **Out of scope** (deferred to follow-on full Phase 4j integration PR, ~5-commit cluster like Phase 4h): - yfinance-to-Qlib BYO adapter (~150 LOC; ``compute/cache/prices/ *.parquet`` → Qlib ``.bin`` format conversion) - Full Alpha158 feature compute on 502-ticker universe (502 × N_dates × 158 DataFrame) - Per-feature cross-validation framework (PBO/DSR doesn't directly apply to per-stock-per-date features — walk-forward IC scoring per feature is the likely replacement) - Schema additions (``StockDetail.qlib_features`` + ``Metadata.qlib_features_used`` + IC observability) → bump ``0.9.1-phase4h.2 → 0.10.0-phase4j`` - ``compute/main.py`` wiring decision (observability-only? blended into composite? Phase-5 ML-meta-learner-only consumer?) - Top-5 rotation impact analysis (Rule 16 lock applies) https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
dackclup
added a commit
that referenced
this pull request
May 19, 2026
…all + InstrumentedPCA 8-method API surface lock + 6 offline tests (#121) Phase 4k scout — **FINAL of 4 factor-library scouts** (OSAP ✅ #110, JKP ✅ #114, Qlib ✅ #119, IPCA THIS). Ships `ipca` install + 8-method public-API surface lock + 6 offline tests + inline synthetic fixture. NO production wiring; characteristics-matrix construction + universe-wide IPCA fit + composite blend decision are integration-PR scope (Phase 4k.1, tracked as follow-up). **After this merges → all 4 factor scouts done → eligible for v1.1.0-phase4 tag readiness audit** (gated on 4h.2 Part 2 + 4i.1 + 4j.1 + 4k.1 integration PRs landing — ~6-8w combined effort). 5 pre-plan investigations (verified 2026-05-19, carried verbatim into module docstring): 1. PyPI package: `ipca` 0.6.7 (29 historical versions; last release 2021-04-22 — ~5 years stale). Pin tight `>=0.6.7,<0.7`. 2. License: MIT verbatim from LICENSE.md (Buechner / Bybee 2019). No CC BY-NC complication unlike JKP. Safe for Phase 6+ commercial roadmap. 3. sklearn-compatible API surface — 8 public methods: fit / get_factors / fit_path / predict / predict_panel / predict_portfolio / score / predictOOS. Post-fit attrs: Gamma (L×K) + Factors (K×T) + metad dict + n_factors_eff + has_PSF + PSFcase. NO transform/fit_transform (user brief assumed presence; they don't exist in 0.6.7). 4. Data requirements: MultiIndex (entity, date) DataFrame OR explicit indices array. Min stable shape 10 firms × 20 years × 2 chars (maintainer's test_ipca.py). NaN handling internal. Unbalanced panels supported. 502-ticker scale uses data_type="portfolio" ALS path (integration-PR scope). 5. CI install footprint: ~50-80 MB net-new (numba ~50 MB + llvmlite ~30 MB + small progressbar). Substantially lighter than Qlib's 150-180 MB. IPCA structural shape — 4th distinct vs prior scouts: - OSAP (4h): factor returns CSV → proxy/36m regression - JKP (4i): factor returns CSV → 36m regression - Qlib (4j): per-stock per-date features → native Alpha158 - IPCA (4k): panel decomposition → Gamma (L×K loadings) + Factors (K×T) latent returns Critical scope decision — NO @network test (mirrors Phase 4j Qlib rationale): IPCA is pure local sklearn-style computation. No remote endpoint to network-test. Scout ships 6 offline tests / 0 @network. The synthetic-fixture smoke test exercises the full fit→Gamma/Factors path locally. Architectural locks: - Module placement `compute/features/ipca_factors.py` (NOT compute/ingest/) per pre-existing `.claude/skills/phase-4/ipca-factor-fit/PLAN.md:24` + `compute/features/osap_replicate.py` precedent. NO namespace collision (module=`ipca_factors`, package=`ipca`) — Phase 4j's `qlib_features.py` workaround doesn't apply here. - INSTRUMENTED_PCA_PUBLIC_API 8-method tuple — drift detector; module-load assertion against config.IPCA_PUBLIC_API_METHOD_COUNT. - IPCA_DEFAULT_N_FACTORS=5, IPCA_DEFAULT_INTERCEPT=True (KPS 2019 baseline) — validated by smoke test, NOT module-load assert. - Tenacity NOT applied — pure local sklearn-style; no network retry. Second module after Phase 4j that diverges from osap.py:52-56 pattern; documented in module docstring. - Synthetic fixture inline as @pytest.fixture (NOT committed CSV/parquet) — IPCA inputs are numpy arrays, no roundtrip needed. Module layer (compute/features/ipca_factors.py, ~190 LOC): - IPCA_FITTED_ARTIFACTS_CACHE re-export from config - INSTRUMENTED_PCA_PUBLIC_API 8-tuple + module-load invariants (cardinality + uniqueness) - IPCA_DEFAULT_N_FACTORS / IPCA_DEFAULT_INTERCEPT constants - init_ipca(n_factors, intercept, **kwargs) → unfitted InstrumentedPCA - fit_ipca_panel(estimator, *, X, y, indices, **fit_kwargs) → fitted estimator (returns-self) Config layer (compute/config.py, +28 LOC): - IPCA_FITTED_ARTIFACTS_CACHE: Path = CACHE_DIR / "ipca" - IPCA_FITTED_ARTIFACTS_MAX_AGE_DAYS: int = 31 - IPCA_PUBLIC_API_METHOD_COUNT: int = 8 Tests (6 offline; ~228 LOC): 1. test_ipca_imports_and_exposes_instrumented_pca — primary CI signal (importorskip) 2. test_instrumented_pca_public_api_manifest_locks_8_methods — pure assertion (no ipca runtime) 3. test_instrumented_pca_public_api_matches_runtime_introspection — drift detector 4. test_ipca_fitted_artifacts_cache_under_repo_cache_dir — config sanity 5. test_init_ipca_returns_unfitted_estimator_with_kps_defaults — KPS defaults validation 6. test_fit_ipca_panel_on_synthetic_5x30x10_fixture — smoke fit; asserts Gamma (10,2) + Factors (2,30) + metad N/T/L pyproject.toml: append `ipca>=0.6.7,<0.7` to [factors] (authorized in advance via plan-mode approval; pin range because 2021-04-22 staleness). Ask-first surfaces touched: - pyproject.toml [factors] — extended (authorized via plan-mode) - ci.yml UNCHANGED ([dev,factors] install already covers new dep) - compute-rankings.yml UNTOUCHED per user hard constraint - Schema triple UNTOUCHED (no schema delta this scout) Verification (local sandbox without [factors] + CI with [dev,factors]): - ruff check . → clean - python -m compute.output.schema_check → in-sync - Import smoke: from compute.features.ipca_factors import init_ipca, fit_ipca_panel, INSTRUMENTED_PCA_PUBLIC_API → OK 8 - pytest tests/ -m "not network" excluding factor-extra files → 864 passed locally - 2/6 IPCA tests PASS locally; 4/6 SKIP via pytest.importorskip (expected — local lacks [factors]) - CI on 82ade3a with [dev,factors] → both Python+Frontend GREEN; 936 offline expected Defense layer unchanged at 17. Top-5 rotation unchanged. Schema unchanged at 0.9.1-phase4h.2. Out of scope (deferred to follow-on Phase 4k.1 integration PR, ~5-commit cluster): - Characteristics-matrix construction (which features feed X?) - Full IPCA fit on 502-ticker universe (data_type="portfolio" canonical scaling) - Walk-forward / rolling-window fit cadence - Latent-factor composite integration decision - Schema additions (StockDetail.ipca_loadings + Metadata.ipca_n_factors_eff + ipca_in_sample_r2) → bump 0.9.1-phase4h.2 → 0.10.0-phase4k - PBO/DSR doesn't apply (loadings ≠ portfolio returns); IC walk-forward observability instead per PLAN.md:36 - Top-5 rotation impact analysis (Rule 16 lock) - WRDS data backfill consideration Audit history: - Plan-audit round 1: 5 investigations verified · MIT lock · heavy-deps disclosure - Plan-audit round 2: Q1 (public-API surface lock) + Q2 (inline pytest.fixture) design choices applied - Plan-audit round 3: line citations verified - Implementation: main session direct (Phase 4j paste-loop precedent — worker session was stuck re-presenting plan) - CI green on 82ade3a: Python+Frontend both passing · Vercel ✅ READY - Conditional Mark-Ready authorization given · user confirmed CI green · squash merged Closes the factor-library scout cluster. Next: v1.1.0-phase4 tag readiness audit gated on 4 integration PRs. https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2
This was referenced May 19, 2026
Closed
dackclup
added a commit
that referenced
this pull request
May 20, 2026
Part of epic #125 (Item #6 of 6). Pure tooling addition — no runtime / scoring / schema impact. Motivation ---------- PR #123 (2026-05-19, closed without merging): a worker session opened a Phase 4j + 4k scout duplicate on branch `claude/resume-quantrank-phase-4.5-Zh0pO` while the main session shipped the same work directly via PRs #119 (Qlib) + #121 (IPCA). Root cause: the worker session never inspected the `claude/*` branch list + recent PRs before writing code, producing 100% wasted effort. This change ships a preflight check that surfaces in-flight scope BEFORE any code is written, so the duplicate-PR failure mode is caught at the handoff-prompt entry rather than at PR review. Files (2 new, +271 LOC) ------------------------ - tools/check_branch_collisions.py (+149 LOC) — git-only preflight script. Lists active `claude/*` branches via `git ls-remote origin "refs/heads/claude/*"` and recent main-branch commits via `git log --since="48 hours ago" --oneline --no-merges origin/main`. Optional keyword args flag case-insensitive substring matches. Always exit 0 (informational only). - .claude/skills/branch-collision-check/SKILL.md (+122 LOC) — skill description with YAML frontmatter, trigger conditions (handoff prompts, Phase / issue / Item #N mentions, fresh worker sessions), skip conditions (doc-only chores, iteration #2+, user-authorized parallel work), sample output (clean + warning), and output-interpretation guidance pointing the caller to STOP + ask the user when any⚠️ line surfaces. Design notes ------------ - Git-only data sources — no `gh` CLI / GitHub API auth required. Works in the QuantRank Claude Code Web sandbox where `gh` is unavailable, and on any contributor machine with bare git. - 48-hour window — matches typical worker ↔ main session handoff cadence; long enough to catch duplicate work, short enough to keep the output scannable. - Pure read-only — no destructive git ops, no branch creation, no push, no GitHub API mutation. Always returns exit 0; the caller decides whether to proceed. Verification ladder all green ------------------------------ - ruff check . → All checks passed - python tools/check_branch_collisions.py → lists 1 active claude/* branch + 16 recent commits (last 48h), exit 0 - python tools/check_branch_collisions.py "Alpha158" → fires⚠️ on PR #119 commit "Alpha158 158-feature manifest", summary reports "1 potential scope collision(s) found", exit 0 - python tools/check_branch_collisions.py "Phase 99 nonsense" → no match, summary reports "No scope collisions detected", exit 0 - python tools/check_doc_test_counts.py → exit 0 (Item #2 guard still passes; new files don't introduce hardcoded counts) - python -m compute.output.schema_check → in sync (no schema touch) - python -m pytest tests/ -m "not network" → 959 passed (unchanged; tools/ + .claude/skills/ aren't imported by tests) - SKILL.md YAML frontmatter parses — confirmed via Claude Code's skill registry picking it up at module load Constraints honored ------------------- - No touch to compute/ / frontend/ / tests/ — tools/ + .claude/skills/ only - No network calls / no GitHub API auth — git remote ls + git log - No destructive actions — read-only preflight check - No push to main; no force-push; no --no-verify - No workflow_dispatch trigger (compute-rankings.yml untouched) Epic #125 status after this PR ------------------------------- Item #1 ✅ Hypothesis property tests (PR #127) Item #2 ✅ Strip hardcoded test counts + CI guard (PR #128) Item #4 ✅ Observability-before-wiring pattern (PR #129) Item #6 ✅ Branch-collision preflight (this PR) Items #3, #5 remain — separate PRs per epic decomposition. https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU Co-authored-by: Claude <noreply@anthropic.com>
dackclup
added a commit
that referenced
this pull request
May 20, 2026
…imization PR F) (#146) Sixth PR in the .md optimization sequence (Option D). Audit of 18 QR-origin skill descriptions found all are well-formed (parseable YAML, TRIGGER + SKIP clauses present, average 888 chars). The critical YAML bug (#119+#121 plain-scalar bug in branch-collision- check and pr-quality-gate) was already fixed in PR A. So PR F's remaining work is light polish, not structural change. Vendored skills (20) FROZEN per the boundary convention — Anthropic skills, mattpocock-* (8), karpathy-guidelines, thananon/9arm-skills (4), karpathy-llm-wiki are all upstream-only edits. Trim targets (cut redundancy, fix drift, add Thai triggers): 1. pr-quality-gate (1207 → ~1015): cut redundant "ALSO use right before flipping Draft→Ready" clause that duplicated the first TRIGGER ("before authorizing the Draft→Ready flip"). Tightened wrapping. 2. pr-iteration-flow (990 → ~890): cut redundant "ALSO use this skill as the default workflow harness any time a PR is open" that duplicated the TRIGGER list. Dropped stale "PR-3c → PR-3d → PR-20" historical reference. Added Thai trigger phrases "เช็ค CI" / "ดู PR" since the user invokes this skill in Thai. 3. phase-status-bump (918 → ~840): dropped two historical examples ("PR 3d → tag v0.6.0-phase3d" and "3a→3b, 3c→3d") that anchored the description to one shipped phase. Wording now phase-agnostic. 4. verify-production-output (1086 → ~870): compressed the "Surfaces..." enumeration of Section A-H content (was 8 detailed items; now 8 short items) without losing dispatch specificity. Added Thai trigger phrases "ตรวจ output" / "เช็ค production". Folded "ALSO use" into first TRIGGER as one phrase. YAML moved from plain scalar to `description: >` (folded block) on the 3 plain-scalar descriptions edited (pr-iteration-flow, phase-status-bump, verify-production-output) — same safety pattern PR A applied. Prevents the ' #' comment-eating bug from re-emerging if anyone adds a `#issue` reference later. Net token impact: ~-650 chars × ~0.25 tokens/char ≈ -162 tokens per session-start. Modest but compounds. Why not aggressive trim: - Each TRIGGER phrase + SKIP clause IS dispatch-useful — verified by sampling. Aggressive 50% cuts would risk dispatch quality. - Remaining 14 QR-origin skills already at 700-900 chars with no redundancy to remove. CLAUDE.md (181 → 181, lockstep): §Phase status — added PR #145 (E) to "Recently merged"; replaced "PR E in flight" with "PR F in flight" note explaining the audit found health. AGENTS.md (343 → 343, lockstep): §Phase + version state — optimization sequence tracker updated: PR E ✅, PR F in flight, PR G remaining. Next: PR G (PHASE_STATUS.md "Current State" summary at top + chronological table below). Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 4j scout PR — 3rd of 4 factor-library scouts (OSAP ✅ #110, JKP ✅ #114, Qlib next, IPCA later). Ships
pyqlibinstall + 158-feature manifest + 6 offline tests. NO production wiring this PR; the yfinance-to-Qlib BYO adapter + full Alpha158 feature compute on the 502-ticker universe ships in a follow-on integration PR.No new veto. Defense layer unchanged at 17. Top-5 rotation unchanged. Schema unchanged at
0.9.1-phase4h.2.Pre-plan investigation results (verified 2026-05-19)
pyqlib0.9.7 (also 0.9.6). Other candidate names (qlib,microsoft-qlib) return 404.Classifier: License :: OSI Approved :: MIT License). No CC BY-NC complication like JKP. Safe for Phase 6+ commercial roadmap.qlib.init(provider_uri=..., region="us"). NO public US data bundle — Qlib's defaultprovider_uricovers CN A-share only; US universe is BYO via local.binfiles.qlib.contrib.data.handler.Alpha158→ 158 columns. Manifest captured at scout time viaAlpha158DL.get_feature_config()[1]and hardcoded; offline test 3 locks against upstream drift.🚨 Critical scope decision — NO
@networktest for this scoutPhase 4h scout (PR #110) and Phase 4i scout (PR #114) each had a
@pytest.mark.networktest that hit a remote CDN. Qlib has no remote CDN — its data flow is local-bin filesystem I/O. The originally planned synthetic-OHLCV → bin →init_qlib→Alpha158.fetchsmoke test was DROPPED post-investigation:pyqlib's PyPI wheel does NOT bundle thescripts/dump_bin.pyutility needed for OHLCV →.binconversion. That scaffolding is integration-PR scope.Replacement verification surface (test #3 below): the hardcoded
ALPHA158_FEATURE_NAMEStuple is asserted against the runtime introspection fromAlpha158DL.get_feature_config()[1]. This is actually a stronger drift detector than the dropped end-to-end test would have been — fires on everypip installupgrade if Qlib changes the feature set.pip install pyqlibpulls ~22 transitive deps. Heavy ones NET-NEW to QuantRank's tree:mlflowlightgbmcvxpypymongoredis(Python client)gymjupyter+nbconvertNet CI install footprint bump: ~150-180 MB. None of these heavy deps are consumed by the scout — they come along because
pyqlibdoesn't expose a[minimal]extra upstream. CI cold-start latency bump is one-time per workflow; pip wheel caching mitigates subsequent runs.Module-name choice locked
The new module is
compute/ingest/qlib_features.py, NOTcompute/ingest/qlib.py. Python's import resolution would treat the latter as theqlibpackage and shadow the actual installed PyPI package, breaking the entire factor-library integration. Distinct module name avoids the namespace collision.Files
compute/ingest/qlib_features.pyinit_qlib+fetch_alpha158_featurescompute/config.py# --- Phase 4j scout ---blocktests/test_ingest/test_qlib_features.pypyproject.tomlpyqlib>=0.9.7,<0.10to[factors]extraPHASE_STATUS.mdWithin scout-style budget (Phase 4i scout was ~360 LOC; Phase 4j is leaner because no
@networktest scaffolding).Tenacity policy NOT applied
Qlib's data flow is local filesystem I/O. No network retry semantics needed. This is the first ingest module in QuantRank that diverges from the canonical
compute/ingest/osap.py:52-56retry decorator (documented explicitly in the module docstring).Tests (6 offline; NO
@network)test_alpha158_feature_manifest_has_158_entries[factors]extra.test_alpha158_feature_manifest_first_5_anchorKMID, KLEN, KMID2, KUP, KUP2) anchored against Qlib v0.9.7.test_alpha158_feature_manifest_matches_runtime_introspection⭐Alpha158DL.get_feature_config()[1]. Wrapped inpytest.importorskip("qlib").test_qlib_data_cache_constant_under_repo_cache_dircompute/cache/parent glob (.gitignore:221).test_init_qlib_passes_us_region_and_provider_uriregion="us"+ path passthrough.test_init_qlib_defaults_to_config_cache_when_no_uriprovider_uri=config.QLIB_DATA_CACHE.Verification ladder (8-step; STOP at step 8)
ruff check .pytest tests/ -m "not network"pytest -m network --run-network@network)python -m compute.output.schema_checkpython -c "from compute.ingest.qlib_features import init_qlib, fetch_alpha158_features, ALPHA158_FEATURE_NAMES; print('OK', len(ALPHA158_FEATURE_NAMES))"OK 158git push -u origin claude/resume-quantrank-phase-4.5-Zh0pO68ed2386subscribe_pr_activity+ STOP for user auditAsk-first surfaces touched
pyproject.toml [factors]— extended withpyqlib>=0.9.7,<0.10(authorized in advance via plan-mode approval).github/workflows/ci.yml— UNCHANGED ([dev,factors]install already covers the new dep).github/workflows/compute-rankings.yml— UNTOUCHED per user hard constraintschemas.py/types.ts/schema-snapshot.json) — UNTOUCHED (no schema delta this scout)Out of scope (deferred to follow-on integration PR — ~5-commit cluster mirroring Phase 4h shape)
compute/cache/prices/*.parquetto Qlib.binformat.StockDetail.qlib_features+Metadata.qlib_features_used+ IC observability) → schema bump0.9.1-phase4h.2 → 0.10.0-phase4j.compute/main.pywiring decision — observability-only? blended into composite? Phase-5 ML-meta-learner-only consumer?Risks (from plan, with post-implementation resolution)
@networktest — divergence from Phase 4h/4i patterndump_binAPI may have changed across 0.9.6 → 0.9.7<0.10so any future drift surfaces deliberatelypip install pyqlibCI cold-start ~150-180 MBpytest.importorskip("qlib")masks failures when extra isn't installedtests/test_features/test_osap_e2e_integration.pypattern; CI installs[factors]pyqlibbumpqlib.initglobal state pollutes other testsTest plan
68ed2386(Python lint+test + Frontend build + Vercel preview)🤖 Drafted with Claude Code via the Anthropic SDK.
Generated by Claude Code