feat(ingest): Qlib scout — pyqlib MIT install + Alpha158 handler smoke + 158-feature manifest by dackclup · Pull Request #119 · dackclup/quantrank

dackclup · 2026-05-19T09:33:52Z

Summary

Phase 4j scout PR — 3rd of 4 factor-library scouts (OSAP ✅ #110, JKP ✅ #114, Qlib next, IPCA later). Ships pyqlib install + 158-feature manifest + 6 offline tests. NO production wiring this PR; the yfinance-to-Qlib BYO adapter + full Alpha158 feature compute on the 502-ticker universe ships in a follow-on integration PR.

No new veto. Defense layer unchanged at 17. Top-5 rotation unchanged. Schema unchanged at 0.9.1-phase4h.2.

Pre-plan investigation results (verified 2026-05-19)

#	Finding	Verdict
1	PyPI package	`pyqlib` 0.9.7 (also 0.9.6). Other candidate names (`qlib`, `microsoft-qlib`) return 404.
2	License	MIT — verified via wheel METADATA inspection (`Classifier: License :: OSI Approved :: MIT License`). No CC BY-NC complication like JKP. Safe for Phase 6+ commercial roadmap.
3	Data init	`qlib.init(provider_uri=..., region="us")`. NO public US data bundle — Qlib's default `provider_uri` covers CN A-share only; US universe is BYO via local `.bin` files.
4	Alpha158 surface	`qlib.contrib.data.handler.Alpha158` → 158 columns. Manifest captured at scout time via `Alpha158DL.get_feature_config()[1]` and hardcoded; offline test 3 locks against upstream drift.

🚨 Critical scope decision — NO `@network` test for this scout

Phase 4h scout (PR #110) and Phase 4i scout (PR #114) each had a @pytest.mark.network test that hit a remote CDN. Qlib has no remote CDN — its data flow is local-bin filesystem I/O. The originally planned synthetic-OHLCV → bin → init_qlib → Alpha158.fetch smoke test was DROPPED post-investigation: pyqlib's PyPI wheel does NOT bundle the scripts/dump_bin.py utility needed for OHLCV → .bin conversion. That scaffolding is integration-PR scope.

Replacement verification surface (test #3 below): the hardcoded ALPHA158_FEATURE_NAMES tuple is asserted against the runtime introspection from Alpha158DL.get_feature_config()[1]. This is actually a stronger drift detector than the dropped end-to-end test would have been — fires on every pip install upgrade if Qlib changes the feature set.

⚠️ CI install footprint disclosure

pip install pyqlib pulls ~22 transitive deps. Heavy ones NET-NEW to QuantRank's tree:

Dep	Approx size	Purpose
`mlflow`	~20 MB	ML experiment tracking (not used by scout)
`lightgbm`	~15 MB	Default Qlib ML model (not used by scout)
`cvxpy`	~30 MB	Portfolio optimization (not used by scout)
`pymongo`	~5 MB	Experiment store backend (not used by scout)
`redis` (Python client)	~1 MB	Cache backend (not used by scout)
`gym`	~10 MB	RL gym env (not used by scout)
`jupyter` + `nbconvert`	~100 MB	Notebook tooling (not used by scout)

Net CI install footprint bump: ~150-180 MB. None of these heavy deps are consumed by the scout — they come along because pyqlib doesn't expose a [minimal] extra upstream. CI cold-start latency bump is one-time per workflow; pip wheel caching mitigates subsequent runs.

Module-name choice locked

The new module is compute/ingest/qlib_features.py, NOT compute/ingest/qlib.py. Python's import resolution would treat the latter as the qlib package and shadow the actual installed PyPI package, breaking the entire factor-library integration. Distinct module name avoids the namespace collision.

Files

Path	Action	LOC
`compute/ingest/qlib_features.py`	NEW — module docstring + ALPHA158_FEATURE_NAMES + `init_qlib` + `fetch_alpha158_features`	186
`compute/config.py`	Edit — 3 new constants in a `# --- Phase 4j scout ---` block	+23
`tests/test_ingest/test_qlib_features.py`	NEW — 6 offline tests	113
`pyproject.toml`	Edit — add `pyqlib>=0.9.7,<0.10` to `[factors]` extra	+7
`PHASE_STATUS.md`	Edit — row 4 sub-bullet for Phase 4j scout	+1
Total	—	~330 LOC

Within scout-style budget (Phase 4i scout was ~360 LOC; Phase 4j is leaner because no @network test scaffolding).

Tenacity policy NOT applied

Qlib's data flow is local filesystem I/O. No network retry semantics needed. This is the first ingest module in QuantRank that diverges from the canonical compute/ingest/osap.py:52-56 retry decorator (documented explicitly in the module docstring).

Tests (6 offline; NO `@network`)

#	Test	Coverage
1	`test_alpha158_feature_manifest_has_158_entries`	Primary CI signal. Pure cardinality + uniqueness; survives even without `[factors]` extra.
2	`test_alpha158_feature_manifest_first_5_anchor`	K-bar leading features (`KMID, KLEN, KMID2, KUP, KUP2`) anchored against Qlib v0.9.7.
3	`test_alpha158_feature_manifest_matches_runtime_introspection` ⭐	The drift detector. Hardcoded tuple must equal `Alpha158DL.get_feature_config()[1]`. Wrapped in `pytest.importorskip("qlib")`.
4	`test_qlib_data_cache_constant_under_repo_cache_dir`	Config sanity. Locks gitignore coverage via `compute/cache/` parent glob (`.gitignore:221`).
5	`test_init_qlib_passes_us_region_and_provider_uri`	Monkeypatch capture; asserts `region="us"` + path passthrough.
6	`test_init_qlib_defaults_to_config_cache_when_no_uri`	Default `provider_uri` = `config.QLIB_DATA_CACHE`.

Verification ladder (8-step; STOP at step 8)

Step	Command	Result
1	`ruff check .`	✅ clean
2	`pytest tests/ -m "not network"`	✅ 930 passed (924 prior + 6 new)
3	`pytest -m network --run-network`	(unchanged at 20 — no new `@network`)
4	`python -m compute.output.schema_check`	✅ in-sync (no schema delta)
5	`python -c "from compute.ingest.qlib_features import init_qlib, fetch_alpha158_features, ALPHA158_FEATURE_NAMES; print('OK', len(ALPHA158_FEATURE_NAMES))"`	✅ `OK 158`
6	`git push -u origin claude/resume-quantrank-phase-4.5-Zh0pO`	✅ at `68ed2386`
7	Open PR as Draft (this PR)	✅
8	`subscribe_pr_activity` + STOP for user audit	⏳ next

Ask-first surfaces touched

pyproject.toml [factors] — extended with pyqlib>=0.9.7,<0.10 (authorized in advance via plan-mode approval)
.github/workflows/ci.yml — UNCHANGED ([dev,factors] install already covers the new dep)
.github/workflows/compute-rankings.yml — UNTOUCHED per user hard constraint
Schema triple (schemas.py / types.ts / schema-snapshot.json) — UNTOUCHED (no schema delta this scout)

Out of scope (deferred to follow-on integration PR — ~5-commit cluster mirroring Phase 4h shape)

yfinance-to-Qlib BYO adapter (~150 LOC + custom S&P 500 instruments universe registration). Converts compute/cache/prices/*.parquet to Qlib .bin format.
Full Alpha158 feature compute on the 502-ticker universe → 502 × N_dates × 158 DataFrame.
Per-feature cross-validation framework — PBO/DSR doesn't directly apply to per-stock-per-date features. Walk-forward IC scoring per feature is the likely replacement; Phase 5 backtest infra is the canonical version.
Schema additions (StockDetail.qlib_features + Metadata.qlib_features_used + IC observability) → schema bump 0.9.1-phase4h.2 → 0.10.0-phase4j.
compute/main.py wiring decision — observability-only? blended into composite? Phase-5 ML-meta-learner-only consumer?
Top-5 rotation impact analysis (Rule 16 lock applies as for prior factor libraries).

Risks (from plan, with post-implementation resolution)

#	Risk (planned)	Status post-scout
1	NO real `@network` test — divergence from Phase 4h/4i pattern	Documented + replaced with manifest-matches-runtime introspection (stronger drift detector)
2	Qlib's `dump_bin` API may have changed across 0.9.6 → 0.9.7	Defer to integration PR; pyqlib pinned to `<0.10` so any future drift surfaces deliberately
3	`pip install pyqlib` CI cold-start ~150-180 MB	Documented in PR body; pip wheel caching mitigates subsequent runs
4	`pytest.importorskip("qlib")` masks failures when extra isn't installed	Acceptable — matches `tests/test_features/test_osap_e2e_integration.py` pattern; CI installs `[factors]`
5	Alpha158 hardcoded manifest drifts vs upstream	Test #3 catches drift; manifest is hand-updated on deliberate `pyqlib` bump
6	`qlib.init` global state pollutes other tests	Tests use monkeypatch + tmp_path; no actual init called during the 6 scout tests

Test plan

Commit local — all 6 tests green; 930 total offline; ruff clean; schema in-sync
CI green on 68ed2386 (Python lint+test + Frontend build + Vercel preview)
Audit: install footprint disclosure + no-@network rationale + manifest drift detector design
User authorizes Draft → Ready flip
(post-merge) Section I post-merge no-op (no UI surface change; preview build sanity-check sufficient)
(post-merge) File "Phase 4j.1 — Full integration" follow-up tracking issue with explicit BYO-adapter precondition

🤖 Drafted with Claude Code via the Anthropic SDK.

Generated by Claude Code

…e + 158-feature manifest Phase 4j scout PR. Mirrors the proven Phase 4i scout pattern (PR #114) for Microsoft Qlib's Alpha158 feature library. Scope is install + API surface + manifest verification ONLY; the yfinance-to-Qlib BYO adapter + full Alpha158 feature compute on the 502-ticker universe ships in a follow-on integration PR. **Pre-plan access-path discovery** (verified 2026-05-19; full record in ``compute/ingest/qlib_features.py`` module docstring): 1. **PyPI package**: ``pyqlib`` 0.9.7 (also 0.9.6 available). Other candidate names (``qlib``, ``microsoft-qlib``) return 404. 2. **License**: MIT (verified via wheel METADATA inspection — ``Classifier: License :: OSI Approved :: MIT License``). **No CC BY-NC complication** like JKP. Safe for Phase 6+ commercial roadmap. 3. **Data init**: ``qlib.init(provider_uri=..., region=REG_US)`` where ``REG_US = "us"``. **NO public US data bundle published by Qlib** — the ``provider_uri`` defaults to ``~/.qlib/qlib_data/cn_data`` (Chinese A-share, irrelevant for QuantRank); the US universe is BYO via local ``.bin`` files. 4. **Alpha158 surface**: ``qlib.contrib.data.handler.Alpha158`` → ``handler.fetch(col_set="feature")`` returns a DataFrame with ``(datetime, instrument)`` MultiIndex × 158 feature columns. The 158-name manifest is fetched via ``Alpha158DL.get_feature_config()[1]`` — captured at scout time and hardcoded for stability; offline test 3 below locks it against upstream drift. **Module** (``compute/ingest/qlib_features.py``, 186 LOC including docstring): - Module-name choice locked per architectural review: NOT ``compute/ingest/qlib.py``. Python's import resolution would treat the latter as the ``qlib`` package and shadow the actual installed PyPI package, breaking the entire integration. Distinct module name avoids the namespace collision. - ``QLIB_INSTRUMENTS_UNIVERSE = "sp500"`` — custom universe ID; integration PR registers this against Qlib's instruments API. - ``ALPHA158_FEATURE_NAMES: tuple[str, ...]`` — 158-name manifest hardcoded from ``Alpha158DL.get_feature_config()[1]`` at scout implementation time against pyqlib 0.9.7. Cardinality asserted at module load against ``config.ALPHA158_FEATURE_COUNT``. - ``init_qlib(provider_uri=None)`` — idempotent thin wrapper around ``qlib.init(provider_uri=..., region="us")``. Local import so the scout module loads even when ``[factors]`` extra isn't installed. - ``fetch_alpha158_features(*, instruments, start_time, end_time)`` — forward-compat wrapper around ``Alpha158(...).fetch(col_set= "feature")``. NOT exercised end-to-end by the scout (see §"No ``@network`` test" below). **Config** (``compute/config.py``, +23 LOC): new ``# --- Phase 4j scout: Microsoft Qlib (Alpha158) integration ---`` block adds: - ``QLIB_DATA_CACHE: Path = CACHE_DIR / "qlib" / "us_data"`` (gitignored — ``compute/cache/`` parent glob at .gitignore:221 covers it). - ``QLIB_DATA_MAX_AGE_DAYS: int = 31`` (BYO bundle, monthly refresh). - ``ALPHA158_FEATURE_COUNT: int = 158``. **pyproject.toml**: ``[factors]`` extra extended with ``pyqlib>=0.9.7,<0.10``. The ``<0.10`` cap pins against Qlib 0.10+ which may drift the feature set; offline test 3 will catch any drift on a deliberate version bump. **Tests** (``tests/test_ingest/test_qlib_features.py``, 113 LOC, 6 offline — NO ``@network``): 1. ``test_alpha158_feature_manifest_has_158_entries`` — primary CI signal. Pure cardinality + uniqueness check; survives even when the ``[factors]`` extra isn't installed. 2. ``test_alpha158_feature_manifest_first_5_anchor`` — anchors the K-bar leading features (``KMID, KLEN, KMID2, KUP, KUP2``) against the canonical Qlib v0.9.7 surface. 3. ``test_alpha158_feature_manifest_matches_runtime_introspection`` — hardcoded tuple must equal ``Alpha158DL.get_feature_config() [1]``. Wrapped in ``pytest.importorskip("qlib")``. The drift detector. 4. ``test_qlib_data_cache_constant_under_repo_cache_dir`` — config sanity + locks gitignore coverage via the ``compute/cache/`` parent glob. 5. ``test_init_qlib_passes_us_region_and_provider_uri`` — monkeypatch capture; asserts ``region="us"`` + provided ``provider_uri`` are passed through. 6. ``test_init_qlib_defaults_to_config_cache_when_no_uri`` — default ``provider_uri`` resolves to ``config.QLIB_DATA_CACHE``. **Critical scope decision — NO ``@network`` test for this scout**: Phase 4h scout (PR #110) and Phase 4i scout (PR #114) each had a ``@pytest.mark.network`` test that hit a remote CDN. **Qlib has no remote CDN** — its data flow is local-bin filesystem I/O, not download-from-network. The originally planned synthetic-OHLCV → ``.bin`` conversion → ``init_qlib`` → ``Alpha158.fetch`` smoke test was DROPPED post-investigation: pyqlib's PyPI wheel does NOT bundle the ``scripts/dump_bin.py`` utility needed for OHLCV → ``.bin`` conversion. That scaffolding is integration-PR scope. Test #3 (runtime introspection match) is the **replacement verification surface** — actually a stronger drift detector than the dropped end-to-end test would have been, because it asserts the hardcoded manifest matches upstream on every ``pip install``. **CI install footprint impact**: ~150-180 MB net-new. ``pyqlib`` pulls ~22 transitive deps including ``mlflow`` (~20 MB), ``lightgbm`` (~15 MB), ``cvxpy`` (~30 MB), ``pymongo``, ``redis`` client, ``gym``, ``jupyter``, ``nbconvert``. None of these heavy deps are actually consumed by the scout — they come along for the ride because pyqlib doesn't expose a ``[minimal]`` extra. CI cold- start latency bump is one-time per workflow; pip wheel caching mitigates subsequent runs. **Tenacity policy NOT applied**: Qlib's data flow is local filesystem I/O. No network retry semantics needed. This is the first ingest module in QuantRank that diverges from the canonical ``compute/ingest/osap.py:52-56`` retry decorator (documented explicitly in the module docstring). **Verification ladder** (steps 1-5 complete): - ``ruff check .`` → clean ✅ - ``pytest tests/ -m "not network"`` → **930 passed** (924 baseline + 6 new offline) ✅ - ``pytest -m network --run-network`` → 20 (unchanged; NO new ``@network``) ✅ - ``python -m compute.output.schema_check`` → in-sync (NO schema delta this scout) ✅ - ``python -c "from compute.ingest.qlib_features import init_qlib, fetch_alpha158_features, ALPHA158_FEATURE_NAMES; print('OK', len(ALPHA158_FEATURE_NAMES))"`` → ``OK 158`` ✅ Steps 6-8: ``git push`` → open Draft PR → ``subscribe_pr_activity`` + STOP for user audit + Mark-Ready authorization. **Ask-first surfaces touched**: NONE for the workflow / schema triple. ``pyproject.toml [factors]`` extra extended in this commit (authorized in advance via the plan-mode approval). ``.github/workflows/ci.yml`` unchanged (``[dev,factors]`` install already covers the new pyqlib dep). ``.github/workflows/compute-rankings.yml`` UNTOUCHED per user hard constraint. **Defense layer**: unchanged at 17. **Top-5 rotation**: unchanged. **Schema version**: unchanged at ``0.9.1-phase4h.2`` (no schema delta this scout). **Out of scope** (deferred to follow-on full Phase 4j integration PR, ~5-commit cluster like Phase 4h): - yfinance-to-Qlib BYO adapter (~150 LOC; ``compute/cache/prices/ *.parquet`` → Qlib ``.bin`` format conversion) - Full Alpha158 feature compute on 502-ticker universe (502 × N_dates × 158 DataFrame) - Per-feature cross-validation framework (PBO/DSR doesn't directly apply to per-stock-per-date features — walk-forward IC scoring per feature is the likely replacement) - Schema additions (``StockDetail.qlib_features`` + ``Metadata.qlib_features_used`` + IC observability) → bump ``0.9.1-phase4h.2 → 0.10.0-phase4j`` - ``compute/main.py`` wiring decision (observability-only? blended into composite? Phase-5 ML-meta-learner-only consumer?) - Top-5 rotation impact analysis (Rule 16 lock applies) https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU

vercel · 2026-05-19T09:33:57Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
quantrank	Ready	Preview, Comment	May 19, 2026 9:33am

@pytest

…all + InstrumentedPCA 8-method API surface lock + 6 offline tests (#121) Phase 4k scout — **FINAL of 4 factor-library scouts** (OSAP ✅ #110, JKP ✅ #114, Qlib ✅ #119, IPCA THIS). Ships `ipca` install + 8-method public-API surface lock + 6 offline tests + inline synthetic fixture. NO production wiring; characteristics-matrix construction + universe-wide IPCA fit + composite blend decision are integration-PR scope (Phase 4k.1, tracked as follow-up). **After this merges → all 4 factor scouts done → eligible for v1.1.0-phase4 tag readiness audit** (gated on 4h.2 Part 2 + 4i.1 + 4j.1 + 4k.1 integration PRs landing — ~6-8w combined effort). 5 pre-plan investigations (verified 2026-05-19, carried verbatim into module docstring): 1. PyPI package: `ipca` 0.6.7 (29 historical versions; last release 2021-04-22 — ~5 years stale). Pin tight `>=0.6.7,<0.7`. 2. License: MIT verbatim from LICENSE.md (Buechner / Bybee 2019). No CC BY-NC complication unlike JKP. Safe for Phase 6+ commercial roadmap. 3. sklearn-compatible API surface — 8 public methods: fit / get_factors / fit_path / predict / predict_panel / predict_portfolio / score / predictOOS. Post-fit attrs: Gamma (L×K) + Factors (K×T) + metad dict + n_factors_eff + has_PSF + PSFcase. NO transform/fit_transform (user brief assumed presence; they don't exist in 0.6.7). 4. Data requirements: MultiIndex (entity, date) DataFrame OR explicit indices array. Min stable shape 10 firms × 20 years × 2 chars (maintainer's test_ipca.py). NaN handling internal. Unbalanced panels supported. 502-ticker scale uses data_type="portfolio" ALS path (integration-PR scope). 5. CI install footprint: ~50-80 MB net-new (numba ~50 MB + llvmlite ~30 MB + small progressbar). Substantially lighter than Qlib's 150-180 MB. IPCA structural shape — 4th distinct vs prior scouts: - OSAP (4h): factor returns CSV → proxy/36m regression - JKP (4i): factor returns CSV → 36m regression - Qlib (4j): per-stock per-date features → native Alpha158 - IPCA (4k): panel decomposition → Gamma (L×K loadings) + Factors (K×T) latent returns Critical scope decision — NO @network test (mirrors Phase 4j Qlib rationale): IPCA is pure local sklearn-style computation. No remote endpoint to network-test. Scout ships 6 offline tests / 0 @network. The synthetic-fixture smoke test exercises the full fit→Gamma/Factors path locally. Architectural locks: - Module placement `compute/features/ipca_factors.py` (NOT compute/ingest/) per pre-existing `.claude/skills/phase-4/ipca-factor-fit/PLAN.md:24` + `compute/features/osap_replicate.py` precedent. NO namespace collision (module=`ipca_factors`, package=`ipca`) — Phase 4j's `qlib_features.py` workaround doesn't apply here. - INSTRUMENTED_PCA_PUBLIC_API 8-method tuple — drift detector; module-load assertion against config.IPCA_PUBLIC_API_METHOD_COUNT. - IPCA_DEFAULT_N_FACTORS=5, IPCA_DEFAULT_INTERCEPT=True (KPS 2019 baseline) — validated by smoke test, NOT module-load assert. - Tenacity NOT applied — pure local sklearn-style; no network retry. Second module after Phase 4j that diverges from osap.py:52-56 pattern; documented in module docstring. - Synthetic fixture inline as @pytest.fixture (NOT committed CSV/parquet) — IPCA inputs are numpy arrays, no roundtrip needed. Module layer (compute/features/ipca_factors.py, ~190 LOC): - IPCA_FITTED_ARTIFACTS_CACHE re-export from config - INSTRUMENTED_PCA_PUBLIC_API 8-tuple + module-load invariants (cardinality + uniqueness) - IPCA_DEFAULT_N_FACTORS / IPCA_DEFAULT_INTERCEPT constants - init_ipca(n_factors, intercept, **kwargs) → unfitted InstrumentedPCA - fit_ipca_panel(estimator, *, X, y, indices, **fit_kwargs) → fitted estimator (returns-self) Config layer (compute/config.py, +28 LOC): - IPCA_FITTED_ARTIFACTS_CACHE: Path = CACHE_DIR / "ipca" - IPCA_FITTED_ARTIFACTS_MAX_AGE_DAYS: int = 31 - IPCA_PUBLIC_API_METHOD_COUNT: int = 8 Tests (6 offline; ~228 LOC): 1. test_ipca_imports_and_exposes_instrumented_pca — primary CI signal (importorskip) 2. test_instrumented_pca_public_api_manifest_locks_8_methods — pure assertion (no ipca runtime) 3. test_instrumented_pca_public_api_matches_runtime_introspection — drift detector 4. test_ipca_fitted_artifacts_cache_under_repo_cache_dir — config sanity 5. test_init_ipca_returns_unfitted_estimator_with_kps_defaults — KPS defaults validation 6. test_fit_ipca_panel_on_synthetic_5x30x10_fixture — smoke fit; asserts Gamma (10,2) + Factors (2,30) + metad N/T/L pyproject.toml: append `ipca>=0.6.7,<0.7` to [factors] (authorized in advance via plan-mode approval; pin range because 2021-04-22 staleness). Ask-first surfaces touched: - pyproject.toml [factors] — extended (authorized via plan-mode) - ci.yml UNCHANGED ([dev,factors] install already covers new dep) - compute-rankings.yml UNTOUCHED per user hard constraint - Schema triple UNTOUCHED (no schema delta this scout) Verification (local sandbox without [factors] + CI with [dev,factors]): - ruff check . → clean - python -m compute.output.schema_check → in-sync - Import smoke: from compute.features.ipca_factors import init_ipca, fit_ipca_panel, INSTRUMENTED_PCA_PUBLIC_API → OK 8 - pytest tests/ -m "not network" excluding factor-extra files → 864 passed locally - 2/6 IPCA tests PASS locally; 4/6 SKIP via pytest.importorskip (expected — local lacks [factors]) - CI on 82ade3a with [dev,factors] → both Python+Frontend GREEN; 936 offline expected Defense layer unchanged at 17. Top-5 rotation unchanged. Schema unchanged at 0.9.1-phase4h.2. Out of scope (deferred to follow-on Phase 4k.1 integration PR, ~5-commit cluster): - Characteristics-matrix construction (which features feed X?) - Full IPCA fit on 502-ticker universe (data_type="portfolio" canonical scaling) - Walk-forward / rolling-window fit cadence - Latent-factor composite integration decision - Schema additions (StockDetail.ipca_loadings + Metadata.ipca_n_factors_eff + ipca_in_sample_r2) → bump 0.9.1-phase4h.2 → 0.10.0-phase4k - PBO/DSR doesn't apply (loadings ≠ portfolio returns); IC walk-forward observability instead per PLAN.md:36 - Top-5 rotation impact analysis (Rule 16 lock) - WRDS data backfill consideration Audit history: - Plan-audit round 1: 5 investigations verified · MIT lock · heavy-deps disclosure - Plan-audit round 2: Q1 (public-API surface lock) + Q2 (inline pytest.fixture) design choices applied - Plan-audit round 3: line citations verified - Implementation: main session direct (Phase 4j paste-loop precedent — worker session was stuck re-presenting plan) - CI green on 82ade3a: Python+Frontend both passing · Vercel ✅ READY - Conditional Mark-Ready authorization given · user confirmed CI green · squash merged Closes the factor-library scout cluster. Next: v1.1.0-phase4 tag readiness audit gated on 4 integration PRs. https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2

Part of epic #125 (Item #6 of 6). Pure tooling addition — no runtime / scoring / schema impact. Motivation ---------- PR #123 (2026-05-19, closed without merging): a worker session opened a Phase 4j + 4k scout duplicate on branch `claude/resume-quantrank-phase-4.5-Zh0pO` while the main session shipped the same work directly via PRs #119 (Qlib) + #121 (IPCA). Root cause: the worker session never inspected the `claude/*` branch list + recent PRs before writing code, producing 100% wasted effort. This change ships a preflight check that surfaces in-flight scope BEFORE any code is written, so the duplicate-PR failure mode is caught at the handoff-prompt entry rather than at PR review. Files (2 new, +271 LOC) ------------------------ - tools/check_branch_collisions.py (+149 LOC) — git-only preflight script. Lists active `claude/*` branches via `git ls-remote origin "refs/heads/claude/*"` and recent main-branch commits via `git log --since="48 hours ago" --oneline --no-merges origin/main`. Optional keyword args flag case-insensitive substring matches. Always exit 0 (informational only). - .claude/skills/branch-collision-check/SKILL.md (+122 LOC) — skill description with YAML frontmatter, trigger conditions (handoff prompts, Phase / issue / Item #N mentions, fresh worker sessions), skip conditions (doc-only chores, iteration #2+, user-authorized parallel work), sample output (clean + warning), and output-interpretation guidance pointing the caller to STOP + ask the user when any ⚠️ line surfaces. Design notes ------------ - Git-only data sources — no `gh` CLI / GitHub API auth required. Works in the QuantRank Claude Code Web sandbox where `gh` is unavailable, and on any contributor machine with bare git. - 48-hour window — matches typical worker ↔ main session handoff cadence; long enough to catch duplicate work, short enough to keep the output scannable. - Pure read-only — no destructive git ops, no branch creation, no push, no GitHub API mutation. Always returns exit 0; the caller decides whether to proceed. Verification ladder all green ------------------------------ - ruff check . → All checks passed - python tools/check_branch_collisions.py → lists 1 active claude/* branch + 16 recent commits (last 48h), exit 0 - python tools/check_branch_collisions.py "Alpha158" → fires ⚠️ on PR #119 commit "Alpha158 158-feature manifest", summary reports "1 potential scope collision(s) found", exit 0 - python tools/check_branch_collisions.py "Phase 99 nonsense" → no match, summary reports "No scope collisions detected", exit 0 - python tools/check_doc_test_counts.py → exit 0 (Item #2 guard still passes; new files don't introduce hardcoded counts) - python -m compute.output.schema_check → in sync (no schema touch) - python -m pytest tests/ -m "not network" → 959 passed (unchanged; tools/ + .claude/skills/ aren't imported by tests) - SKILL.md YAML frontmatter parses — confirmed via Claude Code's skill registry picking it up at module load Constraints honored ------------------- - No touch to compute/ / frontend/ / tests/ — tools/ + .claude/skills/ only - No network calls / no GitHub API auth — git remote ls + git log - No destructive actions — read-only preflight check - No push to main; no force-push; no --no-verify - No workflow_dispatch trigger (compute-rankings.yml untouched) Epic #125 status after this PR ------------------------------- Item #1 ✅ Hypothesis property tests (PR #127) Item #2 ✅ Strip hardcoded test counts + CI guard (PR #128) Item #4 ✅ Observability-before-wiring pattern (PR #129) Item #6 ✅ Branch-collision preflight (this PR) Items #3, #5 remain — separate PRs per epic decomposition. https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU Co-authored-by: Claude <noreply@anthropic.com>

…imization PR F) (#146) Sixth PR in the .md optimization sequence (Option D). Audit of 18 QR-origin skill descriptions found all are well-formed (parseable YAML, TRIGGER + SKIP clauses present, average 888 chars). The critical YAML bug (#119+#121 plain-scalar bug in branch-collision- check and pr-quality-gate) was already fixed in PR A. So PR F's remaining work is light polish, not structural change. Vendored skills (20) FROZEN per the boundary convention — Anthropic skills, mattpocock-* (8), karpathy-guidelines, thananon/9arm-skills (4), karpathy-llm-wiki are all upstream-only edits. Trim targets (cut redundancy, fix drift, add Thai triggers): 1. pr-quality-gate (1207 → ~1015): cut redundant "ALSO use right before flipping Draft→Ready" clause that duplicated the first TRIGGER ("before authorizing the Draft→Ready flip"). Tightened wrapping. 2. pr-iteration-flow (990 → ~890): cut redundant "ALSO use this skill as the default workflow harness any time a PR is open" that duplicated the TRIGGER list. Dropped stale "PR-3c → PR-3d → PR-20" historical reference. Added Thai trigger phrases "เช็ค CI" / "ดู PR" since the user invokes this skill in Thai. 3. phase-status-bump (918 → ~840): dropped two historical examples ("PR 3d → tag v0.6.0-phase3d" and "3a→3b, 3c→3d") that anchored the description to one shipped phase. Wording now phase-agnostic. 4. verify-production-output (1086 → ~870): compressed the "Surfaces..." enumeration of Section A-H content (was 8 detailed items; now 8 short items) without losing dispatch specificity. Added Thai trigger phrases "ตรวจ output" / "เช็ค production". Folded "ALSO use" into first TRIGGER as one phrase. YAML moved from plain scalar to `description: >` (folded block) on the 3 plain-scalar descriptions edited (pr-iteration-flow, phase-status-bump, verify-production-output) — same safety pattern PR A applied. Prevents the ' #' comment-eating bug from re-emerging if anyone adds a `#issue` reference later. Net token impact: ~-650 chars × ~0.25 tokens/char ≈ -162 tokens per session-start. Modest but compounds. Why not aggressive trim: - Each TRIGGER phrase + SKIP clause IS dispatch-useful — verified by sampling. Aggressive 50% cuts would risk dispatch quality. - Remaining 14 QR-origin skills already at 700-900 chars with no redundancy to remove. CLAUDE.md (181 → 181, lockstep): §Phase status — added PR #145 (E) to "Recently merged"; replaced "PR E in flight" with "PR F in flight" note explaining the audit found health. AGENTS.md (343 → 343, lockstep): §Phase + version state — optimization sequence tracker updated: PR E ✅, PR F in flight, PR G remaining. Next: PR G (PHASE_STATUS.md "Current State" summary at top + chronological table below). Co-authored-by: Claude <noreply@anthropic.com>

vercel Bot deployed to Preview May 19, 2026 09:33 View deployment

dackclup marked this pull request as ready for review May 19, 2026 10:47

dackclup merged commit f0ade65 into main May 19, 2026
4 checks passed

dackclup deleted the claude/resume-quantrank-phase-4.5-Zh0pO branch May 19, 2026 10:48

This was referenced May 19, 2026

Phase 4j.1 — Full Qlib Alpha158 integration (BYO adapter + 502-ticker feature compute + per-feature IC validation) #120

Open

feat(features): IPCA scout — ipca MIT install + InstrumentedPCA 8-method API surface lock + 6 synthetic-fixture tests #121

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ingest): Qlib scout — pyqlib MIT install + Alpha158 handler smoke + 158-feature manifest#119

feat(ingest): Qlib scout — pyqlib MIT install + Alpha158 handler smoke + 158-feature manifest#119
dackclup merged 1 commit into
mainfrom
claude/resume-quantrank-phase-4.5-Zh0pO

dackclup commented May 19, 2026

Uh oh!

vercel Bot commented May 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dackclup commented May 19, 2026

Summary

Pre-plan investigation results (verified 2026-05-19)

🚨 Critical scope decision — NO @network test for this scout

⚠️ CI install footprint disclosure

Module-name choice locked

Files

Tenacity policy NOT applied

Tests (6 offline; NO @network)

Verification ladder (8-step; STOP at step 8)

Ask-first surfaces touched

Out of scope (deferred to follow-on integration PR — ~5-commit cluster mirroring Phase 4h shape)

Risks (from plan, with post-implementation resolution)

Test plan

Uh oh!

vercel Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🚨 Critical scope decision — NO `@network` test for this scout

Tests (6 offline; NO `@network`)

vercel Bot commented May 19, 2026 •

edited

Loading