feat(features): IPCA scout — ipca MIT install + InstrumentedPCA 8-method API surface lock + 6 synthetic-fixture tests#123
Conversation
…e + 158-feature manifest Phase 4j scout PR. Mirrors the proven Phase 4i scout pattern (PR #114) for Microsoft Qlib's Alpha158 feature library. Scope is install + API surface + manifest verification ONLY; the yfinance-to-Qlib BYO adapter + full Alpha158 feature compute on the 502-ticker universe ships in a follow-on integration PR. **Pre-plan access-path discovery** (verified 2026-05-19; full record in ``compute/ingest/qlib_features.py`` module docstring): 1. **PyPI package**: ``pyqlib`` 0.9.7 (also 0.9.6 available). Other candidate names (``qlib``, ``microsoft-qlib``) return 404. 2. **License**: MIT (verified via wheel METADATA inspection — ``Classifier: License :: OSI Approved :: MIT License``). **No CC BY-NC complication** like JKP. Safe for Phase 6+ commercial roadmap. 3. **Data init**: ``qlib.init(provider_uri=..., region=REG_US)`` where ``REG_US = "us"``. **NO public US data bundle published by Qlib** — the ``provider_uri`` defaults to ``~/.qlib/qlib_data/cn_data`` (Chinese A-share, irrelevant for QuantRank); the US universe is BYO via local ``.bin`` files. 4. **Alpha158 surface**: ``qlib.contrib.data.handler.Alpha158`` → ``handler.fetch(col_set="feature")`` returns a DataFrame with ``(datetime, instrument)`` MultiIndex × 158 feature columns. The 158-name manifest is fetched via ``Alpha158DL.get_feature_config()[1]`` — captured at scout time and hardcoded for stability; offline test 3 below locks it against upstream drift. **Module** (``compute/ingest/qlib_features.py``, 186 LOC including docstring): - Module-name choice locked per architectural review: NOT ``compute/ingest/qlib.py``. Python's import resolution would treat the latter as the ``qlib`` package and shadow the actual installed PyPI package, breaking the entire integration. Distinct module name avoids the namespace collision. - ``QLIB_INSTRUMENTS_UNIVERSE = "sp500"`` — custom universe ID; integration PR registers this against Qlib's instruments API. - ``ALPHA158_FEATURE_NAMES: tuple[str, ...]`` — 158-name manifest hardcoded from ``Alpha158DL.get_feature_config()[1]`` at scout implementation time against pyqlib 0.9.7. Cardinality asserted at module load against ``config.ALPHA158_FEATURE_COUNT``. - ``init_qlib(provider_uri=None)`` — idempotent thin wrapper around ``qlib.init(provider_uri=..., region="us")``. Local import so the scout module loads even when ``[factors]`` extra isn't installed. - ``fetch_alpha158_features(*, instruments, start_time, end_time)`` — forward-compat wrapper around ``Alpha158(...).fetch(col_set= "feature")``. NOT exercised end-to-end by the scout (see §"No ``@network`` test" below). **Config** (``compute/config.py``, +23 LOC): new ``# --- Phase 4j scout: Microsoft Qlib (Alpha158) integration ---`` block adds: - ``QLIB_DATA_CACHE: Path = CACHE_DIR / "qlib" / "us_data"`` (gitignored — ``compute/cache/`` parent glob at .gitignore:221 covers it). - ``QLIB_DATA_MAX_AGE_DAYS: int = 31`` (BYO bundle, monthly refresh). - ``ALPHA158_FEATURE_COUNT: int = 158``. **pyproject.toml**: ``[factors]`` extra extended with ``pyqlib>=0.9.7,<0.10``. The ``<0.10`` cap pins against Qlib 0.10+ which may drift the feature set; offline test 3 will catch any drift on a deliberate version bump. **Tests** (``tests/test_ingest/test_qlib_features.py``, 113 LOC, 6 offline — NO ``@network``): 1. ``test_alpha158_feature_manifest_has_158_entries`` — primary CI signal. Pure cardinality + uniqueness check; survives even when the ``[factors]`` extra isn't installed. 2. ``test_alpha158_feature_manifest_first_5_anchor`` — anchors the K-bar leading features (``KMID, KLEN, KMID2, KUP, KUP2``) against the canonical Qlib v0.9.7 surface. 3. ``test_alpha158_feature_manifest_matches_runtime_introspection`` — hardcoded tuple must equal ``Alpha158DL.get_feature_config() [1]``. Wrapped in ``pytest.importorskip("qlib")``. The drift detector. 4. ``test_qlib_data_cache_constant_under_repo_cache_dir`` — config sanity + locks gitignore coverage via the ``compute/cache/`` parent glob. 5. ``test_init_qlib_passes_us_region_and_provider_uri`` — monkeypatch capture; asserts ``region="us"`` + provided ``provider_uri`` are passed through. 6. ``test_init_qlib_defaults_to_config_cache_when_no_uri`` — default ``provider_uri`` resolves to ``config.QLIB_DATA_CACHE``. **Critical scope decision — NO ``@network`` test for this scout**: Phase 4h scout (PR #110) and Phase 4i scout (PR #114) each had a ``@pytest.mark.network`` test that hit a remote CDN. **Qlib has no remote CDN** — its data flow is local-bin filesystem I/O, not download-from-network. The originally planned synthetic-OHLCV → ``.bin`` conversion → ``init_qlib`` → ``Alpha158.fetch`` smoke test was DROPPED post-investigation: pyqlib's PyPI wheel does NOT bundle the ``scripts/dump_bin.py`` utility needed for OHLCV → ``.bin`` conversion. That scaffolding is integration-PR scope. Test #3 (runtime introspection match) is the **replacement verification surface** — actually a stronger drift detector than the dropped end-to-end test would have been, because it asserts the hardcoded manifest matches upstream on every ``pip install``. **CI install footprint impact**: ~150-180 MB net-new. ``pyqlib`` pulls ~22 transitive deps including ``mlflow`` (~20 MB), ``lightgbm`` (~15 MB), ``cvxpy`` (~30 MB), ``pymongo``, ``redis`` client, ``gym``, ``jupyter``, ``nbconvert``. None of these heavy deps are actually consumed by the scout — they come along for the ride because pyqlib doesn't expose a ``[minimal]`` extra. CI cold- start latency bump is one-time per workflow; pip wheel caching mitigates subsequent runs. **Tenacity policy NOT applied**: Qlib's data flow is local filesystem I/O. No network retry semantics needed. This is the first ingest module in QuantRank that diverges from the canonical ``compute/ingest/osap.py:52-56`` retry decorator (documented explicitly in the module docstring). **Verification ladder** (steps 1-5 complete): - ``ruff check .`` → clean ✅ - ``pytest tests/ -m "not network"`` → **930 passed** (924 baseline + 6 new offline) ✅ - ``pytest -m network --run-network`` → 20 (unchanged; NO new ``@network``) ✅ - ``python -m compute.output.schema_check`` → in-sync (NO schema delta this scout) ✅ - ``python -c "from compute.ingest.qlib_features import init_qlib, fetch_alpha158_features, ALPHA158_FEATURE_NAMES; print('OK', len(ALPHA158_FEATURE_NAMES))"`` → ``OK 158`` ✅ Steps 6-8: ``git push`` → open Draft PR → ``subscribe_pr_activity`` + STOP for user audit + Mark-Ready authorization. **Ask-first surfaces touched**: NONE for the workflow / schema triple. ``pyproject.toml [factors]`` extra extended in this commit (authorized in advance via the plan-mode approval). ``.github/workflows/ci.yml`` unchanged (``[dev,factors]`` install already covers the new pyqlib dep). ``.github/workflows/compute-rankings.yml`` UNTOUCHED per user hard constraint. **Defense layer**: unchanged at 17. **Top-5 rotation**: unchanged. **Schema version**: unchanged at ``0.9.1-phase4h.2`` (no schema delta this scout). **Out of scope** (deferred to follow-on full Phase 4j integration PR, ~5-commit cluster like Phase 4h): - yfinance-to-Qlib BYO adapter (~150 LOC; ``compute/cache/prices/ *.parquet`` → Qlib ``.bin`` format conversion) - Full Alpha158 feature compute on 502-ticker universe (502 × N_dates × 158 DataFrame) - Per-feature cross-validation framework (PBO/DSR doesn't directly apply to per-stock-per-date features — walk-forward IC scoring per feature is the likely replacement) - Schema additions (``StockDetail.qlib_features`` + ``Metadata.qlib_features_used`` + IC observability) → bump ``0.9.1-phase4h.2 → 0.10.0-phase4j`` - ``compute/main.py`` wiring decision (observability-only? blended into composite? Phase-5 ML-meta-learner-only consumer?) - Top-5 rotation impact analysis (Rule 16 lock applies) https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU
…hod API surface lock + 6 synthetic-fixture tests Phase 4k scout — 4th and final of 4 factor-library scouts (4h OSAP ✅, 4i JKP ✅, 4j Qlib ✅, 4k IPCA). Installs ipca==0.6.7 (MIT, Buechner+Bybee 2019, github.com/bkelly-lab/ipca) and locks the InstrumentedPCA public API surface at module load with the 8-name tuple INSTRUMENTED_PCA_PUBLIC_API = (fit, get_factors, fit_path, predict, predict_panel, predict_portfolio, score, predictOOS) — drift detector against any future ipca>0.6.7 upgrade silently dropping or renaming a method (upstream last released 2021-04-22, ~5 years stale). Reference: Kelly, Pruitt, Su (2019) "Characteristics are covariances: A unified model of risk and return" JFE 134(3) 501-524. Files - compute/features/ipca_factors.py (NEW, ~140 LOC) — init_ipca() factory + fit_ipca_panel() wrapper + 8-method API manifest with module-load assertion against config.IPCA_PUBLIC_API_METHOD_COUNT - tests/test_features/test_ipca_factors.py (NEW, ~190 LOC) — 6 offline tests with inline @pytest.fixture synthetic 5x30x10 panel (np.random.RandomState(42), pandas MultiIndex shape matches maintainer's canonical ipca/test_ipca.py example). 4/6 use pytest.importorskip("ipca") for graceful skip when [factors] extra absent - compute/config.py — Phase 4k block: IPCA_FITTED_ARTIFACTS_CACHE, IPCA_FITTED_ARTIFACTS_MAX_AGE_DAYS, IPCA_PUBLIC_API_METHOD_COUNT=8 - pyproject.toml — append ipca>=0.6.7,<0.7 to [factors] (after pyqlib); pinned to 0.6.x band due to upstream staleness - PHASE_STATUS.md row 4 — promote 4j scout to ✅ shipped (PR #119) and mark 4k scout in-flight Structural distinctness vs prior 3 scouts: IPCA takes a panel (N entities × T dates × L characteristics) and produces Gamma (L×K loadings) + Factors (K×T latent factor returns) via ALS decomposition. Different shape from 4h/4i (factor returns CSV) and 4j (per-stock OHLCV → features). Characteristics-matrix construction, universe-wide fit on the 502-ticker universe, composite-blend decision, and schema additions are integration-PR scope (~Phase 4k.1). NO @network test — IPCA is a pure local sklearn-style decomposition with no remote endpoint to retry against (mirrors Phase 4j Qlib rationale). Test count: 930 baseline offline + 6 new = 936 offline. @network slot unchanged at 20. Heavy-deps disclosure: net-new transitives are numba (~50 MB w/ llvmlite ~30 MB) + tiny progressbar. CI install footprint bump ~50-80 MB — substantially lighter than Qlib's 150-180 MB. Notable upstream API divergence: InstrumentedPCA lacks transform / fit_transform (no sklearn TransformerMixin). Panel-prediction path uses fit + .Gamma/.Factors attrs + predict_panel(). Verification ladder all green: - ruff check . → All checks passed - pytest tests/ -m "not network" → 936 passed (1m48s) - pytest tests/test_features/test_ipca_factors.py → 6 passed - python -m compute.output.schema_check → in sync (no schema delta) - python -c module load → INSTRUMENTED_PCA_PUBLIC_API len == 8 ✓ - 5x30x10 fit produces Gamma.shape == (10, 2), Factors.shape == (2, 30), metad == {N=5, T=30, L=10} ✓ After this scout merges, all 4 factor-library scouts complete and v1.1.0-phase4 tag becomes eligible (gated on 4h.2 Part 2 / 4i.1 / 4j.1 / 4k.1 integration PRs landing). https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Closing as duplicate — Phase 4j (Qlib) and Phase 4k (IPCA) scouts already shipped via separate PRs while this branch was in flight:
Files in this PR are functionally equivalent to what's already on Production cron has already run successfully on commit Next deliverable per current planning: Phase 4h.2 Part 2 (tracked in updated issue #116) — fixes the 56-signal silent-drop gap discovered in the first 0.9.1 production cron + adds the — closed by Phase 4 auditor session, branch Generated by Claude Code |
Part of epic #125 (Item #6 of 6). Pure tooling addition — no runtime / scoring / schema impact. Motivation ---------- PR #123 (2026-05-19, closed without merging): a worker session opened a Phase 4j + 4k scout duplicate on branch `claude/resume-quantrank-phase-4.5-Zh0pO` while the main session shipped the same work directly via PRs #119 (Qlib) + #121 (IPCA). Root cause: the worker session never inspected the `claude/*` branch list + recent PRs before writing code, producing 100% wasted effort. This change ships a preflight check that surfaces in-flight scope BEFORE any code is written, so the duplicate-PR failure mode is caught at the handoff-prompt entry rather than at PR review. Files (2 new, +271 LOC) ------------------------ - tools/check_branch_collisions.py (+149 LOC) — git-only preflight script. Lists active `claude/*` branches via `git ls-remote origin "refs/heads/claude/*"` and recent main-branch commits via `git log --since="48 hours ago" --oneline --no-merges origin/main`. Optional keyword args flag case-insensitive substring matches. Always exit 0 (informational only). - .claude/skills/branch-collision-check/SKILL.md (+122 LOC) — skill description with YAML frontmatter, trigger conditions (handoff prompts, Phase / issue / Item #N mentions, fresh worker sessions), skip conditions (doc-only chores, iteration #2+, user-authorized parallel work), sample output (clean + warning), and output-interpretation guidance pointing the caller to STOP + ask the user when any⚠️ line surfaces. Design notes ------------ - Git-only data sources — no `gh` CLI / GitHub API auth required. Works in the QuantRank Claude Code Web sandbox where `gh` is unavailable, and on any contributor machine with bare git. - 48-hour window — matches typical worker ↔ main session handoff cadence; long enough to catch duplicate work, short enough to keep the output scannable. - Pure read-only — no destructive git ops, no branch creation, no push, no GitHub API mutation. Always returns exit 0; the caller decides whether to proceed. Verification ladder all green ------------------------------ - ruff check . → All checks passed - python tools/check_branch_collisions.py → lists 1 active claude/* branch + 16 recent commits (last 48h), exit 0 - python tools/check_branch_collisions.py "Alpha158" → fires⚠️ on PR #119 commit "Alpha158 158-feature manifest", summary reports "1 potential scope collision(s) found", exit 0 - python tools/check_branch_collisions.py "Phase 99 nonsense" → no match, summary reports "No scope collisions detected", exit 0 - python tools/check_doc_test_counts.py → exit 0 (Item #2 guard still passes; new files don't introduce hardcoded counts) - python -m compute.output.schema_check → in sync (no schema touch) - python -m pytest tests/ -m "not network" → 959 passed (unchanged; tools/ + .claude/skills/ aren't imported by tests) - SKILL.md YAML frontmatter parses — confirmed via Claude Code's skill registry picking it up at module load Constraints honored ------------------- - No touch to compute/ / frontend/ / tests/ — tools/ + .claude/skills/ only - No network calls / no GitHub API auth — git remote ls + git log - No destructive actions — read-only preflight check - No push to main; no force-push; no --no-verify - No workflow_dispatch trigger (compute-rankings.yml untouched) Epic #125 status after this PR ------------------------------- Item #1 ✅ Hypothesis property tests (PR #127) Item #2 ✅ Strip hardcoded test counts + CI guard (PR #128) Item #4 ✅ Observability-before-wiring pattern (PR #129) Item #6 ✅ Branch-collision preflight (this PR) Items #3, #5 remain — separate PRs per epic decomposition. https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU Co-authored-by: Claude <noreply@anthropic.com>
…ble skills (#132) 3-task housekeeping + tacit knowledge harvest. Docs/skills-only PR — no code, no schema delta, no test additions. Task A — SKILL.md schema-version table fixes --------------------------------------------- Two stale "in flight" entries flipped to merged + 1 new row inserted: - Row 0.9.0-phase4h: "(in flight in PR #112)" → "(PR #112 merged 2026-05-19)" - Row 0.9.1-phase4h.2: "(in flight in PR #<NEXT>)" → "(PR #118 merged 2026-05-19)" - NEW row 0.9.2-phase4h.2 (above 0.9.1) — PR #124 merged, multi-port OSAP adapter + osap_signals_dropped_no_long_short field, closing the 100-signal accounting equation; DSR sign-inversion deferred to Part 3 PHASE_STATUS.md row 4 ALSO has "Phase 4h.2 Part 2 in flight in this PR" staleness — confirmed via grep but DELIBERATELY not updated here per Task A explicit scope (SKILL.md only). Recommend a follow-up phase-status-bump PR after this lands. Task B — New worker-session-handoff skill ------------------------------------------ .claude/skills/worker-session-handoff/SKILL.md (+163 LOC). YAML frontmatter + 5 sections: - When to use vs inline (≤50 LOC single-file → inline; ≥2 files / new dep / code logic → handoff) - Constraint lock library (8 standard locks: composite/PHASE3, Rule 16, Rule 18, no-merge, no force-push, no --no-verify, no workflow_dispatch, schema triple) - Anti-pattern: paste-loop avoidance (single outer code-block fence; reference PR #123 as related-but-distinct paste-loop failure mode) - Template (paste-ready, single ```` outer code block with language tag ` text` so inner triple-backticks pass through) - Reference invocations + QuantRank precedents (PR #124, #127, #131) Codifies the handoff shape that appeared verbatim across PRs #123, #124, #127, #128, #129, #131 — user copies ONE block instead of editing 5 template snippets per handoff. Task C — Portable skills library (4 skills, +417 LOC) ----------------------------------------------------- Audit step (per spec): read CLAUDE.md + AGENTS.md + SKILL.md + WORKFLOW.md + PR descriptions of #112/#118/#124/#127/#128/#129/#131. Identified 7 candidate patterns; classified by portability: - ✅ scout-then-integrate (portable; vendoring pattern, no QR logic) - ✅ observability-before-wiring (portable; gate-diagnostic pattern) - ✅ drift-detector-manifest (portable; API surface lock pattern) - ✅ schema-triple-lockstep (portable; Python/TS JSON contract) - 🟡 annotate-before-veto (portable; progressive rollout — DEFERRED to follow-up issue, lower value vs the 4 shipped) - 🟡 pre-plan-investigations (subsumed by scout-then-integrate's Phase 1 § "Pre-plan investigations" — no separate skill needed) - 🟡 graceful-degradation-try-except (portable; error-handling pattern — DEFERRED to follow-up issue, the wrapper is generally 1-line so doesn't warrant a dedicated skill) 4 shipped (each ≤ 109 LOC): .claude/skills/portable-scout-then-integrate/SKILL.md (99 LOC) .claude/skills/portable-drift-detector-manifest/SKILL.md (109 LOC) .claude/skills/portable-schema-triple-lockstep/SKILL.md (103 LOC) .claude/skills/portable-observability-before-wiring/SKILL.md (106 LOC) Flat naming convention (`portable-<name>/SKILL.md` at depth 1 from `.claude/skills/`) because Claude Code's skill registry doesn't recurse into nested subdirectories per CLAUDE.md ## Conventions. Confirmed via session reload — all 4 portable + worker-session- handoff registered correctly. Each portable skill has: - YAML frontmatter (name + description + TRIGGER + SKIP) - ## Pattern section (generic, no QR business logic) - ## Trigger conditions + ## Skip conditions - ## QuantRank precedent (1 paragraph, clearly labeled as precedent not pattern definition) Task C constraint check: - All portable skills core pattern descriptions are project- agnostic (read `.claude/skills/portable-*/SKILL.md` ## Pattern sections — zero references to OSAP / IPCA / pillar / Top-5 inside the pattern body; only inside the labeled "QuantRank precedent" section at the bottom) - 3 of 4 portable skills are 103-109 LOC (slightly over the 100-LOC target — pattern + trigger + skip + precedent sections require ~25 LOC each, leaving ~25 LOC of unavoidable scaffold). The 99-LOC one (scout-then-integrate) shows the cap is achievable but tight. Files (6 changed, +580 LOC, no deletions) ------------------------------------------ - SKILL.md — schema-version table fixes (Task A) - 5 new SKILL.md files in .claude/skills/ (Tasks B + C) Verification ladder all green ------------------------------ - ruff check . → All checks passed - python tools/check_doc_test_counts.py → exit 0 - python tools/check_branch_collisions.py "skill" "portable" → expected⚠️ on #131 (own adjacent work, not a duplicate) - python -m compute.output.schema_check → in sync (no schema touch) - python -m pytest tests/ -m "not network" → 959 passed (unchanged; tools/ + .claude/skills/ aren't imported by tests) - Claude Code skill registry pick-up verified via session reload — all 5 new skills (worker-session-handoff + 4 portable-*) appear in the available-skills list Constraints honored ------------------- - No touch to compute/ / frontend/ / tests/ - No touch to PHASE_STATUS.md / WORKFLOW.md (Task A scope = SKILL.md only; PHASE_STATUS.md staleness flagged for follow-up) - No push to main; no force-push; no --no-verify - No workflow_dispatch trigger - Task C portable skills are project-agnostic in their pattern description (QR refs confined to labeled "precedent" sections) Follow-up issue (to file post-merge) ------------------------------------ Title: "Portable Skills Library — extract remaining tacit patterns" - annotate-before-veto (progressive rule rollout) - graceful-degradation-try-except (1-line wrapper guidance) - pre-plan-investigations as standalone (currently subsumed) - Anything else surfaced by future PR descriptions https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU Co-authored-by: Claude <noreply@anthropic.com>
…sk C.1 recovery) (#135) * docs(skills): SKILL.md schema bump + worker-session-handoff + 4 portable skills 3-task housekeeping + tacit knowledge harvest. Docs/skills-only PR — no code, no schema delta, no test additions. Task A — SKILL.md schema-version table fixes --------------------------------------------- Two stale "in flight" entries flipped to merged + 1 new row inserted: - Row 0.9.0-phase4h: "(in flight in PR #112)" → "(PR #112 merged 2026-05-19)" - Row 0.9.1-phase4h.2: "(in flight in PR #<NEXT>)" → "(PR #118 merged 2026-05-19)" - NEW row 0.9.2-phase4h.2 (above 0.9.1) — PR #124 merged, multi-port OSAP adapter + osap_signals_dropped_no_long_short field, closing the 100-signal accounting equation; DSR sign-inversion deferred to Part 3 PHASE_STATUS.md row 4 ALSO has "Phase 4h.2 Part 2 in flight in this PR" staleness — confirmed via grep but DELIBERATELY not updated here per Task A explicit scope (SKILL.md only). Recommend a follow-up phase-status-bump PR after this lands. Task B — New worker-session-handoff skill ------------------------------------------ .claude/skills/worker-session-handoff/SKILL.md (+163 LOC). YAML frontmatter + 5 sections: - When to use vs inline (≤50 LOC single-file → inline; ≥2 files / new dep / code logic → handoff) - Constraint lock library (8 standard locks: composite/PHASE3, Rule 16, Rule 18, no-merge, no force-push, no --no-verify, no workflow_dispatch, schema triple) - Anti-pattern: paste-loop avoidance (single outer code-block fence; reference PR #123 as related-but-distinct paste-loop failure mode) - Template (paste-ready, single ```` outer code block with language tag ` text` so inner triple-backticks pass through) - Reference invocations + QuantRank precedents (PR #124, #127, #131) Codifies the handoff shape that appeared verbatim across PRs #123, #124, #127, #128, #129, #131 — user copies ONE block instead of editing 5 template snippets per handoff. Task C — Portable skills library (4 skills, +417 LOC) ----------------------------------------------------- Audit step (per spec): read CLAUDE.md + AGENTS.md + SKILL.md + WORKFLOW.md + PR descriptions of #112/#118/#124/#127/#128/#129/#131. Identified 7 candidate patterns; classified by portability: - ✅ scout-then-integrate (portable; vendoring pattern, no QR logic) - ✅ observability-before-wiring (portable; gate-diagnostic pattern) - ✅ drift-detector-manifest (portable; API surface lock pattern) - ✅ schema-triple-lockstep (portable; Python/TS JSON contract) - 🟡 annotate-before-veto (portable; progressive rollout — DEFERRED to follow-up issue, lower value vs the 4 shipped) - 🟡 pre-plan-investigations (subsumed by scout-then-integrate's Phase 1 § "Pre-plan investigations" — no separate skill needed) - 🟡 graceful-degradation-try-except (portable; error-handling pattern — DEFERRED to follow-up issue, the wrapper is generally 1-line so doesn't warrant a dedicated skill) 4 shipped (each ≤ 109 LOC): .claude/skills/portable-scout-then-integrate/SKILL.md (99 LOC) .claude/skills/portable-drift-detector-manifest/SKILL.md (109 LOC) .claude/skills/portable-schema-triple-lockstep/SKILL.md (103 LOC) .claude/skills/portable-observability-before-wiring/SKILL.md (106 LOC) Flat naming convention (`portable-<name>/SKILL.md` at depth 1 from `.claude/skills/`) because Claude Code's skill registry doesn't recurse into nested subdirectories per CLAUDE.md ## Conventions. Confirmed via session reload — all 4 portable + worker-session- handoff registered correctly. Each portable skill has: - YAML frontmatter (name + description + TRIGGER + SKIP) - ## Pattern section (generic, no QR business logic) - ## Trigger conditions + ## Skip conditions - ## QuantRank precedent (1 paragraph, clearly labeled as precedent not pattern definition) Task C constraint check: - All portable skills core pattern descriptions are project- agnostic (read `.claude/skills/portable-*/SKILL.md` ## Pattern sections — zero references to OSAP / IPCA / pillar / Top-5 inside the pattern body; only inside the labeled "QuantRank precedent" section at the bottom) - 3 of 4 portable skills are 103-109 LOC (slightly over the 100-LOC target — pattern + trigger + skip + precedent sections require ~25 LOC each, leaving ~25 LOC of unavoidable scaffold). The 99-LOC one (scout-then-integrate) shows the cap is achievable but tight. Files (6 changed, +580 LOC, no deletions) ------------------------------------------ - SKILL.md — schema-version table fixes (Task A) - 5 new SKILL.md files in .claude/skills/ (Tasks B + C) Verification ladder all green ------------------------------ - ruff check . → All checks passed - python tools/check_doc_test_counts.py → exit 0 - python tools/check_branch_collisions.py "skill" "portable" → expected⚠️ on #131 (own adjacent work, not a duplicate) - python -m compute.output.schema_check → in sync (no schema touch) - python -m pytest tests/ -m "not network" → 959 passed (unchanged; tools/ + .claude/skills/ aren't imported by tests) - Claude Code skill registry pick-up verified via session reload — all 5 new skills (worker-session-handoff + 4 portable-*) appear in the available-skills list Constraints honored ------------------- - No touch to compute/ / frontend/ / tests/ - No touch to PHASE_STATUS.md / WORKFLOW.md (Task A scope = SKILL.md only; PHASE_STATUS.md staleness flagged for follow-up) - No push to main; no force-push; no --no-verify - No workflow_dispatch trigger - Task C portable skills are project-agnostic in their pattern description (QR refs confined to labeled "precedent" sections) Follow-up issue (to file post-merge) ------------------------------------ Title: "Portable Skills Library — extract remaining tacit patterns" - annotate-before-veto (progressive rule rollout) - graceful-degradation-try-except (1-line wrapper guidance) - pre-plan-investigations as standalone (currently subsumed) - Anything else surfaced by future PR descriptions https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU * docs(skills): Vendor karpathy-guidelines (Task C.1 recovery) + THIRD_PARTY_NOTICES.md Recovers Task C.1 from the original handoff that was silent-dropped in the prior PR #132 commit (50da720). The handoff explicitly named "Vendor karpathy-guidelines (1 skill, ~70 LOC)" as part of the portable skills library; the auditor session caught the omission and authorized this follow-up commit on the existing branch. Files (2 new, +138 LOC) ------------------------ - .claude/skills/portable-karpathy-guidelines/SKILL.md (+82 LOC) — vendored content of upstream skills/karpathy-guidelines/SKILL.md (67 LOC, byte-for-byte preserved) + 15-line appended attribution block referencing the upstream source, commit SHA, and the Karpathy tweet that motivated the guidelines. - THIRD_PARTY_NOTICES.md (+56 LOC, NEW at repo root) — third-party license disclosures. Section "karpathy-guidelines (Claude Code skill)" carries source URL, license declaration, vendored path, vendored date, upstream commit SHA, upstream first-commit date, and the full standard MIT License text with copyright attributed to "multica-ai contributors" (upstream has no individual copyright line and no standalone LICENSE file; the `license: MIT` claim appears in upstream README.md § License and each skill's YAML frontmatter). Upstream provenance ------------------- - Source: https://github.com/multica-ai/andrej-karpathy-skills - Upstream HEAD SHA at vendoring: 2c606141936f1eeef17fa3043a72095b4765b9c2 - Upstream first commit: 2026-01-27 - Vendored date: 2026-05-20 - License: MIT (declared) Verbatim content preserved -------------------------- `diff /tmp/karpathy-src/skills/karpathy-guidelines/SKILL.md .claude/skills/portable-karpathy-guidelines/SKILL.md` shows ONLY the 15-line appended attribution block at lines 68-82. The upstream 67-line content (YAML frontmatter + "Karpathy Guidelines" heading + the 4 principles) is byte-for-byte unchanged. Per the spec constraint: "เก็บ 4 principles verbatim. แก้ได้แค่ 'เพิ่ม' attribution block ท้ายไฟล์". License-disclosure caveat ------------------------- Upstream `multica-ai/andrej-karpathy-skills` declares MIT via README + YAML frontmatter but does NOT ship a standalone LICENSE file. The `THIRD_PARTY_NOTICES.md` entry includes the standard MIT License template with copyright attributed to the GitHub org ("multica-ai contributors"), matching the principle that an MIT declaration without a formal copyright line still licenses to the redistributor; the attribution is conservative. Verification ladder all green ------------------------------ - ruff check . → All checks passed - python tools/check_doc_test_counts.py → exit 0 (no test-count drift introduced by this commit) - python tools/check_branch_collisions.py "karpathy" → no scope collisions detected - python -m compute.output.schema_check → in sync (no schema touch) - python -m pytest tests/ -m "not network" → 959 passed (unchanged; .claude/skills/ + THIRD_PARTY_NOTICES.md aren't imported by tests) - Skill registry pickup verified via session reload — `portable-karpathy-guidelines` appears in the available-skills list with the upstream description verbatim Constraints honored ------------------- - No squash / amend of the prior 50da720 commit — this is a fresh commit pushed on top of the existing branch (per spec "ห้าม squash old commit") - No touch to the 4 already-shipped portable skills in 50da720 - No touch to compute/ / frontend/ / tests/ - No push to main; no force-push; no --no-verify - No workflow_dispatch trigger - Karpathy SKILL.md upstream content preserved verbatim; only the attribution block appended below the original content PR description update will follow as a separate `gh pr edit` / MCP `update_pull_request` call so the new "License Compliance" section + the audit-table row for karpathy-guidelines land in the PR body. https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU --------- Co-authored-by: Claude <noreply@anthropic.com>
…PR A) (#141) First PR in the multi-PR .md optimization sequence (Option D scope — yกเครื่อง). PR A is the low-risk baseline: fixes 2 broken skill frontmatters that prevent dispatch + drift-fixes 4 stale facts in agent docs. Critical YAML fix: - branch-collision-check/SKILL.md and pr-quality-gate/SKILL.md had multi-line `description:` plain-scalar frontmatter that PyYAML (and Claude Code's skill loader) couldn't parse because lines contain `#123` / `#X` issue references after whitespace — YAML treats ` #` as a comment marker, so everything after the first comment-trigger got eaten and the loader fell back to displaying `name: name` in the available-skills list. Both skills were effectively undispatchable from any session. - Fix: change `description:` to `description: >` (folded block scalar) so newlines become spaces and `#` mid-content is treated as literal text. Verified live in this session — system reminder now shows the full TRIGGER/SKIP descriptions for both. Stale-fact pass: - .claude/skills/README.md L14-16: "27 invocation-triggerable skills" → references CLAUDE.md as the canonical count (38) to prevent future drift. Future top-level skill add/remove only needs to bump CLAUDE.md §Layout, not three files. - AGENTS.md L104: ".claude/skills/ # 24 loaded skills" → 38. - AGENTS.md L287: "Schema version: 0.8.0-phase4.5f" → 0.9.2-phase4h.2 (3 versions behind). Now references SKILL.md schema-version table for full history. - CLAUDE.md L181-192 (§Phase status): "Current schema 0.9.1-phase4h.2 ... Phase 4h in flight in PR #112" → 0.9.2-phase4h.2 + Phase 4h shipped (Parts 1+2 done via #112/#118/#124). - CLAUDE.md + AGENTS.md §Phase status: "Epic #125 Item 3 in flight via PR #140" → "PR 1 of 2 shipped" at commit a52aa2d; PR 2 remaining. CLAUDE.md + AGENTS.md edit ships per the lockstep convention. No code touched, no schema touched — pre-merge-prod-sim.yml won't trigger (paths compute/scoring + compute/features unaffected). Next in optimization sequence: PR B (CLAUDE.md token diet) — TBD after user reviews this one. Co-authored-by: Claude <noreply@anthropic.com>
…em 6) (#203) New tools/check_cross_session_collision.py hits the GitHub API for claude/* branches updated in the last 7 days and open PRs matching a scope keyword, exits 1 on collision and 0 when clean. Authenticated via GH_TOKEN / GITHUB_TOKEN env vars or gh CLI; exits 2 with a clear message when no auth is available (no silent failure). New .claude/skills/cross-session-collision-check/SKILL.md wraps the script with trigger/skip conditions, false-positive guard (merged+closed branches excluded by GitHub API design), auth instructions, and a comparison table vs the existing git-only branch-collision-check skill. phase-coordinator Mode A updated to run BOTH skills in sequence: Step 1 git-only (branch-collision-check, no auth, 48h window), Step 2 GitHub API (cross-session-collision-check, GH_TOKEN, 7d window). Together they cover the full failure-mode space that produced PR #123. CLAUDE.md + AGENTS.md updated in lockstep: skill count 42 -> 43, phase status entry added, AGENTS.md phase-coordinator skill list updated. Verification: ruff clean on new script; 1054 offline tests pass (no baseline change — tooling-only PR, no new tests). https://claude.ai/code/session_01D6NTyJZa5LWHWakbF5dT29 Co-authored-by: Claude <noreply@anthropic.com>
Summary
Phase 4k scout — 4th and final of 4 factor-library scouts (4h OSAP ✅, 4i JKP ✅, 4j Qlib ✅, 4k IPCA). Installs
ipca==0.6.7(MIT, Buechner+Bybee 2019, github.com/bkelly-lab/ipca) and locks theInstrumentedPCApublic API surface at module load.Reference: Kelly, Pruitt, Su (2019). "Characteristics are covariances: A unified model of risk and return." JFE 134(3), 501-524.
After this scout merges, all 4 factor-library scouts complete and
v1.1.0-phase4tag becomes eligible (gated on the 4h.2 Part 2 / 4i.1 / 4j.1 / 4k.1 integration PRs landing — ~6-8w combined effort, separate session per CLAUDE.md multi-session audit pattern).Structural distinctness vs prior 3 scouts
Gamma(L×K loadings) +Factors(K×T returns) from ALS decomposition5 pre-plan investigations (verbatim, 2026-05-19, against
ipca==0.6.7)pip index versions ipca→ipca 0.6.7latest.ipca-py/pyipca404. Last upstream release 2021-04-22 — ~5 years stale → pinned to0.6.xband.ipca-0.6.7.dist-info/LICENSE.md):MIT License - Copyright (c) [2019] [Matthias Buechner, Leland Bybee]. Same as Qlib 4j, unlike JKP 4i's CC BY-NC 4.0 → no Phase 6+ commercial complication.ipca/ipca.py) — 8 public methods onInstrumentedPCA(BaseEstimator):transform/fit_transform(sklearn TransformerMixin pattern absent). Usefit+.Gamma/.Factorsattrs +predict_panel()for the panel-prediction path.RegressorMixinimported in source but unused.ipca/test_ipca.py):(entity, time), or numpy ndarray + explicitindicesarg.data_type="portfolio"is the recommended scaling path (ALS on Q matrix, not raw panel).[factors]baseline:numba(~50 MB w/ llvmlite ~30 MB) +progressbar(~50 KB)scipy/joblib/scikit-learnalready in tree via Phase 4h/4i.NO
@networktest (deliberate)IPCA is a pure local sklearn-style decomposition — there is no remote endpoint to retry against (unlike OSAP 4h's Chen-Zimmermann CDN or JKP 4i's S3 bucket). Mirrors Phase 4j Qlib rationale at
compute/ingest/qlib_features.py:23-30. Scout ships 6 offline tests / 0@network.Files (5 changed, +380 / −1)
compute/features/ipca_factors.py(NEW, ~140 LOC) —init_ipca()factory +fit_ipca_panel()wrapper +INSTRUMENTED_PCA_PUBLIC_API8-method tuple with module-loadassertagainstconfig.IPCA_PUBLIC_API_METHOD_COUNT. NOT tenacity-wrapped (no network).tests/test_features/test_ipca_factors.py(NEW, ~190 LOC) — 6 offline tests with inline@pytest.fixturesynthetic 5×30×10 panel (np.random.RandomState(42), pandas MultiIndex shape matches maintainer's canonicalipca/test_ipca.py). 4/6 tests usepytest.importorskip("ipca")for graceful skip when[factors]extra absent.compute/config.py— Phase 4k block:IPCA_FITTED_ARTIFACTS_CACHE,IPCA_FITTED_ARTIFACTS_MAX_AGE_DAYS=31,IPCA_PUBLIC_API_METHOD_COUNT=8.pyproject.toml— appendipca>=0.6.7,<0.7to[factors](afterpyqlib); pinned to0.6.xband due to upstream staleness.PHASE_STATUS.mdrow 4 — promote 4j scout to ✅ shipped (PR feat(ingest): Qlib scout — pyqlib MIT install + Alpha158 handler smoke + 158-feature manifest #119) and mark 4k scout in-flight.Out of scope (deferred to ~Phase 4k.1 integration PR)
data_type="portfolio"recommended)StockDetail.ipca_loadings,Metadata.ipca_in_sample_r2, etc.) — schema bump0.9.1-phase4h.2 → 0.10.0-phase4kdeferred.compute/validation/pbo_dsr.pydoesn't directly apply — integration PR will need OOS R² + IC walk-forward observability instead (per PLAN.md acceptance criteria: ≥30% in-sample R², IC > 0.05 OOS).Test plan
ruff check .→ All checks passedpython -m pytest tests/ -m "not network"→ 936 passed (930 baseline + 6 new, 1m48s)python -m pytest tests/test_features/test_ipca_factors.py -v→ 6 passed (withipcainstalled); 2 passed + 4 skipped (without — importorskip works as expected)python -m compute.output.schema_check→ snapshot in sync (no schema delta)python -c "from compute.features.ipca_factors import INSTRUMENTED_PCA_PUBLIC_API; print(len(INSTRUMENTED_PCA_PUBLIC_API))"→8Gamma.shape == (10, 2)(L × n_factors) ✓Factors.shape == (2, 30)(n_factors × T) ✓metad == {N: 5, T: 30, L: 10}✓claude/resume-quantrank-phase-4.5-Zh0pORisks
ipcalast released 2021-04-22 — 5 years stale. Mitigation:>=0.6.7,<0.7pin + module-load API-surface assertion catches any silent drift on future upgrade.numbais the heavy transitive (~50 MB w/ llvmlite ~30 MB) — can fail on some Python/glibc combos. Mitigated by CI's Ubuntu runner; if cold-start install fails, escalate via fix-amend.InstrumentedPCAlackstransform/fit_transform(sklearn pattern absent) — documented in module docstring; scout usesfit+.Gamma/.Factors+predict_panel()instead.pytest.importorskip("ipca")masks real failures when[factors]absent — acceptable per the established Phase 4h/4i/4j precedent; CI always installs[factors]so real failures still surface.https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU
Generated by Claude Code