feat(features): IPCA scout — ipca MIT install + InstrumentedPCA 8-method API surface lock + 6 synthetic-fixture tests#121
Merged
Conversation
…hod API surface lock + 6 synthetic-fixture tests Phase 4k scout PR — final of 4 factor-library scouts (OSAP ✅ #110, JKP ✅ #114, Qlib ✅ #119, IPCA THIS). Ships `ipca` install + 8-method public-API surface lock + 6 offline tests + inline synthetic fixture. NO production wiring; characteristics-matrix construction + universe-wide IPCA fit + composite blend decision are integration-PR scope (Phase 4k.1). After this merges → all 4 factor scouts done → eligible for v1.1.0-phase4 tag readiness audit (gated on 4h.2 Part 2 + 4i.1 + 4j.1 + 4k.1 integration PRs landing). 5 pre-plan investigations (verified 2026-05-19, carried verbatim into module docstring): 1. PyPI package: `ipca` (0.6.7); 29 historical versions back to 0.1; last release 2021-04-22 (~5 years stale). Pin tightly `>=0.6.7,<0.7`. 2. License: MIT (LICENSE.md verbatim — Buechner / Bybee 2019). No CC BY-NC complication unlike JKP. Safe for Phase 6+ commercial roadmap. 3. sklearn-compatible API surface: 8 public methods — fit / get_factors / fit_path / predict / predict_panel / predict_portfolio / score / predictOOS. NO transform/fit_transform (user brief assumed these; they don't exist in 0.6.7). Post-fit attrs: Gamma (L×K) / Factors (K×T) / metad dict / n_factors_eff / has_PSF / PSFcase. 4. Data requirements: MultiIndex (entity, date) DataFrame OR explicit indices array. Min stable shape 10 firms × 20 years × 2 chars (maintainer's test). NaN handling internal. Unbalanced panels supported. 5. CI install footprint: ~50-80 MB net-new (numba ~50 MB + llvmlite ~30 MB). Substantially lighter than Qlib's 150-180 MB (no mlflow / cvxpy / jupyter). IPCA structural shape — 4th distinct vs prior scouts: - OSAP (4h): factor returns CSV → proxy/36m regression - JKP (4i): factor returns CSV → 36m regression - Qlib (4j): per-stock per-date features → native Alpha158 - IPCA (4k): panel decomposition → Gamma (L×K loadings) + Factors (K×T) latent returns Critical scope decision — NO @network test (mirrors Phase 4j Qlib rationale): IPCA is pure local sklearn-style computation. No remote endpoint to network-test (unlike OSAP 4h's CDN or JKP 4i's S3 bucket). Scout ships 6 offline tests / 0 @network. Test count delta: 930 baseline + 6 offline = ~936 in CI. Architectural locks: - Module placement `compute/features/ipca_factors.py` (NOT compute/ingest/) per pre-existing `.claude/skills/phase-4/ipca-factor-fit/PLAN.md:24` and `compute/features/osap_replicate.py` precedent. No namespace collision (module is `ipca_factors`, PyPI package is `ipca`) — Phase 4j's `qlib_features.py` workaround doesn't apply. - INSTRUMENTED_PCA_PUBLIC_API 8-method tuple — drift detector; module-load assertion against config.IPCA_PUBLIC_API_METHOD_COUNT. Catches future `ipca>0.6.7` API renames. - IPCA_DEFAULT_N_FACTORS=5, IPCA_DEFAULT_INTERCEPT=True (KPS 2019 baseline) — validated by smoke test, NOT module-load assert (defaults are our choice, not external surface). - Tenacity NOT applied — pure local sklearn-style; no network retry. First-class divergence from osap.py:52-56 pattern; documented in module docstring. - Synthetic fixture inline as @pytest.fixture (NOT committed CSV/parquet) — IPCA inputs are numpy arrays, no roundtrip needed. Module layer (compute/features/ipca_factors.py, ~190 LOC including extensive docstring): - IPCA_FITTED_ARTIFACTS_CACHE re-export from config - INSTRUMENTED_PCA_PUBLIC_API 8-tuple + module-load invariants (cardinality + uniqueness) - IPCA_DEFAULT_N_FACTORS / IPCA_DEFAULT_INTERCEPT constants - init_ipca(n_factors, intercept, **kwargs) → unfitted InstrumentedPCA - fit_ipca_panel(estimator, *, X, y, indices, **fit_kwargs) → fitted estimator Config layer (compute/config.py, +28 LOC): - IPCA_FITTED_ARTIFACTS_CACHE: Path = CACHE_DIR / "ipca" - IPCA_FITTED_ARTIFACTS_MAX_AGE_DAYS: int = 31 - IPCA_PUBLIC_API_METHOD_COUNT: int = 8 Tests (6 offline; ~190 LOC): 1. test_ipca_imports_and_exposes_instrumented_pca — primary CI signal (importorskip) 2. test_instrumented_pca_public_api_manifest_locks_8_methods — pure assertion, no ipca runtime 3. test_instrumented_pca_public_api_matches_runtime_introspection — drift detector 4. test_ipca_fitted_artifacts_cache_under_repo_cache_dir — config sanity 5. test_init_ipca_returns_unfitted_estimator_with_kps_defaults — defaults validation 6. test_fit_ipca_panel_on_synthetic_5x30x10_fixture — smoke fit; asserts Gamma (10,2) + Factors (2,30) + metad N/T/L pyproject.toml: append `ipca>=0.6.7,<0.7` to `[factors]` (authorized in advance via plan-mode approval; pin range because 2021-04-22 staleness). Ask-first surfaces touched: - pyproject.toml [factors] — extended (authorized via plan-mode) - ci.yml UNCHANGED ([dev,factors] install already covers new dep) - compute-rankings.yml UNTOUCHED per user hard constraint - Schema triple UNTOUCHED (no schema delta this scout) Verification (local, sandbox without [factors]): - ruff check . → clean (auto-fix on import-block sort) - python -m compute.output.schema_check → in-sync - Import smoke: from compute.features.ipca_factors import init_ipca, fit_ipca_panel, INSTRUMENTED_PCA_PUBLIC_API → OK 8 - pytest tests/ -m "not network" excluding factor-extra files → 864 passed - 2 of 6 IPCA tests PASS locally (#2 manifest cardinality + #4 cache path); 4 SKIP via pytest.importorskip("ipca") (expected — local lacks [factors] extra) - CI with [dev,factors] will run all 6 → ~936 offline expected (930 baseline + 6 new) Defense layer unchanged at 17. Top-5 rotation unchanged. Schema unchanged at 0.9.1-phase4h.2. Out of scope (deferred to follow-on Phase 4k.1 integration PR, ~5-commit cluster): - Characteristics-matrix construction (which Phase 3 + OSAP/JKP/Qlib features feed X?) - Full IPCA fit on 502-ticker universe (data_type="portfolio" canonical scaling) - Walk-forward / rolling-window fit cadence - Latent-factor composite integration decision (observability-only? Phase 5 ML-meta-learner consumer?) - Schema additions (StockDetail.ipca_loadings + Metadata.ipca_n_factors_eff + ipca_in_sample_r2) → bump 0.9.1-phase4h.2 → 0.10.0-phase4k - PBO/DSR doesn't apply (loadings ≠ portfolio returns); IC walk-forward observability instead per PLAN.md:36 - Top-5 rotation impact analysis (Rule 16 lock) Audit history: - Plan-audit round 1: 5 investigations verified · MIT lock · heavy-deps disclosure - Plan-audit round 2: Q1 (public-API surface lock) + Q2 (inline pytest.fixture) design choices applied - Plan-audit round 3: line citations verified (ipca PyPI · config.py:200-221 · pyproject.toml:36-45 · PLAN.md:24 + L36) - Implementation: main session direct (worker session paste-loop bypassed per Phase 4j precedent) - Local verification: ruff clean · schema_check in-sync · 864 offline passing · 2 IPCA tests pass + 4 graceful skip Closes the factor-library scout cluster. Next: v1.1.0-phase4 tag readiness audit gated on 4 integration PRs. https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This was referenced May 19, 2026
dackclup
added a commit
that referenced
this pull request
May 20, 2026
…rop diagnostic + schema 0.9.2 (#124) Closes #116 (Part 2 scope). Phase 4h.2 Part 2 closes the OSAP 100-signal accounting gap that Part 1 made visible. Production cron at commit 182c02d (version 0.9.1-phase4h.2) exposed the imbalance: 22 missing_from_dataset + 22 gate_diagnostics + 0 signals_used = 44 — leaving 56 signals UNACCOUNTED for between the dataset rows and the gate. Root cause was the hardcoded port=01 / port=10 filter in `compute/features/osap_replicate.py::compute_long_short_returns` at L60,65,120,135-136: OSAP delivers some signals as quintile (ports 01..05) or tercile (01..03), and the global pre-filter dropped every row that didn't match port=10 — those signals silently disappeared before reaching the PBO/DSR gate. Sub-task 1 — Multi-port adapter (compute/features/osap_replicate.py) --------------------------------------------------------------------- Replaced the hardcoded constants `LONG_PORT_LABEL` / `SHORT_PORT_LABEL` with per-signal `min(port)` / `max(port)` inference. Algorithm: 1. groupby("signalname") to derive each signal's port extents 2. long_port = min(unique ports), short_port = max(unique ports) 3. signals with fewer than 2 distinct ports are dropped (no LS pair) 4. pivot per-signal with "long" / "short" role columns so the LS axis is stable across heterogeneous port cardinalities Decile signals (01..10) degenerate to the same ("01", "10") corners under min/max — backward-compatible. Quintile signals → ("01", "05"). Tercile signals → ("01", "03"). Sub-task 2 — Accounting-balance diagnostic ------------------------------------------- New helper `signals_dropped_no_long_short(returns) -> list[str]` returns signals present in the dataset but with <2 distinct port buckets (the non-recoverable subset). Wired through `compute/main.py` into the new Metadata field `osap_signals_dropped_no_long_short: list[str] | None`. Schema triple moved together: Pydantic (`compute/output/schemas.py`) + TypeScript (`frontend/lib/types.ts`) + snapshot (auto-regenerated via `python -m compute.output.schema_check --update-snapshot`). Phase 4h.2 Part 2 accounting invariant (asserted by the new test `test_part2_accounting_invariant_against_synthetic_manifest`): len(OSAP_SIGNALS_100) == ( len(osap_signals_missing_from_dataset) # 0 rows in dataset + len(osap_signals_dropped_no_long_short) # <2 distinct ports + len(osap_signals_used) # passed gate + len(osap_excluded_signals) # reached gate, failed ) Sub-task 3 — DSR investigation (DEFERRED to Phase 4h.2 Part 3) --------------------------------------------------------------- Both hypotheses investigated: (a) Signal sign inversion — CONFIRMED via production metadata.json inspection. Every gated signal at 0.9.1 shows rejection_reason "low_dsr" with negative Sharpe (e.g., AbnormalAccruals sharpe=-0.23, AssetGrowth sharpe negative, dVolCall sharpe=-0.66). This is the classic OSAP "anomaly" pattern: many signals predict that the SHORT portfolio outperforms LONG, so the naive `LONG - SHORT` LS is correctly capturing that as a negative excess return — but the gate rejects it. The proper fix requires fetching OSAP's `SignalDoc.csv` for per-signal sign metadata (`Cat.SignalSign`) and flipping the LS for anomaly signals. Scope explicitly deferred to Part 3 (cleaner separation: Part 2 fixes the dropped-signal accounting first, Part 3 fixes the gate-rejection sign inversion). (b) DSR threshold too tight for monthly returns — RULED OUT by code citation. `compute/validation/pbo_dsr.py:62` sets `DSR_VETO_THRESHOLD: float = 0.0` — already maximally permissive (the canonical Bailey-Lopez de Prado 2014 threshold is DSR > 0.95). The 100% low_dsr rejection rate is genuine, not a threshold artifact. Decision: ship Part 2 with hypothesis (a) annotated for Part 3 follow-up. Expected post-Part-2 acceptance count (with sign uncorrected) remains ≈ 0; the headline win is the dropped-no-long-short diagnostic surface, not acceptance recovery. Production diagnosis from the next cron will confirm the exact pre/post accounting numbers. Sub-task 4 — Schema PATCH bump ------------------------------- `compute/config.py::SCHEMA_VERSION` "0.9.1-phase4h.2" → "0.9.2-phase4h.2" (MINOR.PATCH bump per the additive-only Metadata change). Snapshot regenerated via `python -m compute.output.schema_check --update-snapshot`. Existing `test_config.py::test_schema_version_is_phase4h_2` updated to match. `tests/test_config.py` is the single source of the schema-version lock — the test name keeps the "phase4h_2" anchor. Files (10 changed, +353 / −26) ------------------------------- - compute/features/osap_replicate.py — multi-port adapter + `signals_dropped_no_long_short` helper (+132 / −24) - compute/main.py — wire new diagnostic into Metadata; restrict the dropped-list to the OSAP_SIGNALS_100 manifest so the accounting equation closes against the manifest size (+28) - compute/output/schemas.py — `osap_signals_dropped_no_long_short` field (+9) - compute/config.py — SCHEMA_VERSION bump (+1 / −1) - frontend/lib/types.ts — TypeScript mirror (+8) - frontend/lib/schema-snapshot.json — auto-regenerated (+5) - frontend/public/data/metadata.json — null sentinel for the new field so the static-export tsc cast passes; next cron overwrites with the real list (+2 / −1) - tests/test_features/test_osap_replicate.py — 9 new tests covering quintile / tercile / mixed-port universes + accounting invariant + defensive edge cases for the new helper (+188) - tests/test_config.py — schema-version lock follow-up (+1 / −1) - PHASE_STATUS.md — Part 2 in-flight + 4k scout shipped via PR #121 (+1 / −1) Constraints honored ------------------- - NO modification to `compute_composite` / `PHASE3_WEIGHTS` (sum=1.0 lock at composite.py:43-45 — Path-b blend stays OUTSIDE in `compute/scoring/osap_blend.py`) - Rule 16: Top-5 still ranks raw composite_score; no scoring touched - No push to main; no force-push; no `--no-verify` - No workflow_dispatch trigger (compute-rankings.yml untouched) - Schema triple moved together (Pydantic + types.ts + snapshot.json) Verification ladder all green ------------------------------ - ruff check . → All checks passed - python -m pytest tests/ -m "not network" → 945 passed (77s) (936 baseline + 9 new osap tests = 945) - python -m compute.output.schema_check → in sync - cd frontend && npx --no -- tsc --noEmit → clean - Section A-H verifier: 2 pre-existing failures on `main` unrelated to Part 2 (`non_reliance_filing` / `auditor_change` Tier-2 baseline drift) Expected post-merge cron diagnostic ------------------------------------ Pre-Part-2 (0.9.1-phase4h.2): 22 missing + 22 gated + 0 used = 44 → gap = 56 invisible Post-Part-2 (0.9.2-phase4h.2): 22 missing + X dropped + Y gated + Z used → 100 (balanced); X + Y == 78, Z ≈ 0 until Part 3 sign inversion fix https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU Co-authored-by: Claude <noreply@anthropic.com>
This was referenced May 20, 2026
Closed
dackclup
added a commit
that referenced
this pull request
May 20, 2026
Part of epic #125 (Item #6 of 6). Pure tooling addition — no runtime / scoring / schema impact. Motivation ---------- PR #123 (2026-05-19, closed without merging): a worker session opened a Phase 4j + 4k scout duplicate on branch `claude/resume-quantrank-phase-4.5-Zh0pO` while the main session shipped the same work directly via PRs #119 (Qlib) + #121 (IPCA). Root cause: the worker session never inspected the `claude/*` branch list + recent PRs before writing code, producing 100% wasted effort. This change ships a preflight check that surfaces in-flight scope BEFORE any code is written, so the duplicate-PR failure mode is caught at the handoff-prompt entry rather than at PR review. Files (2 new, +271 LOC) ------------------------ - tools/check_branch_collisions.py (+149 LOC) — git-only preflight script. Lists active `claude/*` branches via `git ls-remote origin "refs/heads/claude/*"` and recent main-branch commits via `git log --since="48 hours ago" --oneline --no-merges origin/main`. Optional keyword args flag case-insensitive substring matches. Always exit 0 (informational only). - .claude/skills/branch-collision-check/SKILL.md (+122 LOC) — skill description with YAML frontmatter, trigger conditions (handoff prompts, Phase / issue / Item #N mentions, fresh worker sessions), skip conditions (doc-only chores, iteration #2+, user-authorized parallel work), sample output (clean + warning), and output-interpretation guidance pointing the caller to STOP + ask the user when any⚠️ line surfaces. Design notes ------------ - Git-only data sources — no `gh` CLI / GitHub API auth required. Works in the QuantRank Claude Code Web sandbox where `gh` is unavailable, and on any contributor machine with bare git. - 48-hour window — matches typical worker ↔ main session handoff cadence; long enough to catch duplicate work, short enough to keep the output scannable. - Pure read-only — no destructive git ops, no branch creation, no push, no GitHub API mutation. Always returns exit 0; the caller decides whether to proceed. Verification ladder all green ------------------------------ - ruff check . → All checks passed - python tools/check_branch_collisions.py → lists 1 active claude/* branch + 16 recent commits (last 48h), exit 0 - python tools/check_branch_collisions.py "Alpha158" → fires⚠️ on PR #119 commit "Alpha158 158-feature manifest", summary reports "1 potential scope collision(s) found", exit 0 - python tools/check_branch_collisions.py "Phase 99 nonsense" → no match, summary reports "No scope collisions detected", exit 0 - python tools/check_doc_test_counts.py → exit 0 (Item #2 guard still passes; new files don't introduce hardcoded counts) - python -m compute.output.schema_check → in sync (no schema touch) - python -m pytest tests/ -m "not network" → 959 passed (unchanged; tools/ + .claude/skills/ aren't imported by tests) - SKILL.md YAML frontmatter parses — confirmed via Claude Code's skill registry picking it up at module load Constraints honored ------------------- - No touch to compute/ / frontend/ / tests/ — tools/ + .claude/skills/ only - No network calls / no GitHub API auth — git remote ls + git log - No destructive actions — read-only preflight check - No push to main; no force-push; no --no-verify - No workflow_dispatch trigger (compute-rankings.yml untouched) Epic #125 status after this PR ------------------------------- Item #1 ✅ Hypothesis property tests (PR #127) Item #2 ✅ Strip hardcoded test counts + CI guard (PR #128) Item #4 ✅ Observability-before-wiring pattern (PR #129) Item #6 ✅ Branch-collision preflight (this PR) Items #3, #5 remain — separate PRs per epic decomposition. https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU Co-authored-by: Claude <noreply@anthropic.com>
dackclup
added a commit
that referenced
this pull request
May 20, 2026
…imization PR F) (#146) Sixth PR in the .md optimization sequence (Option D). Audit of 18 QR-origin skill descriptions found all are well-formed (parseable YAML, TRIGGER + SKIP clauses present, average 888 chars). The critical YAML bug (#119+#121 plain-scalar bug in branch-collision- check and pr-quality-gate) was already fixed in PR A. So PR F's remaining work is light polish, not structural change. Vendored skills (20) FROZEN per the boundary convention — Anthropic skills, mattpocock-* (8), karpathy-guidelines, thananon/9arm-skills (4), karpathy-llm-wiki are all upstream-only edits. Trim targets (cut redundancy, fix drift, add Thai triggers): 1. pr-quality-gate (1207 → ~1015): cut redundant "ALSO use right before flipping Draft→Ready" clause that duplicated the first TRIGGER ("before authorizing the Draft→Ready flip"). Tightened wrapping. 2. pr-iteration-flow (990 → ~890): cut redundant "ALSO use this skill as the default workflow harness any time a PR is open" that duplicated the TRIGGER list. Dropped stale "PR-3c → PR-3d → PR-20" historical reference. Added Thai trigger phrases "เช็ค CI" / "ดู PR" since the user invokes this skill in Thai. 3. phase-status-bump (918 → ~840): dropped two historical examples ("PR 3d → tag v0.6.0-phase3d" and "3a→3b, 3c→3d") that anchored the description to one shipped phase. Wording now phase-agnostic. 4. verify-production-output (1086 → ~870): compressed the "Surfaces..." enumeration of Section A-H content (was 8 detailed items; now 8 short items) without losing dispatch specificity. Added Thai trigger phrases "ตรวจ output" / "เช็ค production". Folded "ALSO use" into first TRIGGER as one phrase. YAML moved from plain scalar to `description: >` (folded block) on the 3 plain-scalar descriptions edited (pr-iteration-flow, phase-status-bump, verify-production-output) — same safety pattern PR A applied. Prevents the ' #' comment-eating bug from re-emerging if anyone adds a `#issue` reference later. Net token impact: ~-650 chars × ~0.25 tokens/char ≈ -162 tokens per session-start. Modest but compounds. Why not aggressive trim: - Each TRIGGER phrase + SKIP clause IS dispatch-useful — verified by sampling. Aggressive 50% cuts would risk dispatch quality. - Remaining 14 QR-origin skills already at 700-900 chars with no redundancy to remove. CLAUDE.md (181 → 181, lockstep): §Phase status — added PR #145 (E) to "Recently merged"; replaced "PR E in flight" with "PR F in flight" note explaining the audit found health. AGENTS.md (343 → 343, lockstep): §Phase + version state — optimization sequence tracker updated: PR E ✅, PR F in flight, PR G remaining. Next: PR G (PHASE_STATUS.md "Current State" summary at top + chronological table below). Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 4k scout PR — FINAL of 4 factor-library scouts (OSAP ✅ #110, JKP ✅ #114, Qlib ✅ #119, IPCA THIS). Ships
ipcainstall + 8-method public-API surface lock + 6 offline tests + inline synthetic fixture. NO production wiring; characteristics-matrix construction + universe-wide IPCA fit + composite blend decision are integration-PR scope (Phase 4k.1, tracked separately).After this merges → all 4 factor scouts done → eligible for
v1.1.0-phase4tag readiness audit (gated on 4h.2 Part 2 + 4i.1 + 4j.1 + 4k.1 integration PRs landing).No new veto. Defense layer unchanged at 17. Top-5 rotation unchanged. Schema unchanged at
0.9.1-phase4h.2.5 pre-plan investigation results (verified 2026-05-19)
ipca0.6.7 (29 historical versions; last release 2021-04-22, ~5 years stale — Risk #1). Canonical name matchesSKILL.md:155.LICENSE.mdverbatim: "Copyright (c) [2019] [Matthias Buechner, Leland Bybee]". No CC BY-NC complication unlike JKP (Phase 4i). Safe for Phase 6+ commercial roadmap.fit / get_factors / fit_path / predict / predict_panel / predict_portfolio / score / predictOOS. Post-fit attrs:Gamma (L×K) / Factors (K×T) / metad / n_factors_eff / has_PSF / PSFcase. NOtransform/fit_transform— user brief assumed these; they don't exist in 0.6.7. Risk #3.MultiIndex (entity, date)DataFrame OR explicitindicesarray. Min stable size 10 firms × 20 years × 2 chars (maintainer'stest_ipca.py). NaN handling internal. Unbalanced panels supported.🚨 Critical scope decision — NO
@networktest (mirrors Phase 4j Qlib rationale)IPCA is pure local sklearn-style computation. No remote endpoint to network-test (unlike OSAP 4h's CDN or JKP 4i's S3 bucket). Scout ships 6 offline tests / 0 @network. The synthetic-fixture smoke test exercises the full
fit → Gamma/Factorspath locally without network.IPCA structural shape — 4th distinct vs prior scouts
Module-name choice locked (NO namespace collision risk)
The new module is
compute/features/ipca_factors.py, mirroringcompute/features/osap_replicate.pyprecedent for library-action modules and the pre-existing.claude/skills/phase-4/ipca-factor-fit/PLAN.md:24lock. Unlike Phase 4j (whereqlib.pywould shadow theqlibPyPI package), IPCA has no collision risk — module nameipca_factorsis distinct from PyPI packageipca.Files
compute/features/ipca_factors.pyinit_ipca+fit_ipca_panelcompute/config.py# --- Phase 4k scout ---blocktests/test_features/test_ipca_factors.pypyproject.tomlipca>=0.6.7,<0.7to[factors]PHASE_STATUS.mdTenacity policy NOT applied
IPCA's data flow is local sklearn computation. No network retry semantics. This is the second ingest-adjacent module after Phase 4j Qlib (
qlib_features.py) that diverges from the canonicalcompute/ingest/osap.py:52-56retry decorator pattern. Documented explicitly in module docstring.Tests (6 offline; NO
@network)test_ipca_imports_and_exposes_instrumented_pcatest_instrumented_pca_public_api_manifest_locks_8_methods[factors]extratest_instrumented_pca_public_api_matches_runtime_introspectionhasattr(InstrumentedPCA, name) and callable(...)for each manifest entrytest_ipca_fitted_artifacts_cache_under_repo_cache_dirtest_init_ipca_returns_unfitted_estimator_with_kps_defaultsn_factors=5, intercept=True(KPS 2019 baseline)test_fit_ipca_panel_on_synthetic_5x30x10_fixtureGamma.shape == (10, 2)+Factors.shape == (2, 30)+metad N/T/LVerification ladder (8-step; STOP at step 8)
ruff check .pytest tests/test_features/test_ipca_factors.py -v(local, no[factors])pytest.importorskip("ipca")pytest tests/ -m "not network"(excluding factor-extra files)python -m compute.output.schema_checkfrom compute.features.ipca_factors import init_ipca, fit_ipca_panel, INSTRUMENTED_PCA_PUBLIC_APIgit push -u origin claude/phase-0-scaffolding-Yx96Msubscribe_pr_activity+ STOP for user auditCI will validate all 6 IPCA tests (with
[dev,factors]extra installed) — expected ~936 total offline (930 baseline + 6 new).Ask-first surfaces touched
pyproject.toml [factors]— extended withipca>=0.6.7,<0.7(authorized in advance via plan-mode approval).github/workflows/ci.yml— UNCHANGED ([dev,factors]install already covers the new dep).github/workflows/compute-rankings.yml— UNTOUCHED per user hard constraint (scout doesn't wire into weekly compute; characteristics-matrix construction is integration-PR scope)schemas.py/types.ts/schema-snapshot.json) — UNTOUCHED (no schema delta this scout)Out of scope (deferred to follow-on Phase 4k.1 integration PR, ~5-commit cluster)
Xmatrix? Design decision deferred.N=502 × T=N_dates × L=~30panel;data_type="portfolio"scaling path per maintainer.StockDetail.ipca_loadings+Metadata.ipca_n_factors_eff+Metadata.ipca_in_sample_r2) → schema bump0.9.1-phase4h.2 → 0.10.0-phase4kis integration-PR scope.PLAN.md:36"IC > 0.05 OOS" acceptance criterion.Risks
ipcalast released 2021-04-22 — 5 years stale>=0.6.7,<0.7; API-surface assertion at module load catches drift; documented in PR bodyInstrumentedPCAlackstransform/fit_transform(user brief assumed presence)fit+Gamma/Factorsattrs +predict_panelinstead; PR body documents divergence@networktest — divergence from Phase 4h/4i, matches Phase 4jpytest.importorskip("ipca")masks failures when[factors]not installedtest_osap_e2e_integration.py+ Phase 4j precedent; CI installs[factors]InstrumentedPCAfromBaseEstimatoronly (NOTRegressorMixin) — minor sklearn divergencescore()into RegressorMixin)IPCA_DEFAULT_N_FACTORS=5may not be optimalAfter 4k scout merges —
v1.1.0-phase4tag readiness✅ All 4 factor scouts complete: 4h OSAP · 4i JKP · 4j Qlib · 4k IPCA
⏳ Gated on follow-on integration PRs:
Tag-cut decision: separate audit session post-4k-scout-merge, gated on the 4 integration PRs landing (~6-8w combined effort).
🤖 Implemented by main session direct (Phase 4j paste-loop precedent: worker session was stuck re-presenting plan; main session consolidated roles per user authorization).
Generated by Claude Code