Skip to content

feat(features): IPCA scout — ipca MIT install + InstrumentedPCA 8-method API surface lock + 6 synthetic-fixture tests#121

Merged
dackclup merged 1 commit into
mainfrom
claude/phase-0-scaffolding-Yx96M
May 19, 2026
Merged

feat(features): IPCA scout — ipca MIT install + InstrumentedPCA 8-method API surface lock + 6 synthetic-fixture tests#121
dackclup merged 1 commit into
mainfrom
claude/phase-0-scaffolding-Yx96M

Conversation

@dackclup
Copy link
Copy Markdown
Owner

Summary

Phase 4k scout PR — FINAL of 4 factor-library scouts (OSAP ✅ #110, JKP ✅ #114, Qlib ✅ #119, IPCA THIS). Ships ipca install + 8-method public-API surface lock + 6 offline tests + inline synthetic fixture. NO production wiring; characteristics-matrix construction + universe-wide IPCA fit + composite blend decision are integration-PR scope (Phase 4k.1, tracked separately).

After this merges → all 4 factor scouts done → eligible for v1.1.0-phase4 tag readiness audit (gated on 4h.2 Part 2 + 4i.1 + 4j.1 + 4k.1 integration PRs landing).

No new veto. Defense layer unchanged at 17. Top-5 rotation unchanged. Schema unchanged at 0.9.1-phase4h.2.

5 pre-plan investigation results (verified 2026-05-19)

# Finding Verdict
1 PyPI package ipca 0.6.7 (29 historical versions; last release 2021-04-22, ~5 years stale — Risk #1). Canonical name matches SKILL.md:155.
2 License MITLICENSE.md verbatim: "Copyright (c) [2019] [Matthias Buechner, Leland Bybee]". No CC BY-NC complication unlike JKP (Phase 4i). Safe for Phase 6+ commercial roadmap.
3 sklearn API surface 8 public methods: fit / get_factors / fit_path / predict / predict_panel / predict_portfolio / score / predictOOS. Post-fit attrs: Gamma (L×K) / Factors (K×T) / metad / n_factors_eff / has_PSF / PSFcase. NO transform / fit_transform — user brief assumed these; they don't exist in 0.6.7. Risk #3.
4 Data requirements MultiIndex (entity, date) DataFrame OR explicit indices array. Min stable size 10 firms × 20 years × 2 chars (maintainer's test_ipca.py). NaN handling internal. Unbalanced panels supported.
5 CI install footprint ~50-80 MB net-new (numba ~50 MB + llvmlite ~30 MB). Substantially lighter than Qlib's 150-180 MB (no mlflow / cvxpy / jupyter).

🚨 Critical scope decision — NO @network test (mirrors Phase 4j Qlib rationale)

IPCA is pure local sklearn-style computation. No remote endpoint to network-test (unlike OSAP 4h's CDN or JKP 4i's S3 bucket). Scout ships 6 offline tests / 0 @network. The synthetic-fixture smoke test exercises the full fit → Gamma/Factors path locally without network.

IPCA structural shape — 4th distinct vs prior scouts

Scout Data shape Computation Per-stock surface
OSAP (4h) factor returns CSV downloaded proxy / 36m regression
JKP (4i) factor returns CSV downloaded 36m regression
Qlib (4j) per-stock per-date features computed locally from OHLCV native Alpha158
IPCA (4k) panel (N × T × L characteristics) sklearn-style estimator Gamma (L × K) loadings + Factors (K × T) latent returns

Module-name choice locked (NO namespace collision risk)

The new module is compute/features/ipca_factors.py, mirroring compute/features/osap_replicate.py precedent for library-action modules and the pre-existing .claude/skills/phase-4/ipca-factor-fit/PLAN.md:24 lock. Unlike Phase 4j (where qlib.py would shadow the qlib PyPI package), IPCA has no collision risk — module name ipca_factors is distinct from PyPI package ipca.

Files

Path Action LOC
compute/features/ipca_factors.py NEW — module docstring + 8-method manifest + module-load invariants + init_ipca + fit_ipca_panel ~190
compute/config.py Edit — 3 constants in new # --- Phase 4k scout --- block +28
tests/test_features/test_ipca_factors.py NEW — 6 offline tests + inline synthetic fixture ~228
pyproject.toml Edit — append ipca>=0.6.7,<0.7 to [factors] +9
PHASE_STATUS.md Edit — row 4 IPCA pending → in-flight + 4j shipped via #119 ±1
Total ~448 LOC

Tenacity policy NOT applied

IPCA's data flow is local sklearn computation. No network retry semantics. This is the second ingest-adjacent module after Phase 4j Qlib (qlib_features.py) that diverges from the canonical compute/ingest/osap.py:52-56 retry decorator pattern. Documented explicitly in module docstring.

Tests (6 offline; NO @network)

# Test Coverage
1 test_ipca_imports_and_exposes_instrumented_pca Primary CI signal (importorskip)
2 test_instrumented_pca_public_api_manifest_locks_8_methods Pure assertion · no ipca runtime · runs without [factors] extra
3 test_instrumented_pca_public_api_matches_runtime_introspection ⭐ drift detector — hasattr(InstrumentedPCA, name) and callable(...) for each manifest entry
4 test_ipca_fitted_artifacts_cache_under_repo_cache_dir Config sanity · no ipca runtime
5 test_init_ipca_returns_unfitted_estimator_with_kps_defaults n_factors=5, intercept=True (KPS 2019 baseline)
6 test_fit_ipca_panel_on_synthetic_5x30x10_fixture Smoke fit — asserts Gamma.shape == (10, 2) + Factors.shape == (2, 30) + metad N/T/L

Verification ladder (8-step; STOP at step 8)

Step Command Result
1 ruff check . ✅ clean (auto-fix on import-block sort)
2 pytest tests/test_features/test_ipca_factors.py -v (local, no [factors]) ✅ 2 PASS (#2, #4) · 4 SKIP via pytest.importorskip("ipca")
3 pytest tests/ -m "not network" (excluding factor-extra files) ✅ 864 passed
4 python -m compute.output.schema_check ✅ in-sync (no schema delta)
5 Import smoke: from compute.features.ipca_factors import init_ipca, fit_ipca_panel, INSTRUMENTED_PCA_PUBLIC_API ✅ OK 8
6 git push -u origin claude/phase-0-scaffolding-Yx96M
7 Open PR as Draft (this PR)
8 subscribe_pr_activity + STOP for user audit ⏳ next

CI will validate all 6 IPCA tests (with [dev,factors] extra installed) — expected ~936 total offline (930 baseline + 6 new).

Ask-first surfaces touched

  • pyproject.toml [factors] — extended with ipca>=0.6.7,<0.7 (authorized in advance via plan-mode approval)
  • .github/workflows/ci.yml — UNCHANGED ([dev,factors] install already covers the new dep)
  • .github/workflows/compute-rankings.ymlUNTOUCHED per user hard constraint (scout doesn't wire into weekly compute; characteristics-matrix construction is integration-PR scope)
  • Schema triple (schemas.py / types.ts / schema-snapshot.json) — UNTOUCHED (no schema delta this scout)

Out of scope (deferred to follow-on Phase 4k.1 integration PR, ~5-commit cluster)

  • Characteristics-matrix construction — which Phase 3 + OSAP/JKP/Qlib features feed IPCA's X matrix? Design decision deferred.
  • Full IPCA fit on 502-ticker universeN=502 × T=N_dates × L=~30 panel; data_type="portfolio" scaling path per maintainer.
  • Walk-forward fit cadence — monthly? quarterly?
  • Latent-factor composite integration — observability-only? blend into composite? Phase 5 ML-meta-learner-only consumer?
  • Schema additions (StockDetail.ipca_loadings + Metadata.ipca_n_factors_eff + Metadata.ipca_in_sample_r2) → schema bump 0.9.1-phase4h.2 → 0.10.0-phase4k is integration-PR scope.
  • PBO/DSR doesn't apply — IPCA outputs are factor loadings, not long-short portfolio returns. Integration PR uses IC walk-forward observability per PLAN.md:36 "IC > 0.05 OOS" acceptance criterion.
  • Top-5 rotation impact analysis (Rule 16 lock applies).
  • WRDS data backfill — KPS 2019 used CRSP/Compustat; integration PR may opt for WRDS quality data over yfinance.

Risks

# Risk Mitigation
1 Upstream ipca last released 2021-04-22 — 5 years stale Pin >=0.6.7,<0.7; API-surface assertion at module load catches drift; documented in PR body
2 numba (~50 MB w/ llvmlite ~30 MB) can fail on certain Python/glibc combos Mitigated by CI's stable Ubuntu runner; fix-amend recipe if cold-start fails (Phase 4j RED-path precedent)
3 InstrumentedPCA lacks transform/fit_transform (user brief assumed presence) Plan uses fit + Gamma/Factors attrs + predict_panel instead; PR body documents divergence
4 NO @network test — divergence from Phase 4h/4i, matches Phase 4j Documented deliberately; IPCA has no remote endpoint by design
5 pytest.importorskip("ipca") masks failures when [factors] not installed Acceptable — matches test_osap_e2e_integration.py + Phase 4j precedent; CI installs [factors]
6 InstrumentedPCA from BaseEstimator only (NOT RegressorMixin) — minor sklearn divergence Documented in module docstring; doesn't affect scout (no score() into RegressorMixin)
7 numba JIT cold-start adds ~3-5s test latency on first IPCA fit Acceptable for the single smoke test; cached after warmup
8 Default IPCA_DEFAULT_N_FACTORS=5 may not be optimal Scout default per KPS 2019; integration PR will tune via walk-forward IC

After 4k scout merges — v1.1.0-phase4 tag readiness

✅ All 4 factor scouts complete: 4h OSAP · 4i JKP · 4j Qlib · 4k IPCA

⏳ Gated on follow-on integration PRs:

Tag-cut decision: separate audit session post-4k-scout-merge, gated on the 4 integration PRs landing (~6-8w combined effort).

🤖 Implemented by main session direct (Phase 4j paste-loop precedent: worker session was stuck re-presenting plan; main session consolidated roles per user authorization).


Generated by Claude Code

…hod API surface lock + 6 synthetic-fixture tests

Phase 4k scout PR — final of 4 factor-library scouts (OSAP ✅ #110, JKP ✅ #114, Qlib ✅ #119, IPCA THIS). Ships `ipca` install + 8-method public-API surface lock + 6 offline tests + inline synthetic fixture. NO production wiring; characteristics-matrix construction + universe-wide IPCA fit + composite blend decision are integration-PR scope (Phase 4k.1).

After this merges → all 4 factor scouts done → eligible for v1.1.0-phase4 tag readiness audit (gated on 4h.2 Part 2 + 4i.1 + 4j.1 + 4k.1 integration PRs landing).

5 pre-plan investigations (verified 2026-05-19, carried verbatim into module docstring):

1. PyPI package: `ipca` (0.6.7); 29 historical versions back to 0.1; last release 2021-04-22 (~5 years stale). Pin tightly `>=0.6.7,<0.7`.
2. License: MIT (LICENSE.md verbatim — Buechner / Bybee 2019). No CC BY-NC complication unlike JKP. Safe for Phase 6+ commercial roadmap.
3. sklearn-compatible API surface: 8 public methods — fit / get_factors / fit_path / predict / predict_panel / predict_portfolio / score / predictOOS. NO transform/fit_transform (user brief assumed these; they don't exist in 0.6.7). Post-fit attrs: Gamma (L×K) / Factors (K×T) / metad dict / n_factors_eff / has_PSF / PSFcase.
4. Data requirements: MultiIndex (entity, date) DataFrame OR explicit indices array. Min stable shape 10 firms × 20 years × 2 chars (maintainer's test). NaN handling internal. Unbalanced panels supported.
5. CI install footprint: ~50-80 MB net-new (numba ~50 MB + llvmlite ~30 MB). Substantially lighter than Qlib's 150-180 MB (no mlflow / cvxpy / jupyter).

IPCA structural shape — 4th distinct vs prior scouts:
- OSAP (4h): factor returns CSV → proxy/36m regression
- JKP (4i): factor returns CSV → 36m regression
- Qlib (4j): per-stock per-date features → native Alpha158
- IPCA (4k): panel decomposition → Gamma (L×K loadings) + Factors (K×T) latent returns

Critical scope decision — NO @network test (mirrors Phase 4j Qlib rationale):

IPCA is pure local sklearn-style computation. No remote endpoint to network-test (unlike OSAP 4h's CDN or JKP 4i's S3 bucket). Scout ships 6 offline tests / 0 @network. Test count delta: 930 baseline + 6 offline = ~936 in CI.

Architectural locks:

- Module placement `compute/features/ipca_factors.py` (NOT compute/ingest/) per pre-existing `.claude/skills/phase-4/ipca-factor-fit/PLAN.md:24` and `compute/features/osap_replicate.py` precedent. No namespace collision (module is `ipca_factors`, PyPI package is `ipca`) — Phase 4j's `qlib_features.py` workaround doesn't apply.
- INSTRUMENTED_PCA_PUBLIC_API 8-method tuple — drift detector; module-load assertion against config.IPCA_PUBLIC_API_METHOD_COUNT. Catches future `ipca>0.6.7` API renames.
- IPCA_DEFAULT_N_FACTORS=5, IPCA_DEFAULT_INTERCEPT=True (KPS 2019 baseline) — validated by smoke test, NOT module-load assert (defaults are our choice, not external surface).
- Tenacity NOT applied — pure local sklearn-style; no network retry. First-class divergence from osap.py:52-56 pattern; documented in module docstring.
- Synthetic fixture inline as @pytest.fixture (NOT committed CSV/parquet) — IPCA inputs are numpy arrays, no roundtrip needed.

Module layer (compute/features/ipca_factors.py, ~190 LOC including extensive docstring):
- IPCA_FITTED_ARTIFACTS_CACHE re-export from config
- INSTRUMENTED_PCA_PUBLIC_API 8-tuple + module-load invariants (cardinality + uniqueness)
- IPCA_DEFAULT_N_FACTORS / IPCA_DEFAULT_INTERCEPT constants
- init_ipca(n_factors, intercept, **kwargs) → unfitted InstrumentedPCA
- fit_ipca_panel(estimator, *, X, y, indices, **fit_kwargs) → fitted estimator

Config layer (compute/config.py, +28 LOC):
- IPCA_FITTED_ARTIFACTS_CACHE: Path = CACHE_DIR / "ipca"
- IPCA_FITTED_ARTIFACTS_MAX_AGE_DAYS: int = 31
- IPCA_PUBLIC_API_METHOD_COUNT: int = 8

Tests (6 offline; ~190 LOC):
1. test_ipca_imports_and_exposes_instrumented_pca — primary CI signal (importorskip)
2. test_instrumented_pca_public_api_manifest_locks_8_methods — pure assertion, no ipca runtime
3. test_instrumented_pca_public_api_matches_runtime_introspection — drift detector
4. test_ipca_fitted_artifacts_cache_under_repo_cache_dir — config sanity
5. test_init_ipca_returns_unfitted_estimator_with_kps_defaults — defaults validation
6. test_fit_ipca_panel_on_synthetic_5x30x10_fixture — smoke fit; asserts Gamma (10,2) + Factors (2,30) + metad N/T/L

pyproject.toml: append `ipca>=0.6.7,<0.7` to `[factors]` (authorized in advance via plan-mode approval; pin range because 2021-04-22 staleness).

Ask-first surfaces touched:
- pyproject.toml [factors] — extended (authorized via plan-mode)
- ci.yml UNCHANGED ([dev,factors] install already covers new dep)
- compute-rankings.yml UNTOUCHED per user hard constraint
- Schema triple UNTOUCHED (no schema delta this scout)

Verification (local, sandbox without [factors]):
- ruff check . → clean (auto-fix on import-block sort)
- python -m compute.output.schema_check → in-sync
- Import smoke: from compute.features.ipca_factors import init_ipca, fit_ipca_panel, INSTRUMENTED_PCA_PUBLIC_API → OK 8
- pytest tests/ -m "not network" excluding factor-extra files → 864 passed
- 2 of 6 IPCA tests PASS locally (#2 manifest cardinality + #4 cache path); 4 SKIP via pytest.importorskip("ipca") (expected — local lacks [factors] extra)
- CI with [dev,factors] will run all 6 → ~936 offline expected (930 baseline + 6 new)

Defense layer unchanged at 17. Top-5 rotation unchanged. Schema unchanged at 0.9.1-phase4h.2.

Out of scope (deferred to follow-on Phase 4k.1 integration PR, ~5-commit cluster):
- Characteristics-matrix construction (which Phase 3 + OSAP/JKP/Qlib features feed X?)
- Full IPCA fit on 502-ticker universe (data_type="portfolio" canonical scaling)
- Walk-forward / rolling-window fit cadence
- Latent-factor composite integration decision (observability-only? Phase 5 ML-meta-learner consumer?)
- Schema additions (StockDetail.ipca_loadings + Metadata.ipca_n_factors_eff + ipca_in_sample_r2) → bump 0.9.1-phase4h.2 → 0.10.0-phase4k
- PBO/DSR doesn't apply (loadings ≠ portfolio returns); IC walk-forward observability instead per PLAN.md:36
- Top-5 rotation impact analysis (Rule 16 lock)

Audit history:
- Plan-audit round 1: 5 investigations verified · MIT lock · heavy-deps disclosure
- Plan-audit round 2: Q1 (public-API surface lock) + Q2 (inline pytest.fixture) design choices applied
- Plan-audit round 3: line citations verified (ipca PyPI · config.py:200-221 · pyproject.toml:36-45 · PLAN.md:24 + L36)
- Implementation: main session direct (worker session paste-loop bypassed per Phase 4j precedent)
- Local verification: ruff clean · schema_check in-sync · 864 offline passing · 2 IPCA tests pass + 4 graceful skip

Closes the factor-library scout cluster. Next: v1.1.0-phase4 tag readiness audit gated on 4 integration PRs.

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2
@vercel
Copy link
Copy Markdown

vercel Bot commented May 19, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
quantrank Ready Ready Preview, Comment May 19, 2026 11:54am

@dackclup dackclup marked this pull request as ready for review May 19, 2026 11:58
@dackclup dackclup merged commit 182c02d into main May 19, 2026
4 checks passed
@dackclup dackclup deleted the claude/phase-0-scaffolding-Yx96M branch May 19, 2026 11:59
dackclup added a commit that referenced this pull request May 20, 2026
…rop diagnostic + schema 0.9.2 (#124)

Closes #116 (Part 2 scope).

Phase 4h.2 Part 2 closes the OSAP 100-signal accounting gap that Part 1
made visible. Production cron at commit 182c02d (version 0.9.1-phase4h.2)
exposed the imbalance: 22 missing_from_dataset + 22 gate_diagnostics +
0 signals_used = 44 — leaving 56 signals UNACCOUNTED for between the
dataset rows and the gate. Root cause was the hardcoded port=01 / port=10
filter in `compute/features/osap_replicate.py::compute_long_short_returns`
at L60,65,120,135-136: OSAP delivers some signals as quintile (ports
01..05) or tercile (01..03), and the global pre-filter dropped every row
that didn't match port=10 — those signals silently disappeared before
reaching the PBO/DSR gate.

Sub-task 1 — Multi-port adapter (compute/features/osap_replicate.py)
---------------------------------------------------------------------
Replaced the hardcoded constants `LONG_PORT_LABEL` / `SHORT_PORT_LABEL`
with per-signal `min(port)` / `max(port)` inference. Algorithm:

1. groupby("signalname") to derive each signal's port extents
2. long_port = min(unique ports), short_port = max(unique ports)
3. signals with fewer than 2 distinct ports are dropped (no LS pair)
4. pivot per-signal with "long" / "short" role columns so the LS axis
   is stable across heterogeneous port cardinalities

Decile signals (01..10) degenerate to the same ("01", "10") corners
under min/max — backward-compatible. Quintile signals → ("01", "05").
Tercile signals → ("01", "03").

Sub-task 2 — Accounting-balance diagnostic
-------------------------------------------
New helper `signals_dropped_no_long_short(returns) -> list[str]` returns
signals present in the dataset but with <2 distinct port buckets (the
non-recoverable subset). Wired through `compute/main.py` into the new
Metadata field `osap_signals_dropped_no_long_short: list[str] | None`.
Schema triple moved together: Pydantic (`compute/output/schemas.py`) +
TypeScript (`frontend/lib/types.ts`) + snapshot (auto-regenerated via
`python -m compute.output.schema_check --update-snapshot`).

Phase 4h.2 Part 2 accounting invariant (asserted by the new test
`test_part2_accounting_invariant_against_synthetic_manifest`):

    len(OSAP_SIGNALS_100) == (
        len(osap_signals_missing_from_dataset)    # 0 rows in dataset
        + len(osap_signals_dropped_no_long_short) # <2 distinct ports
        + len(osap_signals_used)                  # passed gate
        + len(osap_excluded_signals)              # reached gate, failed
    )

Sub-task 3 — DSR investigation (DEFERRED to Phase 4h.2 Part 3)
---------------------------------------------------------------
Both hypotheses investigated:

(a) Signal sign inversion — CONFIRMED via production metadata.json
    inspection. Every gated signal at 0.9.1 shows rejection_reason
    "low_dsr" with negative Sharpe (e.g., AbnormalAccruals sharpe=-0.23,
    AssetGrowth sharpe negative, dVolCall sharpe=-0.66). This is the
    classic OSAP "anomaly" pattern: many signals predict that the SHORT
    portfolio outperforms LONG, so the naive `LONG - SHORT` LS is
    correctly capturing that as a negative excess return — but the
    gate rejects it. The proper fix requires fetching OSAP's
    `SignalDoc.csv` for per-signal sign metadata (`Cat.SignalSign`) and
    flipping the LS for anomaly signals. Scope explicitly deferred to
    Part 3 (cleaner separation: Part 2 fixes the dropped-signal
    accounting first, Part 3 fixes the gate-rejection sign inversion).

(b) DSR threshold too tight for monthly returns — RULED OUT by code
    citation. `compute/validation/pbo_dsr.py:62` sets
    `DSR_VETO_THRESHOLD: float = 0.0` — already maximally permissive
    (the canonical Bailey-Lopez de Prado 2014 threshold is DSR > 0.95).
    The 100% low_dsr rejection rate is genuine, not a threshold artifact.

Decision: ship Part 2 with hypothesis (a) annotated for Part 3 follow-up.
Expected post-Part-2 acceptance count (with sign uncorrected) remains
≈ 0; the headline win is the dropped-no-long-short diagnostic surface,
not acceptance recovery. Production diagnosis from the next cron will
confirm the exact pre/post accounting numbers.

Sub-task 4 — Schema PATCH bump
-------------------------------
`compute/config.py::SCHEMA_VERSION` "0.9.1-phase4h.2" → "0.9.2-phase4h.2"
(MINOR.PATCH bump per the additive-only Metadata change). Snapshot
regenerated via `python -m compute.output.schema_check --update-snapshot`.
Existing `test_config.py::test_schema_version_is_phase4h_2` updated to
match. `tests/test_config.py` is the single source of the schema-version
lock — the test name keeps the "phase4h_2" anchor.

Files (10 changed, +353 / −26)
-------------------------------
- compute/features/osap_replicate.py — multi-port adapter +
  `signals_dropped_no_long_short` helper (+132 / −24)
- compute/main.py — wire new diagnostic into Metadata; restrict the
  dropped-list to the OSAP_SIGNALS_100 manifest so the accounting
  equation closes against the manifest size (+28)
- compute/output/schemas.py — `osap_signals_dropped_no_long_short`
  field (+9)
- compute/config.py — SCHEMA_VERSION bump (+1 / −1)
- frontend/lib/types.ts — TypeScript mirror (+8)
- frontend/lib/schema-snapshot.json — auto-regenerated (+5)
- frontend/public/data/metadata.json — null sentinel for the new field
  so the static-export tsc cast passes; next cron overwrites with the
  real list (+2 / −1)
- tests/test_features/test_osap_replicate.py — 9 new tests covering
  quintile / tercile / mixed-port universes + accounting invariant +
  defensive edge cases for the new helper (+188)
- tests/test_config.py — schema-version lock follow-up (+1 / −1)
- PHASE_STATUS.md — Part 2 in-flight + 4k scout shipped via PR #121
  (+1 / −1)

Constraints honored
-------------------
- NO modification to `compute_composite` / `PHASE3_WEIGHTS` (sum=1.0
  lock at composite.py:43-45 — Path-b blend stays OUTSIDE in
  `compute/scoring/osap_blend.py`)
- Rule 16: Top-5 still ranks raw composite_score; no scoring touched
- No push to main; no force-push; no `--no-verify`
- No workflow_dispatch trigger (compute-rankings.yml untouched)
- Schema triple moved together (Pydantic + types.ts + snapshot.json)

Verification ladder all green
------------------------------
- ruff check . → All checks passed
- python -m pytest tests/ -m "not network" → 945 passed (77s)
  (936 baseline + 9 new osap tests = 945)
- python -m compute.output.schema_check → in sync
- cd frontend && npx --no -- tsc --noEmit → clean
- Section A-H verifier: 2 pre-existing failures on `main` unrelated to
  Part 2 (`non_reliance_filing` / `auditor_change` Tier-2 baseline drift)

Expected post-merge cron diagnostic
------------------------------------
Pre-Part-2 (0.9.1-phase4h.2): 22 missing + 22 gated + 0 used = 44
  → gap = 56 invisible
Post-Part-2 (0.9.2-phase4h.2): 22 missing + X dropped + Y gated + Z used
  → 100 (balanced); X + Y == 78, Z ≈ 0 until Part 3 sign inversion fix

https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 20, 2026
Part of epic #125 (Item #6 of 6). Pure tooling addition — no
runtime / scoring / schema impact.

Motivation
----------
PR #123 (2026-05-19, closed without merging): a worker session
opened a Phase 4j + 4k scout duplicate on branch
`claude/resume-quantrank-phase-4.5-Zh0pO` while the main session
shipped the same work directly via PRs #119 (Qlib) + #121 (IPCA).
Root cause: the worker session never inspected the `claude/*`
branch list + recent PRs before writing code, producing 100%
wasted effort.

This change ships a preflight check that surfaces in-flight scope
BEFORE any code is written, so the duplicate-PR failure mode is
caught at the handoff-prompt entry rather than at PR review.

Files (2 new, +271 LOC)
------------------------
- tools/check_branch_collisions.py (+149 LOC) — git-only preflight
  script. Lists active `claude/*` branches via `git ls-remote
  origin "refs/heads/claude/*"` and recent main-branch commits
  via `git log --since="48 hours ago" --oneline --no-merges
  origin/main`. Optional keyword args flag case-insensitive
  substring matches. Always exit 0 (informational only).

- .claude/skills/branch-collision-check/SKILL.md (+122 LOC) —
  skill description with YAML frontmatter, trigger conditions
  (handoff prompts, Phase / issue / Item #N mentions, fresh worker
  sessions), skip conditions (doc-only chores, iteration #2+,
  user-authorized parallel work), sample output (clean + warning),
  and output-interpretation guidance pointing the caller to STOP
  + ask the user when any ⚠️ line surfaces.

Design notes
------------
- Git-only data sources — no `gh` CLI / GitHub API auth required.
  Works in the QuantRank Claude Code Web sandbox where `gh` is
  unavailable, and on any contributor machine with bare git.
- 48-hour window — matches typical worker ↔ main session handoff
  cadence; long enough to catch duplicate work, short enough to
  keep the output scannable.
- Pure read-only — no destructive git ops, no branch creation,
  no push, no GitHub API mutation. Always returns exit 0; the
  caller decides whether to proceed.

Verification ladder all green
------------------------------
- ruff check . → All checks passed
- python tools/check_branch_collisions.py → lists 1 active
  claude/* branch + 16 recent commits (last 48h), exit 0
- python tools/check_branch_collisions.py "Alpha158" → fires
  ⚠️  on PR #119 commit "Alpha158 158-feature manifest", summary
  reports "1 potential scope collision(s) found", exit 0
- python tools/check_branch_collisions.py "Phase 99 nonsense" →
  no match, summary reports "No scope collisions detected",
  exit 0
- python tools/check_doc_test_counts.py → exit 0 (Item #2 guard
  still passes; new files don't introduce hardcoded counts)
- python -m compute.output.schema_check → in sync (no schema touch)
- python -m pytest tests/ -m "not network" → 959 passed
  (unchanged; tools/ + .claude/skills/ aren't imported by tests)
- SKILL.md YAML frontmatter parses — confirmed via Claude Code's
  skill registry picking it up at module load

Constraints honored
-------------------
- No touch to compute/ / frontend/ / tests/ — tools/ +
  .claude/skills/ only
- No network calls / no GitHub API auth — git remote ls + git log
- No destructive actions — read-only preflight check
- No push to main; no force-push; no --no-verify
- No workflow_dispatch trigger (compute-rankings.yml untouched)

Epic #125 status after this PR
-------------------------------
Item #1 ✅ Hypothesis property tests (PR #127)
Item #2 ✅ Strip hardcoded test counts + CI guard (PR #128)
Item #4 ✅ Observability-before-wiring pattern (PR #129)
Item #6 ✅ Branch-collision preflight (this PR)
Items #3, #5 remain — separate PRs per epic decomposition.

https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 20, 2026
…imization PR F) (#146)

Sixth PR in the .md optimization sequence (Option D). Audit of 18
QR-origin skill descriptions found all are well-formed (parseable
YAML, TRIGGER + SKIP clauses present, average 888 chars). The
critical YAML bug (#119+#121 plain-scalar bug in branch-collision-
check and pr-quality-gate) was already fixed in PR A. So PR F's
remaining work is light polish, not structural change.

Vendored skills (20) FROZEN per the boundary convention — Anthropic
skills, mattpocock-* (8), karpathy-guidelines, thananon/9arm-skills
(4), karpathy-llm-wiki are all upstream-only edits.

Trim targets (cut redundancy, fix drift, add Thai triggers):

1. pr-quality-gate (1207 → ~1015): cut redundant "ALSO use right
   before flipping Draft→Ready" clause that duplicated the first
   TRIGGER ("before authorizing the Draft→Ready flip"). Tightened
   wrapping.

2. pr-iteration-flow (990 → ~890): cut redundant "ALSO use this
   skill as the default workflow harness any time a PR is open"
   that duplicated the TRIGGER list. Dropped stale "PR-3c → PR-3d
   → PR-20" historical reference. Added Thai trigger phrases
   "เช็ค CI" / "ดู PR" since the user invokes this skill in Thai.

3. phase-status-bump (918 → ~840): dropped two historical examples
   ("PR 3d → tag v0.6.0-phase3d" and "3a→3b, 3c→3d") that anchored
   the description to one shipped phase. Wording now phase-agnostic.

4. verify-production-output (1086 → ~870): compressed the
   "Surfaces..." enumeration of Section A-H content (was 8 detailed
   items; now 8 short items) without losing dispatch specificity.
   Added Thai trigger phrases "ตรวจ output" / "เช็ค production".
   Folded "ALSO use" into first TRIGGER as one phrase.

YAML moved from plain scalar to `description: >` (folded block) on
the 3 plain-scalar descriptions edited (pr-iteration-flow,
phase-status-bump, verify-production-output) — same safety pattern
PR A applied. Prevents the ' #' comment-eating bug from re-emerging
if anyone adds a `#issue` reference later.

Net token impact: ~-650 chars × ~0.25 tokens/char ≈ -162 tokens
per session-start. Modest but compounds.

Why not aggressive trim:
- Each TRIGGER phrase + SKIP clause IS dispatch-useful — verified
  by sampling. Aggressive 50% cuts would risk dispatch quality.
- Remaining 14 QR-origin skills already at 700-900 chars with
  no redundancy to remove.

CLAUDE.md (181 → 181, lockstep): §Phase status — added PR #145 (E)
to "Recently merged"; replaced "PR E in flight" with "PR F in flight"
note explaining the audit found health.

AGENTS.md (343 → 343, lockstep): §Phase + version state —
optimization sequence tracker updated: PR E ✅, PR F in flight, PR G
remaining.

Next: PR G (PHASE_STATUS.md "Current State" summary at top + chronological
table below).

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants