fix(features): Phase 4h.2 Part 2 — multi-port OSAP adapter + silent-drop diagnostic + schema 0.9.2#124
Merged
Conversation
…rop diagnostic + schema 0.9.2 Closes #116 (Part 2 scope). Phase 4h.2 Part 2 closes the OSAP 100-signal accounting gap that Part 1 made visible. Production cron at commit 182c02d (version 0.9.1-phase4h.2) exposed the imbalance: 22 missing_from_dataset + 22 gate_diagnostics + 0 signals_used = 44 — leaving 56 signals UNACCOUNTED for between the dataset rows and the gate. Root cause was the hardcoded port=01 / port=10 filter in `compute/features/osap_replicate.py::compute_long_short_returns` at L60,65,120,135-136: OSAP delivers some signals as quintile (ports 01..05) or tercile (01..03), and the global pre-filter dropped every row that didn't match port=10 — those signals silently disappeared before reaching the PBO/DSR gate. Sub-task 1 — Multi-port adapter (compute/features/osap_replicate.py) --------------------------------------------------------------------- Replaced the hardcoded constants `LONG_PORT_LABEL` / `SHORT_PORT_LABEL` with per-signal `min(port)` / `max(port)` inference. Algorithm: 1. groupby("signalname") to derive each signal's port extents 2. long_port = min(unique ports), short_port = max(unique ports) 3. signals with fewer than 2 distinct ports are dropped (no LS pair) 4. pivot per-signal with "long" / "short" role columns so the LS axis is stable across heterogeneous port cardinalities Decile signals (01..10) degenerate to the same ("01", "10") corners under min/max — backward-compatible. Quintile signals → ("01", "05"). Tercile signals → ("01", "03"). Sub-task 2 — Accounting-balance diagnostic ------------------------------------------- New helper `signals_dropped_no_long_short(returns) -> list[str]` returns signals present in the dataset but with <2 distinct port buckets (the non-recoverable subset). Wired through `compute/main.py` into the new Metadata field `osap_signals_dropped_no_long_short: list[str] | None`. Schema triple moved together: Pydantic (`compute/output/schemas.py`) + TypeScript (`frontend/lib/types.ts`) + snapshot (auto-regenerated via `python -m compute.output.schema_check --update-snapshot`). Phase 4h.2 Part 2 accounting invariant (asserted by the new test `test_part2_accounting_invariant_against_synthetic_manifest`): len(OSAP_SIGNALS_100) == ( len(osap_signals_missing_from_dataset) # 0 rows in dataset + len(osap_signals_dropped_no_long_short) # <2 distinct ports + len(osap_signals_used) # passed gate + len(osap_excluded_signals) # reached gate, failed ) Sub-task 3 — DSR investigation (DEFERRED to Phase 4h.2 Part 3) --------------------------------------------------------------- Both hypotheses investigated: (a) Signal sign inversion — CONFIRMED via production metadata.json inspection. Every gated signal at 0.9.1 shows rejection_reason "low_dsr" with negative Sharpe (e.g., AbnormalAccruals sharpe=-0.23, AssetGrowth sharpe negative, dVolCall sharpe=-0.66). This is the classic OSAP "anomaly" pattern: many signals predict that the SHORT portfolio outperforms LONG, so the naive `LONG - SHORT` LS is correctly capturing that as a negative excess return — but the gate rejects it. The proper fix requires fetching OSAP's `SignalDoc.csv` for per-signal sign metadata (`Cat.SignalSign`) and flipping the LS for anomaly signals. Scope explicitly deferred to Part 3 (cleaner separation: Part 2 fixes the dropped-signal accounting first, Part 3 fixes the gate-rejection sign inversion). (b) DSR threshold too tight for monthly returns — RULED OUT by code citation. `compute/validation/pbo_dsr.py:62` sets `DSR_VETO_THRESHOLD: float = 0.0` — already maximally permissive (the canonical Bailey-Lopez de Prado 2014 threshold is DSR > 0.95). The 100% low_dsr rejection rate is genuine, not a threshold artifact. Decision: ship Part 2 with hypothesis (a) annotated for Part 3 follow-up. Expected post-Part-2 acceptance count (with sign uncorrected) remains ≈ 0; the headline win is the dropped-no-long-short diagnostic surface, not acceptance recovery. Production diagnosis from the next cron will confirm the exact pre/post accounting numbers. Sub-task 4 — Schema PATCH bump ------------------------------- `compute/config.py::SCHEMA_VERSION` "0.9.1-phase4h.2" → "0.9.2-phase4h.2" (MINOR.PATCH bump per the additive-only Metadata change). Snapshot regenerated via `python -m compute.output.schema_check --update-snapshot`. Existing `test_config.py::test_schema_version_is_phase4h_2` updated to match. `tests/test_config.py` is the single source of the schema-version lock — the test name keeps the "phase4h_2" anchor. Files (10 changed, +353 / −26) ------------------------------- - compute/features/osap_replicate.py — multi-port adapter + `signals_dropped_no_long_short` helper (+132 / −24) - compute/main.py — wire new diagnostic into Metadata; restrict the dropped-list to the OSAP_SIGNALS_100 manifest so the accounting equation closes against the manifest size (+28) - compute/output/schemas.py — `osap_signals_dropped_no_long_short` field (+9) - compute/config.py — SCHEMA_VERSION bump (+1 / −1) - frontend/lib/types.ts — TypeScript mirror (+8) - frontend/lib/schema-snapshot.json — auto-regenerated (+5) - frontend/public/data/metadata.json — null sentinel for the new field so the static-export tsc cast passes; next cron overwrites with the real list (+2 / −1) - tests/test_features/test_osap_replicate.py — 9 new tests covering quintile / tercile / mixed-port universes + accounting invariant + defensive edge cases for the new helper (+188) - tests/test_config.py — schema-version lock follow-up (+1 / −1) - PHASE_STATUS.md — Part 2 in-flight + 4k scout shipped via PR #121 (+1 / −1) Constraints honored ------------------- - NO modification to `compute_composite` / `PHASE3_WEIGHTS` (sum=1.0 lock at composite.py:43-45 — Path-b blend stays OUTSIDE in `compute/scoring/osap_blend.py`) - Rule 16: Top-5 still ranks raw composite_score; no scoring touched - No push to main; no force-push; no `--no-verify` - No workflow_dispatch trigger (compute-rankings.yml untouched) - Schema triple moved together (Pydantic + types.ts + snapshot.json) Verification ladder all green ------------------------------ - ruff check . → All checks passed - python -m pytest tests/ -m "not network" → 945 passed (77s) (936 baseline + 9 new osap tests = 945) - python -m compute.output.schema_check → in sync - cd frontend && npx --no -- tsc --noEmit → clean - Section A-H verifier: 2 pre-existing failures on `main` unrelated to Part 2 (`non_reliance_filing` / `auditor_change` Tier-2 baseline drift) Expected post-merge cron diagnostic ------------------------------------ Pre-Part-2 (0.9.1-phase4h.2): 22 missing + 22 gated + 0 used = 44 → gap = 56 invisible Post-Part-2 (0.9.2-phase4h.2): 22 missing + X dropped + Y gated + Z used → 100 (balanced); X + Y == 78, Z ≈ 0 until Part 3 sign inversion fix https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This was referenced May 20, 2026
dackclup
added a commit
that referenced
this pull request
May 20, 2026
…variants (#127) Closes #126. Process Hygiene Item #1 (parent epic #125). Adds Hypothesis property- based tests as the new defense line for "untested data-shape assumption" bugs — the class that hid the OSAP quintile/tercile silent-drop in PR #112's CI until production cron diagnostics caught it (subsequently fixed in PR #124 / Phase 4h.2 Part 2). If a `@given` property over `port_count ∈ {2,3,5,10}` had existed in Phase 4h, the hardcoded `port=10` filter would have been falsified the first time the CI ran. Test-addition only. No scoring / feature behavior touched. No schema delta. No CI workflow changes. Sub-task 1 — Hypothesis added to [dev] extra (pyproject.toml) -------------------------------------------------------------- `hypothesis>=6.92` joins `pytest` + `ruff` in the `[dev]` optional extra. Pure-Python dep (no C extensions); CI footprint negligible. Sub-task 2 — Property tests for osap_replicate.py (7 tests, 394 LOC) --------------------------------------------------------------------- New file: tests/test_features/test_osap_replicate_properties.py 7 property tests covering data-shape invariants the Phase 4h.2 Part 2 multi-port adapter must satisfy: 1. `test_compute_long_short_returns_handles_any_port_cardinality` — for port_count ∈ [2, 10] and n_dates ∈ [1, 12], the adapter produces exactly n_dates LS rows with ls_return == port_count - 1. THE headline property — would have caught the PR #112 bug. 2. `test_signals_dropped_no_long_short_returns_sorted_unique` — contract for the Metadata.osap_signals_dropped_no_long_short field: sorted, no duplicates, single-port signals appear, two-port signals don't. 3. `test_normalize_port_label_int_input_yields_2char_zfill` — port=int(1..10) → '01'..'10' for any input list. Idempotent. 4. `test_normalize_port_label_str_input_yields_2char_zfill` — mixed '1' / '01' / '10' inputs normalize to a uniform 2-char width. 5. `test_part2_accounting_invariant_under_random_partition` — the Phase 4h.2 Part 2 accounting equation (manifest = missing + dropped + gated + used) holds for any 3-way partition of a synthetic manifest into the bucket set. Uses st.composite to draw disjoint partitions. 6. `test_coverage_by_signal_returns_pct_in_0_to_100` — domain contract for the coverage helper (0..100 percent, NOT 0..1 fraction). 7. `test_rank_signals_cross_sectional_returns_unit_interval` — ranks live in (0, 1] for any non-empty cross-section. Sub-task 3 — Property tests for scoring transforms (7 tests, 340 LOC) --------------------------------------------------------------------- New file: tests/test_scoring/test_transforms_properties.py 7 property tests covering composite (compute/scoring/composite.py) and OSAP blend (compute/scoring/osap_blend.py) — pure-numeric transforms whose output domains are contract-locked by the downstream Pydantic + TypeScript schemas. Composite tests (4): A. `test_compute_composite_output_bounded_0_to_100` — for any pillar input in [0, 100], composite ∈ [0, 100] (the writer + Pydantic contract) B. `test_compute_composite_all_50_inputs_yield_composite_50` — neutral-pillar input collapses to composite == 50 (catches accidental weight-vector drift) C. `test_compute_composite_neutralize_missing_imputes_nan_to_50` — NaN pillar inputs are imputed when neutralize_missing=True; all-NaN → composite == 50.0 D. `test_compute_composite_constant_input_equals_input` — constant-pillar input → composite == that constant (PHASE3 weight-sum-equals-1.0 invariant expressed as a property) OSAP blend tests (3): E. `test_apply_osap_blend_output_bounded_and_nan_passthrough` — blend ∈ [0, 100]; NaN OSAP → composite passthrough; finite OSAP → interior point between composite and osap F. `test_aggregate_osap_signals_finite_values_in_0_to_100` — finite aggregate values live in [0, 100]; NaN allowed for universe gaps G. `test_apply_osap_blend_weight_zero_is_identity_on_composite` — weight=0 leaves composite unchanged (locks the Phase 4h observability-only design property + Rule 16: Top-5 still ranks raw composite) Sub-task 4 — CI integration + .gitignore + docs ------------------------------------------------- - `.gitignore` already covers `.hypothesis/` at line 50 (Python's default boilerplate) — no edit needed. - CLAUDE.md ## Gotchas — 1-line note that Hypothesis is the new defense line for data-shape bugs (paired with example tests), with the `@settings(deadline=None)` anti-pattern flagged. - CI hypothesis.errors.Flaky behaviour: default profile makes flaky examples fail-fast (no retry); the `pytest -m "not network"` CI invocation inherits this. NO `@settings(deadline=None)` used in this PR — slow examples surface as honest failures. Sanity verification (NOT committed) ----------------------------------- As part of pre-push verification I temporarily reverted the multi- port adapter at compute/features/osap_replicate.py:143 (`agg(["min", "max"])` → `agg(["min", "min"])`) and confirmed `test_compute_long_short_returns_handles_any_port_cardinality` fails with "Falsifying example: port_count=2, n_dates=1". Reverted the break before commit. Constraints honored ------------------- - NO modification to compute_composite() / PHASE3_WEIGHTS sum=1.0 invariant (composite.py:43-45) — pure test-addition PR - Rule 16: Top-5 still ranks raw composite_score; no scoring touched - No push to main; no force-push; no --no-verify - No workflow_dispatch trigger (compute-rankings.yml untouched) - Schema triple untouched (no schemas.py / types.ts changes) - NO @settings(deadline=None) — default deterministic deadline - NO RuleBasedStateMachine (out of scope per issue #126) Test count delta ---------------- Before: 945 passed (Phase 4h.2 Part 2 baseline) After: 959 passed (+14 property tests across 2 new files) Files (4 changed, +747 / 0) ---------------------------- - pyproject.toml — +6 (hypothesis>=6.92 in [dev]) - CLAUDE.md — +7 (## Gotchas note) - tests/test_features/test_osap_replicate_properties.py — +394 NEW - tests/test_scoring/test_transforms_properties.py — +340 NEW Verification ladder all green ------------------------------ - ruff check . → All checks passed - python -m pytest tests/ -m "not network" → 959 passed (1m46s) - python -m pytest tests/test_features/test_osap_replicate_properties.py tests/test_scoring/test_transforms_properties.py → 14 passed (5s) - python -m compute.output.schema_check → in sync (no schema delta) - Sanity break-revert confirmed property test catches a regression No regression discovered ------------------------ Property tests passed on first execution against current main (commit 80c6641, Phase 4h.2 Part 2 already merged). No hidden bugs surfaced beyond the 56-signal gap that PR #124 already fixed — which itself is a good signal that the multi-port adapter handles the [2, 10] cardinality region cleanly. https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU Co-authored-by: Claude <noreply@anthropic.com>
13 tasks
dackclup
added a commit
that referenced
this pull request
May 20, 2026
Part of epic #125 (Item #4 of 6). Doc-only PR — no code changes, no schema delta, no test additions. Phase 4h timeline (2026-05-18 → 2026-05-19) demonstrated the cost of shipping production wiring + gate logic without a diagnostic surface: - PR #112 (Phase 4h): OSAP signal replication + PBO/DSR gate + Path-b blend, NO observability surface for gate decisions - First production cron: every signal failed gate, no way to know why - PR #118 (Phase 4h.2 Part 1): retrofit diagnostic surface (osap_signals_missing_from_dataset + osap_gate_diagnostics) - Second production cron: 22 missing + 22 fail low_dsr, 56 silently dropped (gap that Part 1 still couldn't fully expose) - PR #124 (Phase 4h.2 Part 2): root-cause fix (multi-port adapter) + osap_signals_dropped_no_long_short closing the accounting gap The combined cost of Phase 4h.2 Parts 1 + 2 (~10 hours across 2 PRs) would have been ~30 minutes of additional Phase 4h scope if the diagnostic surface had shipped alongside the production wiring. Files (3 changed, +83 LOC) --------------------------- - WORKFLOW.md (+63 LOC) — new section "# Observability-Before-Wiring Pattern" inserted between the mobile playbook table and the "Initial Prompts" section. Includes mandatory checklist (6 items) + anti-pattern statement + 3 reference precedents (PR #112 bad, PR #118 good, PR #124 good) - SKILL.md (+14 LOC) — new "Rule 18: Observability-before-wiring" appended to the Core Behavior Rules section (Rule 17 was the prior trailing rule). Links back to WORKFLOW.md for the mandatory checklist detail - CLAUDE.md (+6 LOC) — 1 bullet added to ## Conventions referencing the new Rule 18 + WORKFLOW.md section Files NOT touched (deliberately per scope) ------------------------------------------- - PHASE_STATUS.md — chronological log; pattern guidance belongs in WORKFLOW.md / SKILL.md / CLAUDE.md, not in the historical tracker - AGENTS.md — cross-tool agent doc; lookups defer to WORKFLOW.md by default, so a fresh duplicate would just create drift risk - compute/ / frontend/ / tests/ — doc-only PR, no behavior change Constraints honored ------------------- - No code changes — pure markdown additions - No schema delta — schema_check confirms in-sync - No test additions — pytest count unchanged at 959 - No push to main; no force-push; no --no-verify - No workflow_dispatch trigger (compute-rankings.yml untouched) Verification ladder all green ------------------------------ - ruff check . → All checks passed - python tools/check_doc_test_counts.py → exit 0 (no new hardcoded test-count claims introduced — the precedents reference PRs and hour estimates, not "N offline + M @network" drift patterns) - python -m compute.output.schema_check → in sync (no schema touch) - python -m pytest tests/ -m "not network" → 959 passed (unchanged) https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU Co-authored-by: Claude <noreply@anthropic.com>
10 tasks
dackclup
added a commit
that referenced
this pull request
May 20, 2026
…ble skills (#132) 3-task housekeeping + tacit knowledge harvest. Docs/skills-only PR — no code, no schema delta, no test additions. Task A — SKILL.md schema-version table fixes --------------------------------------------- Two stale "in flight" entries flipped to merged + 1 new row inserted: - Row 0.9.0-phase4h: "(in flight in PR #112)" → "(PR #112 merged 2026-05-19)" - Row 0.9.1-phase4h.2: "(in flight in PR #<NEXT>)" → "(PR #118 merged 2026-05-19)" - NEW row 0.9.2-phase4h.2 (above 0.9.1) — PR #124 merged, multi-port OSAP adapter + osap_signals_dropped_no_long_short field, closing the 100-signal accounting equation; DSR sign-inversion deferred to Part 3 PHASE_STATUS.md row 4 ALSO has "Phase 4h.2 Part 2 in flight in this PR" staleness — confirmed via grep but DELIBERATELY not updated here per Task A explicit scope (SKILL.md only). Recommend a follow-up phase-status-bump PR after this lands. Task B — New worker-session-handoff skill ------------------------------------------ .claude/skills/worker-session-handoff/SKILL.md (+163 LOC). YAML frontmatter + 5 sections: - When to use vs inline (≤50 LOC single-file → inline; ≥2 files / new dep / code logic → handoff) - Constraint lock library (8 standard locks: composite/PHASE3, Rule 16, Rule 18, no-merge, no force-push, no --no-verify, no workflow_dispatch, schema triple) - Anti-pattern: paste-loop avoidance (single outer code-block fence; reference PR #123 as related-but-distinct paste-loop failure mode) - Template (paste-ready, single ```` outer code block with language tag ` text` so inner triple-backticks pass through) - Reference invocations + QuantRank precedents (PR #124, #127, #131) Codifies the handoff shape that appeared verbatim across PRs #123, #124, #127, #128, #129, #131 — user copies ONE block instead of editing 5 template snippets per handoff. Task C — Portable skills library (4 skills, +417 LOC) ----------------------------------------------------- Audit step (per spec): read CLAUDE.md + AGENTS.md + SKILL.md + WORKFLOW.md + PR descriptions of #112/#118/#124/#127/#128/#129/#131. Identified 7 candidate patterns; classified by portability: - ✅ scout-then-integrate (portable; vendoring pattern, no QR logic) - ✅ observability-before-wiring (portable; gate-diagnostic pattern) - ✅ drift-detector-manifest (portable; API surface lock pattern) - ✅ schema-triple-lockstep (portable; Python/TS JSON contract) - 🟡 annotate-before-veto (portable; progressive rollout — DEFERRED to follow-up issue, lower value vs the 4 shipped) - 🟡 pre-plan-investigations (subsumed by scout-then-integrate's Phase 1 § "Pre-plan investigations" — no separate skill needed) - 🟡 graceful-degradation-try-except (portable; error-handling pattern — DEFERRED to follow-up issue, the wrapper is generally 1-line so doesn't warrant a dedicated skill) 4 shipped (each ≤ 109 LOC): .claude/skills/portable-scout-then-integrate/SKILL.md (99 LOC) .claude/skills/portable-drift-detector-manifest/SKILL.md (109 LOC) .claude/skills/portable-schema-triple-lockstep/SKILL.md (103 LOC) .claude/skills/portable-observability-before-wiring/SKILL.md (106 LOC) Flat naming convention (`portable-<name>/SKILL.md` at depth 1 from `.claude/skills/`) because Claude Code's skill registry doesn't recurse into nested subdirectories per CLAUDE.md ## Conventions. Confirmed via session reload — all 4 portable + worker-session- handoff registered correctly. Each portable skill has: - YAML frontmatter (name + description + TRIGGER + SKIP) - ## Pattern section (generic, no QR business logic) - ## Trigger conditions + ## Skip conditions - ## QuantRank precedent (1 paragraph, clearly labeled as precedent not pattern definition) Task C constraint check: - All portable skills core pattern descriptions are project- agnostic (read `.claude/skills/portable-*/SKILL.md` ## Pattern sections — zero references to OSAP / IPCA / pillar / Top-5 inside the pattern body; only inside the labeled "QuantRank precedent" section at the bottom) - 3 of 4 portable skills are 103-109 LOC (slightly over the 100-LOC target — pattern + trigger + skip + precedent sections require ~25 LOC each, leaving ~25 LOC of unavoidable scaffold). The 99-LOC one (scout-then-integrate) shows the cap is achievable but tight. Files (6 changed, +580 LOC, no deletions) ------------------------------------------ - SKILL.md — schema-version table fixes (Task A) - 5 new SKILL.md files in .claude/skills/ (Tasks B + C) Verification ladder all green ------------------------------ - ruff check . → All checks passed - python tools/check_doc_test_counts.py → exit 0 - python tools/check_branch_collisions.py "skill" "portable" → expected⚠️ on #131 (own adjacent work, not a duplicate) - python -m compute.output.schema_check → in sync (no schema touch) - python -m pytest tests/ -m "not network" → 959 passed (unchanged; tools/ + .claude/skills/ aren't imported by tests) - Claude Code skill registry pick-up verified via session reload — all 5 new skills (worker-session-handoff + 4 portable-*) appear in the available-skills list Constraints honored ------------------- - No touch to compute/ / frontend/ / tests/ - No touch to PHASE_STATUS.md / WORKFLOW.md (Task A scope = SKILL.md only; PHASE_STATUS.md staleness flagged for follow-up) - No push to main; no force-push; no --no-verify - No workflow_dispatch trigger - Task C portable skills are project-agnostic in their pattern description (QR refs confined to labeled "precedent" sections) Follow-up issue (to file post-merge) ------------------------------------ Title: "Portable Skills Library — extract remaining tacit patterns" - annotate-before-veto (progressive rule rollout) - graceful-degradation-try-except (1-line wrapper guidance) - pre-plan-investigations as standalone (currently subsumed) - Anything else surfaced by future PR descriptions https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU Co-authored-by: Claude <noreply@anthropic.com>
dackclup
added a commit
that referenced
this pull request
May 20, 2026
…sk C.1 recovery) (#135) * docs(skills): SKILL.md schema bump + worker-session-handoff + 4 portable skills 3-task housekeeping + tacit knowledge harvest. Docs/skills-only PR — no code, no schema delta, no test additions. Task A — SKILL.md schema-version table fixes --------------------------------------------- Two stale "in flight" entries flipped to merged + 1 new row inserted: - Row 0.9.0-phase4h: "(in flight in PR #112)" → "(PR #112 merged 2026-05-19)" - Row 0.9.1-phase4h.2: "(in flight in PR #<NEXT>)" → "(PR #118 merged 2026-05-19)" - NEW row 0.9.2-phase4h.2 (above 0.9.1) — PR #124 merged, multi-port OSAP adapter + osap_signals_dropped_no_long_short field, closing the 100-signal accounting equation; DSR sign-inversion deferred to Part 3 PHASE_STATUS.md row 4 ALSO has "Phase 4h.2 Part 2 in flight in this PR" staleness — confirmed via grep but DELIBERATELY not updated here per Task A explicit scope (SKILL.md only). Recommend a follow-up phase-status-bump PR after this lands. Task B — New worker-session-handoff skill ------------------------------------------ .claude/skills/worker-session-handoff/SKILL.md (+163 LOC). YAML frontmatter + 5 sections: - When to use vs inline (≤50 LOC single-file → inline; ≥2 files / new dep / code logic → handoff) - Constraint lock library (8 standard locks: composite/PHASE3, Rule 16, Rule 18, no-merge, no force-push, no --no-verify, no workflow_dispatch, schema triple) - Anti-pattern: paste-loop avoidance (single outer code-block fence; reference PR #123 as related-but-distinct paste-loop failure mode) - Template (paste-ready, single ```` outer code block with language tag ` text` so inner triple-backticks pass through) - Reference invocations + QuantRank precedents (PR #124, #127, #131) Codifies the handoff shape that appeared verbatim across PRs #123, #124, #127, #128, #129, #131 — user copies ONE block instead of editing 5 template snippets per handoff. Task C — Portable skills library (4 skills, +417 LOC) ----------------------------------------------------- Audit step (per spec): read CLAUDE.md + AGENTS.md + SKILL.md + WORKFLOW.md + PR descriptions of #112/#118/#124/#127/#128/#129/#131. Identified 7 candidate patterns; classified by portability: - ✅ scout-then-integrate (portable; vendoring pattern, no QR logic) - ✅ observability-before-wiring (portable; gate-diagnostic pattern) - ✅ drift-detector-manifest (portable; API surface lock pattern) - ✅ schema-triple-lockstep (portable; Python/TS JSON contract) - 🟡 annotate-before-veto (portable; progressive rollout — DEFERRED to follow-up issue, lower value vs the 4 shipped) - 🟡 pre-plan-investigations (subsumed by scout-then-integrate's Phase 1 § "Pre-plan investigations" — no separate skill needed) - 🟡 graceful-degradation-try-except (portable; error-handling pattern — DEFERRED to follow-up issue, the wrapper is generally 1-line so doesn't warrant a dedicated skill) 4 shipped (each ≤ 109 LOC): .claude/skills/portable-scout-then-integrate/SKILL.md (99 LOC) .claude/skills/portable-drift-detector-manifest/SKILL.md (109 LOC) .claude/skills/portable-schema-triple-lockstep/SKILL.md (103 LOC) .claude/skills/portable-observability-before-wiring/SKILL.md (106 LOC) Flat naming convention (`portable-<name>/SKILL.md` at depth 1 from `.claude/skills/`) because Claude Code's skill registry doesn't recurse into nested subdirectories per CLAUDE.md ## Conventions. Confirmed via session reload — all 4 portable + worker-session- handoff registered correctly. Each portable skill has: - YAML frontmatter (name + description + TRIGGER + SKIP) - ## Pattern section (generic, no QR business logic) - ## Trigger conditions + ## Skip conditions - ## QuantRank precedent (1 paragraph, clearly labeled as precedent not pattern definition) Task C constraint check: - All portable skills core pattern descriptions are project- agnostic (read `.claude/skills/portable-*/SKILL.md` ## Pattern sections — zero references to OSAP / IPCA / pillar / Top-5 inside the pattern body; only inside the labeled "QuantRank precedent" section at the bottom) - 3 of 4 portable skills are 103-109 LOC (slightly over the 100-LOC target — pattern + trigger + skip + precedent sections require ~25 LOC each, leaving ~25 LOC of unavoidable scaffold). The 99-LOC one (scout-then-integrate) shows the cap is achievable but tight. Files (6 changed, +580 LOC, no deletions) ------------------------------------------ - SKILL.md — schema-version table fixes (Task A) - 5 new SKILL.md files in .claude/skills/ (Tasks B + C) Verification ladder all green ------------------------------ - ruff check . → All checks passed - python tools/check_doc_test_counts.py → exit 0 - python tools/check_branch_collisions.py "skill" "portable" → expected⚠️ on #131 (own adjacent work, not a duplicate) - python -m compute.output.schema_check → in sync (no schema touch) - python -m pytest tests/ -m "not network" → 959 passed (unchanged; tools/ + .claude/skills/ aren't imported by tests) - Claude Code skill registry pick-up verified via session reload — all 5 new skills (worker-session-handoff + 4 portable-*) appear in the available-skills list Constraints honored ------------------- - No touch to compute/ / frontend/ / tests/ - No touch to PHASE_STATUS.md / WORKFLOW.md (Task A scope = SKILL.md only; PHASE_STATUS.md staleness flagged for follow-up) - No push to main; no force-push; no --no-verify - No workflow_dispatch trigger - Task C portable skills are project-agnostic in their pattern description (QR refs confined to labeled "precedent" sections) Follow-up issue (to file post-merge) ------------------------------------ Title: "Portable Skills Library — extract remaining tacit patterns" - annotate-before-veto (progressive rule rollout) - graceful-degradation-try-except (1-line wrapper guidance) - pre-plan-investigations as standalone (currently subsumed) - Anything else surfaced by future PR descriptions https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU * docs(skills): Vendor karpathy-guidelines (Task C.1 recovery) + THIRD_PARTY_NOTICES.md Recovers Task C.1 from the original handoff that was silent-dropped in the prior PR #132 commit (50da720). The handoff explicitly named "Vendor karpathy-guidelines (1 skill, ~70 LOC)" as part of the portable skills library; the auditor session caught the omission and authorized this follow-up commit on the existing branch. Files (2 new, +138 LOC) ------------------------ - .claude/skills/portable-karpathy-guidelines/SKILL.md (+82 LOC) — vendored content of upstream skills/karpathy-guidelines/SKILL.md (67 LOC, byte-for-byte preserved) + 15-line appended attribution block referencing the upstream source, commit SHA, and the Karpathy tweet that motivated the guidelines. - THIRD_PARTY_NOTICES.md (+56 LOC, NEW at repo root) — third-party license disclosures. Section "karpathy-guidelines (Claude Code skill)" carries source URL, license declaration, vendored path, vendored date, upstream commit SHA, upstream first-commit date, and the full standard MIT License text with copyright attributed to "multica-ai contributors" (upstream has no individual copyright line and no standalone LICENSE file; the `license: MIT` claim appears in upstream README.md § License and each skill's YAML frontmatter). Upstream provenance ------------------- - Source: https://github.com/multica-ai/andrej-karpathy-skills - Upstream HEAD SHA at vendoring: 2c606141936f1eeef17fa3043a72095b4765b9c2 - Upstream first commit: 2026-01-27 - Vendored date: 2026-05-20 - License: MIT (declared) Verbatim content preserved -------------------------- `diff /tmp/karpathy-src/skills/karpathy-guidelines/SKILL.md .claude/skills/portable-karpathy-guidelines/SKILL.md` shows ONLY the 15-line appended attribution block at lines 68-82. The upstream 67-line content (YAML frontmatter + "Karpathy Guidelines" heading + the 4 principles) is byte-for-byte unchanged. Per the spec constraint: "เก็บ 4 principles verbatim. แก้ได้แค่ 'เพิ่ม' attribution block ท้ายไฟล์". License-disclosure caveat ------------------------- Upstream `multica-ai/andrej-karpathy-skills` declares MIT via README + YAML frontmatter but does NOT ship a standalone LICENSE file. The `THIRD_PARTY_NOTICES.md` entry includes the standard MIT License template with copyright attributed to the GitHub org ("multica-ai contributors"), matching the principle that an MIT declaration without a formal copyright line still licenses to the redistributor; the attribution is conservative. Verification ladder all green ------------------------------ - ruff check . → All checks passed - python tools/check_doc_test_counts.py → exit 0 (no test-count drift introduced by this commit) - python tools/check_branch_collisions.py "karpathy" → no scope collisions detected - python -m compute.output.schema_check → in sync (no schema touch) - python -m pytest tests/ -m "not network" → 959 passed (unchanged; .claude/skills/ + THIRD_PARTY_NOTICES.md aren't imported by tests) - Skill registry pickup verified via session reload — `portable-karpathy-guidelines` appears in the available-skills list with the upstream description verbatim Constraints honored ------------------- - No squash / amend of the prior 50da720 commit — this is a fresh commit pushed on top of the existing branch (per spec "ห้าม squash old commit") - No touch to the 4 already-shipped portable skills in 50da720 - No touch to compute/ / frontend/ / tests/ - No push to main; no force-push; no --no-verify - No workflow_dispatch trigger - Karpathy SKILL.md upstream content preserved verbatim; only the attribution block appended below the original content PR description update will follow as a separate `gh pr edit` / MCP `update_pull_request` call so the new "License Compliance" section + the audit-table row for karpathy-guidelines land in the PR body. https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU --------- Co-authored-by: Claude <noreply@anthropic.com>
dackclup
added a commit
that referenced
this pull request
May 20, 2026
…4 staleness (#139) Closes #133. Docs/skills-only PR. Task A — Portable skills library final 2 (closes #133) ------------------------------------------------------ Extracts the last 2 deferred-but-tracked patterns from epic #125: - .claude/skills/portable-annotate-before-veto/SKILL.md (108 LOC): Progressive-rollout pattern for defense / risk flags. Ship as annotate FIRST, promote to veto only after ≥ 1 production cron of observation + threshold calibration + cohort-acceptance check. Forcing precedent: Phase 4.5 cluster (loss_avoidance_pattern at 0% fire rate would've been a no-op or hotfix candidate as a veto; annotate made it observable). - .claude/skills/portable-graceful-degradation-try-except/SKILL.md (115 LOC): Wrap every external-data integration call site in a try/except that sets ALL related output fields to None on failure + writes a structured log line + sets a per-integration status Metadata field. 3-rule contract: no partial state, no log swallowing, downstream-aware. Forcing precedent: OSAP integration in compute/main.py (PRs #112 → #118 → #124). Both skills follow the established portable-* convention from PR #132 (YAML frontmatter + Pattern + Trigger + Skip + QuantRank precedent section). Each pattern section is project-agnostic; QuantRank refs confined to the labeled "QuantRank precedent" sections at the bottom. Task B — PHASE_STATUS.md row 4 staleness fix --------------------------------------------- PHASE_STATUS.md row 4 said "Phase 4h.2 Part 2 in flight in this PR" since PR #124's prep work. PR #124 merged 2026-05-19 (commit sequence visible in main: ...124...118...112...). Updated to "Phase 4h.2 Part 2 merged via PR #124 (2026-05-19)" — the rest of the row 4 text (multi-port OSAP adapter description, IC-decay deferral note) stays unchanged. This was flagged in PR #132 body and tracked as a small follow-up. No other PHASE_STATUS.md edits — row 4 is the only stale entry. Task C — Docs lockstep ----------------------- CLAUDE.md row 33 skill count: 35 → 37 (QR-origin portable category 4 → 6, total reflects the 2 new skills landed here). Categorisation unchanged otherwise; 9arm license-pending caveat still flagged with cross-reference to issue #137. Skill inventory after this PR (37 total) ----------------------------------------- - QuantRank operational: 12 - QR-origin portable extract: 6 (was 4; +annotate-before-veto + graceful-degradation-try-except) - Anthropic vendored: 6 - External MIT vendored: 9 (Karpathy + 8 mattpocock, unchanged) - External license-pending vendored: 4 (9arm, unchanged) Verification ladder ------------------- - ruff check . → All checks passed - python -m compute.output.schema_check → Schema snapshot in sync - python tools/check_doc_test_counts.py → exit 0 - pytest tests/ -m "not network" → not run locally (sandbox missing pandas); CI will verify. Changes are docs/skills-only. - Skill registry pickup verified via session reload — both portable-annotate-before-veto and portable-graceful-degradation-try-except register with full YAML-frontmatter descriptions. Constraints honored ------------------- - No touch to compute/ / frontend/ / tests/ - No touch to WORKFLOW.md (out of scope; could file a future follow-up if WORKFLOW.md needs to cross-reference the two new portable skills) - No squash / amend of prior commits - No push to main; no force-push; no --no-verify - No workflow_dispatch trigger - 2 new portable skills pattern descriptions are project-agnostic; QR refs only in labeled "precedent" sections Epic #125 status after this PR ------------------------------- - #130 (quarterly cohort-threshold review tracker) — recurring, unchanged - #133 (portable skills library remaining) — CLOSED by this PR - #137 (9arm-skills license clarification) — external action, waiting on user to file upstream issue at thananon/9arm-skills Epic #125 Item 3 (Pre-merge production simulation) remains the only substantive open scope. PHASE_STATUS.md row 4 staleness was the last housekeeping task. https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2 Co-authored-by: Claude <noreply@anthropic.com>
4 tasks
dackclup
added a commit
that referenced
this pull request
May 20, 2026
…PR A) (#141) First PR in the multi-PR .md optimization sequence (Option D scope — yกเครื่อง). PR A is the low-risk baseline: fixes 2 broken skill frontmatters that prevent dispatch + drift-fixes 4 stale facts in agent docs. Critical YAML fix: - branch-collision-check/SKILL.md and pr-quality-gate/SKILL.md had multi-line `description:` plain-scalar frontmatter that PyYAML (and Claude Code's skill loader) couldn't parse because lines contain `#123` / `#X` issue references after whitespace — YAML treats ` #` as a comment marker, so everything after the first comment-trigger got eaten and the loader fell back to displaying `name: name` in the available-skills list. Both skills were effectively undispatchable from any session. - Fix: change `description:` to `description: >` (folded block scalar) so newlines become spaces and `#` mid-content is treated as literal text. Verified live in this session — system reminder now shows the full TRIGGER/SKIP descriptions for both. Stale-fact pass: - .claude/skills/README.md L14-16: "27 invocation-triggerable skills" → references CLAUDE.md as the canonical count (38) to prevent future drift. Future top-level skill add/remove only needs to bump CLAUDE.md §Layout, not three files. - AGENTS.md L104: ".claude/skills/ # 24 loaded skills" → 38. - AGENTS.md L287: "Schema version: 0.8.0-phase4.5f" → 0.9.2-phase4h.2 (3 versions behind). Now references SKILL.md schema-version table for full history. - CLAUDE.md L181-192 (§Phase status): "Current schema 0.9.1-phase4h.2 ... Phase 4h in flight in PR #112" → 0.9.2-phase4h.2 + Phase 4h shipped (Parts 1+2 done via #112/#118/#124). - CLAUDE.md + AGENTS.md §Phase status: "Epic #125 Item 3 in flight via PR #140" → "PR 1 of 2 shipped" at commit a52aa2d; PR 2 remaining. CLAUDE.md + AGENTS.md edit ships per the lockstep convention. No code touched, no schema touched — pre-merge-prod-sim.yml won't trigger (paths compute/scoring + compute/features unaffected). Next in optimization sequence: PR B (CLAUDE.md token diet) — TBD after user reviews this one. Co-authored-by: Claude <noreply@anthropic.com>
dackclup
added a commit
that referenced
this pull request
May 20, 2026
…D) (#144) Fourth PR in the .md optimization sequence (Option D). WORKFLOW.md was 1732 lines because it accumulated complete task lists + acceptance criteria for every shipped phase. Phase 0-3 (v1.0) shipped 2026-05-14 and is historical now — its content is post-mortem documentation, not forward-looking guidance. WORKFLOW.md (1732 → 1460 lines, -16%): - L196-L473 (Phase 0 / 1 / 2 / 3 + PR 3c / 3d / 3e detail + Phase 3 acceptance) extracted verbatim into a new archive file - Replaced with a 7-line pointer: archive location + v1.0 ship date + "current work starts at Phase 4 below" New file docs/archived/PHASE_0_3_WORKFLOW.md (290 lines): - Header notes archive date + v1.0 ship date + back-link to WORKFLOW.md as the live forward-looking source - Body preserves the original Phase 0-3 content verbatim — no edits to historical record CLAUDE.md (180 → 181 lines, per-PR lockstep): - §Phase status "Recently merged" — added PR #143 (B sync + dedup); dropped #124 (now 2 entries past the bar) - §Phase status — replaced "PR C in flight" tracker with "PR D in flight" note covering archive scope AGENTS.md (343 → 343 lines, per-PR lockstep): - §Phase + version state — optimization sequence tracker updated: PR C ✅ → PR D in flight, PR E-G remaining Why this matters for Claude effectiveness: - WORKFLOW.md is referenced on-demand (not auto-loaded), but when Claude reads it for "what's the current Phase X plan?", it previously waded through 273 lines of completed work first. Now Phase 4 starts at L205 instead of L478 → faster orientation. - Archive preserves the v1.0 build history for anyone who wants to understand how the project bootstrapped — Phase 8 (S&P 1500 universe expansion) PRs will likely reference Phase 0-3 patterns. What this PR does NOT touch: - Code · schemas · CI workflows · pre-merge-prod-sim.yml won't trigger Next in sequence: PR E (SKILL.md restructure + TOC) · PR F (skill description audit ×38) · PR G (PHASE_STATUS.md "Current State" summary at top). Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #116 (Part 2 scope).
Summary
Phase 4h.2 Part 2 closes the OSAP 100-signal accounting gap that Part 1 made visible. Production cron at commit
182c02de(version0.9.1-phase4h.2) exposed the imbalance:Root cause (
compute/features/osap_replicate.py:60,65,120,135-136): hardcodedLONG_PORT_LABEL = "01"/SHORT_PORT_LABEL = "10". OSAP delivers quintile signals as ports01..05and tercile signals as01..03— every row that didn't matchport=10was silently dropped before reaching the PBO/DSR gate.Accounting equation pre / post
Where
X + Y ≈ 78(the 56 newly recovered + the 22 already visible) andZ ≈ 0until Part 3 ships the sign-inversion fix.Sub-task breakdown
1. Multi-port adapter (
compute/features/osap_replicate.py)Replaced hardcoded port constants with per-signal
min(port)/max(port)inference:groupby("signalname")to derive each signal's port extentslong_port = min(unique ports),short_port = max(unique ports)< 2distinct ports are dropped (no LS pair)"long"/"short"role columns so the LS axis stays stable across heterogeneous port cardinalitiesDecile signals (
01..10) degenerate to the same("01", "10")corners under min/max — backward-compatible. Quintile signals →("01", "05"). Tercile signals →("01", "03").2. Accounting-balance diagnostic (schema triple)
New helper
signals_dropped_no_long_short(returns) -> list[str]surfaces signals present in the dataset but with<2distinct port buckets. Wired throughcompute/main.pyinto newMetadatafield:Schema triple moved together: Pydantic (
compute/output/schemas.py) + TypeScript (frontend/lib/types.ts) + snapshot (auto-regenerated viapython -m compute.output.schema_check --update-snapshot).Phase 4h.2 Part 2 accounting invariant (asserted by
test_part2_accounting_invariant_against_synthetic_manifest):3. DSR investigation → DEFER to Part 3
Both hypotheses investigated against the production diagnostic at
frontend/public/data/metadata.json(cron run26095888044on commit182c02de):(a) Signal sign inversion — CONFIRMED. Every gated signal at 0.9.1 shows
rejection_reason: "low_dsr"with negative Sharpe:This is the classic OSAP "anomaly" pattern: many signals predict that the SHORT portfolio outperforms LONG, so the naive
LONG − SHORTis correctly capturing that as negative excess return — but the gate (thresholdDSR > 0) rejects it. Proper fix requires fetching OSAP'sSignalDoc.csvfor per-signal sign metadata (Cat.SignalSign) and flipping the LS for "anomaly" signals.(b) DSR threshold too tight for monthly returns — RULED OUT.
compute/validation/pbo_dsr.py:62setsDSR_VETO_THRESHOLD: float = 0.0— already maximally permissive (the canonical Bailey-Lopez de Prado 2014 threshold isDSR > 0.95). The 100%low_dsrrejection rate is genuine, not a threshold artifact.Decision: ship Part 2 with hypothesis (a) annotated for Phase 4h.2 Part 3 follow-up. Expected post-Part-2 acceptance count (with sign uncorrected) remains
≈ 0; the headline Part 2 win is the dropped-no-long-short diagnostic surface, not acceptance recovery.4. Schema PATCH bump
compute/config.py::SCHEMA_VERSION"0.9.1-phase4h.2"→"0.9.2-phase4h.2"(PATCH bump per the additive-onlyMetadatachange). Snapshot regenerated. Existingtests/test_config.py::test_schema_version_is_phase4h_2updated to match.Constraints honored
compute_composite/PHASE3_WEIGHTS(sum=1.0 lock atcomposite.py:43-45)compute/scoring/osap_blend.pycomposite_score; no scoring touched--no-verifyworkflow_dispatchtrigger (compute-rankings.ymluntouched)Test plan
ruff check .→ All checks passedpython -m pytest tests/ -m "not network"→ 945 passed (936 baseline + 9 new osap tests, 77s)python -m compute.output.schema_check→ in synccd frontend && npx --no -- tsc --noEmit→ cleanmain(non_reliance_filing/auditor_changeTier-2 baseline drift) — unrelated to Part 2phase-4h.2-part2-multi-port-adapterFiles (10 changed, +353 / −26)
compute/features/osap_replicate.pycompute/main.pycompute/output/schemas.pycompute/config.pyfrontend/lib/types.tsfrontend/lib/schema-snapshot.jsonfrontend/public/data/metadata.jsontests/test_features/test_osap_replicate.pytests/test_config.pyPHASE_STATUS.mdNew test cases
test_compute_long_short_returns_handles_quintile_signaltest_compute_long_short_returns_handles_tercile_signaltest_compute_long_short_returns_handles_mixed_port_universetest_compute_long_short_returns_drops_single_port_signaltest_signals_dropped_no_long_short_empty_inputtest_signals_dropped_no_long_short_missing_port_columntest_signals_dropped_no_long_short_identifies_single_port_signalstest_signals_dropped_no_long_short_normalizes_int_ports_normalize_port_labeltest_part2_accounting_invariant_against_synthetic_manifesthttps://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU
Generated by Claude Code