Skip to content

feat(phase-4h): OSAP integration — foundation + replicate + blend + PBO/DSR gate#112

Merged
dackclup merged 5 commits into
mainfrom
claude/resume-quantrank-phase-4.5-Zh0pO
May 18, 2026
Merged

feat(phase-4h): OSAP integration — foundation + replicate + blend + PBO/DSR gate#112
dackclup merged 5 commits into
mainfrom
claude/resume-quantrank-phase-4.5-Zh0pO

Conversation

@dackclup
Copy link
Copy Markdown
Owner

Summary

Phase 4h commit 1 of N — foundation only. Lays the schema-triple groundwork + the cron-workflow extras bump so subsequent commits (replicate / blend / validation / main.py wiring) can extend without each touching ask-first surfaces. No new compute paths are wired into main.py yet — the schema additions are all Optional with None defaults; the next weekly cron run will write version: "0.9.0-phase4h" JSONs with the OSAP fields absent (i.e., null), which is forward-compatible.

This PR will accumulate ~4 more commits before flipping Draft → Ready:

# Commit (planned) Files
1 (this one) foundation — scout kwargs + schema triple + cron extras 8 files, +85/-8
2 OSAP_SIGNALS_100 manifest + compute/features/osap_replicate.py ~600 LOC + ~80 test LOC
3 compute/scoring/osap_blend.py (Path b — outside compute_composite) ~80 LOC + ~50 test LOC
4 compute/validation/osap_validation.py (PBO/DSR gate wrapper) ~120 LOC + ~40 test LOC
5 compute/main.py wiring + @network integration test ~30 LOC + ~30 test LOC
(then docs row in commit 5 or a 6) PHASE_STATUS / WORKFLOW / SKILL / CLAUDE Phase 4h done markers docs

Plan ref: /root/.claude/plans/resume-quantrank-swift-barto.md (audited 2026-05-18; all 5 citation/blend issues fixed; line numbers re-verified with grep).

Commit 1 changes

  • compute/ingest/osap.py: extend fetch_osap_returns() with keyword-only signals: list[str] | None + as_of: date | None filters. Non-breaking (defaults None); cache always stores the full bulk parquet — filter happens post-load so multiple callsites with different subsets don't fight over the cache.
  • compute/output/schemas.py (Pydantic):
    • StockDetail: +osap_signals: dict[str, float] | None, osap_blended_score: float | None.
    • Metadata: +osap_signals_used: list[str] | None, osap_excluded_signals: list[str] | None, osap_signals_ic_12m: dict[str, float] | None, osap_signals_coverage_pct: dict[str, float] | None.
  • compute/config.py: SCHEMA_VERSION "0.8.0-phase4.5f""0.9.0-phase4h" (MINOR bump = new phase per SKILL.md convention).
  • frontend/lib/types.ts + frontend/lib/schema-snapshot.json: mirror Pydantic; snapshot auto-regenerated.
  • .github/workflows/compute-rankings.yml: install line pip install -e .pip install -e ".[factors]" so weekly cron has openassetpricing in its env once commit 5 wires it into main.py.
  • tests/test_config.py + tests/test_smoke.py: SCHEMA_VERSION pin assertions → 0.9.0-phase4h.

⚠️ Sensitive-surface touches (per AGENTS.md)

All three flagged in the plan audit before this commit; user authorized 2026-05-18:

  • .github/workflows/compute-rankings.yml (AGENTS.md:105 + line 233 wildcard). Single-line install bump; no timeout / ordering / cache changes.
  • Schema triple (AGENTS.md:229-231): schemas.py + types.ts + schema-snapshot.json moved in lockstep in this one commit. schema_check (no --update-snapshot) confirms in-sync post-edit.
  • SCHEMA_VERSION (compute/config.py:30): bumped MINOR. Legacy 0.8.x JSONs deserialize cleanly because the new fields are all | None = None.

Blend approach lock (audit feedback)

apply_osap_blend() (commit 3) will compute outside compute_composite():

composite_score_osap_adjusted = (1 - weight) × composite_score + weight × osap_signal_aggregate

The PHASE3_WEIGHTS sum-to-1.0 invariant at compute/scoring/composite.py:43-45 stays intact — no 9th slot added. 50/50 default locked per osap-integration/PLAN.md:168-170.

Verification (this commit only — local)

  • ruff check . → clean
  • python -m pytest tests/ -m "not network"861 passed in 25s (no regressions)
  • python -m compute.output.schema_check → ✓ in sync
  • cd frontend && npx tsc --noEmit → clean
  • cd frontend && npx next build → 506 static pages generated, identical shape to post-PR-110 production build

Test plan (PR-level — accumulates across commits)

  • Commit 1 CI green on 06bdac76 (ci.yml + Frontend (build))
  • Commit 2: OSAP_SIGNALS_100 manifest + replicate module + 12 offline tests
  • Commit 3: blend module + 8 offline tests
  • Commit 4: validation module + 10 offline tests
  • Commit 5: main.py wiring + 1 @network integration test + per-signal PBO/DSR scorecard posted as PR comment
  • Docs row (PHASE_STATUS / WORKFLOW / SKILL / CLAUDE) before Mark-Ready
  • User authorizes Draft → Ready flip after spot-check + Section I (Vercel + Playwright) on Vercel preview

Branch note

SDK preamble locks branch to claude/resume-quantrank-phase-4.5-Zh0pO (same harness allocation that hosted PR #110). The branch was deleted on remote post-PR-110-merge; this push creates it fresh on top of current main. Title/scope reflect Phase 4h, not Phase 4.5.

🤖 Drafted with Claude Code via the Anthropic SDK.


Generated by Claude Code

Phase 4h commit 1 of N. Lays the groundwork for the OSAP composite-blend
integration without yet wiring any new compute paths into main.py — the
schema bump + workflow install bump land first so subsequent commits
(replicate.py, blend.py, validation.py, main.py wiring) can extend the
foundation without each touching ask-first surfaces.

Changes:

- `compute/ingest/osap.py`: extend `fetch_osap_returns()` with keyword-only
  `signals: list[str] | None` + `as_of: date | None` filters (non-breaking
  — both default None). Filter happens post-cache-load so a callsite asking
  for 20 signals doesn't invalidate a callsite asking for all 1,188. The
  cache always stores the full bulk parquet.
- `compute/output/schemas.py`:
  - `StockDetail`: +`osap_signals: dict[str, float] | None` (per-stock
    signalname → cross-sectional rank for the PBO/DSR-accepted subset) +
    `osap_blended_score: float | None` (the 50/50 blend output).
  - `Metadata`: +`osap_signals_used: list[str] | None` + `osap_excluded_
    signals: list[str] | None` + `osap_signals_ic_12m: dict[str, float] |
    None` + `osap_signals_coverage_pct: dict[str, float] | None`.
  - All fields Optional with None default → forward-compatible with
    legacy 0.8.x JSONs.
- `compute/config.py`: `SCHEMA_VERSION` 0.8.0-phase4.5f → 0.9.0-phase4h
  (MINOR bump = new phase per SKILL.md schema-versions convention).
- `frontend/lib/types.ts` + `frontend/lib/schema-snapshot.json`: mirror
  the schema additions; snapshot regenerated via `schema_check
  --update-snapshot`.
- `.github/workflows/compute-rankings.yml`: install line
  `pip install -e .` → `pip install -e ".[factors]"` so the next cron
  run is ready when commit 5 (main.py wiring) lands — pinned
  `openassetpricing==0.0.2` already in pyproject `factors` extra from
  PR #110.
- `tests/test_config.py` + `tests/test_smoke.py`: SCHEMA_VERSION pin
  assertions updated to phase4h.

Blend approach (locked in plan audit 2026-05-18, Path b):
`apply_osap_blend()` will compute outside `compute_composite()` — formula
`composite_score_osap_adjusted = (1 - weight) × composite_score +
weight × osap_signal_aggregate`. This commit does NOT yet touch the
composite path; that lands in commit 3 (osap_blend.py). The
PHASE3_WEIGHTS sum-to-1.0 invariant at composite.py:43-45 stays intact.

Verification (this commit only):
- `ruff check .` → clean
- `python -m pytest tests/ -m "not network"` → 861 passed
- `python -m compute.output.schema_check` → in sync (no
  `--update-snapshot` needed; regen committed)
- `cd frontend && npx tsc --noEmit` → clean
- `cd frontend && npm run build` → 506 static pages, identical
  shape to post-PR-110 production build

Ask-first surfaces touched (per AGENTS.md):
- `.github/workflows/compute-rankings.yml` (AGENTS.md:105)
- Schema triple (AGENTS.md:229-231)
- `SCHEMA_VERSION` (compute/config.py:30)

All flagged in advance in the plan audit; user authorized 2026-05-18.

https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU
@vercel
Copy link
Copy Markdown

vercel Bot commented May 18, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
quantrank Ready Ready Preview, Comment May 18, 2026 11:46am

…or-exposure proxy)

Phase 4h commit 2 of N. Lays the per-stock signal mapping layer that
commit 3's blend module will consume. The manifest is committed to
config (so the validation harness in commit 4 can iterate over it
without importing from the replicate module), and the replicate
module ships the *factor-exposure proxy* version of the algorithm —
honest, well-documented, and complete-enough for the Phase 4h blend
target.

Module layer:

- `compute/config.py`: OSAP_SIGNALS_100 manifest (exactly 100 entries
  across 8 theme buckets matching osap-integration/PLAN.md L60-73).
  Two `assert` statements at module-load time pin the cardinality
  and uniqueness invariants — any future edit that strays from 100
  unique signals fails at import (caught by ruff / pytest collect,
  surfaces during local dev). Manifest is aspirational: commit 4's
  PBO/DSR gate filters out any signal that doesn't resolve in the
  fetched OSAP DataFrame and logs the rejection under
  `metadata.json::osap_excluded_signals`.

- `compute/features/osap_replicate.py` NEW: four public functions
  + one orchestrator + one coverage helper.
  - `compute_long_short_returns(returns)` — pivots port to columns,
    derives ls_return = port=01 − port=10, drops decile buckets and
    incomplete pairs.
  - `select_as_of_cross_section(ls_returns, as_of)` — picks the
    most-recent observation per signal at or before as_of.
  - `rank_signals_cross_sectional(cross_section)` — `pandas.rank(
    method='average', pct=True)`; no scipy dependency.
  - `compute_osap_signals(returns, tickers, as_of, requested_signals)`
    — orchestrator returning `{ticker: {signalname: rank} | None}`.
  - `coverage_by_signal(signal_map)` — per-signal coverage %
    helper for commit 5's `metadata.json` write.

  *Factor-exposure proxy mode* (locked 2026-05-18 plan audit, Path
  consistent with §Scope IN #2): every ticker receives the same
  signal map, derived from the market-wide OSAP long-short return
  cross-section at as_of. True per-stock signal replication (porting
  100 OSAP SAS/Stata formulas into pandas) is the deferred heavy
  lift. Module docstring documents why this is sufficient for the
  blend target:
    1. osap_blended_score is observability-only this phase (SKILL.md
       Rule 16 — Top-5 ranking still uses raw composite_score).
    2. PBO/DSR gate (commit 4) runs on the long-short returns
       themselves, not the per-stock projection, so signal acceptance
       is identical to the full version.
    3. Per-stock replication of all 100 signals slips Phase 4h by
       weeks without unblocking 4i/4j/4k.

Tests (`tests/test_features/test_osap_replicate.py`, 14 offline):

- 4 covering `compute_long_short_returns`: basic 2-signal happy
  path, missing-short-port drop, decile-bucket exclusion, integer
  port column coercion.
- 3 covering `select_as_of_cross_section`: most-recent-per-signal
  pick, future-date filter, empty-window None handling.
- 2 covering `rank_signals_cross_sectional`: unit-interval
  normalisation, ties get average rank.
- 5 covering `compute_osap_signals` end-to-end: full happy path
  with proxy-mode invariant assertion, empty-returns None policy,
  universe-gap None policy, manifest cardinality sanity, and a
  smoke test against the shipped scout fixture
  (`tests/fixtures/osap_returns_sample.csv` from PR #110).

Verification:

- `ruff check .` → clean
- `python -m pytest tests/ -m "not network"` → 875 passed (861
  prior + 14 new osap_replicate)
- No `compute.main` import yet — wiring lands in commit 5
- No schema bump (already at 0.9.0-phase4h from commit 1)

Universe-gap policy: tickers receive None (not zero, not an imputed
neutral) when the as-of cross-section is empty. Pillar
`compute_composite(neutralize_missing=True)` imputes 50.0 for
missing pillars; OSAP intentionally does not. Commit 3's blend
layer treats None as "no OSAP adjustment" and passes
composite_score through unchanged.

Next: commit 3 — `compute/scoring/osap_blend.py` (~80 LOC + 8
tests). The Path-b formula
`(1 - weight) × composite_score + weight × osap_signal_aggregate`
stays OUTSIDE `compute_composite`; PHASE3_WEIGHTS sum-to-1.0
invariant at composite.py:43-45 stays intact.

https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU
…ult)

Phase 4h commit 3 of N. Ships the blend layer that consumes commit 2's
per-ticker signal map and produces ``composite_score_osap_adjusted``
without touching the PHASE3_WEIGHTS sum-to-1.0 invariant at
``compute/scoring/composite.py:43-45``.

**Path-b architecture decision** — the OSAP correction is applied
*outside* ``compute_composite()``, NOT as a 9th slot in
PHASE3_WEIGHTS. Two reasons documented in module docstring:

1. Adding a 9th slot would either fail the existing
   ``abs(_W_SUM - 1.0) > 1e-9 → ValueError`` invariant or force a
   pro-rata redistribution of the 8 active Phase-3 pillars — both
   alter the established composite math retroactively.
2. Phase 4h's blend is observability-only (Top-5 still ranked by
   raw composite_score per SKILL.md Rule 16); a layered
   ``composite_score → composite_score_osap_adjusted`` keeps the
   pre-blend score on every StockDetail for direct
   delta-attribution.

Module layer (`compute/scoring/osap_blend.py`, 96 LOC):

- ``OSAP_BLEND_WEIGHT_DEFAULT = 0.5`` — locked at
  osap-integration/PLAN.md L168-170 (50/50 default). Phase 5 ML
  meta-learner is where this can move.

- ``aggregate_osap_signals(signal_map) -> pd.Series`` — pools the
  per-ticker ``{signalname: rank}`` map into a single 0-100 score
  via arithmetic mean of ranks × 100. NaN for tickers with ``None``
  inner map (universe gap) or empty ``{}``. Empty input → empty
  Series. Matches the shape returned by commit 2's
  ``compute_osap_signals``.

- ``apply_osap_blend(composite_scores, osap_signal_aggregate,
  weight=OSAP_BLEND_WEIGHT_DEFAULT) -> pd.Series`` — Path-b formula
  ``(1 - weight) × composite_score + weight × osap_signal_aggregate``.
  Output indexed by ``composite_scores.index``, dtype float, name
  ``composite_score_osap_adjusted``, clipped to [0, 100] to match
  composite-score domain.

**Universe-gap policy** — tickers whose OSAP aggregate is NaN
(after reindex) pass their raw composite_score through unchanged.
NO impute. This is intentionally distinct from pillar
``compute_composite(neutralize_missing=True)`` which imputes 50.0
for missing pillar values: an OSAP-blank ticker is "no information
added", not "no information available", so imputing 50.0 would
silently shrink the composite toward neutral and bias Top-5
against OSAP-covered names.

Tests (`tests/test_scoring/test_osap_blend.py`, 17 offline,
exceeded plan's 8-test floor):

- 4 covering aggregate: mean-of-ranks × 100 math, None → NaN,
  empty inner dict → NaN, empty input → empty Series.
- 12 covering apply_osap_blend: 50/50 basic, weight=0 pass-through,
  weight=1 OSAP-only-where-covered, NaN OSAP fallback, empty
  composite, output clipping to [0, 100], invalid weight raises
  (< 0 and > 1), extra OSAP tickers dropped via reindex, missing
  OSAP ticker pass-through, default-weight matches constant,
  end-to-end shape with commit 2's signal-map format, dtype
  preservation (int composite → float blended).
- 1 cross-module sanity: round-trip with the exact signal_map shape
  produced by ``compute_osap_signals`` (commit 2) → aggregate →
  blend, asserting universe-gap and math.

Verification:

- ``ruff check .`` → clean (ruff auto-reorganized one import block;
  no logic changes)
- ``python -m pytest tests/ -m "not network"`` → 892 passed
  (875 prior + 17 new osap_blend)
- No ``compute.main`` import yet — wiring lands in commit 5
- No schema bump (already at 0.9.0-phase4h from commit 1)
- ``PHASE3_WEIGHTS`` untouched; composite.py L43-45 invariant
  intact (manually verified: ``grep -n PHASE3_WEIGHTS
  compute/scoring/composite.py`` returns the same 4 hits as
  pre-commit)

Next: commit 4 — ``compute/validation/osap_validation.py`` (PBO/DSR
hard gate per-signal + rolling-12m-Spearman-IC observability,
~120 LOC + 10 tests). Wraps PR #60's ``factor_passes_gates``
(``compute/validation/pbo_dsr.py:388``); accepted-signal subset
feeds commit 5's ``compute/main.py`` wiring.

https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU
Phase 4h commit 4 of N. Ships the cohort-aware wrapper that decides
which of the 100 candidate OSAP signals are accepted into the
``composite_score_osap_adjusted`` blend (commit 3). Wraps PR #60's
``compute/validation/pbo_dsr.py::factor_passes_gates`` — does NOT
reimplement PBO or DSR math.

Module layer (``compute/validation/osap_validation.py``, 220 LOC):

- ``GateResult(frozen dataclass)`` — per-signal verdict with
  ``accepted: bool``, ``pbo / dsr / sharpe: float | None``,
  ``n_observations: int``, ``rejection_reason: str | None`` in
  ``{None, 'high_pbo', 'low_dsr', 'gate_failed', 'insufficient_data'}``.
  Distinct ``'gate_failed'`` category for diagnostic clarity when both
  PBO AND DSR fail simultaneously (Bailey 2014 pure-noise cohorts fail
  this way — verified by tests #1, #4, #8).

- ``gate_osap_signals(long_short_returns, requested_signals=None,
  pbo_threshold=PBO_VETO_THRESHOLD, dsr_threshold=DSR_VETO_THRESHOLD,
  n_partitions=DEFAULT_N_PARTITIONS) -> dict[str, GateResult]`` —
  pivots commit-2's long-format DF to wide (date × signal), runs
  Bailey 2014 cohort framing (``n_trials = wide.shape[1]``), per-signal
  loop calling ``factor_passes_gates`` with the established defaults.
  Module constants imported from ``pbo_dsr`` — NOT redefined.

- ``compute_rolling_ic_12m(long_short_returns, signalname) -> float
  | None`` — observability-only Spearman lag-1 IC over the most
  recent 12 monthly observations. Pure pandas (no scipy — matches
  ``pbo_dsr.py``'s hand-rolled Beasley-Springer-Moro precedent for
  the inverse normal CDF). Never gates acceptance.

- ``filter_accepted_signals(gate_results) -> (accepted, excluded)`` —
  sorted-alphabetical split, feeds commit-5's metadata writer.

**NaN policy — LOCKED, documented in module docstring**:

Source-verified asymmetry in ``compute/validation/pbo_dsr.py``:

- ``compute_pbo`` (L187-284) is **NaN-UNSAFE** — L234 ``to_numpy
  (dtype=float)`` then L256-257 ``.mean(axis=0)`` / ``.std(axis=0)``
  then L261 ``np.argmax`` silently corrupts on any NaN cell.
- ``compute_deflated_sharpe`` (L287-385) is **NaN-SAFE** — L323 strips
  internally via ``arr = arr[~np.isnan(arr)]``.

Because ``factor_passes_gates`` accepts ``factor_returns`` and
``returns_matrix`` independently, this wrapper feeds different NaN
treatments to each side:

1. ``factor_returns = wide[sig].dropna()`` — DSR's internal strip
   handles it. No information lost.
2. ``returns_matrix = wide.fillna(0.0)`` — zero-fill, NOT mean-fill,
   NOT ``dropna(how='any')``.

Zero-fill chosen over the two alternatives:

- ``dropna(how='any')`` would decimate the 100-signal × monthly
  matrix below ``n_partitions=16`` rows once any earnings-event-only
  signal is included, collapsing the Bailey 2014 multiple-testing
  ``n_trials = cohort_size`` correction. Test #13 ``test_gate_osap_
  signals_sparse_cohort_zero_filled_not_decimated`` is the
  regression guard against accidental revert.
- ``fillna(column_mean)`` would deflate per-signal variance, inflate
  Sharpe, bias PBO toward false acceptance — silently rewarding
  sparse signals for low coverage.
- ``fillna(0.0)`` is the honest OSAP-semantic: absence-of-coverage
  for ``(signal, month)`` means "no portfolio formed / no
  information generated" → zero return is the right proxy. Bailey
  2014 PBO is rank-based within each period; zero-imputation
  symmetrically pushes coverage-gap rows toward indeterminate rank.

Acknowledged trade-off: sparse-coverage signals see their Sharpe
shrunk toward zero by the zero-fill, raising DSR rejection
probability. Cohort-fair but penalizes legitimate event-only
signals. Phase 4h scope accepts this — the Phase 5 backtest harness
(``defense-infrastructure/PLAN.md:270``) runs full walk-forward CV
per signal and supersedes this gate when it ships.

Standalone module discipline: zero imports from
``compute.features.osap_replicate`` (commit 2),
``compute.scoring.osap_blend`` (commit 3), or ``compute.main``.
Only ``compute.validation.pbo_dsr`` for primitives + constants.
Validation runs on the long-short returns DataFrame contract only.

Tests (``tests/test_validation/test_osap_validation.py``, 14
offline, exceeded plan's 13-test target):

1. ``random_noise_yields_high_pbo`` — Bailey 2014 invariant: pure-
   noise cohort → zero acceptances, all reasons in
   {'high_pbo', 'low_dsr', 'gate_failed'}, no 'insufficient_data'
2. ``low_sharpe_signal_rejected_for_dsr`` — near-zero σ signal →
   'low_dsr' or 'gate_failed' with DSR ≤ 0
3. ``strong_signal_accepted`` — monotone-drift signal beats noisy
   cohort → accepted=True, populated pbo/dsr/sharpe floats
4. ``insufficient_data`` — < ``MIN_OBS_PER_SIGNAL`` rows in cohort
   → all signals rejected with 'insufficient_data'
5. ``requested_signals_filter`` — subset filter applied pre-pivot
6. ``requested_none_uses_all_signals_in_df`` — default covers all
7. ``empty_input_returns_empty_dict`` — empty DF → {}, no crash
8. ``single_signal_cohort_rejects_with_insufficient_data`` — cohort
   size < 2 short-circuit
9. ``compute_rolling_ic_12m_known_signal`` — monotone series →
   Spearman = 1.0 ± 1e-9
10. ``compute_rolling_ic_12m_insufficient_history`` — < 13 obs →
    None
11. ``compute_rolling_ic_12m_nan_safe_with_gaps`` — NaN outside
    tail(13) window pruned cleanly; tail-13 strictly monotone
    Spearman = 1.0
12. ``filter_accepted_signals_splits_into_sorted_lists`` — sorted
    union round-trip
13. ``sparse_cohort_zero_filled_not_decimated`` — REGRESSION GUARD:
    3 of 10 signals with 25% NaN coverage at staggered offsets; all
    10 still get real PBO/DSR runs (none short-circuit) — fails
    immediately if cohort policy reverts to dropna(how='any')
14. ``module_load_constants_sourced_from_pbo_dsr`` — MIN_OBS_PER_
    SIGNAL == DEFAULT_N_PARTITIONS == 16; canonical Phase 4 gate

Verification:

- ``ruff check compute/validation/osap_validation.py tests/test_
  validation/test_osap_validation.py`` → clean
- ``pytest tests/ -m "not network"`` → 906 passed (892 prior + 14
  new)
- Import sanity: ``from compute.validation.osap_validation import
  gate_osap_signals, compute_rolling_ic_12m, filter_accepted_
  signals, GateResult, MIN_OBS_PER_SIGNAL, ROLLING_IC_WINDOW_MONTHS``
  → OK
- No schema change (still 0.9.0-phase4h from commit 1)
- No ``PHASE3_WEIGHTS`` touched
- No imports from osap_replicate / osap_blend / main — standalone
  verified

Next: commit 5 — ``compute/main.py`` wiring + ``compute/ingest/
osap.py`` kwargs (~70 LOC). Wires fetch → replicate → gate → blend
end-to-end; integration ``@network`` test against real OSAP fetch
with 20-ticker compute slice + sanity-IC on Mom1m signal.

https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU
**FINAL** commit of the Phase 4h 5-commit cluster. Wires commits 1-4
into the weekly compute orchestrator and ships the end-to-end
``@pytest.mark.network`` integration test.

Wiring (`compute/main.py`):

- New imports for the 4 OSAP layers (ingest / features / scoring /
  validation). Ruff auto-reorganized them; no logic delta from the
  reorganization.

- New OSAP pipeline block inserted after ``asof_date = now.date()``
  and before the Step-8 per-ticker loop (~100 LOC). Steps:

  1. ``fetch_osap_returns(signals=OSAP_SIGNALS_100, as_of=asof_date)``
     — single bulk fetch (cached parquet per PR #110 + commit 1
     ``signals``/``as_of`` kwargs).
  2. ``compute_long_short_returns`` — commit 2 helper.
  3. ``gate_osap_signals`` + ``filter_accepted_signals`` — commit 4
     PBO/DSR hard gate (PBO ≤ 0.5 AND DSR > 0 inherited from PR
     #60's ``factor_passes_gates`` defaults).
  4. ``compute_rolling_ic_12m`` per accepted signal — observability
     only, populates ``metadata.osap_signals_ic_12m``.
  5. ``compute_osap_signals`` over the *accepted* subset only —
     produces per-ticker proxy signal map.
  6. ``coverage_by_signal`` — populates
     ``metadata.osap_signals_coverage_pct``.
  7. ``aggregate_osap_signals`` → ``apply_osap_blend(composite,
     aggregate, weight=0.5)`` — commit 3 Path-b. STAYS OUTSIDE
     ``compute_composite()``: ``PHASE3_WEIGHTS`` sum-to-1.0
     invariant (``compute/scoring/composite.py:43-45``) is intact.

- **Top-5 ranking still uses raw ``composite_score`` per SKILL.md
  Rule 16.** ``composite_score_osap_adjusted`` is written into
  ``StockDetail.osap_blended_score`` as an observability column —
  Phase 5 ML meta-learner is where 50/50 may be retuned and a
  ranking cutover authorized.

- **Graceful degradation** — entire block wrapped in try/except:
  on any failure (network outage, ``openassetpricing`` import
  error, OSAP release shift, gate exception) all six OSAP-bearing
  fields degrade to ``None`` and weekly production continues. OSAP
  is observability-only this phase — non-essential to the static
  ranking output.

- ``StockDetail`` per-ticker writer (existing loop): two new
  fields wired — ``osap_signals=osap_signal_map.get(ticker)``
  (dict or None per universe-gap policy) and ``osap_blended_score``
  (rounded float or None when reindex misses or value is NaN).

- ``Metadata`` writer: four new fields wired —
  ``osap_signals_used`` (sorted list of accepted signals),
  ``osap_excluded_signals`` (sorted list of PBO/DSR rejects),
  ``osap_signals_ic_12m`` (per-signal rolling-12m Spearman IC,
  observability), ``osap_signals_coverage_pct`` (per-signal % of
  tickers populated). Each defaults to ``None`` (not empty list/
  dict) when the OSAP pipeline degrades, matching the
  ``| None = None`` schema contract from commit 1.

Test (`tests/test_features/test_osap_e2e_integration.py`, NEW, ~150
LOC, 1 ``@pytest.mark.network @pytest.mark.timeout(600)``):

Full ingest → replicate → gate → IC → blend chain against the real
OSAP package release, 4-signal × 20-ticker slice (kept cheap so the
e2e test stays under the 300s ceiling on shared runners). Asserts:

- Live fetch returns non-empty filtered DataFrame
- Long-short derivation produces the expected
  ``{signalname, date, ls_return}`` schema
- Every ``GateResult`` has a sensible structure (accepted ⇒
  ``rejection_reason=None`` + populated PBO/DSR floats; rejected ⇒
  one of 4 enumerated reasons)
- Mom1m rolling-12m IC is finite within [-1, 1] (not asserting > 0
  — single-window IC is noisy)
- 20-ticker proxy signal map populates the universe
- ``aggregate_osap_signals`` → ``apply_osap_blend`` round-trips and
  the output is clipped to [0, 100] over the full 20-ticker index

Test does NOT run full ``compute/main.py`` (502-ticker EDGAR fetch
exceeds CI budget); compute/main.py wiring correctness is verified
by the 906 offline unit tests across the 4 OSAP layers plus this
e2e chain that confirms data shapes match end-to-end.

Docs (atomic with the wiring per user direction):

- ``CLAUDE.md`` ``## Phase status`` — schema bump line updated to
  ``0.9.0-phase4h``, "Phase 4h in flight in PR #112" stanza added,
  test counts bumped to 906 offline + 19 ``@network``. Defense
  layer count UNCHANGED at 17 (annotate-only blend, no new veto).

- ``PHASE_STATUS.md`` Phase 4 row — Phase 4h sub-status added,
  schema bump cited, "no new veto" lock cited.

- ``SKILL.md`` schema-versions table — new row for
  ``0.9.0-phase4h`` marked "in flight in PR #112" so the table
  doesn't lie pre-merge; documents the 6 new optional fields,
  Path-b architecture lock, hard-gate criteria, NaN-policy lock,
  observability-only framing, and graceful-degradation contract.

- ``WORKFLOW.md`` — unchanged this commit; Phase 4h plan reference
  already present, post-merge tick belongs to the merge PR.

Verification:

- ``ruff check .`` → clean (auto-reorganized import block in
  compute/main.py + e2e test; no logic delta)
- ``pytest tests/ -m "not network"`` → **906 passed** (892 prior +
  14 new from commit 4) — confirms zero regression from the
  ``compute/main.py`` wiring
- ``python -m compute.output.schema_check`` → in-sync (no schema
  delta this commit — schema bump landed in commit 1)
- ``python -c "from compute import main"`` → OK; SCHEMA_VERSION
  resolves to ``0.9.0-phase4h``

**STOP** here per user instruction — awaiting audit before Mark-
Ready flip on PR #112. After Ready + merge, Section I post-merge:
Vercel MCP 4-call (deploy health) + Playwright 4-ticker matrix
including one zero-OSAP-coverage ticker as the new failure mode.
Held issue (Phase 4h.1 full per-stock signal replication) auto-
files on the merge webhook event.

https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU
@dackclup dackclup marked this pull request as ready for review May 18, 2026 11:58
@dackclup dackclup merged commit fbd1acf into main May 18, 2026
4 checks passed
@dackclup dackclup deleted the claude/resume-quantrank-phase-4.5-Zh0pO branch May 18, 2026 12:00
dackclup pushed a commit that referenced this pull request May 19, 2026
**FINAL** commit of the Phase 4h.2 Part 1 3-commit cluster (issue
#116). Populates the ``osap_gate_diagnostics`` field landed in
commit 1's schema delta + docs the full Part-1 surface so reviewers
+ future maintainers see the schema and observability contract in
one place.

**`compute/main.py` wiring** (+23 LOC):

1. Import added to ``from compute.output.schemas import (...)``:
   ``OsapGateDiagnostic`` inserted alphabetically between ``Metadata``
   and ``PillarScores`` (schemas import already used at this site,
   no new module touched).
2. Variable initialized BEFORE the OSAP try block:
   ``osap_gate_diagnostics: dict[str, OsapGateDiagnostic] = {}``.
3. Populated inside the try after
   ``gate_results = gate_osap_signals(osap_ls, requested_signals=
   config.OSAP_SIGNALS_100)`` and BEFORE
   ``filter_accepted_signals`` — captures EVERY signal that reached
   the gate (both accepted and rejected). Accepted carry
   ``rejection_reason=None``; rejected carry one of the canonical
   taxonomy values (``high_pbo`` / ``low_dsr`` / ``insufficient_data``
   / ``gate_failed``) per
   ``compute/validation/osap_validation.py::GateResult``.
4. Reset to ``{}`` in the OSAP-pipeline-failed ``except`` branch so
   graceful degradation continues to leave every osap_* field at
   ``None``.
5. Wired into the ``Metadata(...)`` constructor with the established
   ``or None`` idiom:
   ```python
   osap_gate_diagnostics=osap_gate_diagnostics or None,
   ```

**Tests** (``tests/test_output/test_schema_phase4h2.py``, +55 LOC, 2
new offline appended to commit 1's suite):

1. ``test_metadata_gate_diagnostics_round_trip_with_production_cohort_shape``
   — simulates the production observation from #116 (22 signals
   reach the gate, all rejected with a mix of rejection_reason
   values across the canonical 4-value taxonomy); asserts the
   dict-of-OsapGateDiagnostic structure survives ``model_validate``
   → ``model_dump`` → ``model_validate`` round-trip.
2. ``test_metadata_gate_diagnostics_accepted_signal_has_null_rejection_reason``
   — locks the ``rejection_reason=None`` semantics for accepted
   signals (Pydantic preserves None rather than coercing to a
   sentinel string).

**Docs** (atomic with the wiring):

- ``CLAUDE.md`` ``## Phase status`` — schema line updated to
  ``0.9.1-phase4h.2`` with the PATCH-bump framing; preserved the
  prior MINOR-bump history (`0.8.0-phase4.5f` → `0.9.0-phase4h` via
  PR #112).
- ``PHASE_STATUS.md`` row 4 — Phase 4h.2 Part 1 sub-status added;
  describes both new fields, the Part-1 / Part-2 split rationale
  ("Part 2 opens after ≥1 week of production diagnostic data
  accumulates"), and the "no new veto / no rank change" invariant.
- ``SKILL.md`` schema-versions table — new row for
  ``0.9.1-phase4h.2`` inserted above the ``0.9.0-phase4h`` row;
  cites the SKILL.md L305 PATCH-bump quote verbatim, locks the
  ``OsapGateDiagnostic`` "all 4 fields explicit = None" refinement
  in writing, and documents the set-diff helper placement decision
  (``compute/features/osap_replicate.py::signals_in_dataframe``
  per refinement #4).
- ``WORKFLOW.md`` — unchanged; no "Open items" checkbox list for
  Phase 4h.2 yet (would be created when Part 2 is scoped).

**Verification ladder** (steps 1-5 complete):

- ``ruff check .`` → clean ✅
- ``pytest tests/ -m "not network"`` → **924 passed** (911 baseline
  + 13 new across the 3-commit cluster: 7 schema + 4 helper + 2
  gate-diagnostic) ✅
- ``python -m compute.output.schema_check`` → in-sync (no new
  schema delta this commit; the snapshot already captured both
  fields + ``OsapGateDiagnostic`` from commit 1's regen) ✅
- ``python -c "from compute.main import run_weekly_compute;
  from compute.output.schemas import OsapGateDiagnostic; ..."``
  → OK ✅

Steps 6-8 next: ``git push`` → open Draft PR → ``subscribe_pr_activity``
+ STOP for user audit + Mark-Ready authorization.

**Defense layer**: unchanged at 17. **Top-5 rotation**: unchanged.
**Schema version**: ``0.9.1-phase4h.2`` (locked from commit 1).

**Cluster summary**:

| # | SHA | LOC | Tests added |
|---|---|---|---|
| 1 — schema delta | ``428729ad`` | 231 | +7 (round-trip + backward-compat) |
| 2 — silent-drop wiring | ``c7949403`` | 116 | +4 (helper unit tests) |
| 3 — gate diagnostics + docs (this) | TBD | ~86 | +2 (gate-diag round-trip) |
| **Total** | — | ~433 | **+13** |

Within the Option-β diagnostic-first scope (~250-350 LOC budget; +
docs); under the original plan's ~300 LOC estimate.

https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU
dackclup added a commit that referenced this pull request May 19, 2026
…manifest + 6 offline tests (#119)

Phase 4j scout PR — 3rd of 4 factor-library scouts (OSAP ✅ #110, JKP ✅ #114, Qlib THIS, IPCA next as 4k). Ships `pyqlib` install + Alpha158 158-feature manifest + 6 offline tests. NO production wiring; yfinance-to-Qlib BYO adapter + full Alpha158 compute on 502-ticker universe deferred to follow-on integration PR.

5 pre-plan investigations (all verified 2026-05-19):

1. PyPI package: `pyqlib` 0.9.7 (canonical). Alternative names (`qlib`, `microsoft-qlib`) return 404.
2. License: MIT via wheel METADATA classifier. No CC BY-NC complication like JKP — safe for Phase 6+ commercial roadmap.
3. Data init: `qlib.init(provider_uri=..., region="us")`. NO public US data bundle — Qlib's default covers CN A-share only; US universe is BYO via local .bin files.
4. Alpha158 surface: `qlib.contrib.data.handler.Alpha158` → 158 columns; manifest captured via `Alpha158DL.get_feature_config()[1]` and hardcoded; offline test 3 locks against upstream drift.
5. CI install footprint: ~150-180 MB net-new (mlflow / lightgbm / cvxpy / pymongo / redis / gym / jupyter + nbconvert transitives). One-time cold-start; pip wheel caching mitigates subsequent runs.

Critical scope decisions:

- NO @network test for this scout — Qlib has no remote CDN; data flow is local-bin filesystem I/O. Originally planned synthetic-OHLCV→bin→init→Alpha158 smoke test was dropped because pyqlib's PyPI wheel doesn't bundle `scripts/dump_bin.py`. Replacement: manifest-vs-runtime-introspection drift detector (stronger than the dropped test — fires on every pip install upgrade if Qlib changes the feature set).
- Module name `compute/ingest/qlib_features.py` (NOT `qlib.py`) — Python import resolution would shadow the installed `qlib` package, breaking the entire factor-library integration. Distinct module name avoids namespace collision.
- Tenacity NOT applied — Qlib's data flow is local filesystem I/O, no network retry semantics needed. First ingest module in QuantRank that diverges from the canonical `compute/ingest/osap.py:52-56` retry decorator (documented in module docstring).

Module layer (compute/ingest/qlib_features.py, ~186 LOC):
- `QLIB_DATA_CACHE: Path` constant (gitignored via parent `compute/cache/`)
- `QLIB_INSTRUMENTS_UNIVERSE = "sp500"` (custom universe for future BYO bundle)
- `ALPHA158_FEATURE_NAMES: tuple[str, ...]` — 158 hardcoded entries, asserted at module load
- `init_qlib(provider_uri=None)` — thin wrapper around `qlib.init(region="us")`; idempotent
- `fetch_alpha158_features(*, instruments, start_time, end_time)` — Alpha158 handler wrapper

Config layer (compute/config.py, +23 LOC):
- `QLIB_DATA_CACHE: Path = CACHE_DIR / "qlib" / "us_data"`
- `QLIB_DATA_MAX_AGE_DAYS: int = 31`
- `ALPHA158_FEATURE_COUNT: int = 158` (asserted against module manifest length)

Tests (6 offline; ~113 LOC):
1. `test_alpha158_feature_manifest_has_158_entries` — primary CI signal (pure cardinality + uniqueness, no Qlib runtime)
2. `test_alpha158_feature_manifest_first_5_anchor` — K-bar leading features anchor (KMID, KLEN, KMID2, KUP, KUP2)
3. `test_alpha158_feature_manifest_matches_runtime_introspection` — drift detector (manifest == `Alpha158DL.get_feature_config()[1]`)
4. `test_qlib_data_cache_constant_under_repo_cache_dir` — config sanity
5. `test_init_qlib_passes_us_region_and_provider_uri` — monkeypatch capture
6. `test_init_qlib_defaults_to_config_cache_when_no_uri` — default path verified

pyproject.toml: `pyqlib>=0.9.7,<0.10` added to `[factors]` extra (authorized in advance via plan-mode approval; pin range because Qlib's API drifts across minor versions).

Ask-first surfaces touched:
- `pyproject.toml [factors]` — extended (authorized via plan-mode)
- `ci.yml` UNCHANGED (`[dev,factors]` install already covers new dep)
- `compute-rankings.yml` UNTOUCHED per user hard constraint
- Schema triple UNTOUCHED (no schema delta this scout)

Verification (local):
- ruff check . → clean
- pytest tests/ -m "not network" → 930 passed (924 prior + 6 new)
- python -m compute.output.schema_check → in-sync
- python -c "from compute.ingest.qlib_features import ..." → OK 158
- Vercel preview ✅ READY

Defense layer unchanged at 17. Top-5 rotation unchanged (no scoring touched). Schema unchanged at 0.9.1-phase4h.2.

After this merges → 3 of 4 factor-library scouts done. Phase 4k (IPCA) is the final scout; once 4k merges → eligible for `v1.1.0-phase4` tag.

Out of scope (deferred to follow-on full Phase 4j integration PR, ~5-commit cluster):
- yfinance-to-Qlib BYO adapter (~150 LOC + custom S&P 500 instruments universe registration)
- Full Alpha158 feature compute on 502-ticker universe → 502 × N_dates × 158 DataFrame
- Per-feature cross-validation framework (PBO/DSR doesn't apply to per-stock-per-date features; walk-forward IC scoring per feature is the likely replacement)
- Schema additions (StockDetail.qlib_features + Metadata.qlib_features_used + IC observability) → schema bump 0.9.1-phase4h.2 → 0.10.0-phase4j
- compute/main.py wiring decision (observability-only? blended into composite? Phase-5 ML-meta-learner-only consumer?)

Audit history:
- Plan-audit round 1: 5 pre-plan investigations verified · MIT lock · heavy-deps disclosure approved
- Plan-audit rounds 2-5: same plan re-paste loop (session-side stuck); main session verified PR #119 unchanged at each check
- Pre-CI audit: clean (1 legitimate pivot — test #6 swapped from end-to-end smoke to manifest drift detector because pyqlib wheel lacks scripts/dump_bin.py)
- Conditional Mark-Ready authorization given on Vercel ✅ + mergeable_state clean
- Squash merged per "merge call is yours" delegation pattern (PR #112 / #114 / #118 precedent)

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2
dackclup added a commit that referenced this pull request May 20, 2026
…variants (#127)

Closes #126.

Process Hygiene Item #1 (parent epic #125). Adds Hypothesis property-
based tests as the new defense line for "untested data-shape
assumption" bugs — the class that hid the OSAP quintile/tercile
silent-drop in PR #112's CI until production cron diagnostics caught
it (subsequently fixed in PR #124 / Phase 4h.2 Part 2). If a `@given`
property over `port_count ∈ {2,3,5,10}` had existed in Phase 4h, the
hardcoded `port=10` filter would have been falsified the first time
the CI ran.

Test-addition only. No scoring / feature behavior touched. No schema
delta. No CI workflow changes.

Sub-task 1 — Hypothesis added to [dev] extra (pyproject.toml)
--------------------------------------------------------------
`hypothesis>=6.92` joins `pytest` + `ruff` in the `[dev]` optional
extra. Pure-Python dep (no C extensions); CI footprint negligible.

Sub-task 2 — Property tests for osap_replicate.py (7 tests, 394 LOC)
---------------------------------------------------------------------
New file: tests/test_features/test_osap_replicate_properties.py

7 property tests covering data-shape invariants the Phase 4h.2 Part 2
multi-port adapter must satisfy:

1. `test_compute_long_short_returns_handles_any_port_cardinality` —
   for port_count ∈ [2, 10] and n_dates ∈ [1, 12], the adapter
   produces exactly n_dates LS rows with ls_return == port_count - 1.
   THE headline property — would have caught the PR #112 bug.

2. `test_signals_dropped_no_long_short_returns_sorted_unique` —
   contract for the Metadata.osap_signals_dropped_no_long_short
   field: sorted, no duplicates, single-port signals appear,
   two-port signals don't.

3. `test_normalize_port_label_int_input_yields_2char_zfill` —
   port=int(1..10) → '01'..'10' for any input list. Idempotent.

4. `test_normalize_port_label_str_input_yields_2char_zfill` —
   mixed '1' / '01' / '10' inputs normalize to a uniform 2-char width.

5. `test_part2_accounting_invariant_under_random_partition` —
   the Phase 4h.2 Part 2 accounting equation
   (manifest = missing + dropped + gated + used) holds for any
   3-way partition of a synthetic manifest into the bucket set.
   Uses st.composite to draw disjoint partitions.

6. `test_coverage_by_signal_returns_pct_in_0_to_100` — domain
   contract for the coverage helper (0..100 percent, NOT 0..1 fraction).

7. `test_rank_signals_cross_sectional_returns_unit_interval` —
   ranks live in (0, 1] for any non-empty cross-section.

Sub-task 3 — Property tests for scoring transforms (7 tests, 340 LOC)
---------------------------------------------------------------------
New file: tests/test_scoring/test_transforms_properties.py

7 property tests covering composite (compute/scoring/composite.py)
and OSAP blend (compute/scoring/osap_blend.py) — pure-numeric
transforms whose output domains are contract-locked by the
downstream Pydantic + TypeScript schemas.

Composite tests (4):
  A. `test_compute_composite_output_bounded_0_to_100` — for any
     pillar input in [0, 100], composite ∈ [0, 100] (the writer +
     Pydantic contract)
  B. `test_compute_composite_all_50_inputs_yield_composite_50` —
     neutral-pillar input collapses to composite == 50 (catches
     accidental weight-vector drift)
  C. `test_compute_composite_neutralize_missing_imputes_nan_to_50` —
     NaN pillar inputs are imputed when neutralize_missing=True;
     all-NaN → composite == 50.0
  D. `test_compute_composite_constant_input_equals_input` —
     constant-pillar input → composite == that constant (PHASE3
     weight-sum-equals-1.0 invariant expressed as a property)

OSAP blend tests (3):
  E. `test_apply_osap_blend_output_bounded_and_nan_passthrough` —
     blend ∈ [0, 100]; NaN OSAP → composite passthrough; finite OSAP
     → interior point between composite and osap
  F. `test_aggregate_osap_signals_finite_values_in_0_to_100` —
     finite aggregate values live in [0, 100]; NaN allowed for
     universe gaps
  G. `test_apply_osap_blend_weight_zero_is_identity_on_composite` —
     weight=0 leaves composite unchanged (locks the Phase 4h
     observability-only design property + Rule 16: Top-5 still
     ranks raw composite)

Sub-task 4 — CI integration + .gitignore + docs
-------------------------------------------------
- `.gitignore` already covers `.hypothesis/` at line 50 (Python's
  default boilerplate) — no edit needed.
- CLAUDE.md ## Gotchas — 1-line note that Hypothesis is the new
  defense line for data-shape bugs (paired with example tests), with
  the `@settings(deadline=None)` anti-pattern flagged.
- CI hypothesis.errors.Flaky behaviour: default profile makes flaky
  examples fail-fast (no retry); the `pytest -m "not network"` CI
  invocation inherits this. NO `@settings(deadline=None)` used in
  this PR — slow examples surface as honest failures.

Sanity verification (NOT committed)
-----------------------------------
As part of pre-push verification I temporarily reverted the multi-
port adapter at compute/features/osap_replicate.py:143
(`agg(["min", "max"])` → `agg(["min", "min"])`) and confirmed
`test_compute_long_short_returns_handles_any_port_cardinality`
fails with "Falsifying example: port_count=2, n_dates=1". Reverted
the break before commit.

Constraints honored
-------------------
- NO modification to compute_composite() / PHASE3_WEIGHTS sum=1.0
  invariant (composite.py:43-45) — pure test-addition PR
- Rule 16: Top-5 still ranks raw composite_score; no scoring touched
- No push to main; no force-push; no --no-verify
- No workflow_dispatch trigger (compute-rankings.yml untouched)
- Schema triple untouched (no schemas.py / types.ts changes)
- NO @settings(deadline=None) — default deterministic deadline
- NO RuleBasedStateMachine (out of scope per issue #126)

Test count delta
----------------
Before: 945 passed (Phase 4h.2 Part 2 baseline)
After:  959 passed (+14 property tests across 2 new files)

Files (4 changed, +747 / 0)
----------------------------
- pyproject.toml — +6 (hypothesis>=6.92 in [dev])
- CLAUDE.md — +7 (## Gotchas note)
- tests/test_features/test_osap_replicate_properties.py — +394 NEW
- tests/test_scoring/test_transforms_properties.py — +340 NEW

Verification ladder all green
------------------------------
- ruff check . → All checks passed
- python -m pytest tests/ -m "not network" → 959 passed (1m46s)
- python -m pytest tests/test_features/test_osap_replicate_properties.py
  tests/test_scoring/test_transforms_properties.py → 14 passed (5s)
- python -m compute.output.schema_check → in sync (no schema delta)
- Sanity break-revert confirmed property test catches a regression

No regression discovered
------------------------
Property tests passed on first execution against current main
(commit 80c6641, Phase 4h.2 Part 2 already merged). No hidden bugs
surfaced beyond the 56-signal gap that PR #124 already fixed —
which itself is a good signal that the multi-port adapter handles
the [2, 10] cardinality region cleanly.

https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 20, 2026
Part of epic #125 (Item #4 of 6). Doc-only PR — no code changes,
no schema delta, no test additions.

Phase 4h timeline (2026-05-18 → 2026-05-19) demonstrated the cost of
shipping production wiring + gate logic without a diagnostic surface:

- PR #112 (Phase 4h): OSAP signal replication + PBO/DSR gate + Path-b
  blend, NO observability surface for gate decisions
- First production cron: every signal failed gate, no way to know why
- PR #118 (Phase 4h.2 Part 1): retrofit diagnostic surface
  (osap_signals_missing_from_dataset + osap_gate_diagnostics)
- Second production cron: 22 missing + 22 fail low_dsr, 56 silently
  dropped (gap that Part 1 still couldn't fully expose)
- PR #124 (Phase 4h.2 Part 2): root-cause fix (multi-port adapter)
  + osap_signals_dropped_no_long_short closing the accounting gap

The combined cost of Phase 4h.2 Parts 1 + 2 (~10 hours across 2 PRs)
would have been ~30 minutes of additional Phase 4h scope if the
diagnostic surface had shipped alongside the production wiring.

Files (3 changed, +83 LOC)
---------------------------
- WORKFLOW.md (+63 LOC) — new section "# Observability-Before-Wiring
  Pattern" inserted between the mobile playbook table and the
  "Initial Prompts" section. Includes mandatory checklist (6 items)
  + anti-pattern statement + 3 reference precedents (PR #112 bad,
  PR #118 good, PR #124 good)
- SKILL.md (+14 LOC) — new "Rule 18: Observability-before-wiring"
  appended to the Core Behavior Rules section (Rule 17 was the prior
  trailing rule). Links back to WORKFLOW.md for the mandatory
  checklist detail
- CLAUDE.md (+6 LOC) — 1 bullet added to ## Conventions referencing
  the new Rule 18 + WORKFLOW.md section

Files NOT touched (deliberately per scope)
-------------------------------------------
- PHASE_STATUS.md — chronological log; pattern guidance belongs in
  WORKFLOW.md / SKILL.md / CLAUDE.md, not in the historical tracker
- AGENTS.md — cross-tool agent doc; lookups defer to WORKFLOW.md
  by default, so a fresh duplicate would just create drift risk
- compute/ / frontend/ / tests/ — doc-only PR, no behavior change

Constraints honored
-------------------
- No code changes — pure markdown additions
- No schema delta — schema_check confirms in-sync
- No test additions — pytest count unchanged at 959
- No push to main; no force-push; no --no-verify
- No workflow_dispatch trigger (compute-rankings.yml untouched)

Verification ladder all green
------------------------------
- ruff check . → All checks passed
- python tools/check_doc_test_counts.py → exit 0 (no new hardcoded
  test-count claims introduced — the precedents reference PRs and
  hour estimates, not "N offline + M @network" drift patterns)
- python -m compute.output.schema_check → in sync (no schema touch)
- python -m pytest tests/ -m "not network" → 959 passed (unchanged)

https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 20, 2026
…ble skills (#132)

3-task housekeeping + tacit knowledge harvest. Docs/skills-only PR —
no code, no schema delta, no test additions.

Task A — SKILL.md schema-version table fixes
---------------------------------------------
Two stale "in flight" entries flipped to merged + 1 new row inserted:

- Row 0.9.0-phase4h: "(in flight in PR #112)" → "(PR #112 merged
  2026-05-19)"
- Row 0.9.1-phase4h.2: "(in flight in PR #<NEXT>)" → "(PR #118 merged
  2026-05-19)"
- NEW row 0.9.2-phase4h.2 (above 0.9.1) — PR #124 merged, multi-port
  OSAP adapter + osap_signals_dropped_no_long_short field, closing
  the 100-signal accounting equation; DSR sign-inversion deferred to
  Part 3

PHASE_STATUS.md row 4 ALSO has "Phase 4h.2 Part 2 in flight in this
PR" staleness — confirmed via grep but DELIBERATELY not updated here
per Task A explicit scope (SKILL.md only). Recommend a follow-up
phase-status-bump PR after this lands.

Task B — New worker-session-handoff skill
------------------------------------------
.claude/skills/worker-session-handoff/SKILL.md (+163 LOC). YAML
frontmatter + 5 sections:

- When to use vs inline (≤50 LOC single-file → inline; ≥2 files /
  new dep / code logic → handoff)
- Constraint lock library (8 standard locks: composite/PHASE3,
  Rule 16, Rule 18, no-merge, no force-push, no --no-verify,
  no workflow_dispatch, schema triple)
- Anti-pattern: paste-loop avoidance (single outer code-block
  fence; reference PR #123 as related-but-distinct paste-loop
  failure mode)
- Template (paste-ready, single ```` outer code block with
  language tag ` text` so inner triple-backticks pass through)
- Reference invocations + QuantRank precedents (PR #124, #127, #131)

Codifies the handoff shape that appeared verbatim across PRs #123,
#124, #127, #128, #129, #131 — user copies ONE block instead of
editing 5 template snippets per handoff.

Task C — Portable skills library (4 skills, +417 LOC)
-----------------------------------------------------
Audit step (per spec): read CLAUDE.md + AGENTS.md + SKILL.md +
WORKFLOW.md + PR descriptions of #112/#118/#124/#127/#128/#129/#131.
Identified 7 candidate patterns; classified by portability:

- ✅ scout-then-integrate (portable; vendoring pattern, no QR logic)
- ✅ observability-before-wiring (portable; gate-diagnostic pattern)
- ✅ drift-detector-manifest (portable; API surface lock pattern)
- ✅ schema-triple-lockstep (portable; Python/TS JSON contract)
- 🟡 annotate-before-veto (portable; progressive rollout — DEFERRED
   to follow-up issue, lower value vs the 4 shipped)
- 🟡 pre-plan-investigations (subsumed by scout-then-integrate's
   Phase 1 § "Pre-plan investigations" — no separate skill needed)
- 🟡 graceful-degradation-try-except (portable; error-handling
   pattern — DEFERRED to follow-up issue, the wrapper is generally
   1-line so doesn't warrant a dedicated skill)

4 shipped (each ≤ 109 LOC):
  .claude/skills/portable-scout-then-integrate/SKILL.md (99 LOC)
  .claude/skills/portable-drift-detector-manifest/SKILL.md (109 LOC)
  .claude/skills/portable-schema-triple-lockstep/SKILL.md (103 LOC)
  .claude/skills/portable-observability-before-wiring/SKILL.md (106 LOC)

Flat naming convention (`portable-<name>/SKILL.md` at depth 1 from
`.claude/skills/`) because Claude Code's skill registry doesn't
recurse into nested subdirectories per CLAUDE.md ## Conventions.
Confirmed via session reload — all 4 portable + worker-session-
handoff registered correctly.

Each portable skill has:
- YAML frontmatter (name + description + TRIGGER + SKIP)
- ## Pattern section (generic, no QR business logic)
- ## Trigger conditions + ## Skip conditions
- ## QuantRank precedent (1 paragraph, clearly labeled as precedent
  not pattern definition)

Task C constraint check:
- All portable skills core pattern descriptions are project-
  agnostic (read `.claude/skills/portable-*/SKILL.md` ## Pattern
  sections — zero references to OSAP / IPCA / pillar / Top-5
  inside the pattern body; only inside the labeled "QuantRank
  precedent" section at the bottom)
- 3 of 4 portable skills are 103-109 LOC (slightly over the
  100-LOC target — pattern + trigger + skip + precedent sections
  require ~25 LOC each, leaving ~25 LOC of unavoidable scaffold).
  The 99-LOC one (scout-then-integrate) shows the cap is achievable
  but tight.

Files (6 changed, +580 LOC, no deletions)
------------------------------------------
- SKILL.md — schema-version table fixes (Task A)
- 5 new SKILL.md files in .claude/skills/ (Tasks B + C)

Verification ladder all green
------------------------------
- ruff check . → All checks passed
- python tools/check_doc_test_counts.py → exit 0
- python tools/check_branch_collisions.py "skill" "portable" →
  expected ⚠️ on #131 (own adjacent work, not a duplicate)
- python -m compute.output.schema_check → in sync (no schema touch)
- python -m pytest tests/ -m "not network" → 959 passed
  (unchanged; tools/ + .claude/skills/ aren't imported by tests)
- Claude Code skill registry pick-up verified via session reload —
  all 5 new skills (worker-session-handoff + 4 portable-*) appear
  in the available-skills list

Constraints honored
-------------------
- No touch to compute/ / frontend/ / tests/
- No touch to PHASE_STATUS.md / WORKFLOW.md (Task A scope =
  SKILL.md only; PHASE_STATUS.md staleness flagged for follow-up)
- No push to main; no force-push; no --no-verify
- No workflow_dispatch trigger
- Task C portable skills are project-agnostic in their pattern
  description (QR refs confined to labeled "precedent" sections)

Follow-up issue (to file post-merge)
------------------------------------
Title: "Portable Skills Library — extract remaining tacit patterns"
- annotate-before-veto (progressive rule rollout)
- graceful-degradation-try-except (1-line wrapper guidance)
- pre-plan-investigations as standalone (currently subsumed)
- Anything else surfaced by future PR descriptions

https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 20, 2026
…sk C.1 recovery) (#135)

* docs(skills): SKILL.md schema bump + worker-session-handoff + 4 portable skills

3-task housekeeping + tacit knowledge harvest. Docs/skills-only PR —
no code, no schema delta, no test additions.

Task A — SKILL.md schema-version table fixes
---------------------------------------------
Two stale "in flight" entries flipped to merged + 1 new row inserted:

- Row 0.9.0-phase4h: "(in flight in PR #112)" → "(PR #112 merged
  2026-05-19)"
- Row 0.9.1-phase4h.2: "(in flight in PR #<NEXT>)" → "(PR #118 merged
  2026-05-19)"
- NEW row 0.9.2-phase4h.2 (above 0.9.1) — PR #124 merged, multi-port
  OSAP adapter + osap_signals_dropped_no_long_short field, closing
  the 100-signal accounting equation; DSR sign-inversion deferred to
  Part 3

PHASE_STATUS.md row 4 ALSO has "Phase 4h.2 Part 2 in flight in this
PR" staleness — confirmed via grep but DELIBERATELY not updated here
per Task A explicit scope (SKILL.md only). Recommend a follow-up
phase-status-bump PR after this lands.

Task B — New worker-session-handoff skill
------------------------------------------
.claude/skills/worker-session-handoff/SKILL.md (+163 LOC). YAML
frontmatter + 5 sections:

- When to use vs inline (≤50 LOC single-file → inline; ≥2 files /
  new dep / code logic → handoff)
- Constraint lock library (8 standard locks: composite/PHASE3,
  Rule 16, Rule 18, no-merge, no force-push, no --no-verify,
  no workflow_dispatch, schema triple)
- Anti-pattern: paste-loop avoidance (single outer code-block
  fence; reference PR #123 as related-but-distinct paste-loop
  failure mode)
- Template (paste-ready, single ```` outer code block with
  language tag ` text` so inner triple-backticks pass through)
- Reference invocations + QuantRank precedents (PR #124, #127, #131)

Codifies the handoff shape that appeared verbatim across PRs #123,
#124, #127, #128, #129, #131 — user copies ONE block instead of
editing 5 template snippets per handoff.

Task C — Portable skills library (4 skills, +417 LOC)
-----------------------------------------------------
Audit step (per spec): read CLAUDE.md + AGENTS.md + SKILL.md +
WORKFLOW.md + PR descriptions of #112/#118/#124/#127/#128/#129/#131.
Identified 7 candidate patterns; classified by portability:

- ✅ scout-then-integrate (portable; vendoring pattern, no QR logic)
- ✅ observability-before-wiring (portable; gate-diagnostic pattern)
- ✅ drift-detector-manifest (portable; API surface lock pattern)
- ✅ schema-triple-lockstep (portable; Python/TS JSON contract)
- 🟡 annotate-before-veto (portable; progressive rollout — DEFERRED
   to follow-up issue, lower value vs the 4 shipped)
- 🟡 pre-plan-investigations (subsumed by scout-then-integrate's
   Phase 1 § "Pre-plan investigations" — no separate skill needed)
- 🟡 graceful-degradation-try-except (portable; error-handling
   pattern — DEFERRED to follow-up issue, the wrapper is generally
   1-line so doesn't warrant a dedicated skill)

4 shipped (each ≤ 109 LOC):
  .claude/skills/portable-scout-then-integrate/SKILL.md (99 LOC)
  .claude/skills/portable-drift-detector-manifest/SKILL.md (109 LOC)
  .claude/skills/portable-schema-triple-lockstep/SKILL.md (103 LOC)
  .claude/skills/portable-observability-before-wiring/SKILL.md (106 LOC)

Flat naming convention (`portable-<name>/SKILL.md` at depth 1 from
`.claude/skills/`) because Claude Code's skill registry doesn't
recurse into nested subdirectories per CLAUDE.md ## Conventions.
Confirmed via session reload — all 4 portable + worker-session-
handoff registered correctly.

Each portable skill has:
- YAML frontmatter (name + description + TRIGGER + SKIP)
- ## Pattern section (generic, no QR business logic)
- ## Trigger conditions + ## Skip conditions
- ## QuantRank precedent (1 paragraph, clearly labeled as precedent
  not pattern definition)

Task C constraint check:
- All portable skills core pattern descriptions are project-
  agnostic (read `.claude/skills/portable-*/SKILL.md` ## Pattern
  sections — zero references to OSAP / IPCA / pillar / Top-5
  inside the pattern body; only inside the labeled "QuantRank
  precedent" section at the bottom)
- 3 of 4 portable skills are 103-109 LOC (slightly over the
  100-LOC target — pattern + trigger + skip + precedent sections
  require ~25 LOC each, leaving ~25 LOC of unavoidable scaffold).
  The 99-LOC one (scout-then-integrate) shows the cap is achievable
  but tight.

Files (6 changed, +580 LOC, no deletions)
------------------------------------------
- SKILL.md — schema-version table fixes (Task A)
- 5 new SKILL.md files in .claude/skills/ (Tasks B + C)

Verification ladder all green
------------------------------
- ruff check . → All checks passed
- python tools/check_doc_test_counts.py → exit 0
- python tools/check_branch_collisions.py "skill" "portable" →
  expected ⚠️ on #131 (own adjacent work, not a duplicate)
- python -m compute.output.schema_check → in sync (no schema touch)
- python -m pytest tests/ -m "not network" → 959 passed
  (unchanged; tools/ + .claude/skills/ aren't imported by tests)
- Claude Code skill registry pick-up verified via session reload —
  all 5 new skills (worker-session-handoff + 4 portable-*) appear
  in the available-skills list

Constraints honored
-------------------
- No touch to compute/ / frontend/ / tests/
- No touch to PHASE_STATUS.md / WORKFLOW.md (Task A scope =
  SKILL.md only; PHASE_STATUS.md staleness flagged for follow-up)
- No push to main; no force-push; no --no-verify
- No workflow_dispatch trigger
- Task C portable skills are project-agnostic in their pattern
  description (QR refs confined to labeled "precedent" sections)

Follow-up issue (to file post-merge)
------------------------------------
Title: "Portable Skills Library — extract remaining tacit patterns"
- annotate-before-veto (progressive rule rollout)
- graceful-degradation-try-except (1-line wrapper guidance)
- pre-plan-investigations as standalone (currently subsumed)
- Anything else surfaced by future PR descriptions

https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU

* docs(skills): Vendor karpathy-guidelines (Task C.1 recovery) + THIRD_PARTY_NOTICES.md

Recovers Task C.1 from the original handoff that was silent-dropped in
the prior PR #132 commit (50da720). The handoff explicitly named
"Vendor karpathy-guidelines (1 skill, ~70 LOC)" as part of the portable
skills library; the auditor session caught the omission and authorized
this follow-up commit on the existing branch.

Files (2 new, +138 LOC)
------------------------
- .claude/skills/portable-karpathy-guidelines/SKILL.md (+82 LOC) —
  vendored content of upstream skills/karpathy-guidelines/SKILL.md
  (67 LOC, byte-for-byte preserved) + 15-line appended attribution
  block referencing the upstream source, commit SHA, and the
  Karpathy tweet that motivated the guidelines.

- THIRD_PARTY_NOTICES.md (+56 LOC, NEW at repo root) — third-party
  license disclosures. Section "karpathy-guidelines (Claude Code
  skill)" carries source URL, license declaration, vendored path,
  vendored date, upstream commit SHA, upstream first-commit date,
  and the full standard MIT License text with copyright attributed
  to "multica-ai contributors" (upstream has no individual copyright
  line and no standalone LICENSE file; the `license: MIT` claim
  appears in upstream README.md § License and each skill's YAML
  frontmatter).

Upstream provenance
-------------------
- Source: https://github.com/multica-ai/andrej-karpathy-skills
- Upstream HEAD SHA at vendoring: 2c606141936f1eeef17fa3043a72095b4765b9c2
- Upstream first commit: 2026-01-27
- Vendored date: 2026-05-20
- License: MIT (declared)

Verbatim content preserved
--------------------------
`diff /tmp/karpathy-src/skills/karpathy-guidelines/SKILL.md
.claude/skills/portable-karpathy-guidelines/SKILL.md` shows ONLY
the 15-line appended attribution block at lines 68-82. The upstream
67-line content (YAML frontmatter + "Karpathy Guidelines" heading +
the 4 principles) is byte-for-byte unchanged. Per the spec
constraint: "เก็บ 4 principles verbatim. แก้ได้แค่ 'เพิ่ม' attribution
block ท้ายไฟล์".

License-disclosure caveat
-------------------------
Upstream `multica-ai/andrej-karpathy-skills` declares MIT via README
+ YAML frontmatter but does NOT ship a standalone LICENSE file. The
`THIRD_PARTY_NOTICES.md` entry includes the standard MIT License
template with copyright attributed to the GitHub org ("multica-ai
contributors"), matching the principle that an MIT declaration
without a formal copyright line still licenses to the redistributor;
the attribution is conservative.

Verification ladder all green
------------------------------
- ruff check . → All checks passed
- python tools/check_doc_test_counts.py → exit 0 (no test-count
  drift introduced by this commit)
- python tools/check_branch_collisions.py "karpathy" → no scope
  collisions detected
- python -m compute.output.schema_check → in sync (no schema touch)
- python -m pytest tests/ -m "not network" → 959 passed (unchanged;
  .claude/skills/ + THIRD_PARTY_NOTICES.md aren't imported by tests)
- Skill registry pickup verified via session reload —
  `portable-karpathy-guidelines` appears in the available-skills list
  with the upstream description verbatim

Constraints honored
-------------------
- No squash / amend of the prior 50da720 commit — this is a fresh
  commit pushed on top of the existing branch (per spec
  "ห้าม squash old commit")
- No touch to the 4 already-shipped portable skills in 50da720
- No touch to compute/ / frontend/ / tests/
- No push to main; no force-push; no --no-verify
- No workflow_dispatch trigger
- Karpathy SKILL.md upstream content preserved verbatim; only the
  attribution block appended below the original content

PR description update will follow as a separate `gh pr edit` /
MCP `update_pull_request` call so the new "License Compliance"
section + the audit-table row for karpathy-guidelines land in the
PR body.

https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU

---------

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 20, 2026
…136)

Vendoring + cleanup PR. Docs/skills-only — no code, no schema delta,
no test additions.

Task A — Vendor 8 mattpocock/skills selections
----------------------------------------------
Upstream: https://github.com/mattpocock/skills (MIT, Copyright (c)
2026 Matt Pocock). Vendored at upstream HEAD
d54c497aa94400a496d3f2c38be10fa5f284c5a9 (2026-05-20). Selection
criterion: engineering-core skills applicable to QuantRank's Python +
TypeScript stack and PR-iteration workflow.

Vendored 8 of upstream's 18 skills (flat naming under .claude/skills/
matches the portable-* convention from PR #132):

  .claude/skills/mattpocock-diagnose/
      SKILL.md (128 LOC = 117 upstream + 11 attribution)
      scripts/hitl-loop.template.sh (verbatim)
  .claude/skills/mattpocock-tdd/
      SKILL.md (120 LOC = 109 + 11)
      + 5 sidecars: deep-modules / interface-design / mocking /
        refactoring / tests (.md, verbatim)
  .claude/skills/mattpocock-to-issues/
      SKILL.md (94 LOC = 83 + 11)
  .claude/skills/mattpocock-to-prd/
      SKILL.md (87 LOC = 76 + 11)
  .claude/skills/mattpocock-setup-harness/
      SKILL.md (132 LOC = 121 + 11; disable-model-invocation: true)
      + 5 sidecars: domain / issue-tracker-github / issue-tracker-
        gitlab / issue-tracker-local / triage-labels (.md, verbatim)
  .claude/skills/mattpocock-handoff/
      SKILL.md (26 LOC = 15 + 11)
  .claude/skills/mattpocock-write-a-skill/
      SKILL.md (128 LOC = 117 + 11)
  .claude/skills/mattpocock-grill-me/
      SKILL.md (21 LOC = 10 + 11)

Total: 19 new files, ~860 LOC of upstream content + 88 LOC
attribution blocks. Each vendored SKILL.md carries upstream content
byte-for-byte plus an 11-line appended "## License + Attribution"
block referencing the upstream SHA + repo's THIRD_PARTY_NOTICES.md.
Sidecars (referenced via ./domain.md style links) vendored verbatim.

Skipped 10 upstream skills:
- caveman / scaffold-exercises / setup-pre-commit / migrate-to-
  shoehorn / git-guardrails-claude-code (TypeScript-specific or
  redundant with QuantRank's existing CI guardrails)
- grill-with-docs / improve-codebase-architecture / triage /
  prototype / zoom-out (lower-priority for current QR workflow)
- all in-progress/ deprecated/ personal/ entries

Registry pickup verified — 7 of 8 mattpocock skills appear in the
available-skills list (mattpocock-diagnose / -tdd / -to-issues /
-to-prd / -handoff / -write-a-skill / -grill-me); mattpocock-setup-
harness has upstream `disable-model-invocation: true` (user-invoked
only, not model-invoked).

Task B — Remove 11 unused skills
---------------------------------
QuantRank is a static-site finance dashboard — Office docs / Slack
GIFs / art-generation / branded-design tooling don't apply. Deleted:

  algorithmic-art          (p5.js generative art)
  brand-guidelines         (Anthropic brand colors)
  canvas-design            (poster / PDF visual art, 5.6 MB of fonts)
  docx                     (Word document tooling)
  internal-comms           (corporate status reports)
  pdf                      (PDF form filling / OCR)
  pptx                     (PowerPoint deck generation)
  slack-gif-creator        (Slack-optimized animated GIFs)
  theme-factory            (artifact theme presets)
  web-artifacts-builder    (claude.ai shadcn artifact builder)
  xlsx                     (Excel spreadsheet tooling)

Total: 306 files deleted (~80,000 LOC dropped, dominated by
embedded Office XSD schemas, fonts, and validators). Reduces
clone size by ~10 MB.

Kept (still relevant for QuantRank work):
- mcp-builder (Phase 5 ML may surface an MCP server)
- claude-api (Phase 5 ML SDK work)
- skill-creator (maintainer-only)
- webapp-testing (Playwright Section I verification)
- frontend-design + frontend-design-system (UI work)
- doc-coauthoring (PR descriptions, plans)

Task C — Docs lockstep
-----------------------
- CLAUDE.md row 33: skill count "24 invocation-triggerable
  skills (7 QuantRank + 17 Anthropic vendored)" → "31
  invocation-triggerable skills (12 QuantRank operational + 4
  QR-origin portable + 6 Anthropic vendored + 9 external MIT
  vendored — Karpathy + 8 mattpocock)"
- THIRD_PARTY_NOTICES.md: new "mattpocock-skills" section appended
  after the existing karpathy-guidelines section. Carries source
  URL, license, upstream SHA, vendored-skill list, full MIT
  License text verbatim (Copyright (c) 2026 Matt Pocock per
  upstream LICENSE).

Verification ladder
-------------------
- ruff check . → All checks passed
- python -m compute.output.schema_check → Schema snapshot in sync
- python tools/check_doc_test_counts.py → exit 0
- python tools/check_branch_collisions.py "skill" "mattpocock" →
  3 historical false positives (PRs #110/#112/#114 — JKP/OSAP
  scouts whose commit messages contained "skill"; unrelated)
- pytest tests/ -m "not network" → not run locally (sandbox missing
  pandas); CI will verify. Changes are docs/skills-only — zero
  Python source touched.
- Skill registry pickup verified via session reload — 7 of 8
  mattpocock-* + all 11 removed skills no longer appear; the
  remaining mattpocock-setup-harness is correctly hidden by
  its upstream `disable-model-invocation: true` frontmatter.

Constraints honored
-------------------
- No touch to compute/ / frontend/ / tests/
- No touch to PHASE_STATUS.md / WORKFLOW.md (out of scope)
- mattpocock SKILL.md content preserved byte-for-byte; only
  the 11-line attribution block appended below upstream content
- Sidecars vendored verbatim (referenced by SKILL.md via
  ./<sidecar>.md links — links continue to resolve in the
  vendored layout)
- No push to main; no force-push; no --no-verify
- No workflow_dispatch trigger

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 20, 2026
…4 staleness (#139)

Closes #133. Docs/skills-only PR.

Task A — Portable skills library final 2 (closes #133)
------------------------------------------------------
Extracts the last 2 deferred-but-tracked patterns from epic #125:

- .claude/skills/portable-annotate-before-veto/SKILL.md (108 LOC):
  Progressive-rollout pattern for defense / risk flags. Ship as
  annotate FIRST, promote to veto only after ≥ 1 production cron of
  observation + threshold calibration + cohort-acceptance check.
  Forcing precedent: Phase 4.5 cluster (loss_avoidance_pattern at 0%
  fire rate would've been a no-op or hotfix candidate as a veto;
  annotate made it observable).

- .claude/skills/portable-graceful-degradation-try-except/SKILL.md
  (115 LOC): Wrap every external-data integration call site in a
  try/except that sets ALL related output fields to None on failure
  + writes a structured log line + sets a per-integration status
  Metadata field. 3-rule contract: no partial state, no log
  swallowing, downstream-aware. Forcing precedent: OSAP integration
  in compute/main.py (PRs #112#118#124).

Both skills follow the established portable-* convention from PR
#132 (YAML frontmatter + Pattern + Trigger + Skip + QuantRank
precedent section). Each pattern section is project-agnostic;
QuantRank refs confined to the labeled "QuantRank precedent"
sections at the bottom.

Task B — PHASE_STATUS.md row 4 staleness fix
---------------------------------------------
PHASE_STATUS.md row 4 said "Phase 4h.2 Part 2 in flight in this PR"
since PR #124's prep work. PR #124 merged 2026-05-19 (commit
sequence visible in main: ...124...118...112...). Updated to
"Phase 4h.2 Part 2 merged via PR #124 (2026-05-19)" — the rest of
the row 4 text (multi-port OSAP adapter description, IC-decay
deferral note) stays unchanged.

This was flagged in PR #132 body and tracked as a small follow-up.
No other PHASE_STATUS.md edits — row 4 is the only stale entry.

Task C — Docs lockstep
-----------------------
CLAUDE.md row 33 skill count: 35 → 37 (QR-origin portable category
4 → 6, total reflects the 2 new skills landed here). Categorisation
unchanged otherwise; 9arm license-pending caveat still flagged with
cross-reference to issue #137.

Skill inventory after this PR (37 total)
-----------------------------------------
- QuantRank operational: 12
- QR-origin portable extract: 6 (was 4; +annotate-before-veto +
  graceful-degradation-try-except)
- Anthropic vendored: 6
- External MIT vendored: 9 (Karpathy + 8 mattpocock, unchanged)
- External license-pending vendored: 4 (9arm, unchanged)

Verification ladder
-------------------
- ruff check . → All checks passed
- python -m compute.output.schema_check → Schema snapshot in sync
- python tools/check_doc_test_counts.py → exit 0
- pytest tests/ -m "not network" → not run locally (sandbox missing
  pandas); CI will verify. Changes are docs/skills-only.
- Skill registry pickup verified via session reload — both
  portable-annotate-before-veto and
  portable-graceful-degradation-try-except register with full
  YAML-frontmatter descriptions.

Constraints honored
-------------------
- No touch to compute/ / frontend/ / tests/
- No touch to WORKFLOW.md (out of scope; could file a future
  follow-up if WORKFLOW.md needs to cross-reference the two new
  portable skills)
- No squash / amend of prior commits
- No push to main; no force-push; no --no-verify
- No workflow_dispatch trigger
- 2 new portable skills pattern descriptions are project-agnostic;
  QR refs only in labeled "precedent" sections

Epic #125 status after this PR
-------------------------------
- #130 (quarterly cohort-threshold review tracker) — recurring,
  unchanged
- #133 (portable skills library remaining) — CLOSED by this PR
- #137 (9arm-skills license clarification) — external action,
  waiting on user to file upstream issue at thananon/9arm-skills

Epic #125 Item 3 (Pre-merge production simulation) remains the
only substantive open scope. PHASE_STATUS.md row 4 staleness was
the last housekeeping task.

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 20, 2026
…PR A) (#141)

First PR in the multi-PR .md optimization sequence (Option D scope —
yกเครื่อง). PR A is the low-risk baseline: fixes 2 broken skill
frontmatters that prevent dispatch + drift-fixes 4 stale facts in
agent docs.

Critical YAML fix:
- branch-collision-check/SKILL.md and pr-quality-gate/SKILL.md had
  multi-line `description:` plain-scalar frontmatter that PyYAML
  (and Claude Code's skill loader) couldn't parse because lines
  contain `#123` / `#X` issue references after whitespace — YAML
  treats ` #` as a comment marker, so everything after the first
  comment-trigger got eaten and the loader fell back to displaying
  `name: name` in the available-skills list. Both skills were
  effectively undispatchable from any session.
- Fix: change `description:` to `description: >` (folded block
  scalar) so newlines become spaces and `#` mid-content is treated
  as literal text. Verified live in this session — system reminder
  now shows the full TRIGGER/SKIP descriptions for both.

Stale-fact pass:
- .claude/skills/README.md L14-16: "27 invocation-triggerable
  skills" → references CLAUDE.md as the canonical count (38) to
  prevent future drift. Future top-level skill add/remove only
  needs to bump CLAUDE.md §Layout, not three files.
- AGENTS.md L104: ".claude/skills/ # 24 loaded skills" → 38.
- AGENTS.md L287: "Schema version: 0.8.0-phase4.5f" → 0.9.2-phase4h.2
  (3 versions behind). Now references SKILL.md schema-version
  table for full history.
- CLAUDE.md L181-192 (§Phase status): "Current schema 0.9.1-phase4h.2
  ... Phase 4h in flight in PR #112" → 0.9.2-phase4h.2 + Phase 4h
  shipped (Parts 1+2 done via #112/#118/#124).
- CLAUDE.md + AGENTS.md §Phase status: "Epic #125 Item 3 in flight
  via PR #140" → "PR 1 of 2 shipped" at commit a52aa2d; PR 2
  remaining.

CLAUDE.md + AGENTS.md edit ships per the lockstep convention. No
code touched, no schema touched — pre-merge-prod-sim.yml won't
trigger (paths compute/scoring + compute/features unaffected).

Next in optimization sequence: PR B (CLAUDE.md token diet) — TBD
after user reviews this one.

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants