Skip to content

test(features): Add Hypothesis property-based tests for data-shape invariants (#126)#127

Merged
dackclup merged 1 commit into
mainfrom
process-hygiene-1-hypothesis-property-tests
May 20, 2026
Merged

test(features): Add Hypothesis property-based tests for data-shape invariants (#126)#127
dackclup merged 1 commit into
mainfrom
process-hygiene-1-hypothesis-property-tests

Conversation

@dackclup
Copy link
Copy Markdown
Owner

Closes #126.

Summary

Process Hygiene Item #1 (parent epic #125). Adds Hypothesis property-based tests as the new defense line for "untested data-shape assumption" bugs — the class that hid the OSAP quintile/tercile silent-drop in PR #112's CI until production cron diagnostics caught it (subsequently fixed in PR #124 / Phase 4h.2 Part 2).

If a @given property over port_count ∈ {2, 3, 5, 10} had existed in Phase 4h, the hardcoded port=10 filter would have been falsified the first time CI ran.

Test-addition only. No scoring / feature behavior touched. No schema delta. No CI workflow changes.

Test count before / after

Offline tests
Before (main @ 80c6641e, post Phase 4h.2 Part 2) 945
After this PR 959 (+14 property tests)

Property tests landed (14)

Sub-task 2 — osap_replicate.py (7 tests, 394 LOC)

# Property test Catches
1 test_compute_long_short_returns_handles_any_port_cardinality The headline. For port_count ∈ [2, 10], adapter produces LS rows. Would have caught PR #112's bug.
2 test_signals_dropped_no_long_short_returns_sorted_unique Metadata field contract drift
3 test_normalize_port_label_int_input_yields_2char_zfill int port idempotence + zfill
4 test_normalize_port_label_str_input_yields_2char_zfill mixed '1' / '01' / '10' → uniform width
5 test_part2_accounting_invariant_under_random_partition accounting equation manifest = missing + dropped + gated + used holds for any partition
6 test_coverage_by_signal_returns_pct_in_0_to_100 0..100 percent (NOT 0..1 fraction) confusion
7 test_rank_signals_cross_sectional_returns_unit_interval ranks ∈ (0, 1]

Sub-task 3 — scoring transforms (7 tests, 340 LOC)

# Property test Module Catches
A test_compute_composite_output_bounded_0_to_100 composite writer + Pydantic contract
B test_compute_composite_all_50_inputs_yield_composite_50 composite accidental weight-vector drift
C test_compute_composite_neutralize_missing_imputes_nan_to_50 composite NaN imputation regression
D test_compute_composite_constant_input_equals_input composite PHASE3_WEIGHTS sum-to-1 invariant
E test_apply_osap_blend_output_bounded_and_nan_passthrough osap_blend bound + NaN passthrough + interior-point property
F test_aggregate_osap_signals_finite_values_in_0_to_100 osap_blend rank × 100 multiplication
G test_apply_osap_blend_weight_zero_is_identity_on_composite osap_blend Rule 16: weight=0 leaves composite unchanged (Phase 4h observability-only lock)

Sub-task 4 — CI integration + docs

  • .gitignore — already covers .hypothesis/ at L50 (Python default). No edit needed.
  • CLAUDE.md ## Gotchas — 1-line note that Hypothesis is the new defense line, with the @settings(deadline=None) anti-pattern flagged.
  • CI flaky behaviour — default profile makes flaky examples fail-fast (no retry); pytest -m "not network" inherits this. No @settings(deadline=None) used in any of the 14 properties.

Sanity verification (NOT committed)

Temporarily reverted compute/features/osap_replicate.py:143 (agg(["min", "max"])agg(["min", "min"])) and confirmed Property 1 fails with:

Falsifying example: test_compute_long_short_returns_handles_any_port_cardinality(
    port_count=2,
    n_dates=1,
)

Reverted the break before commit. Working tree matches main except for the 4 staged files.

No regression discovered

Property tests passed on first execution against current main (Phase 4h.2 Part 2 already merged). The fact that nothing falsified in 14 properties × ~100 examples is itself a quality signal — the multi-port adapter handles the [2, 10] cardinality region cleanly, the composite weight invariant holds, and the OSAP blend domain contract isn't violated under any (composite, osap, weight) triple in the unit interval.

Constraints honored

  • ✅ NO modification to compute_composite / PHASE3_WEIGHTS sum=1.0 invariant — pure test-addition PR
  • ✅ Rule 16: Top-5 still ranks raw composite_score; no scoring touched
  • ✅ No push to main; no force-push; no --no-verify
  • ✅ No workflow_dispatch trigger (compute-rankings.yml untouched)
  • ✅ Schema triple untouched (no schemas.py / types.ts changes)
  • ✅ NO @settings(deadline=None) — default deterministic deadline
  • ✅ NO RuleBasedStateMachine (out of scope per issue Process hygiene #1 — Add Hypothesis property-based tests for data-shape invariants #126)

Files (4 changed, +747 / 0)

File Change
pyproject.toml hypothesis>=6.92 in [dev] (+6)
CLAUDE.md ## Gotchas note (+7)
tests/test_features/test_osap_replicate_properties.py NEW — 7 property tests (+394)
tests/test_scoring/test_transforms_properties.py NEW — 7 property tests (+340)

Test plan

  • ruff check . → All checks passed
  • python -m pytest tests/ -m "not network" → 959 passed (1m46s)
  • python -m pytest tests/test_features/test_osap_replicate_properties.py tests/test_scoring/test_transforms_properties.py → 14 passed (5s)
  • python -m compute.output.schema_check → in sync (no schema delta)
  • Sanity break-revert confirmed property test catches a regression (Falsifying example: port_count=2, n_dates=1)
  • CI green on process-hygiene-1-hypothesis-property-tests
  • User audit + Mark-Ready authorization

https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU


Generated by Claude Code

…variants (#126)

Closes #126.

Process Hygiene Item #1 (parent epic #125). Adds Hypothesis property-
based tests as the new defense line for "untested data-shape
assumption" bugs — the class that hid the OSAP quintile/tercile
silent-drop in PR #112's CI until production cron diagnostics caught
it (subsequently fixed in PR #124 / Phase 4h.2 Part 2). If a `@given`
property over `port_count ∈ {2,3,5,10}` had existed in Phase 4h, the
hardcoded `port=10` filter would have been falsified the first time
the CI ran.

Test-addition only. No scoring / feature behavior touched. No schema
delta. No CI workflow changes.

Sub-task 1 — Hypothesis added to [dev] extra (pyproject.toml)
--------------------------------------------------------------
`hypothesis>=6.92` joins `pytest` + `ruff` in the `[dev]` optional
extra. Pure-Python dep (no C extensions); CI footprint negligible.

Sub-task 2 — Property tests for osap_replicate.py (7 tests, 394 LOC)
---------------------------------------------------------------------
New file: tests/test_features/test_osap_replicate_properties.py

7 property tests covering data-shape invariants the Phase 4h.2 Part 2
multi-port adapter must satisfy:

1. `test_compute_long_short_returns_handles_any_port_cardinality` —
   for port_count ∈ [2, 10] and n_dates ∈ [1, 12], the adapter
   produces exactly n_dates LS rows with ls_return == port_count - 1.
   THE headline property — would have caught the PR #112 bug.

2. `test_signals_dropped_no_long_short_returns_sorted_unique` —
   contract for the Metadata.osap_signals_dropped_no_long_short
   field: sorted, no duplicates, single-port signals appear,
   two-port signals don't.

3. `test_normalize_port_label_int_input_yields_2char_zfill` —
   port=int(1..10) → '01'..'10' for any input list. Idempotent.

4. `test_normalize_port_label_str_input_yields_2char_zfill` —
   mixed '1' / '01' / '10' inputs normalize to a uniform 2-char width.

5. `test_part2_accounting_invariant_under_random_partition` —
   the Phase 4h.2 Part 2 accounting equation
   (manifest = missing + dropped + gated + used) holds for any
   3-way partition of a synthetic manifest into the bucket set.
   Uses st.composite to draw disjoint partitions.

6. `test_coverage_by_signal_returns_pct_in_0_to_100` — domain
   contract for the coverage helper (0..100 percent, NOT 0..1 fraction).

7. `test_rank_signals_cross_sectional_returns_unit_interval` —
   ranks live in (0, 1] for any non-empty cross-section.

Sub-task 3 — Property tests for scoring transforms (7 tests, 340 LOC)
---------------------------------------------------------------------
New file: tests/test_scoring/test_transforms_properties.py

7 property tests covering composite (compute/scoring/composite.py)
and OSAP blend (compute/scoring/osap_blend.py) — pure-numeric
transforms whose output domains are contract-locked by the
downstream Pydantic + TypeScript schemas.

Composite tests (4):
  A. `test_compute_composite_output_bounded_0_to_100` — for any
     pillar input in [0, 100], composite ∈ [0, 100] (the writer +
     Pydantic contract)
  B. `test_compute_composite_all_50_inputs_yield_composite_50` —
     neutral-pillar input collapses to composite == 50 (catches
     accidental weight-vector drift)
  C. `test_compute_composite_neutralize_missing_imputes_nan_to_50` —
     NaN pillar inputs are imputed when neutralize_missing=True;
     all-NaN → composite == 50.0
  D. `test_compute_composite_constant_input_equals_input` —
     constant-pillar input → composite == that constant (PHASE3
     weight-sum-equals-1.0 invariant expressed as a property)

OSAP blend tests (3):
  E. `test_apply_osap_blend_output_bounded_and_nan_passthrough` —
     blend ∈ [0, 100]; NaN OSAP → composite passthrough; finite OSAP
     → interior point between composite and osap
  F. `test_aggregate_osap_signals_finite_values_in_0_to_100` —
     finite aggregate values live in [0, 100]; NaN allowed for
     universe gaps
  G. `test_apply_osap_blend_weight_zero_is_identity_on_composite` —
     weight=0 leaves composite unchanged (locks the Phase 4h
     observability-only design property + Rule 16: Top-5 still
     ranks raw composite)

Sub-task 4 — CI integration + .gitignore + docs
-------------------------------------------------
- `.gitignore` already covers `.hypothesis/` at line 50 (Python's
  default boilerplate) — no edit needed.
- CLAUDE.md ## Gotchas — 1-line note that Hypothesis is the new
  defense line for data-shape bugs (paired with example tests), with
  the `@settings(deadline=None)` anti-pattern flagged.
- CI hypothesis.errors.Flaky behaviour: default profile makes flaky
  examples fail-fast (no retry); the `pytest -m "not network"` CI
  invocation inherits this. NO `@settings(deadline=None)` used in
  this PR — slow examples surface as honest failures.

Sanity verification (NOT committed)
-----------------------------------
As part of pre-push verification I temporarily reverted the multi-
port adapter at compute/features/osap_replicate.py:143
(`agg(["min", "max"])` → `agg(["min", "min"])`) and confirmed
`test_compute_long_short_returns_handles_any_port_cardinality`
fails with "Falsifying example: port_count=2, n_dates=1". Reverted
the break before commit.

Constraints honored
-------------------
- NO modification to compute_composite() / PHASE3_WEIGHTS sum=1.0
  invariant (composite.py:43-45) — pure test-addition PR
- Rule 16: Top-5 still ranks raw composite_score; no scoring touched
- No push to main; no force-push; no --no-verify
- No workflow_dispatch trigger (compute-rankings.yml untouched)
- Schema triple untouched (no schemas.py / types.ts changes)
- NO @settings(deadline=None) — default deterministic deadline
- NO RuleBasedStateMachine (out of scope per issue #126)

Test count delta
----------------
Before: 945 passed (Phase 4h.2 Part 2 baseline)
After:  959 passed (+14 property tests across 2 new files)

Files (4 changed, +747 / 0)
----------------------------
- pyproject.toml — +6 (hypothesis>=6.92 in [dev])
- CLAUDE.md — +7 (## Gotchas note)
- tests/test_features/test_osap_replicate_properties.py — +394 NEW
- tests/test_scoring/test_transforms_properties.py — +340 NEW

Verification ladder all green
------------------------------
- ruff check . → All checks passed
- python -m pytest tests/ -m "not network" → 959 passed (1m46s)
- python -m pytest tests/test_features/test_osap_replicate_properties.py
  tests/test_scoring/test_transforms_properties.py → 14 passed (5s)
- python -m compute.output.schema_check → in sync (no schema delta)
- Sanity break-revert confirmed property test catches a regression

No regression discovered
------------------------
Property tests passed on first execution against current main
(commit 80c6641, Phase 4h.2 Part 2 already merged). No hidden bugs
surfaced beyond the 56-signal gap that PR #124 already fixed —
which itself is a good signal that the multi-port adapter handles
the [2, 10] cardinality region cleanly.

https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU
@vercel
Copy link
Copy Markdown

vercel Bot commented May 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
quantrank Ready Ready Preview, Comment May 20, 2026 1:25am

@dackclup dackclup marked this pull request as ready for review May 20, 2026 02:17
@dackclup dackclup merged commit 780650f into main May 20, 2026
4 checks passed
@dackclup dackclup deleted the process-hygiene-1-hypothesis-property-tests branch May 20, 2026 02:17
dackclup added a commit that referenced this pull request May 20, 2026
Part of epic #125 (Item #6 of 6). Pure tooling addition — no
runtime / scoring / schema impact.

Motivation
----------
PR #123 (2026-05-19, closed without merging): a worker session
opened a Phase 4j + 4k scout duplicate on branch
`claude/resume-quantrank-phase-4.5-Zh0pO` while the main session
shipped the same work directly via PRs #119 (Qlib) + #121 (IPCA).
Root cause: the worker session never inspected the `claude/*`
branch list + recent PRs before writing code, producing 100%
wasted effort.

This change ships a preflight check that surfaces in-flight scope
BEFORE any code is written, so the duplicate-PR failure mode is
caught at the handoff-prompt entry rather than at PR review.

Files (2 new, +271 LOC)
------------------------
- tools/check_branch_collisions.py (+149 LOC) — git-only preflight
  script. Lists active `claude/*` branches via `git ls-remote
  origin "refs/heads/claude/*"` and recent main-branch commits
  via `git log --since="48 hours ago" --oneline --no-merges
  origin/main`. Optional keyword args flag case-insensitive
  substring matches. Always exit 0 (informational only).

- .claude/skills/branch-collision-check/SKILL.md (+122 LOC) —
  skill description with YAML frontmatter, trigger conditions
  (handoff prompts, Phase / issue / Item #N mentions, fresh worker
  sessions), skip conditions (doc-only chores, iteration #2+,
  user-authorized parallel work), sample output (clean + warning),
  and output-interpretation guidance pointing the caller to STOP
  + ask the user when any ⚠️ line surfaces.

Design notes
------------
- Git-only data sources — no `gh` CLI / GitHub API auth required.
  Works in the QuantRank Claude Code Web sandbox where `gh` is
  unavailable, and on any contributor machine with bare git.
- 48-hour window — matches typical worker ↔ main session handoff
  cadence; long enough to catch duplicate work, short enough to
  keep the output scannable.
- Pure read-only — no destructive git ops, no branch creation,
  no push, no GitHub API mutation. Always returns exit 0; the
  caller decides whether to proceed.

Verification ladder all green
------------------------------
- ruff check . → All checks passed
- python tools/check_branch_collisions.py → lists 1 active
  claude/* branch + 16 recent commits (last 48h), exit 0
- python tools/check_branch_collisions.py "Alpha158" → fires
  ⚠️  on PR #119 commit "Alpha158 158-feature manifest", summary
  reports "1 potential scope collision(s) found", exit 0
- python tools/check_branch_collisions.py "Phase 99 nonsense" →
  no match, summary reports "No scope collisions detected",
  exit 0
- python tools/check_doc_test_counts.py → exit 0 (Item #2 guard
  still passes; new files don't introduce hardcoded counts)
- python -m compute.output.schema_check → in sync (no schema touch)
- python -m pytest tests/ -m "not network" → 959 passed
  (unchanged; tools/ + .claude/skills/ aren't imported by tests)
- SKILL.md YAML frontmatter parses — confirmed via Claude Code's
  skill registry picking it up at module load

Constraints honored
-------------------
- No touch to compute/ / frontend/ / tests/ — tools/ +
  .claude/skills/ only
- No network calls / no GitHub API auth — git remote ls + git log
- No destructive actions — read-only preflight check
- No push to main; no force-push; no --no-verify
- No workflow_dispatch trigger (compute-rankings.yml untouched)

Epic #125 status after this PR
-------------------------------
Item #1 ✅ Hypothesis property tests (PR #127)
Item #2 ✅ Strip hardcoded test counts + CI guard (PR #128)
Item #4 ✅ Observability-before-wiring pattern (PR #129)
Item #6 ✅ Branch-collision preflight (this PR)
Items #3, #5 remain — separate PRs per epic decomposition.

https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 20, 2026
…ble skills (#132)

3-task housekeeping + tacit knowledge harvest. Docs/skills-only PR —
no code, no schema delta, no test additions.

Task A — SKILL.md schema-version table fixes
---------------------------------------------
Two stale "in flight" entries flipped to merged + 1 new row inserted:

- Row 0.9.0-phase4h: "(in flight in PR #112)" → "(PR #112 merged
  2026-05-19)"
- Row 0.9.1-phase4h.2: "(in flight in PR #<NEXT>)" → "(PR #118 merged
  2026-05-19)"
- NEW row 0.9.2-phase4h.2 (above 0.9.1) — PR #124 merged, multi-port
  OSAP adapter + osap_signals_dropped_no_long_short field, closing
  the 100-signal accounting equation; DSR sign-inversion deferred to
  Part 3

PHASE_STATUS.md row 4 ALSO has "Phase 4h.2 Part 2 in flight in this
PR" staleness — confirmed via grep but DELIBERATELY not updated here
per Task A explicit scope (SKILL.md only). Recommend a follow-up
phase-status-bump PR after this lands.

Task B — New worker-session-handoff skill
------------------------------------------
.claude/skills/worker-session-handoff/SKILL.md (+163 LOC). YAML
frontmatter + 5 sections:

- When to use vs inline (≤50 LOC single-file → inline; ≥2 files /
  new dep / code logic → handoff)
- Constraint lock library (8 standard locks: composite/PHASE3,
  Rule 16, Rule 18, no-merge, no force-push, no --no-verify,
  no workflow_dispatch, schema triple)
- Anti-pattern: paste-loop avoidance (single outer code-block
  fence; reference PR #123 as related-but-distinct paste-loop
  failure mode)
- Template (paste-ready, single ```` outer code block with
  language tag ` text` so inner triple-backticks pass through)
- Reference invocations + QuantRank precedents (PR #124, #127, #131)

Codifies the handoff shape that appeared verbatim across PRs #123,
#124, #127, #128, #129, #131 — user copies ONE block instead of
editing 5 template snippets per handoff.

Task C — Portable skills library (4 skills, +417 LOC)
-----------------------------------------------------
Audit step (per spec): read CLAUDE.md + AGENTS.md + SKILL.md +
WORKFLOW.md + PR descriptions of #112/#118/#124/#127/#128/#129/#131.
Identified 7 candidate patterns; classified by portability:

- ✅ scout-then-integrate (portable; vendoring pattern, no QR logic)
- ✅ observability-before-wiring (portable; gate-diagnostic pattern)
- ✅ drift-detector-manifest (portable; API surface lock pattern)
- ✅ schema-triple-lockstep (portable; Python/TS JSON contract)
- 🟡 annotate-before-veto (portable; progressive rollout — DEFERRED
   to follow-up issue, lower value vs the 4 shipped)
- 🟡 pre-plan-investigations (subsumed by scout-then-integrate's
   Phase 1 § "Pre-plan investigations" — no separate skill needed)
- 🟡 graceful-degradation-try-except (portable; error-handling
   pattern — DEFERRED to follow-up issue, the wrapper is generally
   1-line so doesn't warrant a dedicated skill)

4 shipped (each ≤ 109 LOC):
  .claude/skills/portable-scout-then-integrate/SKILL.md (99 LOC)
  .claude/skills/portable-drift-detector-manifest/SKILL.md (109 LOC)
  .claude/skills/portable-schema-triple-lockstep/SKILL.md (103 LOC)
  .claude/skills/portable-observability-before-wiring/SKILL.md (106 LOC)

Flat naming convention (`portable-<name>/SKILL.md` at depth 1 from
`.claude/skills/`) because Claude Code's skill registry doesn't
recurse into nested subdirectories per CLAUDE.md ## Conventions.
Confirmed via session reload — all 4 portable + worker-session-
handoff registered correctly.

Each portable skill has:
- YAML frontmatter (name + description + TRIGGER + SKIP)
- ## Pattern section (generic, no QR business logic)
- ## Trigger conditions + ## Skip conditions
- ## QuantRank precedent (1 paragraph, clearly labeled as precedent
  not pattern definition)

Task C constraint check:
- All portable skills core pattern descriptions are project-
  agnostic (read `.claude/skills/portable-*/SKILL.md` ## Pattern
  sections — zero references to OSAP / IPCA / pillar / Top-5
  inside the pattern body; only inside the labeled "QuantRank
  precedent" section at the bottom)
- 3 of 4 portable skills are 103-109 LOC (slightly over the
  100-LOC target — pattern + trigger + skip + precedent sections
  require ~25 LOC each, leaving ~25 LOC of unavoidable scaffold).
  The 99-LOC one (scout-then-integrate) shows the cap is achievable
  but tight.

Files (6 changed, +580 LOC, no deletions)
------------------------------------------
- SKILL.md — schema-version table fixes (Task A)
- 5 new SKILL.md files in .claude/skills/ (Tasks B + C)

Verification ladder all green
------------------------------
- ruff check . → All checks passed
- python tools/check_doc_test_counts.py → exit 0
- python tools/check_branch_collisions.py "skill" "portable" →
  expected ⚠️ on #131 (own adjacent work, not a duplicate)
- python -m compute.output.schema_check → in sync (no schema touch)
- python -m pytest tests/ -m "not network" → 959 passed
  (unchanged; tools/ + .claude/skills/ aren't imported by tests)
- Claude Code skill registry pick-up verified via session reload —
  all 5 new skills (worker-session-handoff + 4 portable-*) appear
  in the available-skills list

Constraints honored
-------------------
- No touch to compute/ / frontend/ / tests/
- No touch to PHASE_STATUS.md / WORKFLOW.md (Task A scope =
  SKILL.md only; PHASE_STATUS.md staleness flagged for follow-up)
- No push to main; no force-push; no --no-verify
- No workflow_dispatch trigger
- Task C portable skills are project-agnostic in their pattern
  description (QR refs confined to labeled "precedent" sections)

Follow-up issue (to file post-merge)
------------------------------------
Title: "Portable Skills Library — extract remaining tacit patterns"
- annotate-before-veto (progressive rule rollout)
- graceful-degradation-try-except (1-line wrapper guidance)
- pre-plan-investigations as standalone (currently subsumed)
- Anything else surfaced by future PR descriptions

https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 20, 2026
…sk C.1 recovery) (#135)

* docs(skills): SKILL.md schema bump + worker-session-handoff + 4 portable skills

3-task housekeeping + tacit knowledge harvest. Docs/skills-only PR —
no code, no schema delta, no test additions.

Task A — SKILL.md schema-version table fixes
---------------------------------------------
Two stale "in flight" entries flipped to merged + 1 new row inserted:

- Row 0.9.0-phase4h: "(in flight in PR #112)" → "(PR #112 merged
  2026-05-19)"
- Row 0.9.1-phase4h.2: "(in flight in PR #<NEXT>)" → "(PR #118 merged
  2026-05-19)"
- NEW row 0.9.2-phase4h.2 (above 0.9.1) — PR #124 merged, multi-port
  OSAP adapter + osap_signals_dropped_no_long_short field, closing
  the 100-signal accounting equation; DSR sign-inversion deferred to
  Part 3

PHASE_STATUS.md row 4 ALSO has "Phase 4h.2 Part 2 in flight in this
PR" staleness — confirmed via grep but DELIBERATELY not updated here
per Task A explicit scope (SKILL.md only). Recommend a follow-up
phase-status-bump PR after this lands.

Task B — New worker-session-handoff skill
------------------------------------------
.claude/skills/worker-session-handoff/SKILL.md (+163 LOC). YAML
frontmatter + 5 sections:

- When to use vs inline (≤50 LOC single-file → inline; ≥2 files /
  new dep / code logic → handoff)
- Constraint lock library (8 standard locks: composite/PHASE3,
  Rule 16, Rule 18, no-merge, no force-push, no --no-verify,
  no workflow_dispatch, schema triple)
- Anti-pattern: paste-loop avoidance (single outer code-block
  fence; reference PR #123 as related-but-distinct paste-loop
  failure mode)
- Template (paste-ready, single ```` outer code block with
  language tag ` text` so inner triple-backticks pass through)
- Reference invocations + QuantRank precedents (PR #124, #127, #131)

Codifies the handoff shape that appeared verbatim across PRs #123,
#124, #127, #128, #129, #131 — user copies ONE block instead of
editing 5 template snippets per handoff.

Task C — Portable skills library (4 skills, +417 LOC)
-----------------------------------------------------
Audit step (per spec): read CLAUDE.md + AGENTS.md + SKILL.md +
WORKFLOW.md + PR descriptions of #112/#118/#124/#127/#128/#129/#131.
Identified 7 candidate patterns; classified by portability:

- ✅ scout-then-integrate (portable; vendoring pattern, no QR logic)
- ✅ observability-before-wiring (portable; gate-diagnostic pattern)
- ✅ drift-detector-manifest (portable; API surface lock pattern)
- ✅ schema-triple-lockstep (portable; Python/TS JSON contract)
- 🟡 annotate-before-veto (portable; progressive rollout — DEFERRED
   to follow-up issue, lower value vs the 4 shipped)
- 🟡 pre-plan-investigations (subsumed by scout-then-integrate's
   Phase 1 § "Pre-plan investigations" — no separate skill needed)
- 🟡 graceful-degradation-try-except (portable; error-handling
   pattern — DEFERRED to follow-up issue, the wrapper is generally
   1-line so doesn't warrant a dedicated skill)

4 shipped (each ≤ 109 LOC):
  .claude/skills/portable-scout-then-integrate/SKILL.md (99 LOC)
  .claude/skills/portable-drift-detector-manifest/SKILL.md (109 LOC)
  .claude/skills/portable-schema-triple-lockstep/SKILL.md (103 LOC)
  .claude/skills/portable-observability-before-wiring/SKILL.md (106 LOC)

Flat naming convention (`portable-<name>/SKILL.md` at depth 1 from
`.claude/skills/`) because Claude Code's skill registry doesn't
recurse into nested subdirectories per CLAUDE.md ## Conventions.
Confirmed via session reload — all 4 portable + worker-session-
handoff registered correctly.

Each portable skill has:
- YAML frontmatter (name + description + TRIGGER + SKIP)
- ## Pattern section (generic, no QR business logic)
- ## Trigger conditions + ## Skip conditions
- ## QuantRank precedent (1 paragraph, clearly labeled as precedent
  not pattern definition)

Task C constraint check:
- All portable skills core pattern descriptions are project-
  agnostic (read `.claude/skills/portable-*/SKILL.md` ## Pattern
  sections — zero references to OSAP / IPCA / pillar / Top-5
  inside the pattern body; only inside the labeled "QuantRank
  precedent" section at the bottom)
- 3 of 4 portable skills are 103-109 LOC (slightly over the
  100-LOC target — pattern + trigger + skip + precedent sections
  require ~25 LOC each, leaving ~25 LOC of unavoidable scaffold).
  The 99-LOC one (scout-then-integrate) shows the cap is achievable
  but tight.

Files (6 changed, +580 LOC, no deletions)
------------------------------------------
- SKILL.md — schema-version table fixes (Task A)
- 5 new SKILL.md files in .claude/skills/ (Tasks B + C)

Verification ladder all green
------------------------------
- ruff check . → All checks passed
- python tools/check_doc_test_counts.py → exit 0
- python tools/check_branch_collisions.py "skill" "portable" →
  expected ⚠️ on #131 (own adjacent work, not a duplicate)
- python -m compute.output.schema_check → in sync (no schema touch)
- python -m pytest tests/ -m "not network" → 959 passed
  (unchanged; tools/ + .claude/skills/ aren't imported by tests)
- Claude Code skill registry pick-up verified via session reload —
  all 5 new skills (worker-session-handoff + 4 portable-*) appear
  in the available-skills list

Constraints honored
-------------------
- No touch to compute/ / frontend/ / tests/
- No touch to PHASE_STATUS.md / WORKFLOW.md (Task A scope =
  SKILL.md only; PHASE_STATUS.md staleness flagged for follow-up)
- No push to main; no force-push; no --no-verify
- No workflow_dispatch trigger
- Task C portable skills are project-agnostic in their pattern
  description (QR refs confined to labeled "precedent" sections)

Follow-up issue (to file post-merge)
------------------------------------
Title: "Portable Skills Library — extract remaining tacit patterns"
- annotate-before-veto (progressive rule rollout)
- graceful-degradation-try-except (1-line wrapper guidance)
- pre-plan-investigations as standalone (currently subsumed)
- Anything else surfaced by future PR descriptions

https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU

* docs(skills): Vendor karpathy-guidelines (Task C.1 recovery) + THIRD_PARTY_NOTICES.md

Recovers Task C.1 from the original handoff that was silent-dropped in
the prior PR #132 commit (50da720). The handoff explicitly named
"Vendor karpathy-guidelines (1 skill, ~70 LOC)" as part of the portable
skills library; the auditor session caught the omission and authorized
this follow-up commit on the existing branch.

Files (2 new, +138 LOC)
------------------------
- .claude/skills/portable-karpathy-guidelines/SKILL.md (+82 LOC) —
  vendored content of upstream skills/karpathy-guidelines/SKILL.md
  (67 LOC, byte-for-byte preserved) + 15-line appended attribution
  block referencing the upstream source, commit SHA, and the
  Karpathy tweet that motivated the guidelines.

- THIRD_PARTY_NOTICES.md (+56 LOC, NEW at repo root) — third-party
  license disclosures. Section "karpathy-guidelines (Claude Code
  skill)" carries source URL, license declaration, vendored path,
  vendored date, upstream commit SHA, upstream first-commit date,
  and the full standard MIT License text with copyright attributed
  to "multica-ai contributors" (upstream has no individual copyright
  line and no standalone LICENSE file; the `license: MIT` claim
  appears in upstream README.md § License and each skill's YAML
  frontmatter).

Upstream provenance
-------------------
- Source: https://github.com/multica-ai/andrej-karpathy-skills
- Upstream HEAD SHA at vendoring: 2c606141936f1eeef17fa3043a72095b4765b9c2
- Upstream first commit: 2026-01-27
- Vendored date: 2026-05-20
- License: MIT (declared)

Verbatim content preserved
--------------------------
`diff /tmp/karpathy-src/skills/karpathy-guidelines/SKILL.md
.claude/skills/portable-karpathy-guidelines/SKILL.md` shows ONLY
the 15-line appended attribution block at lines 68-82. The upstream
67-line content (YAML frontmatter + "Karpathy Guidelines" heading +
the 4 principles) is byte-for-byte unchanged. Per the spec
constraint: "เก็บ 4 principles verbatim. แก้ได้แค่ 'เพิ่ม' attribution
block ท้ายไฟล์".

License-disclosure caveat
-------------------------
Upstream `multica-ai/andrej-karpathy-skills` declares MIT via README
+ YAML frontmatter but does NOT ship a standalone LICENSE file. The
`THIRD_PARTY_NOTICES.md` entry includes the standard MIT License
template with copyright attributed to the GitHub org ("multica-ai
contributors"), matching the principle that an MIT declaration
without a formal copyright line still licenses to the redistributor;
the attribution is conservative.

Verification ladder all green
------------------------------
- ruff check . → All checks passed
- python tools/check_doc_test_counts.py → exit 0 (no test-count
  drift introduced by this commit)
- python tools/check_branch_collisions.py "karpathy" → no scope
  collisions detected
- python -m compute.output.schema_check → in sync (no schema touch)
- python -m pytest tests/ -m "not network" → 959 passed (unchanged;
  .claude/skills/ + THIRD_PARTY_NOTICES.md aren't imported by tests)
- Skill registry pickup verified via session reload —
  `portable-karpathy-guidelines` appears in the available-skills list
  with the upstream description verbatim

Constraints honored
-------------------
- No squash / amend of the prior 50da720 commit — this is a fresh
  commit pushed on top of the existing branch (per spec
  "ห้าม squash old commit")
- No touch to the 4 already-shipped portable skills in 50da720
- No touch to compute/ / frontend/ / tests/
- No push to main; no force-push; no --no-verify
- No workflow_dispatch trigger
- Karpathy SKILL.md upstream content preserved verbatim; only the
  attribution block appended below the original content

PR description update will follow as a separate `gh pr edit` /
MCP `update_pull_request` call so the new "License Compliance"
section + the audit-table row for karpathy-guidelines land in the
PR body.

https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Process hygiene #1 — Add Hypothesis property-based tests for data-shape invariants

2 participants