test(features): Add Hypothesis property-based tests for data-shape invariants (#126)#127
Merged
Merged
Conversation
…variants (#126) Closes #126. Process Hygiene Item #1 (parent epic #125). Adds Hypothesis property- based tests as the new defense line for "untested data-shape assumption" bugs — the class that hid the OSAP quintile/tercile silent-drop in PR #112's CI until production cron diagnostics caught it (subsequently fixed in PR #124 / Phase 4h.2 Part 2). If a `@given` property over `port_count ∈ {2,3,5,10}` had existed in Phase 4h, the hardcoded `port=10` filter would have been falsified the first time the CI ran. Test-addition only. No scoring / feature behavior touched. No schema delta. No CI workflow changes. Sub-task 1 — Hypothesis added to [dev] extra (pyproject.toml) -------------------------------------------------------------- `hypothesis>=6.92` joins `pytest` + `ruff` in the `[dev]` optional extra. Pure-Python dep (no C extensions); CI footprint negligible. Sub-task 2 — Property tests for osap_replicate.py (7 tests, 394 LOC) --------------------------------------------------------------------- New file: tests/test_features/test_osap_replicate_properties.py 7 property tests covering data-shape invariants the Phase 4h.2 Part 2 multi-port adapter must satisfy: 1. `test_compute_long_short_returns_handles_any_port_cardinality` — for port_count ∈ [2, 10] and n_dates ∈ [1, 12], the adapter produces exactly n_dates LS rows with ls_return == port_count - 1. THE headline property — would have caught the PR #112 bug. 2. `test_signals_dropped_no_long_short_returns_sorted_unique` — contract for the Metadata.osap_signals_dropped_no_long_short field: sorted, no duplicates, single-port signals appear, two-port signals don't. 3. `test_normalize_port_label_int_input_yields_2char_zfill` — port=int(1..10) → '01'..'10' for any input list. Idempotent. 4. `test_normalize_port_label_str_input_yields_2char_zfill` — mixed '1' / '01' / '10' inputs normalize to a uniform 2-char width. 5. `test_part2_accounting_invariant_under_random_partition` — the Phase 4h.2 Part 2 accounting equation (manifest = missing + dropped + gated + used) holds for any 3-way partition of a synthetic manifest into the bucket set. Uses st.composite to draw disjoint partitions. 6. `test_coverage_by_signal_returns_pct_in_0_to_100` — domain contract for the coverage helper (0..100 percent, NOT 0..1 fraction). 7. `test_rank_signals_cross_sectional_returns_unit_interval` — ranks live in (0, 1] for any non-empty cross-section. Sub-task 3 — Property tests for scoring transforms (7 tests, 340 LOC) --------------------------------------------------------------------- New file: tests/test_scoring/test_transforms_properties.py 7 property tests covering composite (compute/scoring/composite.py) and OSAP blend (compute/scoring/osap_blend.py) — pure-numeric transforms whose output domains are contract-locked by the downstream Pydantic + TypeScript schemas. Composite tests (4): A. `test_compute_composite_output_bounded_0_to_100` — for any pillar input in [0, 100], composite ∈ [0, 100] (the writer + Pydantic contract) B. `test_compute_composite_all_50_inputs_yield_composite_50` — neutral-pillar input collapses to composite == 50 (catches accidental weight-vector drift) C. `test_compute_composite_neutralize_missing_imputes_nan_to_50` — NaN pillar inputs are imputed when neutralize_missing=True; all-NaN → composite == 50.0 D. `test_compute_composite_constant_input_equals_input` — constant-pillar input → composite == that constant (PHASE3 weight-sum-equals-1.0 invariant expressed as a property) OSAP blend tests (3): E. `test_apply_osap_blend_output_bounded_and_nan_passthrough` — blend ∈ [0, 100]; NaN OSAP → composite passthrough; finite OSAP → interior point between composite and osap F. `test_aggregate_osap_signals_finite_values_in_0_to_100` — finite aggregate values live in [0, 100]; NaN allowed for universe gaps G. `test_apply_osap_blend_weight_zero_is_identity_on_composite` — weight=0 leaves composite unchanged (locks the Phase 4h observability-only design property + Rule 16: Top-5 still ranks raw composite) Sub-task 4 — CI integration + .gitignore + docs ------------------------------------------------- - `.gitignore` already covers `.hypothesis/` at line 50 (Python's default boilerplate) — no edit needed. - CLAUDE.md ## Gotchas — 1-line note that Hypothesis is the new defense line for data-shape bugs (paired with example tests), with the `@settings(deadline=None)` anti-pattern flagged. - CI hypothesis.errors.Flaky behaviour: default profile makes flaky examples fail-fast (no retry); the `pytest -m "not network"` CI invocation inherits this. NO `@settings(deadline=None)` used in this PR — slow examples surface as honest failures. Sanity verification (NOT committed) ----------------------------------- As part of pre-push verification I temporarily reverted the multi- port adapter at compute/features/osap_replicate.py:143 (`agg(["min", "max"])` → `agg(["min", "min"])`) and confirmed `test_compute_long_short_returns_handles_any_port_cardinality` fails with "Falsifying example: port_count=2, n_dates=1". Reverted the break before commit. Constraints honored ------------------- - NO modification to compute_composite() / PHASE3_WEIGHTS sum=1.0 invariant (composite.py:43-45) — pure test-addition PR - Rule 16: Top-5 still ranks raw composite_score; no scoring touched - No push to main; no force-push; no --no-verify - No workflow_dispatch trigger (compute-rankings.yml untouched) - Schema triple untouched (no schemas.py / types.ts changes) - NO @settings(deadline=None) — default deterministic deadline - NO RuleBasedStateMachine (out of scope per issue #126) Test count delta ---------------- Before: 945 passed (Phase 4h.2 Part 2 baseline) After: 959 passed (+14 property tests across 2 new files) Files (4 changed, +747 / 0) ---------------------------- - pyproject.toml — +6 (hypothesis>=6.92 in [dev]) - CLAUDE.md — +7 (## Gotchas note) - tests/test_features/test_osap_replicate_properties.py — +394 NEW - tests/test_scoring/test_transforms_properties.py — +340 NEW Verification ladder all green ------------------------------ - ruff check . → All checks passed - python -m pytest tests/ -m "not network" → 959 passed (1m46s) - python -m pytest tests/test_features/test_osap_replicate_properties.py tests/test_scoring/test_transforms_properties.py → 14 passed (5s) - python -m compute.output.schema_check → in sync (no schema delta) - Sanity break-revert confirmed property test catches a regression No regression discovered ------------------------ Property tests passed on first execution against current main (commit 80c6641, Phase 4h.2 Part 2 already merged). No hidden bugs surfaced beyond the 56-signal gap that PR #124 already fixed — which itself is a good signal that the multi-port adapter handles the [2, 10] cardinality region cleanly. https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
10 tasks
dackclup
added a commit
that referenced
this pull request
May 20, 2026
Part of epic #125 (Item #6 of 6). Pure tooling addition — no runtime / scoring / schema impact. Motivation ---------- PR #123 (2026-05-19, closed without merging): a worker session opened a Phase 4j + 4k scout duplicate on branch `claude/resume-quantrank-phase-4.5-Zh0pO` while the main session shipped the same work directly via PRs #119 (Qlib) + #121 (IPCA). Root cause: the worker session never inspected the `claude/*` branch list + recent PRs before writing code, producing 100% wasted effort. This change ships a preflight check that surfaces in-flight scope BEFORE any code is written, so the duplicate-PR failure mode is caught at the handoff-prompt entry rather than at PR review. Files (2 new, +271 LOC) ------------------------ - tools/check_branch_collisions.py (+149 LOC) — git-only preflight script. Lists active `claude/*` branches via `git ls-remote origin "refs/heads/claude/*"` and recent main-branch commits via `git log --since="48 hours ago" --oneline --no-merges origin/main`. Optional keyword args flag case-insensitive substring matches. Always exit 0 (informational only). - .claude/skills/branch-collision-check/SKILL.md (+122 LOC) — skill description with YAML frontmatter, trigger conditions (handoff prompts, Phase / issue / Item #N mentions, fresh worker sessions), skip conditions (doc-only chores, iteration #2+, user-authorized parallel work), sample output (clean + warning), and output-interpretation guidance pointing the caller to STOP + ask the user when any⚠️ line surfaces. Design notes ------------ - Git-only data sources — no `gh` CLI / GitHub API auth required. Works in the QuantRank Claude Code Web sandbox where `gh` is unavailable, and on any contributor machine with bare git. - 48-hour window — matches typical worker ↔ main session handoff cadence; long enough to catch duplicate work, short enough to keep the output scannable. - Pure read-only — no destructive git ops, no branch creation, no push, no GitHub API mutation. Always returns exit 0; the caller decides whether to proceed. Verification ladder all green ------------------------------ - ruff check . → All checks passed - python tools/check_branch_collisions.py → lists 1 active claude/* branch + 16 recent commits (last 48h), exit 0 - python tools/check_branch_collisions.py "Alpha158" → fires⚠️ on PR #119 commit "Alpha158 158-feature manifest", summary reports "1 potential scope collision(s) found", exit 0 - python tools/check_branch_collisions.py "Phase 99 nonsense" → no match, summary reports "No scope collisions detected", exit 0 - python tools/check_doc_test_counts.py → exit 0 (Item #2 guard still passes; new files don't introduce hardcoded counts) - python -m compute.output.schema_check → in sync (no schema touch) - python -m pytest tests/ -m "not network" → 959 passed (unchanged; tools/ + .claude/skills/ aren't imported by tests) - SKILL.md YAML frontmatter parses — confirmed via Claude Code's skill registry picking it up at module load Constraints honored ------------------- - No touch to compute/ / frontend/ / tests/ — tools/ + .claude/skills/ only - No network calls / no GitHub API auth — git remote ls + git log - No destructive actions — read-only preflight check - No push to main; no force-push; no --no-verify - No workflow_dispatch trigger (compute-rankings.yml untouched) Epic #125 status after this PR ------------------------------- Item #1 ✅ Hypothesis property tests (PR #127) Item #2 ✅ Strip hardcoded test counts + CI guard (PR #128) Item #4 ✅ Observability-before-wiring pattern (PR #129) Item #6 ✅ Branch-collision preflight (this PR) Items #3, #5 remain — separate PRs per epic decomposition. https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU Co-authored-by: Claude <noreply@anthropic.com>
10 tasks
dackclup
added a commit
that referenced
this pull request
May 20, 2026
…ble skills (#132) 3-task housekeeping + tacit knowledge harvest. Docs/skills-only PR — no code, no schema delta, no test additions. Task A — SKILL.md schema-version table fixes --------------------------------------------- Two stale "in flight" entries flipped to merged + 1 new row inserted: - Row 0.9.0-phase4h: "(in flight in PR #112)" → "(PR #112 merged 2026-05-19)" - Row 0.9.1-phase4h.2: "(in flight in PR #<NEXT>)" → "(PR #118 merged 2026-05-19)" - NEW row 0.9.2-phase4h.2 (above 0.9.1) — PR #124 merged, multi-port OSAP adapter + osap_signals_dropped_no_long_short field, closing the 100-signal accounting equation; DSR sign-inversion deferred to Part 3 PHASE_STATUS.md row 4 ALSO has "Phase 4h.2 Part 2 in flight in this PR" staleness — confirmed via grep but DELIBERATELY not updated here per Task A explicit scope (SKILL.md only). Recommend a follow-up phase-status-bump PR after this lands. Task B — New worker-session-handoff skill ------------------------------------------ .claude/skills/worker-session-handoff/SKILL.md (+163 LOC). YAML frontmatter + 5 sections: - When to use vs inline (≤50 LOC single-file → inline; ≥2 files / new dep / code logic → handoff) - Constraint lock library (8 standard locks: composite/PHASE3, Rule 16, Rule 18, no-merge, no force-push, no --no-verify, no workflow_dispatch, schema triple) - Anti-pattern: paste-loop avoidance (single outer code-block fence; reference PR #123 as related-but-distinct paste-loop failure mode) - Template (paste-ready, single ```` outer code block with language tag ` text` so inner triple-backticks pass through) - Reference invocations + QuantRank precedents (PR #124, #127, #131) Codifies the handoff shape that appeared verbatim across PRs #123, #124, #127, #128, #129, #131 — user copies ONE block instead of editing 5 template snippets per handoff. Task C — Portable skills library (4 skills, +417 LOC) ----------------------------------------------------- Audit step (per spec): read CLAUDE.md + AGENTS.md + SKILL.md + WORKFLOW.md + PR descriptions of #112/#118/#124/#127/#128/#129/#131. Identified 7 candidate patterns; classified by portability: - ✅ scout-then-integrate (portable; vendoring pattern, no QR logic) - ✅ observability-before-wiring (portable; gate-diagnostic pattern) - ✅ drift-detector-manifest (portable; API surface lock pattern) - ✅ schema-triple-lockstep (portable; Python/TS JSON contract) - 🟡 annotate-before-veto (portable; progressive rollout — DEFERRED to follow-up issue, lower value vs the 4 shipped) - 🟡 pre-plan-investigations (subsumed by scout-then-integrate's Phase 1 § "Pre-plan investigations" — no separate skill needed) - 🟡 graceful-degradation-try-except (portable; error-handling pattern — DEFERRED to follow-up issue, the wrapper is generally 1-line so doesn't warrant a dedicated skill) 4 shipped (each ≤ 109 LOC): .claude/skills/portable-scout-then-integrate/SKILL.md (99 LOC) .claude/skills/portable-drift-detector-manifest/SKILL.md (109 LOC) .claude/skills/portable-schema-triple-lockstep/SKILL.md (103 LOC) .claude/skills/portable-observability-before-wiring/SKILL.md (106 LOC) Flat naming convention (`portable-<name>/SKILL.md` at depth 1 from `.claude/skills/`) because Claude Code's skill registry doesn't recurse into nested subdirectories per CLAUDE.md ## Conventions. Confirmed via session reload — all 4 portable + worker-session- handoff registered correctly. Each portable skill has: - YAML frontmatter (name + description + TRIGGER + SKIP) - ## Pattern section (generic, no QR business logic) - ## Trigger conditions + ## Skip conditions - ## QuantRank precedent (1 paragraph, clearly labeled as precedent not pattern definition) Task C constraint check: - All portable skills core pattern descriptions are project- agnostic (read `.claude/skills/portable-*/SKILL.md` ## Pattern sections — zero references to OSAP / IPCA / pillar / Top-5 inside the pattern body; only inside the labeled "QuantRank precedent" section at the bottom) - 3 of 4 portable skills are 103-109 LOC (slightly over the 100-LOC target — pattern + trigger + skip + precedent sections require ~25 LOC each, leaving ~25 LOC of unavoidable scaffold). The 99-LOC one (scout-then-integrate) shows the cap is achievable but tight. Files (6 changed, +580 LOC, no deletions) ------------------------------------------ - SKILL.md — schema-version table fixes (Task A) - 5 new SKILL.md files in .claude/skills/ (Tasks B + C) Verification ladder all green ------------------------------ - ruff check . → All checks passed - python tools/check_doc_test_counts.py → exit 0 - python tools/check_branch_collisions.py "skill" "portable" → expected⚠️ on #131 (own adjacent work, not a duplicate) - python -m compute.output.schema_check → in sync (no schema touch) - python -m pytest tests/ -m "not network" → 959 passed (unchanged; tools/ + .claude/skills/ aren't imported by tests) - Claude Code skill registry pick-up verified via session reload — all 5 new skills (worker-session-handoff + 4 portable-*) appear in the available-skills list Constraints honored ------------------- - No touch to compute/ / frontend/ / tests/ - No touch to PHASE_STATUS.md / WORKFLOW.md (Task A scope = SKILL.md only; PHASE_STATUS.md staleness flagged for follow-up) - No push to main; no force-push; no --no-verify - No workflow_dispatch trigger - Task C portable skills are project-agnostic in their pattern description (QR refs confined to labeled "precedent" sections) Follow-up issue (to file post-merge) ------------------------------------ Title: "Portable Skills Library — extract remaining tacit patterns" - annotate-before-veto (progressive rule rollout) - graceful-degradation-try-except (1-line wrapper guidance) - pre-plan-investigations as standalone (currently subsumed) - Anything else surfaced by future PR descriptions https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU Co-authored-by: Claude <noreply@anthropic.com>
dackclup
added a commit
that referenced
this pull request
May 20, 2026
…sk C.1 recovery) (#135) * docs(skills): SKILL.md schema bump + worker-session-handoff + 4 portable skills 3-task housekeeping + tacit knowledge harvest. Docs/skills-only PR — no code, no schema delta, no test additions. Task A — SKILL.md schema-version table fixes --------------------------------------------- Two stale "in flight" entries flipped to merged + 1 new row inserted: - Row 0.9.0-phase4h: "(in flight in PR #112)" → "(PR #112 merged 2026-05-19)" - Row 0.9.1-phase4h.2: "(in flight in PR #<NEXT>)" → "(PR #118 merged 2026-05-19)" - NEW row 0.9.2-phase4h.2 (above 0.9.1) — PR #124 merged, multi-port OSAP adapter + osap_signals_dropped_no_long_short field, closing the 100-signal accounting equation; DSR sign-inversion deferred to Part 3 PHASE_STATUS.md row 4 ALSO has "Phase 4h.2 Part 2 in flight in this PR" staleness — confirmed via grep but DELIBERATELY not updated here per Task A explicit scope (SKILL.md only). Recommend a follow-up phase-status-bump PR after this lands. Task B — New worker-session-handoff skill ------------------------------------------ .claude/skills/worker-session-handoff/SKILL.md (+163 LOC). YAML frontmatter + 5 sections: - When to use vs inline (≤50 LOC single-file → inline; ≥2 files / new dep / code logic → handoff) - Constraint lock library (8 standard locks: composite/PHASE3, Rule 16, Rule 18, no-merge, no force-push, no --no-verify, no workflow_dispatch, schema triple) - Anti-pattern: paste-loop avoidance (single outer code-block fence; reference PR #123 as related-but-distinct paste-loop failure mode) - Template (paste-ready, single ```` outer code block with language tag ` text` so inner triple-backticks pass through) - Reference invocations + QuantRank precedents (PR #124, #127, #131) Codifies the handoff shape that appeared verbatim across PRs #123, #124, #127, #128, #129, #131 — user copies ONE block instead of editing 5 template snippets per handoff. Task C — Portable skills library (4 skills, +417 LOC) ----------------------------------------------------- Audit step (per spec): read CLAUDE.md + AGENTS.md + SKILL.md + WORKFLOW.md + PR descriptions of #112/#118/#124/#127/#128/#129/#131. Identified 7 candidate patterns; classified by portability: - ✅ scout-then-integrate (portable; vendoring pattern, no QR logic) - ✅ observability-before-wiring (portable; gate-diagnostic pattern) - ✅ drift-detector-manifest (portable; API surface lock pattern) - ✅ schema-triple-lockstep (portable; Python/TS JSON contract) - 🟡 annotate-before-veto (portable; progressive rollout — DEFERRED to follow-up issue, lower value vs the 4 shipped) - 🟡 pre-plan-investigations (subsumed by scout-then-integrate's Phase 1 § "Pre-plan investigations" — no separate skill needed) - 🟡 graceful-degradation-try-except (portable; error-handling pattern — DEFERRED to follow-up issue, the wrapper is generally 1-line so doesn't warrant a dedicated skill) 4 shipped (each ≤ 109 LOC): .claude/skills/portable-scout-then-integrate/SKILL.md (99 LOC) .claude/skills/portable-drift-detector-manifest/SKILL.md (109 LOC) .claude/skills/portable-schema-triple-lockstep/SKILL.md (103 LOC) .claude/skills/portable-observability-before-wiring/SKILL.md (106 LOC) Flat naming convention (`portable-<name>/SKILL.md` at depth 1 from `.claude/skills/`) because Claude Code's skill registry doesn't recurse into nested subdirectories per CLAUDE.md ## Conventions. Confirmed via session reload — all 4 portable + worker-session- handoff registered correctly. Each portable skill has: - YAML frontmatter (name + description + TRIGGER + SKIP) - ## Pattern section (generic, no QR business logic) - ## Trigger conditions + ## Skip conditions - ## QuantRank precedent (1 paragraph, clearly labeled as precedent not pattern definition) Task C constraint check: - All portable skills core pattern descriptions are project- agnostic (read `.claude/skills/portable-*/SKILL.md` ## Pattern sections — zero references to OSAP / IPCA / pillar / Top-5 inside the pattern body; only inside the labeled "QuantRank precedent" section at the bottom) - 3 of 4 portable skills are 103-109 LOC (slightly over the 100-LOC target — pattern + trigger + skip + precedent sections require ~25 LOC each, leaving ~25 LOC of unavoidable scaffold). The 99-LOC one (scout-then-integrate) shows the cap is achievable but tight. Files (6 changed, +580 LOC, no deletions) ------------------------------------------ - SKILL.md — schema-version table fixes (Task A) - 5 new SKILL.md files in .claude/skills/ (Tasks B + C) Verification ladder all green ------------------------------ - ruff check . → All checks passed - python tools/check_doc_test_counts.py → exit 0 - python tools/check_branch_collisions.py "skill" "portable" → expected⚠️ on #131 (own adjacent work, not a duplicate) - python -m compute.output.schema_check → in sync (no schema touch) - python -m pytest tests/ -m "not network" → 959 passed (unchanged; tools/ + .claude/skills/ aren't imported by tests) - Claude Code skill registry pick-up verified via session reload — all 5 new skills (worker-session-handoff + 4 portable-*) appear in the available-skills list Constraints honored ------------------- - No touch to compute/ / frontend/ / tests/ - No touch to PHASE_STATUS.md / WORKFLOW.md (Task A scope = SKILL.md only; PHASE_STATUS.md staleness flagged for follow-up) - No push to main; no force-push; no --no-verify - No workflow_dispatch trigger - Task C portable skills are project-agnostic in their pattern description (QR refs confined to labeled "precedent" sections) Follow-up issue (to file post-merge) ------------------------------------ Title: "Portable Skills Library — extract remaining tacit patterns" - annotate-before-veto (progressive rule rollout) - graceful-degradation-try-except (1-line wrapper guidance) - pre-plan-investigations as standalone (currently subsumed) - Anything else surfaced by future PR descriptions https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU * docs(skills): Vendor karpathy-guidelines (Task C.1 recovery) + THIRD_PARTY_NOTICES.md Recovers Task C.1 from the original handoff that was silent-dropped in the prior PR #132 commit (50da720). The handoff explicitly named "Vendor karpathy-guidelines (1 skill, ~70 LOC)" as part of the portable skills library; the auditor session caught the omission and authorized this follow-up commit on the existing branch. Files (2 new, +138 LOC) ------------------------ - .claude/skills/portable-karpathy-guidelines/SKILL.md (+82 LOC) — vendored content of upstream skills/karpathy-guidelines/SKILL.md (67 LOC, byte-for-byte preserved) + 15-line appended attribution block referencing the upstream source, commit SHA, and the Karpathy tweet that motivated the guidelines. - THIRD_PARTY_NOTICES.md (+56 LOC, NEW at repo root) — third-party license disclosures. Section "karpathy-guidelines (Claude Code skill)" carries source URL, license declaration, vendored path, vendored date, upstream commit SHA, upstream first-commit date, and the full standard MIT License text with copyright attributed to "multica-ai contributors" (upstream has no individual copyright line and no standalone LICENSE file; the `license: MIT` claim appears in upstream README.md § License and each skill's YAML frontmatter). Upstream provenance ------------------- - Source: https://github.com/multica-ai/andrej-karpathy-skills - Upstream HEAD SHA at vendoring: 2c606141936f1eeef17fa3043a72095b4765b9c2 - Upstream first commit: 2026-01-27 - Vendored date: 2026-05-20 - License: MIT (declared) Verbatim content preserved -------------------------- `diff /tmp/karpathy-src/skills/karpathy-guidelines/SKILL.md .claude/skills/portable-karpathy-guidelines/SKILL.md` shows ONLY the 15-line appended attribution block at lines 68-82. The upstream 67-line content (YAML frontmatter + "Karpathy Guidelines" heading + the 4 principles) is byte-for-byte unchanged. Per the spec constraint: "เก็บ 4 principles verbatim. แก้ได้แค่ 'เพิ่ม' attribution block ท้ายไฟล์". License-disclosure caveat ------------------------- Upstream `multica-ai/andrej-karpathy-skills` declares MIT via README + YAML frontmatter but does NOT ship a standalone LICENSE file. The `THIRD_PARTY_NOTICES.md` entry includes the standard MIT License template with copyright attributed to the GitHub org ("multica-ai contributors"), matching the principle that an MIT declaration without a formal copyright line still licenses to the redistributor; the attribution is conservative. Verification ladder all green ------------------------------ - ruff check . → All checks passed - python tools/check_doc_test_counts.py → exit 0 (no test-count drift introduced by this commit) - python tools/check_branch_collisions.py "karpathy" → no scope collisions detected - python -m compute.output.schema_check → in sync (no schema touch) - python -m pytest tests/ -m "not network" → 959 passed (unchanged; .claude/skills/ + THIRD_PARTY_NOTICES.md aren't imported by tests) - Skill registry pickup verified via session reload — `portable-karpathy-guidelines` appears in the available-skills list with the upstream description verbatim Constraints honored ------------------- - No squash / amend of the prior 50da720 commit — this is a fresh commit pushed on top of the existing branch (per spec "ห้าม squash old commit") - No touch to the 4 already-shipped portable skills in 50da720 - No touch to compute/ / frontend/ / tests/ - No push to main; no force-push; no --no-verify - No workflow_dispatch trigger - Karpathy SKILL.md upstream content preserved verbatim; only the attribution block appended below the original content PR description update will follow as a separate `gh pr edit` / MCP `update_pull_request` call so the new "License Compliance" section + the audit-table row for karpathy-guidelines land in the PR body. https://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU --------- Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #126.
Summary
Process Hygiene Item #1 (parent epic #125). Adds Hypothesis property-based tests as the new defense line for "untested data-shape assumption" bugs — the class that hid the OSAP quintile/tercile silent-drop in PR #112's CI until production cron diagnostics caught it (subsequently fixed in PR #124 / Phase 4h.2 Part 2).
If a
@givenproperty overport_count ∈ {2, 3, 5, 10}had existed in Phase 4h, the hardcodedport=10filter would have been falsified the first time CI ran.Test-addition only. No scoring / feature behavior touched. No schema delta. No CI workflow changes.
Test count before / after
80c6641e, post Phase 4h.2 Part 2)Property tests landed (14)
Sub-task 2 —
osap_replicate.py(7 tests, 394 LOC)test_compute_long_short_returns_handles_any_port_cardinalitytest_signals_dropped_no_long_short_returns_sorted_uniquetest_normalize_port_label_int_input_yields_2char_zfilltest_normalize_port_label_str_input_yields_2char_zfilltest_part2_accounting_invariant_under_random_partitionmanifest = missing + dropped + gated + usedholds for any partitiontest_coverage_by_signal_returns_pct_in_0_to_100test_rank_signals_cross_sectional_returns_unit_intervalSub-task 3 — scoring transforms (7 tests, 340 LOC)
test_compute_composite_output_bounded_0_to_100test_compute_composite_all_50_inputs_yield_composite_50test_compute_composite_neutralize_missing_imputes_nan_to_50test_compute_composite_constant_input_equals_inputPHASE3_WEIGHTSsum-to-1 invarianttest_apply_osap_blend_output_bounded_and_nan_passthroughtest_aggregate_osap_signals_finite_values_in_0_to_100test_apply_osap_blend_weight_zero_is_identity_on_compositeSub-task 4 — CI integration + docs
.gitignore— already covers.hypothesis/at L50 (Python default). No edit needed.CLAUDE.md ## Gotchas— 1-line note that Hypothesis is the new defense line, with the@settings(deadline=None)anti-pattern flagged.pytest -m "not network"inherits this. No@settings(deadline=None)used in any of the 14 properties.Sanity verification (NOT committed)
Temporarily reverted
compute/features/osap_replicate.py:143(agg(["min", "max"])→agg(["min", "min"])) and confirmed Property 1 fails with:Reverted the break before commit. Working tree matches main except for the 4 staged files.
No regression discovered
Property tests passed on first execution against current
main(Phase 4h.2 Part 2 already merged). The fact that nothing falsified in 14 properties × ~100 examples is itself a quality signal — the multi-port adapter handles the [2, 10] cardinality region cleanly, the composite weight invariant holds, and the OSAP blend domain contract isn't violated under any (composite, osap, weight) triple in the unit interval.Constraints honored
compute_composite/PHASE3_WEIGHTSsum=1.0 invariant — pure test-addition PRcomposite_score; no scoring touched--no-verifyworkflow_dispatchtrigger (compute-rankings.ymluntouched)schemas.py/types.tschanges)@settings(deadline=None)— default deterministic deadlineRuleBasedStateMachine(out of scope per issue Process hygiene #1 — Add Hypothesis property-based tests for data-shape invariants #126)Files (4 changed, +747 / 0)
pyproject.tomlhypothesis>=6.92in[dev](+6)CLAUDE.md## Gotchasnote (+7)tests/test_features/test_osap_replicate_properties.pytests/test_scoring/test_transforms_properties.pyTest plan
ruff check .→ All checks passedpython -m pytest tests/ -m "not network"→ 959 passed (1m46s)python -m pytest tests/test_features/test_osap_replicate_properties.py tests/test_scoring/test_transforms_properties.py→ 14 passed (5s)python -m compute.output.schema_check→ in sync (no schema delta)Falsifying example: port_count=2, n_dates=1)process-hygiene-1-hypothesis-property-testshttps://claude.ai/code/session_01T8FE3MAnmk6hcjvH4SgYNU
Generated by Claude Code