Skip to content

Process hygiene #1 — Add Hypothesis property-based tests for data-shape invariants #126

@dackclup

Description

@dackclup

Context

Part of the process-hygiene epic (parent issue will be linked once filed). Entry point because it has the most concrete acceptance criteria + directly prevents the Phase 4h.2 Part 2 root cause.

Motivation

The 56-signal silent-drop in compute/features/osap_replicate.py:91-140 (fixed in PR #124) had this structure:

# Pre-Part-2 code assumed deciles universally
df = df[df["port"].isin([LONG_PORT_LABEL, SHORT_PORT_LABEL])]  # 01/10 only
# Quintile + tercile signals silently dropped

The tests passed because every example test used decile fixtures. No test enumerated port cardinalities. A property test like "for any port-count ∈ {2, 3, 5, 10}, the adapter produces an LS row" would have failed immediately.

In-scope work

A. Add hypothesis to [dev] extra

pyproject.toml:

dev = [
    "pytest>=8.0",
    "ruff>=0.4",
    "hypothesis>=6.92",  # NEW — property-based test generator
]

B. Property tests for osap_replicate.py

tests/test_features/test_osap_replicate_properties.py (NEW):

from hypothesis import given, strategies as st

@given(
    port_count=st.integers(min_value=2, max_value=10),
    n_dates=st.integers(min_value=1, max_value=12),
)
def test_compute_long_short_returns_handles_any_port_cardinality(port_count, n_dates):
    """For any port cardinality ∈ [2, 10] and any positive date count,
    compute_long_short_returns produces exactly n_dates rows with
    ls_return = ret[port=min] - ret[port=max]."""
    ...

Cover at minimum:

  1. compute_long_short_returns shape invariance across port cardinalities
  2. signals_dropped_no_long_short returns sorted unique list
  3. _normalize_port_label round-trips through int/str/categorical
  4. Part 2 accounting invariant holds for any combination of manifest + dataset
  5. coverage_by_signal returns values in [0, 1]

C. Property tests for other shape-sensitive transforms

Identify candidates by grep -rn "hardcode\|assume\|fixed-shape" compute/:

  • compute/scoring/composite.py::compute_composite — PHASE3_WEIGHTS sum = 1.0 invariant under any pillar-score input
  • compute/valuation/ensemble.py::compute_fair_price — median/min/max relationships
  • compute/scoring/risk_overlay.py::apply_risk_overlay — annotate vs veto disjointness

Pick 2-3 high-value targets, not all of them. The goal is to ship the pattern + Hypothesis dependency, not to retrofit the whole codebase.

D. CI integration

hypothesis runs deterministic by default (no @settings(deadline=None) workaround needed). Ensure:

  • Property tests run as part of pytest -m "not network"
  • Hypothesis database (.hypothesis/) added to .gitignore
  • hypothesis.errors.Flaky failures must fail CI, not retry

Out-of-scope

  • Mutation testing (mutmut / cosmic-ray) — defer to follow-up if Hypothesis insufficient
  • Property tests for compute/ingest/* — those exercise external APIs; @network marker handles them
  • Stateful Hypothesis testing (RuleBasedStateMachine) — defer to integration-PR scope

Acceptance criteria

  • hypothesis>=6.92 in [dev]
  • ≥ 5 property tests for osap_replicate.py covering port-cardinality invariance
  • ≥ 3 property tests across composite.py / ensemble.py / risk_overlay.py
  • All property tests run in pytest -m "not network" and pass deterministically
  • CI green
  • CLAUDE.md ## Gotchas updated noting Hypothesis as the new line of defense

Effort estimate

~1 PR, ~200-300 LOC (mostly test code), 1-2 days.

Why this is the highest-value first improvement

The Phase 4h.2 Part 2 fix took ~3 hours of audit + implementation + CI cycle. The silent-drop survived in production for ~1 cron cycle (≈1 week) before being detected. The cost of NOT having property tests this round was tangible. The pattern, once landed, prevents the entire class of "untested data-shape assumption" bugs — including ones we haven't hit yet but the codebase is full of.

— filed 2026-05-19 by Phase 4h.2 Part 2 auditor session

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions