Context
Part of the process-hygiene epic (parent issue will be linked once filed). Entry point because it has the most concrete acceptance criteria + directly prevents the Phase 4h.2 Part 2 root cause.
Motivation
The 56-signal silent-drop in compute/features/osap_replicate.py:91-140 (fixed in PR #124) had this structure:
# Pre-Part-2 code assumed deciles universally
df = df[df["port"].isin([LONG_PORT_LABEL, SHORT_PORT_LABEL])] # 01/10 only
# Quintile + tercile signals silently dropped
The tests passed because every example test used decile fixtures. No test enumerated port cardinalities. A property test like "for any port-count ∈ {2, 3, 5, 10}, the adapter produces an LS row" would have failed immediately.
In-scope work
A. Add hypothesis to [dev] extra
pyproject.toml:
dev = [
"pytest>=8.0",
"ruff>=0.4",
"hypothesis>=6.92", # NEW — property-based test generator
]
B. Property tests for osap_replicate.py
tests/test_features/test_osap_replicate_properties.py (NEW):
from hypothesis import given, strategies as st
@given(
port_count=st.integers(min_value=2, max_value=10),
n_dates=st.integers(min_value=1, max_value=12),
)
def test_compute_long_short_returns_handles_any_port_cardinality(port_count, n_dates):
"""For any port cardinality ∈ [2, 10] and any positive date count,
compute_long_short_returns produces exactly n_dates rows with
ls_return = ret[port=min] - ret[port=max]."""
...
Cover at minimum:
compute_long_short_returns shape invariance across port cardinalities
signals_dropped_no_long_short returns sorted unique list
_normalize_port_label round-trips through int/str/categorical
- Part 2 accounting invariant holds for any combination of manifest + dataset
coverage_by_signal returns values in [0, 1]
C. Property tests for other shape-sensitive transforms
Identify candidates by grep -rn "hardcode\|assume\|fixed-shape" compute/:
compute/scoring/composite.py::compute_composite — PHASE3_WEIGHTS sum = 1.0 invariant under any pillar-score input
compute/valuation/ensemble.py::compute_fair_price — median/min/max relationships
compute/scoring/risk_overlay.py::apply_risk_overlay — annotate vs veto disjointness
Pick 2-3 high-value targets, not all of them. The goal is to ship the pattern + Hypothesis dependency, not to retrofit the whole codebase.
D. CI integration
hypothesis runs deterministic by default (no @settings(deadline=None) workaround needed). Ensure:
- Property tests run as part of
pytest -m "not network"
- Hypothesis database (
.hypothesis/) added to .gitignore
hypothesis.errors.Flaky failures must fail CI, not retry
Out-of-scope
- Mutation testing (mutmut / cosmic-ray) — defer to follow-up if Hypothesis insufficient
- Property tests for
compute/ingest/* — those exercise external APIs; @network marker handles them
- Stateful Hypothesis testing (
RuleBasedStateMachine) — defer to integration-PR scope
Acceptance criteria
Effort estimate
~1 PR, ~200-300 LOC (mostly test code), 1-2 days.
Why this is the highest-value first improvement
The Phase 4h.2 Part 2 fix took ~3 hours of audit + implementation + CI cycle. The silent-drop survived in production for ~1 cron cycle (≈1 week) before being detected. The cost of NOT having property tests this round was tangible. The pattern, once landed, prevents the entire class of "untested data-shape assumption" bugs — including ones we haven't hit yet but the codebase is full of.
— filed 2026-05-19 by Phase 4h.2 Part 2 auditor session
Context
Part of the process-hygiene epic (parent issue will be linked once filed). Entry point because it has the most concrete acceptance criteria + directly prevents the Phase 4h.2 Part 2 root cause.
Motivation
The 56-signal silent-drop in
compute/features/osap_replicate.py:91-140(fixed in PR #124) had this structure:The tests passed because every example test used decile fixtures. No test enumerated port cardinalities. A property test like "for any port-count ∈ {2, 3, 5, 10}, the adapter produces an LS row" would have failed immediately.
In-scope work
A. Add
hypothesisto[dev]extrapyproject.toml:B. Property tests for
osap_replicate.pytests/test_features/test_osap_replicate_properties.py(NEW):Cover at minimum:
compute_long_short_returnsshape invariance across port cardinalitiessignals_dropped_no_long_shortreturns sorted unique list_normalize_port_labelround-trips through int/str/categoricalcoverage_by_signalreturns values in[0, 1]C. Property tests for other shape-sensitive transforms
Identify candidates by
grep -rn "hardcode\|assume\|fixed-shape" compute/:compute/scoring/composite.py::compute_composite— PHASE3_WEIGHTS sum = 1.0 invariant under any pillar-score inputcompute/valuation/ensemble.py::compute_fair_price— median/min/max relationshipscompute/scoring/risk_overlay.py::apply_risk_overlay— annotate vs veto disjointnessPick 2-3 high-value targets, not all of them. The goal is to ship the pattern + Hypothesis dependency, not to retrofit the whole codebase.
D. CI integration
hypothesisruns deterministic by default (no@settings(deadline=None)workaround needed). Ensure:pytest -m "not network".hypothesis/) added to.gitignorehypothesis.errors.Flakyfailures must fail CI, not retryOut-of-scope
compute/ingest/*— those exercise external APIs; @network marker handles themRuleBasedStateMachine) — defer to integration-PR scopeAcceptance criteria
hypothesis>=6.92in[dev]osap_replicate.pycovering port-cardinality invariancecomposite.py/ensemble.py/risk_overlay.pypytest -m "not network"and pass deterministically## Gotchasupdated noting Hypothesis as the new line of defenseEffort estimate
~1 PR, ~200-300 LOC (mostly test code), 1-2 days.
Why this is the highest-value first improvement
The Phase 4h.2 Part 2 fix took ~3 hours of audit + implementation + CI cycle. The silent-drop survived in production for ~1 cron cycle (≈1 week) before being detected. The cost of NOT having property tests this round was tangible. The pattern, once landed, prevents the entire class of "untested data-shape assumption" bugs — including ones we haven't hit yet but the codebase is full of.
— filed 2026-05-19 by Phase 4h.2 Part 2 auditor session