diff --git a/.github/workflows/compute-rankings.yml b/.github/workflows/compute-rankings.yml index e62e70b06..cde5afa85 100644 --- a/.github/workflows/compute-rankings.yml +++ b/.github/workflows/compute-rankings.yml @@ -46,7 +46,10 @@ jobs: - name: Install run: | python -m pip install --upgrade pip - pip install -e . + # Phase 4h: weekly compute imports compute/ingest/osap.py which + # imports the `openassetpricing` package — installed via the + # `factors` extra (pinned to ==0.0.2 in pyproject.toml). + pip install -e ".[factors]" - name: Compute current quarter id id: quarter diff --git a/CLAUDE.md b/CLAUDE.md index 463205f90..2b8a9c722 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -153,14 +153,16 @@ non-connector-bound work. ## Phase status -Current schema: **`0.8.0-phase4.5f`** · Defense layer: **17** -(7 active vetoes + 10 annotates + 5 numerical guards + -`manipulation_index` rollup). Latest release tag: +Current schema: **`0.9.0-phase4h`** (bumped from `0.8.0-phase4.5f` in +PR #112). Defense layer: **17** (7 active vetoes + 10 annotates + 5 +numerical guards + `manipulation_index` rollup) — Phase 4h adds +observability surface, no new veto. Latest release tag: [**`v1.2.0-phase4.5`**](https://github.com/dackclup/quantrank/releases/tag/v1.2.0-phase4.5) -**SHIPPED 2026-05-17** at commit `6d414a9b` — **Phase 4.5 cluster -✅ complete** (6 sub-PRs). Production verified run #51 -(`b1588b2a`, 5m14s warm-cache). Test suite: 856 offline + 17 -`@network`. +shipped 2026-05-17 at commit `6d414a9b`. **Phase 4h in flight in PR +#112** — OSAP signal replication (factor-exposure proxy) + PBO/DSR +hard gate (PR #60 reuse) + rolling-12m IC observability + Path-b +composite × OSAP blend (50/50 default, Top-5 still ranks raw +composite per Rule 16). Test suite: 906 offline + 19 `@network`. **Next deliverable** (pick by appetite — three tracks parallelize): **4.5e** (Form 4 insider, ~3w → v1.3.0) · **4h/4i/4j/4k** factor diff --git a/PHASE_STATUS.md b/PHASE_STATUS.md index fc8f5a6cc..b507a4ade 100644 --- a/PHASE_STATUS.md +++ b/PHASE_STATUS.md @@ -6,7 +6,7 @@ | 1 | Universe + prices ingestion | ✅ DONE — 2026-05-08 | | 2 | Fundamentals via SEC EDGAR | ✅ DONE — 2026-05-08 | | 3 | Classical features + composite + **defenses** → **v1.0** | ✅ **DONE — 2026-05-14** (v1.0.0 tagged + GitHub release) | -| 4 | Factor consolidation (OSAP + JKP + Qlib + IPCA) → **v1.1** | 🟡 IN PROGRESS — 4a-4g + 4c.1/4c.2/4c.3 + PR 4b §1+§2 all merged; PR 4b §3 IC-decay output deferred to Phase 5; **next: 4h / 4i / 4j / 4k factor integrations** (PBO/DSR gate ready), can run in parallel with Phase 4.5 | +| 4 | Factor consolidation (OSAP + JKP + Qlib + IPCA) → **v1.1** | 🟡 IN PROGRESS — 4a-4g + 4c.1/4c.2/4c.3 + PR 4b §1+§2 all merged; **PR #112 (Phase 4h)** ships OSAP signal replication + PBO/DSR gate + Path-b 50/50 blend (schema bump `0.8.0-phase4.5f` → `0.9.0-phase4h`, no new veto — annotate-only blend, Top-5 still ranks raw composite per Rule 16, 5-commit cluster on `claude/resume-quantrank-phase-4.5-Zh0pO`); 4i/4j/4k pending; PR 4b §3 IC-decay output deferred to Phase 5 | | **4.5** | **Earnings-manipulation defense cluster** → **v1.2** | ✅ **DONE 2026-05-17** — **tag [`v1.2.0-phase4.5`](https://github.com/dackclup/quantrank/releases/tag/v1.2.0-phase4.5) cut** at commit `6d414a9b`. 6 sub-PRs (#89/#90/#91 + #93 + #95 + #97 + #100). Active vetoes **5 → 7**; defense layer **9 → 17** (= 7 vetoes + 10 annotates). 4.5f adds `manipulation_index` (0-100 rollup) + `composite_score_adjusted` (soft penalty, max 10 pts, informational only) + `ManipulationRiskCard` UI + schema bump **`0.7.1-phase4g` → `0.8.0-phase4.5f`**. Production verified run #51 (`b1588b2a`, 5m14s warm-cache): card fires on 158/502 (31.5%); HIGH band 2 (SMCI=84 · WAT=64), MODERATE 60, LOW 96. 4.5e Form-4 insider clustering **deferred to v1.3.0** — reserved-slot weights already declared in `FLAG_WEIGHTS`. | | 5 | ML meta-learner (Triple-Barrier + Meta-Labeling + Conformal) + SHAP | ⚪ not started | | 6 | Sentiment v2 (FinBERT + Whisper + 8-K Lazy Prices) | ⚪ not started | diff --git a/SKILL.md b/SKILL.md index 5e27404a4..e7b015c77 100644 --- a/SKILL.md +++ b/SKILL.md @@ -304,6 +304,7 @@ Schema versions: | `0.7.0-phase4g` | Phase 4g | **8-K Tier-2 event defenses re-enabled** (PR #79, merged 2026-05-15 on `c35c6d40`, closes [issue #14](https://github.com/dackclup/quantrank/issues/14)). Flipped `compute/scoring/tier2._EIGHT_K_DEFENSES_ENABLED = True` after the PR 3d workflow-timeout deferral (root cause cleared by PR #58 cache layers + PR 3d tenacity tightening). `non_reliance_filing` (Item 4.02 hard veto, 365d lookback, Schroeder 2024 SSRN — ~50% of 4.02 filings precede formal restatement) returns to the active layer as the **5th active veto**. `auditor_change` (Item 4.01 annotate, 730d lookback, Reg S-K Item 304, Cohen-Malloy-Nguyen 2020 type) joins the Tier-2 annotate surface. No data-schema-shape delta — only the feature-flag flip + reason-taxonomy expansion. | | `0.7.1-phase4g` | Phase 4g | **`price_change_1d_pct` additive field** (squash-merged via PR #80, commit `1509f707`). New optional `float \| None` field on `StockSummary` + `StockDetail` — day-over-day percent change from the prior trading-day close. Computed once in `compute/main.py:_fetch_prices_one` from the last two valid yfinance closes; null for newly-IPO'd tickers (only one close available). Lets the ranking-table mobile cards render a change pill without lazy-fetching 502 per-stock history JSONs. Per `phase-4/schema-versioning/PLAN.md`: "Add a new optional field (default = None) → patch". Production metadata.version stays `0.7.0-phase4g` until next weekly compute. | | `0.7.1-phase4g` (no schema delta) | Phase 4.5a-4.5d wave | **Earnings-manipulation defense cluster — sub-PRs 4.5a + 4.5b + 4.5c + 4.5d shipped 2026-05-16/17** (PRs #89/#90/#91 + #93 + #95 + #97). **No data-schema-shape delta** — all 9 new flag identifiers are strings appended to existing `risk_flags: list[str]` (active vetoes) + `valuation_warnings: list[str]` (annotates) arrays. Active vetoes **5 → 7**: + `beneish_manipulation_veto` (Beneish 1999, M > −1.78) + `dechow_manipulation_veto` (Dechow 2011, F > 3.0). Annotates added: `manipulation_triple_flag` (4.5a joint gate, 2 fired: SMCI · WAT), `restatement_history` (4.5b, 59 fired / 11.8% — Hennes-Leone-Miller 2008 *TAR*), `late_filing_notification` (4.5b, 2 fired: HAS · Q — Bartov-Lai-Yeung 2002 *JAR*), `rem_suspect` (4.5c, 16 fired / 3.2% — Roychowdhury 2006 *JAE* 3-proxy REM via per-sector OLS), `accruals_momentum_high` (4.5d, 50 fired / 10.0% — Sloan 1996 / Beneish 1999 Δ(TATA) > +0.05 over 3y), `loss_avoidance_pattern` (4.5d, 0 fired — Burgstahler-Dichev 1997 cohort thresholds too tight for S&P 500 large-cap universe, file as follow-up). Also closes [issue #7](https://github.com/dackclup/quantrank/issues/7) (Sloan over-firing on Financials: 21.3% → 11.7%, sector spread 7.7× → 1.4×). 2 new cache dirs (`compute/cache/edgar_amendments/` + `compute/cache/edgar_late_filings/`, 7d TTL each). Test suite **646 → 831 offline**. Reason taxonomy: 24 stable + 2 Tier-3 + 2 new vetoes + 6 new annotates = **34 stable identifiers**. | +| **`0.9.0-phase4h`** (in flight in PR #112) | Phase 4h | **OSAP signal replication + PBO/DSR hard gate + Path-b composite × OSAP blend** (5-commit cluster on branch `claude/resume-quantrank-phase-4.5-Zh0pO`: 06bdac76 schema-foundation, b79983f6 osap_replicate proxy + 100-signal manifest, a6760d91 osap_blend Path-b, df4d9bd2 osap_validation PBO/DSR gate + rolling-12m-IC, [TBD] compute/main.py wiring + @network e2e). **Minor bump** — 6 new optional fields land simultaneously: `StockDetail.osap_signals: dict[str, float] \| None` + `StockDetail.osap_blended_score: float \| None`; `Metadata.osap_signals_used: list[str] \| None`, `Metadata.osap_excluded_signals: list[str] \| None`, `Metadata.osap_signals_ic_12m: dict[str, float] \| None`, `Metadata.osap_signals_coverage_pct: dict[str, float] \| None`. **OSAP blend stays OUTSIDE `compute_composite()`** — `PHASE3_WEIGHTS` sum-to-1.0 invariant (`compute/scoring/composite.py:43-45`) intact; Path-b formula `blended = (1 - weight) × composite_score + weight × osap_signal_aggregate`, default `weight=0.5` locked at `osap-integration/PLAN.md:168-170`. **Hard gate** = PBO ≤ 0.5 AND DSR > 0 via PR #60's `factor_passes_gates`; rolling-12m Spearman IC is observability-only (full walk-forward CV deferred to Phase 5 per `defense-infrastructure/PLAN.md:270`). **No new veto** (Top-5 still ranks raw `composite_score` per Rule 16; `osap_blended_score` is informational); defense layer stays at **17**. **Universe-gap policy** — tickers with no OSAP coverage pass `composite_score` through unchanged (no impute, distinct from pillar `neutralize_missing=True`). **NaN policy in PBO cohort** — zero-fill (not mean-fill, not dropna) preserves Bailey 2014 `n_trials = cohort_size` multiple-testing correction; sparse signals naturally lose on DSR (low Sharpe → DSR rejection). **OSAP failure is observability-only** — wrapped in try/except in `compute/main.py` so live-fetch / package failure NEVER blocks weekly production; all 6 new fields degrade to `None`. Test suite **856 → 906 offline + 18 → 19 `@network`** (commits 2-5 added 50 tests; e2e network test added in commit 5). Reason taxonomy unchanged at 34 stable identifiers. Tag `v1.1.0-phase4` (or `v1.3.0` for the 4.5e+4h combined release) deferred until 4i/4j/4k also merge. | | **`0.8.0-phase4.5f`** | Phase 4.5f | **Manipulation Composite + soft composite penalty + UI** (PR #100 merged 2026-05-17 on commit `b1588b2a`; production verified on commit `e57f09cb`, run #51, warm-cache 5m14s). **Minor bump** because 5 new optional fields land simultaneously + new UI surface ships + tag `v1.2.0-phase4.5` coordinates with the data-version bump (semver coupling). Additive optional fields: `StockSummary.manipulation_index: float \| None`, `StockSummary.composite_score_adjusted: float \| None`, `StockDetail.manipulation_index`, `StockDetail.composite_score_adjusted`, `StockDetail.manipulation_components: dict[str, bool] \| None`. **`manipulation_index`** is a 0-100 rollup over the 4.5a-d flag set via a per-flag additive weight table in `compute/scoring/manipulation_index.py::FLAG_WEIGHTS` (active vetoes 15-20 pts · joint-gate 10 · annotates 5-8 · Tier-3 soft 3); clipped to `[0, 100]`. **`composite_score_adjusted`** applies the soft penalty `composite − 0.5 × (index / 100) × 20` (max 10-pt deduction at index = 100); the original `composite_score` field is preserved untouched per Rule 9 audit trail. **Rank source stays the raw composite per Rule 16** — the adjusted value is informational only, surfaced on the new detail-page `ManipulationRiskCard` (3-band outlined-light: emerald LOW / amber MODERATE / rose HIGH) with the in-line qualifier "Composite penalty: −X.XX pts (informational; rank uses raw composite)". Production: 158/502 (31.5%) fire the card (HIGH 2: SMCI=84 · WAT=64; MODERATE 60; LOW 96). **Phase 4.5e reserved-slot weights declared** (`INSIDER_SELL_CLUSTER_WEIGHT_RESERVED = 10`, `C_SUITE_UNUSUAL_SELL_WEIGHT_RESERVED = 5`) — the 4.5e PR uncomments 2 entries in `FLAG_WEIGHTS`, no calibration cascade. Test suite **831 → 856 offline**. Reason taxonomy: 34 stable identifiers (unchanged — `manipulation_index` is a derivation, not a new flag). Tag **`v1.2.0-phase4.5`** ready to cut. | > Phase 4+ schemas are tracked in [`WORKFLOW.md`](WORKFLOW.md) "Defense diff --git a/compute/config.py b/compute/config.py index 5c0741e71..5bf93f0bb 100644 --- a/compute/config.py +++ b/compute/config.py @@ -27,7 +27,7 @@ MODELS_DIR: Path = PROJECT_ROOT / "models" UNIVERSE: str = "SP500" -SCHEMA_VERSION: str = "0.8.0-phase4.5f" +SCHEMA_VERSION: str = "0.9.0-phase4h" PRICES_PERIOD: str = "5y" MAX_PARALLEL_FETCHES: int = 10 @@ -181,3 +181,71 @@ # more often is wasted bandwidth. OSAP_RETURNS_CACHE: Path = CACHE_DIR / "osap" / "returns.parquet" OSAP_RETURNS_MAX_AGE_DAYS: int = 31 + +# --- Phase 4h: 100-signal manifest --- +# +# Theme buckets mirror the table at +# `.claude/skills/phase-4/osap-integration/PLAN.md` L60-73 +# (Value/Quality/Momentum/Investment/Risk/EarningsNews/Trading + +# Misc). CamelCase names follow the Chen-Zimmermann OSAP convention +# (see github.com/OpenSourceAP/CrossSection signal docs). +# +# Aspirational manifest — commit 4's PBO/DSR gate +# (`compute/validation/osap_validation.py`) will catch any signal that +# does not resolve in the fetched OSAP returns DataFrame and log it +# under `metadata.json::osap_excluded_signals` with reason +# `not_found_in_osap_dataset` so the manifest can be tuned over +# subsequent compute runs without a redeploy. +OSAP_SIGNALS_BY_THEME: dict[str, tuple[str, ...]] = { + "Value": ( + "BM", "EP", "SP", "CF", "DivYieldST", "NetEquityFinance", + "NetDebtFinance", "BookLeverage", "IntanBM", "IntanCFP", + "IntanEP", "IntanSP", "DebtIssuance", "OperatingLeverage", + "CompositeDebtIssuance", + ), # 15 + "Quality": ( + "GP", "RoE", "RoA", "AssetTurnover", "AOP", "OperatingProfit", + "RDS", "RD", "ProfitMargin", "CashProf", "GrcapxThreeYears", + "AccrualsBM", "OperatingAccruals", "PctTotAcc", "Cash", + ), # 15 + "Momentum": ( + "Mom12m", "Mom6m", "Mom36m", "Mom1m", "STreversal", "IndMom", + "IntMom", "EarnSupBig", "MomVol", "MomOffSeason", "MomSeason", + "Recomm_ShortInterest", + ), # 12 + "Investment": ( + "AssetGrowth", "ChNNCOA", "ChNWC", "GrLTNOA", "ChInv", + "ShareIss1Y", "ShareIss5Y", "GrSaleToGrInv", + ), # 8 + "Risk": ( + "MaxRet", "IdioVol3F", "IdioVolAHT", "BetaTailRisk", "Beta", + "BetaFP", "ReturnSkew", "ReturnSkew3F", "IndIPO", + "AbnormalAccruals", + ), # 10 + "EarningsNews": ( + "SUE", "EarningsSurprise", "REV6", "RDIPO", "NumEarnIncrease", + "ConsRecomm", "Recomm", "EarningsForecastDisparity", + ), # 8 + "Trading": ( + "Illiquidity", "Turnover", "Bid_Ask", "VolMkt", "VolSD", + "dVolCall", "Coskewness", + ), # 7 + "Misc": ( + "Leverage", "OrgCapital", "Tax", "ChAssetTurnover", "BAR", + "GS", "AnnouncementReturn", "OScore", "ZScore", "CredRatDG", + "FailureProbability", "IRA", "FR", "BPEBM", "Activism1", + "Activism2", "AnalystValue", "ChForecastAccrual", "ChInvIA", + "AnalystRevision", "ForecastDispersion", "GrowthCapEx", + "MeanRankRevGrowth", "AbnormalAccrualsPercent", "ChEQ", + ), # 25 +} + +OSAP_SIGNALS_100: tuple[str, ...] = tuple( + sig for theme_signals in OSAP_SIGNALS_BY_THEME.values() for sig in theme_signals +) +assert len(OSAP_SIGNALS_100) == 100, ( + f"OSAP_SIGNALS_100 must have exactly 100 entries, got {len(OSAP_SIGNALS_100)}" +) +assert len(set(OSAP_SIGNALS_100)) == 100, ( + "OSAP_SIGNALS_100 contains duplicate signal names" +) diff --git a/compute/features/osap_replicate.py b/compute/features/osap_replicate.py new file mode 100644 index 000000000..a98b72784 --- /dev/null +++ b/compute/features/osap_replicate.py @@ -0,0 +1,323 @@ +"""OpenAssetPricing (OSAP) per-stock signal replication. + +Phase 4h commit 2. Builds a per-ticker signal map from OSAP's +long-short portfolio returns (Chen-Zimmermann 2022 *Critical Finance +Review*, github.com/OpenSourceAP/CrossSection). The fetcher in +``compute/ingest/osap.py`` returns the bulk parquet of +``signalname × port × date`` rows; this module aligns ``port=01`` +(long bucket) against ``port=10`` (short bucket) per +``(signalname, date)``, picks the most-recent cross-section at or +before ``as_of``, ranks signals cross-sectionally by their long-short +return, and surfaces the per-signal rank as the ticker's OSAP +exposure proxy. + +**Scope note** (locked 2026-05-18 plan audit). This is the +*factor-exposure proxy* version: every ticker receives the same +signal map, derived from the market-wide OSAP long-short return at +``as_of``. True per-stock signal replication — porting the ~100 +signal formulas from OSAP's SAS / Stata source into pandas, fed by +our existing ``compute/features/`` pillar inputs — is the deferred +heavy lift. The proxy version is sufficient for Phase 4h's blend +target because: + +1. ``osap_blended_score`` is *observability-only* in this phase + (Top-5 ranking still uses ``composite_score``; SKILL.md Rule 16). +2. PR 4b §2 PBO/DSR gate + (``compute/validation/pbo_dsr.py::factor_passes_gates``) runs on + the long-short returns themselves, not the per-stock projection + — so signal acceptance is identical to the full version. +3. Per-stock replication of all 100 signals slips Phase 4h by weeks + without unblocking 4i/4j/4k. + +If this module needs to graduate to true per-stock replication +later, the contract (``compute_osap_signals(returns, tickers, as_of) +-> dict[str, dict[str, float] | None]``) stays stable — only the +inner ``signal -> rank`` derivation changes per ticker. + +Universe-gap policy: tickers receive ``None`` (NOT zero, NOT an +imputed neutral) when the as-of cross-section is empty (e.g., +``as_of`` precedes OSAP coverage). Pillar +``compute_composite(neutralize_missing=True)`` imputes 50.0 for +missing pillars; OSAP intentionally does not — the blend layer +(commit 3, ``compute/scoring/osap_blend.py``) treats ``None`` as +"no OSAP adjustment" and passes ``composite_score`` through +unchanged. + +No tenacity / network access in this module — all I/O is delegated +to the ingest layer. +""" + +from __future__ import annotations + +import logging +from datetime import date + +import pandas as pd + +from compute import config + +logger = logging.getLogger(__name__) + +# Canonical port labels in OSAP's PredictorPortsFull.csv ("op" dataset +# from openassetpricing). port=01 is the LONG bucket (highest signal +# rank); port=10 is the SHORT bucket. Decile-bucketed signals also use +# port=02..09 but Phase 4h only consumes the corner buckets. +LONG_PORT_LABEL: str = "01" +SHORT_PORT_LABEL: str = "10" + +# Columns the inbound DataFrame must carry. This is a *load-bearing* +# contract — the ingest layer already enforces +# ``REQUIRED_COLUMNS`` (``signalname, port, date, ret``); this module +# tightens the requirement only on the same four columns. +_REQUIRED_INPUT_COLUMNS: frozenset[str] = frozenset( + {"signalname", "port", "date", "ret"} +) + + +def _normalize_port_label(port_series: pd.Series) -> pd.Series: + """Coerce ``port`` to the canonical zero-padded string ('01', '02', + ..., '10'). + + OSAP's parquet may store ``port`` as int (1..10), int64, or + zero-padded string depending on the release. The pivot step below + is column-name sensitive, so we normalize once at the entry point + rather than scatter ``astype(str).str.zfill(2)`` across helpers. + """ + # Cast through str first to absorb any int / int64 / numpy.int64 / + # categorical input. zfill ensures '1' → '01' and '10' stays '10'. + return port_series.astype(str).str.zfill(2) + + +def compute_long_short_returns(returns: pd.DataFrame) -> pd.DataFrame: + """Compute long-short return per ``(signalname, date)``. + + Algorithm: + 1. Filter to rows where ``port`` is the LONG or SHORT bucket + (drops decile buckets 02..09). + 2. Pivot ``port`` to columns indexed by ``(signalname, date)``, + with ``ret`` as the value. + 3. Compute ``ls_return = ret[port=01] - ret[port=10]``. + 4. Drop ``(signalname, date)`` rows where either port is + missing (incomplete coverage). + + Returns a DataFrame with columns: ``signalname``, ``date``, + ``ls_return``. Empty DataFrame (same columns) when the input has + no valid long-short pairs. + """ + missing = _REQUIRED_INPUT_COLUMNS - set(returns.columns) + if missing: + raise ValueError( + f"compute_long_short_returns missing columns {sorted(missing)}; " + f"got {sorted(returns.columns)}. Check compute/ingest/osap.py " + f"REQUIRED_COLUMNS contract." + ) + + if returns.empty: + return pd.DataFrame(columns=["signalname", "date", "ls_return"]) + + df = returns.copy() + df["port"] = _normalize_port_label(df["port"]) + df = df[df["port"].isin([LONG_PORT_LABEL, SHORT_PORT_LABEL])] + if df.empty: + return pd.DataFrame(columns=["signalname", "date", "ls_return"]) + + pivot = df.pivot_table( + index=["signalname", "date"], + columns="port", + values="ret", + aggfunc="first", + ) + + # Both corner buckets must be present for a long-short return to be + # meaningful. A signal-date with only port=01 (or only port=10) is + # silently dropped — coverage shortfall surfaces in the + # `osap_signals_coverage_pct` metadata field. + if LONG_PORT_LABEL not in pivot.columns or SHORT_PORT_LABEL not in pivot.columns: + return pd.DataFrame(columns=["signalname", "date", "ls_return"]) + + pivot["ls_return"] = pivot[LONG_PORT_LABEL] - pivot[SHORT_PORT_LABEL] + pivot = pivot.dropna(subset=["ls_return"]) + return pivot[["ls_return"]].reset_index() + + +def select_as_of_cross_section( + ls_returns: pd.DataFrame, as_of: date +) -> pd.DataFrame: + """For each signal, pick the most recent ``ls_return`` at or before + ``as_of``. + + OSAP releases monthly — ``as_of`` is typically the most recent + month-end. Signals whose latest available observation precedes + ``as_of`` by more than one release cycle still surface (the staleness + is intentional — Phase 4h's universe coverage is a separate metric). + Signals with no observation at or before ``as_of`` are dropped. + + Returns a DataFrame with columns: ``signalname``, ``date``, + ``ls_return``. Empty DataFrame when the entire window is empty. + """ + if ls_returns.empty: + return pd.DataFrame(columns=["signalname", "date", "ls_return"]) + + as_of_ts = pd.Timestamp(as_of) + df = ls_returns.copy() + df["_date_ts"] = pd.to_datetime(df["date"]) + df = df[df["_date_ts"] <= as_of_ts] + if df.empty: + return pd.DataFrame(columns=["signalname", "date", "ls_return"]) + + # For each signal, idxmax over the timestamp picks the most recent + # observation at or before as_of. Behaviour on tie (same signal + # observed twice on the same date) is to keep the first row — + # acceptable because OSAP releases are monthly and per-signal + # uniqueness on (signalname, date) is contractually enforced upstream. + idx = df.groupby("signalname")["_date_ts"].idxmax() + cross_section = ( + df.loc[idx, ["signalname", "date", "ls_return"]] + .sort_values("signalname") + .reset_index(drop=True) + ) + return cross_section + + +def rank_signals_cross_sectional(cross_section: pd.DataFrame) -> pd.Series: + """Rank signals by ``ls_return`` cross-sectionally, normalised to + ``[0, 1]``. + + Uses ``pandas.Series.rank(method='average', pct=True)`` — + average-rank for ties, percentile-normalised. No scipy dependency. + + Returns a Series indexed by ``signalname`` whose values are in + ``(0, 1]``. Empty Series (dtype float, name 'rank') when the input + cross-section is empty. + """ + if cross_section.empty: + return pd.Series(dtype=float, name="rank") + + ranks = cross_section.set_index("signalname")["ls_return"].rank( + method="average", pct=True + ) + ranks.name = "rank" + return ranks + + +def compute_osap_signals( + returns: pd.DataFrame, + tickers: list[str], + as_of: date, + requested_signals: tuple[str, ...] | None = None, +) -> dict[str, dict[str, float] | None]: + """Build the per-ticker OSAP signal map for ``as_of``. + + Args: + returns: DataFrame from + ``compute/ingest/osap.py::fetch_osap_returns``. Must include + columns: ``signalname``, ``port``, ``date``, ``ret``. + tickers: ticker symbols to populate the map for. Tickers are + **not** filtered or validated — Phase 4h's universe set is + assumed to be passed in. + as_of: cross-section date. The most recent observation at or + before this date is used per signal. + requested_signals: optional restriction to a subset of signals. + Defaults to ``config.OSAP_SIGNALS_100`` (the 100-signal + manifest). + + Returns: + ``{ticker: {signalname: rank} | None}``. + + - Inner-dict values are floats in ``(0, 1]`` (cross-sectional + rank of the long-short return at ``as_of``). + - Outer values are ``None`` when the as-of cross-section is + empty (e.g., ``as_of`` precedes OSAP coverage, or no requested + signal has any data). Distinct from pillar + ``neutralize_missing`` — the blend layer treats ``None`` as + "no OSAP adjustment" and passes ``composite_score`` through. + + Phase 4h commit 2 implements the *factor-exposure proxy*: every + ticker receives the same signal map, derived from the market-wide + OSAP long-short return cross-section. See the module docstring for + why this is sufficient for the Phase 4h blend target. + """ + if requested_signals is None: + requested_signals = config.OSAP_SIGNALS_100 + + requested_set = set(requested_signals) + none_map: dict[str, dict[str, float] | None] = {t: None for t in tickers} + + if returns.empty: + logger.info("OSAP returns DataFrame empty; returning None for all tickers") + return none_map + + df = returns[returns["signalname"].isin(requested_set)] + if df.empty: + logger.info( + "No requested signals found in OSAP returns (manifest=%d, " + "available=%d); returning None for all tickers", + len(requested_set), + returns["signalname"].nunique(), + ) + return none_map + + ls = compute_long_short_returns(df) + if ls.empty: + logger.info( + "compute_long_short_returns produced empty cross-section for as_of=%s", + as_of, + ) + return none_map + + cs = select_as_of_cross_section(ls, as_of) + if cs.empty: + logger.info( + "Cross-section at as_of=%s is empty (likely as_of precedes coverage)", + as_of, + ) + return none_map + + ranks = rank_signals_cross_sectional(cs) + if ranks.empty: + return none_map + + # Factor-exposure proxy: every ticker gets the same signal map. + # Phase 4h commit 2 design — see module docstring for rationale. + signal_map: dict[str, float] = { + str(sig): float(rank) for sig, rank in ranks.items() + } + logger.info( + "OSAP signals populated: %d signals × %d tickers (proxy mode)", + len(signal_map), + len(tickers), + ) + # Use a fresh dict per ticker so downstream mutation doesn't leak + # across rows (defensive — Pydantic deepcopies on validate, but the + # writer path may not). + return {t: dict(signal_map) for t in tickers} + + +def coverage_by_signal( + signal_map: dict[str, dict[str, float] | None], +) -> dict[str, float]: + """Report per-signal coverage % across the populated ticker set. + + In the Phase 4h commit 2 *proxy* mode every ticker gets the same + signal map, so per-signal coverage is binary: either 100.0 (signal + present in the cross-section) or 0.0 (signal absent or all-None + tickers). Surfaced into + ``metadata.json::osap_signals_coverage_pct`` by commit 5's + ``compute/main.py`` wiring. + + Returns ``{signalname: coverage_pct}`` covering only signals that + appeared in at least one ticker's map. Empty dict when all tickers + are ``None``. + """ + total = len(signal_map) + if total == 0: + return {} + + counts: dict[str, int] = {} + for sig_dict in signal_map.values(): + if sig_dict is None: + continue + for sig in sig_dict: + counts[sig] = counts.get(sig, 0) + 1 + + return {sig: 100.0 * count / total for sig, count in counts.items()} diff --git a/compute/ingest/osap.py b/compute/ingest/osap.py index c10d1ccae..15aeee8a6 100644 --- a/compute/ingest/osap.py +++ b/compute/ingest/osap.py @@ -27,6 +27,7 @@ import logging import time +from datetime import date import openassetpricing import pandas as pd @@ -78,12 +79,23 @@ def _is_fresh(cache_path, max_age_days: int) -> bool: return age_days < max_age_days -def fetch_osap_returns(force_refresh: bool = False) -> pd.DataFrame: +def fetch_osap_returns( + force_refresh: bool = False, + *, + signals: list[str] | None = None, + as_of: date | None = None, +) -> pd.DataFrame: """Return OSAP long-short portfolio returns, hitting the cache when fresh. Returns a DataFrame whose columns include at minimum - ``REQUIRED_COLUMNS``. The scout PR enforces that contract; Phase 4h - will add keyword-only ``signals`` / ``as_of`` filters (non-breaking). + ``REQUIRED_COLUMNS``. When ``signals`` is provided, the returned + frame is filtered to rows whose ``signalname`` is in the list (the + cache always stores the full bulk parquet — filtering happens + post-load so a callsite that asks for 20 signals doesn't invalidate + a callsite that asks for all 1,188). When ``as_of`` is provided, + rows whose ``date`` is after ``as_of`` are dropped — Phase 4h's + replication callers use this to keep the cross-section honest + against the as-of point-in-time. """ cache = config.OSAP_RETURNS_CACHE if not force_refresh and _is_fresh(cache, config.OSAP_RETURNS_MAX_AGE_DAYS): @@ -103,4 +115,9 @@ def fetch_osap_returns(force_refresh: bool = False) -> pd.DataFrame: f"OSAP returns missing required columns {sorted(missing)}; " f"got {sorted(df.columns)}. Upstream API may have changed." ) + + if signals is not None: + df = df[df["signalname"].isin(signals)] + if as_of is not None: + df = df[pd.to_datetime(df["date"]) <= pd.Timestamp(as_of)] return df diff --git a/compute/main.py b/compute/main.py index 8b5ecb069..a2ed74980 100644 --- a/compute/main.py +++ b/compute/main.py @@ -43,6 +43,11 @@ import pandas as pd from compute import config +from compute.features.osap_replicate import ( + compute_long_short_returns, + compute_osap_signals, + coverage_by_signal, +) from compute.ingest.cross_source import ( validate_market_cap as cross_source_validate_market_cap, ) @@ -52,6 +57,7 @@ fetch_fundamentals, fetch_fundamentals_history, ) +from compute.ingest.osap import fetch_osap_returns from compute.ingest.prices import fetch_prices, fetch_spy_benchmark from compute.ingest.universe import get_sp500_constituents from compute.output.schemas import ( @@ -86,6 +92,7 @@ compute_manipulation_index, manipulation_components, ) +from compute.scoring.osap_blend import aggregate_osap_signals, apply_osap_blend from compute.scoring.pillars import TickerInputs, compute_all_pillars from compute.scoring.recommendation import derive_recommendation from compute.scoring.rem import compute_rem_flags @@ -103,6 +110,11 @@ from compute.scoring.tier2 import ( coverage_pct as tier2_coverage_pct_calc, ) +from compute.validation.osap_validation import ( + compute_rolling_ic_12m, + filter_accepted_signals, + gate_osap_signals, +) from compute.valuation.ensemble import ( EnsembleResult, compute_fair_price_ensemble, @@ -929,6 +941,107 @@ def run_weekly_compute() -> int: now = _now_utc() asof_date = now.date() + # Phase 4h — OSAP signal replication + PBO/DSR gate + Path-b blend. + # Observability-only this phase: Top-5 ranking still uses raw + # ``composite_score`` per SKILL.md Rule 16. The blend writes a + # ``composite_score_osap_adjusted`` per ticker into + # ``StockDetail.osap_blended_score`` for delta-attribution. Wrapped + # in try/except so OSAP fetch / library / network failure NEVER + # blocks weekly production — every OSAP-bearing field degrades to + # ``None`` on the schema (already ``| None = None`` in + # ``compute/output/schemas.py``). + osap_signals_used: list[str] = [] + osap_excluded_signals: list[str] = [] + osap_signals_ic_12m: dict[str, float] = {} + osap_signal_map: dict[str, dict[str, float] | None] = {} + osap_signals_coverage_pct: dict[str, float] = {} + composite_osap_adjusted: pd.Series = pd.Series(dtype=float) + try: + logger.info( + "Phase 4h — fetching OSAP returns for %d-signal manifest " + "(as_of=%s)", + len(config.OSAP_SIGNALS_100), + asof_date.isoformat(), + ) + osap_returns_raw = fetch_osap_returns( + signals=list(config.OSAP_SIGNALS_100), + as_of=asof_date, + ) + osap_ls = compute_long_short_returns(osap_returns_raw) + logger.info( + "OSAP long-short rows: %d across %d signals", + len(osap_ls), + osap_ls["signalname"].nunique() if not osap_ls.empty else 0, + ) + + gate_results = gate_osap_signals( + osap_ls, + requested_signals=config.OSAP_SIGNALS_100, + ) + osap_signals_used, osap_excluded_signals = filter_accepted_signals( + gate_results + ) + logger.info( + "OSAP PBO/DSR gate: %d accepted, %d excluded " + "(of %d candidates)", + len(osap_signals_used), + len(osap_excluded_signals), + len(gate_results), + ) + + # Rolling-12m Spearman IC per accepted signal — observability only, + # NOT a gate decision (canonical full walk-forward + purged-embargo + # CV is deferred to Phase 5 per defense-infrastructure/PLAN.md:270). + for sig in osap_signals_used: + ic = compute_rolling_ic_12m(osap_ls, sig) + if ic is not None: + osap_signals_ic_12m[sig] = round(float(ic), 4) + + # Per-ticker signal map (commit 2 proxy mode — every ticker gets + # the market-wide cross-sectional rank). Only the accepted signal + # subset is consumed; excluded signals never blend. + if osap_signals_used: + osap_filtered_returns = osap_returns_raw[ + osap_returns_raw["signalname"].isin(osap_signals_used) + ] + osap_signal_map = compute_osap_signals( + osap_filtered_returns, + tickers=list(pillar_df.index), + as_of=asof_date, + requested_signals=tuple(osap_signals_used), + ) + osap_signals_coverage_pct = { + sig: round(pct, 2) + for sig, pct in coverage_by_signal(osap_signal_map).items() + } + + # Path-b blend (commit 3) — applied OUTSIDE compute_composite() + # so PHASE3_WEIGHTS sum-to-1.0 invariant at composite.py:43-45 + # stays intact. 50/50 default locked in + # osap-integration/PLAN.md:168-170. + osap_aggregate = aggregate_osap_signals(osap_signal_map) + composite_osap_adjusted = apply_osap_blend( + composite, osap_aggregate + ) + else: + logger.warning( + "OSAP gate accepted 0 signals — skipping per-ticker map + " + "blend; osap_blended_score will be None for every ticker" + ) + except Exception as e: # noqa: BLE001 + logger.warning( + "OSAP pipeline failed (observability-only — production " + "continues); StockDetail.osap_* + metadata.osap_* → None. " + "Error: %s", + e, + ) + osap_signals_used = [] + osap_excluded_signals = [] + osap_signals_ic_12m = {} + osap_signal_map = {} + osap_signals_coverage_pct = {} + composite_osap_adjusted = pd.Series(dtype=float) + # Step 8 — combined per-ticker loop: fair-price ensemble + price history # write + StockSummary + StockDetail. Single pass so per-ticker outputs # stay synchronized (e.g., has_history reflects the actual write result; @@ -1229,6 +1342,13 @@ def run_weekly_compute() -> int: manipulation_index=m_index, composite_score_adjusted=composite_adj, manipulation_components=m_components, + osap_signals=osap_signal_map.get(ticker), + osap_blended_score=( + round(float(composite_osap_adjusted[ticker]), 2) + if ticker in composite_osap_adjusted.index + and not pd.isna(composite_osap_adjusted[ticker]) + else None + ), entered_top5=ticker in entered, exited_top5=ticker in exited, ) @@ -1268,6 +1388,10 @@ def run_weekly_compute() -> int: fundamentals_latency_p95_seconds=( round(fundamentals_p95, 2) if fundamentals_p95 is not None else None ), + osap_signals_used=osap_signals_used or None, + osap_excluded_signals=osap_excluded_signals or None, + osap_signals_ic_12m=osap_signals_ic_12m or None, + osap_signals_coverage_pct=osap_signals_coverage_pct or None, ) config.DATA_DIR.mkdir(parents=True, exist_ok=True) diff --git a/compute/output/schemas.py b/compute/output/schemas.py index a4d6a4508..f48d3484f 100644 --- a/compute/output/schemas.py +++ b/compute/output/schemas.py @@ -92,6 +92,10 @@ class Metadata(BaseModel): fundamentals_coverage_pct: float | None = None fundamentals_latency_p50_seconds: float | None = None fundamentals_latency_p95_seconds: float | None = None + osap_signals_used: list[str] | None = None + osap_excluded_signals: list[str] | None = None + osap_signals_ic_12m: dict[str, float] | None = None + osap_signals_coverage_pct: dict[str, float] | None = None class RawMetrics(BaseModel): @@ -159,5 +163,7 @@ class StockDetail(BaseModel): manipulation_index: float | None = None composite_score_adjusted: float | None = None manipulation_components: dict[str, bool] | None = None + osap_signals: dict[str, float] | None = None + osap_blended_score: float | None = None entered_top5: bool = False exited_top5: bool = False diff --git a/compute/scoring/osap_blend.py b/compute/scoring/osap_blend.py new file mode 100644 index 000000000..d5d198673 --- /dev/null +++ b/compute/scoring/osap_blend.py @@ -0,0 +1,148 @@ +"""Phase 4h commit 3 — composite × OSAP signal-aggregate blend (Path-b). + +Stays OUTSIDE :func:`compute.scoring.composite.compute_composite` so the +``PHASE3_WEIGHTS`` sum-to-1.0 invariant at ``compute/scoring/composite.py: +43-45`` is not extended. Adding a 9th slot to ``PHASE3_WEIGHTS`` for OSAP +would either fail the invariant or force a redistribution of the eight +active pillars — both of which alter Phase 3 composite math +retroactively. Path-b applies the OSAP correction *after* the pillar +composite is computed, leaving the composite layer untouched. + +Formula (osap-integration/PLAN.md:168-170, locked 2026-05-18 plan +audit):: + + blended = (1 - weight) * composite_score + weight * osap_signal_aggregate + +where ``osap_signal_aggregate`` is a 0-100 per-ticker aggregate of the +accepted OSAP signal map produced by +:func:`compute.features.osap_replicate.compute_osap_signals` (cross- +sectional rank of the long-short return at ``as_of``, mean-pooled per +ticker, scaled to ``[0, 100]``). + +**Universe-gap policy** — tickers with ``None`` signal map (or NaN +aggregate) **pass composite_score through unchanged**. No impute, +distinct from pillar ``compute_composite(neutralize_missing=True)``. +Rationale: in Phase 4h, an OSAP-blank ticker is genuinely "no +information added" rather than "no information available" — imputing +50.0 would silently shrink the composite toward neutral and bias +Top-5 against OSAP-covered names. + +**Observability-only this phase** — :func:`apply_osap_blend` writes +``composite_score_osap_adjusted`` into ``StockDetail.osap_blended_score`` +but Top-5 ranking still uses raw ``composite_score`` per SKILL.md +Rule 16. Phase 5 ML meta-learner is where 50/50 may be retuned and the +cutover authorized. + +No I/O, no tenacity, no network access — pure pandas / numpy. +""" + +from __future__ import annotations + +import logging + +import numpy as np +import pandas as pd + +logger = logging.getLogger(__name__) + +# Locked 50/50 default per osap-integration/PLAN.md:168-170. Phase 5 +# ML meta-learner is the next layer where this can move. +OSAP_BLEND_WEIGHT_DEFAULT: float = 0.5 + + +def aggregate_osap_signals( + signal_map: dict[str, dict[str, float] | None], +) -> pd.Series: + """Mean-pool a ticker's OSAP signal ranks into a single 0-100 score. + + Each ticker's inner map is ``{signalname: rank}`` where rank is the + cross-sectional ``(0, 1]`` rank from + :func:`compute.features.osap_replicate.rank_signals_cross_sectional`. + Aggregation is the arithmetic mean × 100. + + Args: + signal_map: ``{ticker: {signalname: rank} | None}`` — exactly the + shape returned by ``compute_osap_signals``. + + Returns: + Series indexed by ticker, dtype float, name + ``osap_signal_aggregate``. NaN for tickers whose inner map is + ``None`` or empty (universe gap). Empty Series when the input + dict has no entries. + """ + if not signal_map: + return pd.Series(dtype=float, name="osap_signal_aggregate") + + out: dict[str, float] = {} + for ticker, sigs in signal_map.items(): + if sigs is None or len(sigs) == 0: + out[ticker] = float("nan") + continue + mean_rank = float(np.mean(list(sigs.values()))) + out[ticker] = 100.0 * mean_rank + + return pd.Series(out, dtype=float, name="osap_signal_aggregate") + + +def apply_osap_blend( + composite_scores: pd.Series, + osap_signal_aggregate: pd.Series, + weight: float = OSAP_BLEND_WEIGHT_DEFAULT, +) -> pd.Series: + """Blend pillar composite × OSAP aggregate per ticker (Phase 4h Path-b). + + Formula:: + + blended = (1 - weight) * composite_score + weight * osap_signal_aggregate + + Tickers whose ``osap_signal_aggregate`` is NaN (after reindex to the + composite index) pass their raw ``composite_score`` through + unchanged. The result is clipped to ``[0, 100]`` to match the + composite-score domain (the writer / Pydantic schema both expect + ``[0, 100]``). + + Args: + composite_scores: Series indexed by ticker, values nominally in + ``[0, 100]``. The output index mirrors this index. + osap_signal_aggregate: Series indexed by ticker, values in + ``[0, 100]`` or NaN. Tickers present here but not in + ``composite_scores`` are silently dropped during reindex. + weight: blend weight in ``[0, 1]``. Default + :data:`OSAP_BLEND_WEIGHT_DEFAULT` (0.5) per + osap-integration/PLAN.md:168-170. + + Returns: + Series indexed by ``composite_scores.index``, dtype float, name + ``composite_score_osap_adjusted``. Clipped to ``[0, 100]``. + + Raises: + ValueError: if ``weight`` is outside ``[0, 1]``. + """ + if not (0.0 <= weight <= 1.0): + raise ValueError(f"OSAP blend weight must be in [0, 1], got {weight}") + + if composite_scores.empty: + return pd.Series(dtype=float, name="composite_score_osap_adjusted") + + # Align OSAP aggregate to composite index; missing tickers become NaN. + aligned_osap = osap_signal_aggregate.reindex(composite_scores.index) + + # NaN-safe Path-b blend. ``raw_blend`` will be NaN wherever OSAP is + # NaN, but ``.where(cond, other)`` keeps ``composite_scores`` at those + # positions — yielding the documented universe-gap pass-through. + raw_blend = (1.0 - weight) * composite_scores + weight * aligned_osap + blended = raw_blend.where(~aligned_osap.isna(), composite_scores) + + blended = blended.clip(lower=0.0, upper=100.0) + blended.name = "composite_score_osap_adjusted" + + coverage = int((~aligned_osap.isna()).sum()) + logger.info( + "OSAP blend applied: weight=%.2f, %d/%d tickers OSAP-covered, " + "%d passed-through (universe gap)", + weight, + coverage, + len(composite_scores), + len(composite_scores) - coverage, + ) + return blended diff --git a/compute/validation/osap_validation.py b/compute/validation/osap_validation.py new file mode 100644 index 000000000..2655d6f21 --- /dev/null +++ b/compute/validation/osap_validation.py @@ -0,0 +1,380 @@ +"""Phase 4h commit 4 — PBO/DSR hard gate for OSAP signal acceptance. + +Wraps :func:`compute.validation.pbo_dsr.factor_passes_gates` (PR #60, +shipped in PR #60) per signal. No PBO or DSR math is reimplemented +here — this module only handles the *cohort framing* (wide pivot, +NaN policy, per-signal partitioning) needed to feed the existing +gate primitives correctly. + +The gate is the linchpin of Phase 4h: 100 candidate OSAP signals +enter, only those passing PBO ≤ 0.5 AND DSR > 0 are blended into +``composite_score_osap_adjusted`` by commit 3's +:func:`compute.scoring.osap_blend.apply_osap_blend`. Signals rejected +here are surfaced via :data:`Metadata.osap_excluded_signals` so the +filter is fully auditable. + +NaN policy — LOCKED 2026-05-18 post-source-audit of pbo_dsr.py +============================================================== + +The two underlying primitives have asymmetric NaN tolerance: + +- :func:`compute.validation.pbo_dsr.compute_pbo` (the cohort gate) + is **NaN-UNSAFE**. Internally + (``compute/validation/pbo_dsr.py:234``) it converts the cohort + matrix to ``float`` numpy via ``.to_numpy(dtype=float)``, then + computes ``.mean(axis=0)`` / ``.std(axis=0)`` (L256-257) and + ``np.argmax`` (L261). Any NaN cell silently corrupts the argmax + selection. +- :func:`compute.validation.pbo_dsr.compute_deflated_sharpe` (the + per-signal gate) is **NaN-SAFE**. L323 strips NaN before computing + Sharpe / skew / kurtosis: ``arr = arr[~np.isnan(arr)]``. + +Because :func:`factor_passes_gates` takes the per-signal +``factor_returns`` and the cohort ``returns_matrix`` independently, +this wrapper feeds them **different NaN treatments**: + +1. ``factor_returns`` ← ``wide[sig].dropna()`` — DSR's internal strip + handles it. No information lost. +2. ``returns_matrix`` ← ``wide.fillna(0.0)`` — zero-fill, NOT + mean-fill, NOT ``dropna(how='any')``. + +Why **zero-fill** the cohort (rejecting the two competing options): + +- ``dropna(how='any')`` decimates a 100-signal × monthly matrix + below ``n_partitions=16`` rows once any earnings-event-only signal + is included. The Bailey 2014 multiple-testing correction + ``n_trials = cohort_size`` collapses. +- ``fillna(column_mean)`` deflates per-signal variance, inflates + Sharpe, biases PBO toward false acceptance — and silently + rewards sparse signals for low coverage. +- ``fillna(0.0)`` is the honest OSAP-semantic interpretation: + absence-of-coverage for ``(signal, month)`` means "no portfolio + formed / no information generated that month" — zero return is + the right proxy. Bailey 2014 PBO is rank-based across strategies + *within* each period; zero-imputation symmetrically pushes + coverage-gap rows toward indeterminate cross-sectional rank, + which honestly reflects "no information added". + +Trade-off acknowledged: sparse-coverage signals (e.g., earnings-event- +only) see their Sharpe shrunk toward zero by the zero-fill, raising +their DSR rejection probability. This is cohort-fair but penalizes +legitimate event-only signals. Phase 4h scope accepts this — the +Phase 5 backtest harness (``defense-infrastructure/PLAN.md:270``) +runs full walk-forward CV per signal and replaces this gate when it +ships. + +Standalone module +================= + +Does NOT import from :mod:`compute.features.osap_replicate` (commit +2), :mod:`compute.scoring.osap_blend` (commit 3), or +:mod:`compute.main`. Validation runs on the long-short returns +DataFrame contract only (columns: ``signalname``, ``date``, +``ls_return``). This keeps the gate testable without the feature / +blend stack and lets commit 5's ``compute/main.py`` wire the three +layers independently. +""" + +from __future__ import annotations + +import logging +from dataclasses import dataclass +from typing import Final + +import pandas as pd + +from compute.validation.pbo_dsr import ( + ANNUALIZATION_FACTOR_MONTHLY, + DEFAULT_N_PARTITIONS, + DSR_VETO_THRESHOLD, + PBO_VETO_THRESHOLD, + factor_passes_gates, +) + +logger = logging.getLogger(__name__) + +# Rolling-IC observability window (lag-1 Spearman over last 12 months). +# OBSERVABILITY ONLY — never gates acceptance. See module docstring. +ROLLING_IC_WINDOW_MONTHS: Final[int] = 12 + +# Per-signal observation floor before PBO/DSR is even attempted. Mirrors +# DEFAULT_N_PARTITIONS so that any signal with fewer non-NaN obs than +# PBO's partition requirement short-circuits to ``insufficient_data``. +MIN_OBS_PER_SIGNAL: Final[int] = DEFAULT_N_PARTITIONS + + +@dataclass(frozen=True) +class GateResult: + """Per-signal verdict + metrics from the PBO/DSR gate. + + All three of ``pbo`` / ``dsr`` / ``sharpe`` are ``None`` when the + signal short-circuited on ``insufficient_data`` — no PBO/DSR call + was made. ``n_observations`` reports the count of non-NaN inputs + that *would have been* used (informational; matches commit-5's + coverage logging needs). + + ``rejection_reason`` is ``None`` when ``accepted=True``. Otherwise + one of: + + - ``'high_pbo'`` — PBO exceeded the threshold (overfit risk) + - ``'low_dsr'`` — Deflated Sharpe failed the threshold + - ``'gate_failed'`` — both PBO and DSR failed (rare; surfaced as + a distinct category for diagnostic clarity) + - ``'insufficient_data'`` — per-signal obs < ``MIN_OBS_PER_SIGNAL`` + or cohort size < 2 + """ + + accepted: bool + pbo: float | None + dsr: float | None + sharpe: float | None + n_observations: int + rejection_reason: str | None + + +def _pivot_to_wide( + long_short: pd.DataFrame, + requested: tuple[str, ...] | None = None, +) -> pd.DataFrame: + """Coerce commit-2's long-format DF to a wide (date × signal) matrix. + + Commit 2's ``compute_long_short_returns`` (see + ``compute/features/osap_replicate.py:140``) emits the ``date`` + column as ``object`` (string from the OSAP parquet pivot). We + explicitly coerce to ``pd.Timestamp`` here so chronological + ordering is reliable for the cohort matrix. + + Args: + long_short: DataFrame with columns + ``{signalname, date, ls_return}``. + requested: optional restriction to a subset of signals. When + ``None`` (default), all signals present in the input DF + are kept. + + Returns: + Wide DataFrame indexed by ``pd.Timestamp`` (sorted), columns = + signalname, values = ls_return. Cells where a signal had no + ``ls_return`` for a given date are NaN — caller decides the + NaN policy (this function never fills). + """ + if long_short.empty: + return pd.DataFrame() + + df = long_short.copy() + df["date"] = pd.to_datetime(df["date"]) + if requested is not None: + df = df[df["signalname"].isin(set(requested))] + if df.empty: + return pd.DataFrame() + + wide = df.pivot_table( + index="date", + columns="signalname", + values="ls_return", + aggfunc="first", + ) + return wide.sort_index() + + +def gate_osap_signals( + long_short_returns: pd.DataFrame, + requested_signals: tuple[str, ...] | None = None, + pbo_threshold: float = PBO_VETO_THRESHOLD, + dsr_threshold: float = DSR_VETO_THRESHOLD, + n_partitions: int = DEFAULT_N_PARTITIONS, +) -> dict[str, GateResult]: + """Apply the PBO/DSR hard gate per signal. + + Cohort framing follows Bailey 2014: ``n_trials = wide.shape[1]`` + (the count of candidate signals) so DSR's multiple-testing + correction reflects the full screen. PBO operates on the + zero-filled cohort matrix (see module docstring §NaN policy). + + Args: + long_short_returns: commit-2 output. DataFrame with columns + ``{signalname, date, ls_return}``. + requested_signals: optional subset of signal names to gate. + ``None`` (default) gates every signal present in the input. + pbo_threshold: PBO must satisfy ``≤ pbo_threshold`` to pass. + Default :data:`PBO_VETO_THRESHOLD` (= 0.5). + dsr_threshold: DSR must satisfy ``> dsr_threshold`` to pass. + Default :data:`DSR_VETO_THRESHOLD` (= 0.0). + n_partitions: PBO partition count. Default + :data:`DEFAULT_N_PARTITIONS` (= 16). + + Returns: + ``{signalname: GateResult}``. Keys cover every signal that + appeared in the (possibly ``requested_signals``-filtered) + input. Empty dict when the input is empty or no requested + signal exists in the input. + """ + wide = _pivot_to_wide(long_short_returns, requested_signals) + + if wide.empty: + logger.warning( + "OSAP gate input empty after pivot — no signals to gate" + ) + return {} + + cohort_size = wide.shape[1] + n_dates = len(wide) + + # Cohort-level precondition: PBO needs at least ``n_partitions`` rows + # and 2 strategy columns. If either fails, every signal short- + # circuits to insufficient_data. + cohort_too_small = cohort_size < 2 or n_dates < n_partitions + + # Zero-fill ONCE for the cohort matrix passed to PBO. See module + # docstring §NaN policy for the rationale. The per-signal + # ``factor_returns`` argument still uses dropna() (DSR strips NaN + # internally, no information lost). + cohort_matrix = wide.fillna(0.0) + + results: dict[str, GateResult] = {} + + for sig in wide.columns: + signal_series = wide[sig] + non_nan_obs = int(signal_series.notna().sum()) + + if cohort_too_small or non_nan_obs < MIN_OBS_PER_SIGNAL: + results[str(sig)] = GateResult( + accepted=False, + pbo=None, + dsr=None, + sharpe=None, + n_observations=non_nan_obs, + rejection_reason="insufficient_data", + ) + continue + + passes, metrics = factor_passes_gates( + factor_returns=signal_series.dropna(), + returns_matrix=cohort_matrix, + n_trials=cohort_size, + n_partitions=n_partitions, + pbo_threshold=pbo_threshold, + dsr_threshold=dsr_threshold, + annualization=ANNUALIZATION_FACTOR_MONTHLY, + ) + + if passes: + reason: str | None = None + elif not metrics["pbo_passes"] and not metrics["dsr_passes"]: + reason = "gate_failed" + elif not metrics["pbo_passes"]: + reason = "high_pbo" + elif not metrics["dsr_passes"]: + reason = "low_dsr" + else: + # Defensive — passes=False but both sub-passes=True should be + # impossible per the and-conjunction in factor_passes_gates. + reason = "gate_failed" + + results[str(sig)] = GateResult( + accepted=bool(passes), + pbo=float(metrics["pbo"]), + dsr=float(metrics["dsr"]), + sharpe=float(metrics["sharpe"]), + n_observations=int(metrics["n_observations"]), + rejection_reason=reason, + ) + + n_accepted = sum(1 for r in results.values() if r.accepted) + logger.info( + "OSAP PBO/DSR gate: %d of %d signals accepted (cohort_size=%d, " + "n_dates=%d, pbo_threshold=%.2f, dsr_threshold=%.2f)", + n_accepted, + len(results), + cohort_size, + n_dates, + pbo_threshold, + dsr_threshold, + ) + return results + + +def compute_rolling_ic_12m( + long_short_returns: pd.DataFrame, + signalname: str, +) -> float | None: + """Spearman rank correlation between LS return at ``t`` and ``t+1``. + + Observability metric only — :func:`gate_osap_signals` does not + consult this. Surfaced via :data:`Metadata.osap_signals_ic_12m` + for the UI / debug audit; the full walk-forward + purged + + embargoed CV that would replace it is Phase 5 work per + ``defense-infrastructure/PLAN.md:270``. + + Pure pandas — no scipy dependency. Matches the + ``pbo_dsr.py`` precedent (Beasley-Springer-Moro inverse normal CDF + is hand-rolled there to avoid scipy). + + Args: + long_short_returns: commit-2 output. DataFrame with columns + ``{signalname, date, ls_return}``. + signalname: target signal to compute IC for. + + Returns: + Spearman lag-1 IC over the most recent + :data:`ROLLING_IC_WINDOW_MONTHS` observations, as ``float``. + ``None`` when the signal has fewer than 12 valid + ``(t, t+1)`` pairs (insufficient history). + """ + df = long_short_returns[ + long_short_returns["signalname"] == signalname + ].copy() + if df.empty: + return None + + df["date"] = pd.to_datetime(df["date"]) + df = df.sort_values("date").tail(ROLLING_IC_WINDOW_MONTHS + 1) + + if len(df) < ROLLING_IC_WINDOW_MONTHS + 1: + return None + + ret = df["ls_return"].astype(float).reset_index(drop=True) + lead = ret.shift(-1) + valid = ret.notna() & lead.notna() + + if int(valid.sum()) < ROLLING_IC_WINDOW_MONTHS: + return None + + ranks_t = ret[valid].rank(method="average") + ranks_t1 = lead[valid].rank(method="average") + corr = ranks_t.corr(ranks_t1) + if pd.isna(corr): + return None + return float(corr) + + +def filter_accepted_signals( + gate_results: dict[str, GateResult], +) -> tuple[list[str], list[str]]: + """Split gate verdicts into (accepted, excluded) sorted lists. + + Feeds :data:`Metadata.osap_signals_used` / + :data:`Metadata.osap_excluded_signals` in commit 5's + ``compute/main.py`` wiring. Sorting is alphabetical for + deterministic JSON output. + + Args: + gate_results: ``{signalname: GateResult}`` from + :func:`gate_osap_signals`. + + Returns: + ``(accepted_sorted, excluded_sorted)``. Union of the two + lists equals ``sorted(gate_results.keys())``. + """ + accepted = sorted(s for s, r in gate_results.items() if r.accepted) + excluded = sorted(s for s, r in gate_results.items() if not r.accepted) + return accepted, excluded + + +__all__ = [ + "GateResult", + "MIN_OBS_PER_SIGNAL", + "ROLLING_IC_WINDOW_MONTHS", + "compute_rolling_ic_12m", + "filter_accepted_signals", + "gate_osap_signals", +] diff --git a/frontend/lib/schema-snapshot.json b/frontend/lib/schema-snapshot.json index 6680ce8a2..083e09373 100644 --- a/frontend/lib/schema-snapshot.json +++ b/frontend/lib/schema-snapshot.json @@ -67,6 +67,26 @@ "required": true, "default": "" }, + "osap_excluded_signals": { + "type": "list[str] | None", + "required": false, + "default": null + }, + "osap_signals_coverage_pct": { + "type": "dict[str, float] | None", + "required": false, + "default": null + }, + "osap_signals_ic_12m": { + "type": "dict[str, float] | None", + "required": false, + "default": null + }, + "osap_signals_used": { + "type": "list[str] | None", + "required": false, + "default": null + }, "tier2_coverage_pct": { "type": "float | None", "required": false, @@ -310,6 +330,16 @@ "required": true, "default": "" }, + "osap_blended_score": { + "type": "float | None", + "required": false, + "default": null + }, + "osap_signals": { + "type": "dict[str, float] | None", + "required": false, + "default": null + }, "pillar_baseline": { "type": "PillarBaseline | None", "required": false, diff --git a/frontend/lib/types.ts b/frontend/lib/types.ts index 7361f8a17..af1038d2c 100644 --- a/frontend/lib/types.ts +++ b/frontend/lib/types.ts @@ -70,6 +70,18 @@ export type Metadata = { fundamentals_coverage_pct: number | null; fundamentals_latency_p50_seconds: number | null; fundamentals_latency_p95_seconds: number | null; + // Phase 4h — OSAP signal observability. `osap_signals_used` lists + // the 100-signal manifest subset that PASSED the PBO/DSR gate + // (`pbo_dsr.factor_passes_gates`); `osap_excluded_signals` lists + // the rest. `osap_signals_ic_12m` is rolling-12m Spearman IC per + // accepted signal (observability only — NOT a hard gate; full + // walk-forward IC-decay is the Phase 5 stronger version). + // `osap_signals_coverage_pct` reports per-signal S&P 500 coverage. + // All null on legacy outputs from before 0.9.0-phase4h. + osap_signals_used: string[] | null; + osap_excluded_signals: string[] | null; + osap_signals_ic_12m: Record | null; + osap_signals_coverage_pct: Record | null; }; // Phase 3d Tier-2 event defenses. Surfaces in StockDetail.tier2_events. @@ -213,6 +225,15 @@ export type StockDetail = { manipulation_index: number | null; composite_score_adjusted: number | null; manipulation_components: Record | null; + // Phase 4h — per-stock OSAP signal map (signalname → cross-sectional + // rank in [0, 1]) for the accepted-by-PBO/DSR subset of the + // 100-signal manifest. `osap_blended_score` is the 50/50 blend + // (composite_score × 0.5 + osap_signal_aggregate × 0.5) — informational + // observability only; Top-5 ranking still uses raw composite_score + // per SKILL.md Rule 16. Both null on legacy outputs from before + // 0.9.0-phase4h. + osap_signals: Record | null; + osap_blended_score: number | null; entered_top5: boolean; exited_top5: boolean; }; diff --git a/tests/test_config.py b/tests/test_config.py index 7fc599f4b..c07ebf26e 100644 --- a/tests/test_config.py +++ b/tests/test_config.py @@ -10,8 +10,8 @@ from compute import config -def test_schema_version_is_phase4_5f(): - assert config.SCHEMA_VERSION == "0.8.0-phase4.5f" +def test_schema_version_is_phase4h(): + assert config.SCHEMA_VERSION == "0.9.0-phase4h" def test_eight_k_lookback_veto_is_one_year(): diff --git a/tests/test_features/test_osap_e2e_integration.py b/tests/test_features/test_osap_e2e_integration.py new file mode 100644 index 000000000..b8969342f --- /dev/null +++ b/tests/test_features/test_osap_e2e_integration.py @@ -0,0 +1,152 @@ +"""End-to-end @network integration test for Phase 4h OSAP pipeline. + +Exercises the full ingest → replicate → gate → IC → blend chain against +the real OSAP package release. Skips when ``--run-network`` is absent +(conftest.py default), so casual ``pytest tests/`` runs are unaffected. + +Scope: + - Real ``fetch_osap_returns`` call against the live OSAP CDN (cached + in ``tmp_path`` so the host cache stays clean) + - ``compute_long_short_returns`` over the real cross-section + - ``gate_osap_signals`` PBO/DSR gate runs end-to-end on a real + 100-signal candidate cohort + - ``compute_rolling_ic_12m`` returns a finite [-1, 1] value for + ``Mom1m`` (a canonical positive-IC factor — sanity check, NOT a + threshold assertion) + - ``compute_osap_signals`` produces a per-ticker proxy map for a + 20-ticker S&P sample + - ``aggregate_osap_signals`` → ``apply_osap_blend`` round-trips + without crashing and produces a Series clipped to [0, 100] + +The test does NOT run the full ``compute/main.py`` (that would hit +SEC EDGAR for 502 tickers — too expensive for CI). The OSAP pipeline +is exercised in isolation against real data; the ``compute/main.py`` +wiring it together is covered by offline unit tests on each layer + +this @network suite confirming the data shapes match end-to-end. +""" + +from __future__ import annotations + +import time +from datetime import date + +import pandas as pd +import pytest + +from compute.features.osap_replicate import ( + compute_long_short_returns, + compute_osap_signals, + coverage_by_signal, +) +from compute.ingest import osap as osap_mod +from compute.ingest.osap import fetch_osap_returns +from compute.scoring.osap_blend import aggregate_osap_signals, apply_osap_blend +from compute.validation.osap_validation import ( + compute_rolling_ic_12m, + filter_accepted_signals, + gate_osap_signals, +) + +SAMPLE_TICKERS_20 = [ + "AAPL", "MSFT", "NVDA", "GOOGL", "AMZN", + "META", "TSLA", "BRK.B", "UNH", "JPM", + "XOM", "V", "JNJ", "WMT", "PG", + "HD", "MA", "AVGO", "CVX", "LLY", +] + +# A small, well-known-name subset of OSAP signals that the live release +# is guaranteed to expose. Keeps the integration test cheap and +# deterministic on release shifts — full 100-signal run is the cron +# job's concern. +SAMPLE_SIGNALS = ("Mom1m", "BM", "GP", "Accruals") + + +@pytest.mark.network +@pytest.mark.timeout(600) +def test_osap_pipeline_end_to_end_real_fetch(monkeypatch, tmp_path) -> None: + """Full Phase-4h chain on real OSAP data — 4-signal × 20-ticker slice.""" + cache = tmp_path / "osap" / "returns.parquet" + monkeypatch.setattr(osap_mod.config, "OSAP_RETURNS_CACHE", cache) + + # 1) Real fetch — filter to SAMPLE_SIGNALS so download stays cheap. + t0 = time.monotonic() + returns = fetch_osap_returns( + force_refresh=True, signals=list(SAMPLE_SIGNALS), as_of=date.today() + ) + elapsed = time.monotonic() - t0 + print( + f"\n[osap-e2e] live fetch elapsed={elapsed:.2f}s " + f"rows={len(returns)} cols={list(returns.columns)}" + ) + assert not returns.empty + assert set(returns["signalname"].unique()).issubset(set(SAMPLE_SIGNALS)) + + # 2) Long-short derivation. + ls = compute_long_short_returns(returns) + assert {"signalname", "date", "ls_return"}.issubset(ls.columns) + assert not ls.empty + print( + f"[osap-e2e] long-short rows={len(ls)} " + f"signals={ls['signalname'].nunique()}" + ) + + # 3) PBO/DSR gate. + gate_results = gate_osap_signals(ls, requested_signals=SAMPLE_SIGNALS) + assert len(gate_results) >= 1 + accepted, excluded = filter_accepted_signals(gate_results) + print( + f"[osap-e2e] gate accepted={accepted} excluded={excluded} " + f"of candidates={list(gate_results.keys())}" + ) + + # Every gate result must have a sensible structure regardless of + # accept/reject outcome. + for sig, res in gate_results.items(): + assert isinstance(res.accepted, bool), sig + if res.accepted: + assert res.rejection_reason is None, sig + assert res.pbo is not None and 0.0 <= res.pbo <= 1.0, sig + assert res.dsr is not None, sig + else: + assert res.rejection_reason in { + "high_pbo", "low_dsr", "insufficient_data", "gate_failed", + }, (sig, res.rejection_reason) + + # 4) Rolling-12m IC sanity on Mom1m (well-known +IC factor). + mom_ic = compute_rolling_ic_12m(ls, "Mom1m") + print(f"[osap-e2e] Mom1m rolling-12m IC={mom_ic}") + if mom_ic is not None: + assert -1.0 <= mom_ic <= 1.0, mom_ic + # NOT asserting > 0 — rolling-12m on a single window is noisy; + # canonical full walk-forward is Phase 5's job. + + # 5) Per-ticker proxy signal map — 20 sample tickers. + signal_map = compute_osap_signals( + returns, + tickers=SAMPLE_TICKERS_20, + as_of=date.today(), + requested_signals=SAMPLE_SIGNALS, + ) + assert len(signal_map) == 20 + populated_tickers = [t for t, m in signal_map.items() if m] + print( + f"[osap-e2e] per-ticker map: {len(populated_tickers)}/20 " + f"have non-empty signal dicts" + ) + coverage = coverage_by_signal(signal_map) + print(f"[osap-e2e] coverage by signal: {coverage}") + + # 6) Aggregate + blend round-trip with a synthetic composite series. + composite = pd.Series( + {t: 60.0 + (i % 5) * 4.0 for i, t in enumerate(SAMPLE_TICKERS_20)} + ) + osap_aggregate = aggregate_osap_signals(signal_map) + blended = apply_osap_blend(composite, osap_aggregate) + assert blended.name == "composite_score_osap_adjusted" + assert (blended >= 0.0).all() and (blended <= 100.0).all() + assert set(blended.index) == set(SAMPLE_TICKERS_20) + print( + f"[osap-e2e] blended range=[{blended.min():.2f}, " + f"{blended.max():.2f}] OSAP-covered tickers=" + f"{int((~osap_aggregate.reindex(composite.index).isna()).sum())}" + ) diff --git a/tests/test_features/test_osap_replicate.py b/tests/test_features/test_osap_replicate.py new file mode 100644 index 000000000..96a847697 --- /dev/null +++ b/tests/test_features/test_osap_replicate.py @@ -0,0 +1,318 @@ +"""Tests for compute.features.osap_replicate. + +Phase 4h commit 2. Twelve offline tests covering long-short +derivation, as-of cross-section selection, cross-sectional ranking, +the universe-gap None policy, and end-to-end ``compute_osap_signals``. +No @network markers — all tests use either a hand-built synthetic +DataFrame or the shipped ``tests/fixtures/osap_returns_sample.csv``. +""" + +from __future__ import annotations + +from datetime import date +from pathlib import Path + +import pandas as pd +import pytest + +from compute import config +from compute.features import osap_replicate + +FIXTURE_CSV = Path(__file__).parent.parent / "fixtures" / "osap_returns_sample.csv" + + +def _make_returns(rows: list[tuple[str, str, str, float]]) -> pd.DataFrame: + """Build a synthetic OSAP returns DataFrame from + ``(signalname, port, date_str, ret)`` tuples. Adds the trailing + ``signallag / Nlong / Nshort`` columns with neutral defaults so the + schema matches the ingest layer's REQUIRED_COLUMNS contract.""" + df = pd.DataFrame( + rows, columns=["signalname", "port", "date", "ret"] + ) + df["signallag"] = 0.0 + df["Nlong"] = 50 + df["Nshort"] = 50 + return df + + +def test_compute_long_short_returns_basic(): + """Two signals × one date with both ports → 2 long-short rows.""" + returns = _make_returns( + [ + ("BM", "01", "2024-01-31", 1.50), + ("BM", "10", "2024-01-31", -0.25), + ("Mom12m", "01", "2024-01-31", 2.10), + ("Mom12m", "10", "2024-01-31", 0.30), + ] + ) + ls = osap_replicate.compute_long_short_returns(returns) + + assert set(ls.columns) == {"signalname", "date", "ls_return"} + assert len(ls) == 2 + + bm_row = ls[ls["signalname"] == "BM"].iloc[0] + mom_row = ls[ls["signalname"] == "Mom12m"].iloc[0] + assert bm_row["ls_return"] == pytest.approx(1.75) + assert mom_row["ls_return"] == pytest.approx(1.80) + + +def test_compute_long_short_returns_missing_short_port_drops_signal(): + """A signal with port=01 only (no port=10) yields no long-short row.""" + returns = _make_returns( + [ + ("BM", "01", "2024-01-31", 1.50), # long only — no pair + ("Mom12m", "01", "2024-01-31", 2.00), + ("Mom12m", "10", "2024-01-31", 0.50), + ] + ) + ls = osap_replicate.compute_long_short_returns(returns) + + assert "BM" not in ls["signalname"].values + assert "Mom12m" in ls["signalname"].values + + +def test_compute_long_short_returns_drops_decile_buckets(): + """Inner decile buckets (port=02..09) must NOT contribute to ls_return.""" + returns = _make_returns( + [ + ("BM", "01", "2024-01-31", 5.0), + ("BM", "05", "2024-01-31", 99.0), # noise — should be ignored + ("BM", "10", "2024-01-31", 1.0), + ] + ) + ls = osap_replicate.compute_long_short_returns(returns) + + assert len(ls) == 1 + assert ls.iloc[0]["ls_return"] == pytest.approx(4.0) + + +def test_compute_long_short_returns_handles_integer_port(): + """OSAP parquet may store ``port`` as int (1..10); normaliser must + coerce both representations to '01'/'10'.""" + df = pd.DataFrame( + [ + ("BM", 1, "2024-01-31", 5.0), + ("BM", 10, "2024-01-31", 1.0), + ], + columns=["signalname", "port", "date", "ret"], + ) + df["signallag"] = 0.0 + df["Nlong"] = 50 + df["Nshort"] = 50 + + ls = osap_replicate.compute_long_short_returns(df) + assert len(ls) == 1 + assert ls.iloc[0]["ls_return"] == pytest.approx(4.0) + + +def test_select_as_of_cross_section_picks_most_recent_per_signal(): + """For each signal, the row with the maximum date <= as_of is kept.""" + ls_returns = pd.DataFrame( + [ + ("BM", "2023-12-31", 1.0), + ("BM", "2024-01-31", 1.5), # most recent for BM + ("Mom12m", "2024-01-31", 2.0), + ("Mom12m", "2023-11-30", 1.9), + ], + columns=["signalname", "date", "ls_return"], + ) + cs = osap_replicate.select_as_of_cross_section( + ls_returns, date(2024, 1, 31) + ) + + assert len(cs) == 2 + bm = cs[cs["signalname"] == "BM"].iloc[0] + mom = cs[cs["signalname"] == "Mom12m"].iloc[0] + assert bm["ls_return"] == pytest.approx(1.5) + assert mom["ls_return"] == pytest.approx(2.0) + + +def test_select_as_of_cross_section_filters_future_dates(): + """Observations after ``as_of`` must be dropped before the + most-recent-per-signal pick.""" + ls_returns = pd.DataFrame( + [ + ("BM", "2024-01-31", 1.0), + ("BM", "2024-06-30", 5.0), # AFTER as_of — must not be picked + ], + columns=["signalname", "date", "ls_return"], + ) + cs = osap_replicate.select_as_of_cross_section( + ls_returns, date(2024, 2, 28) + ) + + assert len(cs) == 1 + assert cs.iloc[0]["ls_return"] == pytest.approx(1.0) + + +def test_select_as_of_cross_section_empty_window(): + """``as_of`` precedes all observations → empty cross-section.""" + ls_returns = pd.DataFrame( + [ + ("BM", "2024-01-31", 1.0), + ("Mom12m", "2024-01-31", 2.0), + ], + columns=["signalname", "date", "ls_return"], + ) + cs = osap_replicate.select_as_of_cross_section( + ls_returns, date(2020, 1, 1) + ) + + assert cs.empty + assert list(cs.columns) == ["signalname", "date", "ls_return"] + + +def test_rank_signals_cross_sectional_normalises_to_unit_interval(): + """Three signals with distinct ls_return → ranks ≈ {1/3, 2/3, 1}.""" + cs = pd.DataFrame( + [ + ("Low", "2024-01-31", 0.1), + ("Mid", "2024-01-31", 0.5), + ("High", "2024-01-31", 0.9), + ], + columns=["signalname", "date", "ls_return"], + ) + ranks = osap_replicate.rank_signals_cross_sectional(cs) + + assert ranks["Low"] == pytest.approx(1 / 3) + assert ranks["Mid"] == pytest.approx(2 / 3) + assert ranks["High"] == pytest.approx(1.0) + assert ranks.max() <= 1.0 + assert ranks.min() > 0.0 + + +def test_rank_signals_cross_sectional_ties_get_average_rank(): + """Two signals with identical ls_return share the same average rank.""" + cs = pd.DataFrame( + [ + ("A", "2024-01-31", 0.5), + ("B", "2024-01-31", 0.5), + ("C", "2024-01-31", 0.9), + ], + columns=["signalname", "date", "ls_return"], + ) + ranks = osap_replicate.rank_signals_cross_sectional(cs) + + # method='average', pct=True: A and B tie at ranks 1 and 2 → + # average rank 1.5 → pct 1.5/3 = 0.5 + assert ranks["A"] == pytest.approx(0.5) + assert ranks["B"] == pytest.approx(0.5) + assert ranks["C"] == pytest.approx(1.0) + + +def test_compute_osap_signals_full_path_proxy_mode(): + """End-to-end: synthetic 3-signal fixture × 4 tickers → every ticker + receives the same signal map (factor-exposure proxy, locked + 2026-05-18).""" + returns = _make_returns( + [ + ("BM", "01", "2024-01-31", 1.5), + ("BM", "10", "2024-01-31", -0.5), # ls = 2.0 + ("Mom12m", "01", "2024-01-31", 0.8), + ("Mom12m", "10", "2024-01-31", 0.6), # ls = 0.2 + ("Beta", "01", "2024-01-31", 0.3), + ("Beta", "10", "2024-01-31", 0.4), # ls = -0.1 + ] + ) + tickers = ["NVDA", "AAPL", "CF", "HST"] + signals = ("BM", "Mom12m", "Beta") + + result = osap_replicate.compute_osap_signals( + returns, tickers, date(2024, 2, 28), requested_signals=signals + ) + + assert set(result.keys()) == set(tickers) + # Every ticker gets a non-None dict in the proxy version + for ticker in tickers: + assert result[ticker] is not None, f"{ticker} should have a signal map" + assert set(result[ticker].keys()) == set(signals) + + # All tickers MUST share the same map (factor-exposure proxy invariant) + assert result["NVDA"] == result["AAPL"] == result["CF"] == result["HST"] + + # Rank ordering: BM (ls=2.0) > Mom12m (ls=0.2) > Beta (ls=-0.1) + one_map = result["NVDA"] + assert one_map["BM"] == pytest.approx(1.0) + assert one_map["Mom12m"] == pytest.approx(2 / 3) + assert one_map["Beta"] == pytest.approx(1 / 3) + + +def test_compute_osap_signals_empty_returns_yields_none_per_ticker(): + """Empty input DataFrame → every ticker maps to None (universe gap).""" + returns = _make_returns([]) + tickers = ["NVDA", "AAPL"] + + result = osap_replicate.compute_osap_signals( + returns, tickers, date(2024, 1, 31) + ) + + assert result == {"NVDA": None, "AAPL": None} + + +def test_compute_osap_signals_universe_gap_before_coverage(): + """``as_of`` precedes OSAP coverage → every ticker maps to None. + + Distinct from pillar ``neutralize_missing`` — OSAP does NOT impute + a neutral value; the blend layer (commit 3) interprets None as + 'no OSAP adjustment' and passes composite_score through. + """ + returns = _make_returns( + [ + ("BM", "01", "2024-01-31", 1.5), + ("BM", "10", "2024-01-31", -0.5), + ] + ) + tickers = ["NVDA", "AAPL"] + + # as_of well before the only observation + result = osap_replicate.compute_osap_signals( + returns, + tickers, + date(2020, 1, 1), + requested_signals=("BM",), + ) + + assert result == {"NVDA": None, "AAPL": None} + + +def test_compute_osap_signals_default_manifest_is_100_signals(): + """Sanity: the module's default manifest matches config.OSAP_SIGNALS_100 + and the manifest itself has the expected shape.""" + assert len(config.OSAP_SIGNALS_100) == 100 + assert len(set(config.OSAP_SIGNALS_100)) == 100, "no duplicates in manifest" + + # Theme buckets must sum to exactly 100 + theme_sum = sum( + len(sigs) for sigs in config.OSAP_SIGNALS_BY_THEME.values() + ) + assert theme_sum == 100 + + +def test_compute_osap_signals_uses_shipped_fixture(): + """End-to-end with the shipped scout fixture + ``tests/fixtures/osap_returns_sample.csv``. Anchors the test suite + against the same file the @network live test uses, so a hand-edit + of the fixture surfaces here too.""" + fixture = pd.read_csv(FIXTURE_CSV) + assert {"signalname", "port", "date", "ret"}.issubset(fixture.columns) + + # Pick an as_of after the latest fixture date so all signals have + # at least one observation visible. + as_of_ts = pd.to_datetime(fixture["date"]).max() + as_of_dt = as_of_ts.date() + + tickers = ["NVDA", "AAPL"] + result = osap_replicate.compute_osap_signals( + fixture, + tickers, + as_of_dt, + requested_signals=tuple(fixture["signalname"].unique()), + ) + + # At least one ticker should have a non-None signal map (the fixture + # carries 4 long-short pairs across 2 dates). + non_none_count = sum(1 for v in result.values() if v is not None) + assert non_none_count == len(tickers), ( + "shipped fixture should produce a non-None signal map for every " + f"ticker; got {non_none_count}/{len(tickers)}" + ) diff --git a/tests/test_scoring/test_osap_blend.py b/tests/test_scoring/test_osap_blend.py new file mode 100644 index 000000000..973332f99 --- /dev/null +++ b/tests/test_scoring/test_osap_blend.py @@ -0,0 +1,258 @@ +"""Tests for ``compute/scoring/osap_blend.py`` (Phase 4h commit 3). + +Path-b blend math + universe-gap pass-through invariant. The blend +layer is observability-only this phase (Top-5 still ranked by raw +composite_score per SKILL.md Rule 16), so test focus is on the formula +correctness and the None/NaN universe-gap policy that downstream +``compute/main.py`` (commit 5) relies on. +""" + +from __future__ import annotations + +import math + +import numpy as np +import pandas as pd +import pytest + +from compute.scoring.osap_blend import ( + OSAP_BLEND_WEIGHT_DEFAULT, + aggregate_osap_signals, + apply_osap_blend, +) + +# --------------------------------------------------------------------------- +# aggregate_osap_signals +# --------------------------------------------------------------------------- + + +def test_aggregate_osap_signals_mean_of_ranks_scaled_to_0_100() -> None: + """Mean of rank values × 100 — basic 3-ticker happy path. + + Each ticker has the same proxy-mode signal map per commit 2 design, + but the helper is shape-agnostic so the test uses distinct maps to + isolate the math. + """ + signal_map: dict[str, dict[str, float] | None] = { + "AAPL": {"Mom1m": 0.8, "Accruals": 0.4, "BM": 0.6}, # mean=0.6 → 60.0 + "MSFT": {"Mom1m": 1.0, "Accruals": 0.2, "BM": 0.5}, # mean≈0.566 → 56.66 + "NVDA": {"Mom1m": 0.1, "Accruals": 0.1, "BM": 0.1}, # mean=0.1 → 10.0 + } + + result = aggregate_osap_signals(signal_map) + + assert result.name == "osap_signal_aggregate" + assert set(result.index) == {"AAPL", "MSFT", "NVDA"} + assert result["AAPL"] == pytest.approx(60.0) + assert result["MSFT"] == pytest.approx(100.0 * (1.0 + 0.2 + 0.5) / 3) + assert result["NVDA"] == pytest.approx(10.0) + + +def test_aggregate_osap_signals_none_ticker_yields_nan() -> None: + """Universe-gap ticker (None map) → NaN aggregate, NOT zero.""" + signal_map: dict[str, dict[str, float] | None] = { + "AAPL": {"Mom1m": 0.5}, + "DELISTED": None, + } + + result = aggregate_osap_signals(signal_map) + + assert result["AAPL"] == pytest.approx(50.0) + assert math.isnan(result["DELISTED"]) + + +def test_aggregate_osap_signals_empty_inner_dict_yields_nan() -> None: + """Empty {} signal dict treated identically to None (no signals fired).""" + signal_map: dict[str, dict[str, float] | None] = { + "AAPL": {}, + } + + result = aggregate_osap_signals(signal_map) + + assert math.isnan(result["AAPL"]) + + +def test_aggregate_osap_signals_empty_map_returns_empty_series() -> None: + """Empty input → empty Series, not error.""" + result = aggregate_osap_signals({}) + + assert result.empty + assert result.dtype == float + assert result.name == "osap_signal_aggregate" + + +# --------------------------------------------------------------------------- +# apply_osap_blend +# --------------------------------------------------------------------------- + + +def test_apply_osap_blend_basic_50_50() -> None: + """50/50 default — blended = mean(composite, osap).""" + composite = pd.Series({"AAPL": 80.0, "MSFT": 60.0, "NVDA": 40.0}) + osap = pd.Series({"AAPL": 40.0, "MSFT": 70.0, "NVDA": 100.0}) + + result = apply_osap_blend(composite, osap) + + assert result.name == "composite_score_osap_adjusted" + assert result["AAPL"] == pytest.approx(60.0) # (80 + 40) / 2 + assert result["MSFT"] == pytest.approx(65.0) # (60 + 70) / 2 + assert result["NVDA"] == pytest.approx(70.0) # (40 + 100) / 2 + + +def test_apply_osap_blend_weight_zero_returns_composite_unchanged() -> None: + """weight=0 → pure pass-through (no OSAP influence).""" + composite = pd.Series({"AAPL": 80.0, "MSFT": 60.0}) + osap = pd.Series({"AAPL": 10.0, "MSFT": 90.0}) + + result = apply_osap_blend(composite, osap, weight=0.0) + + pd.testing.assert_series_equal( + result.astype(float), + composite.astype(float), + check_names=False, + ) + + +def test_apply_osap_blend_weight_one_returns_osap_where_covered() -> None: + """weight=1 → pure OSAP for covered tickers; pass-through for NaN.""" + composite = pd.Series({"AAPL": 80.0, "GAP": 60.0}) + osap = pd.Series({"AAPL": 25.0, "GAP": float("nan")}) + + result = apply_osap_blend(composite, osap, weight=1.0) + + assert result["AAPL"] == pytest.approx(25.0) + # NaN OSAP → composite passes through even at weight=1 + assert result["GAP"] == pytest.approx(60.0) + + +def test_apply_osap_blend_nan_osap_falls_back_to_composite() -> None: + """Universe-gap policy: NaN OSAP → composite unchanged.""" + composite = pd.Series({"AAPL": 80.0, "GAP": 50.0, "MSFT": 60.0}) + osap = pd.Series({"AAPL": 20.0, "GAP": float("nan"), "MSFT": 100.0}) + + result = apply_osap_blend(composite, osap, weight=0.5) + + assert result["AAPL"] == pytest.approx(50.0) # (80 + 20) / 2 + assert result["GAP"] == pytest.approx(50.0) # NaN → pass-through + assert result["MSFT"] == pytest.approx(80.0) # (60 + 100) / 2 + + +def test_apply_osap_blend_empty_composite_returns_empty_series() -> None: + """Empty composite → empty output, no error.""" + result = apply_osap_blend( + pd.Series(dtype=float), + pd.Series({"AAPL": 50.0}), + ) + + assert result.empty + assert result.name == "composite_score_osap_adjusted" + + +def test_apply_osap_blend_clips_to_0_100() -> None: + """Output is clipped to [0, 100] to match composite-score domain.""" + # Edge: composite=100, osap=100 → 100 (no clip needed). + # Edge: composite=0, osap=0 → 0 (no clip needed). + # Construct a degenerate case where naive math could exceed bounds + # if caller passes out-of-domain inputs. + composite = pd.Series({"HIGH": 100.0, "LOW": 0.0, "OOB": 150.0}) + osap = pd.Series({"HIGH": 100.0, "LOW": 0.0, "OOB": 200.0}) + + result = apply_osap_blend(composite, osap, weight=0.5) + + assert result["HIGH"] == pytest.approx(100.0) + assert result["LOW"] == pytest.approx(0.0) + # 0.5 * 150 + 0.5 * 200 = 175 → clipped to 100 + assert result["OOB"] == pytest.approx(100.0) + + +def test_apply_osap_blend_invalid_weight_below_zero_raises() -> None: + composite = pd.Series({"AAPL": 50.0}) + osap = pd.Series({"AAPL": 50.0}) + + with pytest.raises(ValueError, match=r"weight must be in \[0, 1\]"): + apply_osap_blend(composite, osap, weight=-0.1) + + +def test_apply_osap_blend_invalid_weight_above_one_raises() -> None: + composite = pd.Series({"AAPL": 50.0}) + osap = pd.Series({"AAPL": 50.0}) + + with pytest.raises(ValueError, match=r"weight must be in \[0, 1\]"): + apply_osap_blend(composite, osap, weight=1.5) + + +def test_apply_osap_blend_extra_osap_tickers_dropped_via_reindex() -> None: + """Tickers in OSAP but not composite are silently dropped — output + index matches composite_scores.index exactly.""" + composite = pd.Series({"AAPL": 60.0, "MSFT": 70.0}) + osap = pd.Series({"AAPL": 80.0, "MSFT": 50.0, "EXTRA": 99.0}) + + result = apply_osap_blend(composite, osap, weight=0.5) + + assert list(result.index) == ["AAPL", "MSFT"] + assert "EXTRA" not in result.index + + +def test_apply_osap_blend_missing_osap_ticker_becomes_passthrough() -> None: + """Composite ticker absent from OSAP series → reindex to NaN → + pass-through (universe-gap path).""" + composite = pd.Series({"AAPL": 60.0, "ABSENT": 75.0}) + osap = pd.Series({"AAPL": 20.0}) # ABSENT not in OSAP + + result = apply_osap_blend(composite, osap, weight=0.5) + + assert result["AAPL"] == pytest.approx(40.0) # (60 + 20) / 2 + assert result["ABSENT"] == pytest.approx(75.0) # pass-through + + +def test_apply_osap_blend_default_weight_matches_constant() -> None: + """No explicit weight → uses OSAP_BLEND_WEIGHT_DEFAULT (0.5 lock).""" + composite = pd.Series({"AAPL": 80.0}) + osap = pd.Series({"AAPL": 40.0}) + + result_default = apply_osap_blend(composite, osap) + result_explicit = apply_osap_blend( + composite, osap, weight=OSAP_BLEND_WEIGHT_DEFAULT + ) + + pd.testing.assert_series_equal(result_default, result_explicit) + # Sanity: the locked default is 0.5 per osap-integration/PLAN.md L168-170 + assert OSAP_BLEND_WEIGHT_DEFAULT == 0.5 + + +# --------------------------------------------------------------------------- +# Cross-module integration sanity (commit 2 → commit 3) +# --------------------------------------------------------------------------- + + +def test_aggregate_then_blend_round_trip_with_compute_osap_signals_shape() -> None: + """End-to-end shape: ``compute_osap_signals`` output → aggregate → + apply_osap_blend works without manual conversion.""" + # Simulate compute_osap_signals output directly (avoid importing the + # full pipeline here — that's commit 5's integration test concern). + osap_signal_map: dict[str, dict[str, float] | None] = { + "AAPL": {"Mom1m": 0.9, "Accruals": 0.3, "BM": 0.6}, + "MSFT": {"Mom1m": 0.5, "Accruals": 0.5, "BM": 0.5}, + "GAP": None, # Universe gap + } + composite = pd.Series({"AAPL": 80.0, "MSFT": 60.0, "GAP": 70.0}) + + aggregate = aggregate_osap_signals(osap_signal_map) + blended = apply_osap_blend(composite, aggregate) + + # AAPL: (0.9 + 0.3 + 0.6) / 3 × 100 = 60 → (80 + 60) / 2 = 70 + assert blended["AAPL"] == pytest.approx(70.0) + # MSFT: mean=0.5 → 50 → (60 + 50) / 2 = 55 + assert blended["MSFT"] == pytest.approx(55.0) + # GAP: None → NaN → pass-through 70 + assert blended["GAP"] == pytest.approx(70.0) + + +def test_apply_osap_blend_preserves_dtype_float() -> None: + """Output dtype is float (matches composite_score in StockSummary).""" + composite = pd.Series({"AAPL": 80, "MSFT": 60}, dtype=int) # int input + osap = pd.Series({"AAPL": 40.0, "MSFT": 70.0}) + + result = apply_osap_blend(composite, osap) + + assert result.dtype == np.float64 diff --git a/tests/test_smoke.py b/tests/test_smoke.py index fb6a8dccf..1043deb8c 100644 --- a/tests/test_smoke.py +++ b/tests/test_smoke.py @@ -5,7 +5,7 @@ def test_phase0_scaffold_imports() -> None: assert config.UNIVERSE == "SP500" - assert config.SCHEMA_VERSION.startswith("0.8.") + assert config.SCHEMA_VERSION.startswith("0.9.") def test_phase0_paths_resolve() -> None: diff --git a/tests/test_validation/test_osap_validation.py b/tests/test_validation/test_osap_validation.py new file mode 100644 index 000000000..1e3019c0b --- /dev/null +++ b/tests/test_validation/test_osap_validation.py @@ -0,0 +1,384 @@ +"""Tests for ``compute/validation/osap_validation.py`` (Phase 4h commit 4). + +Anchors the PBO/DSR hard-gate behavior + the rolling-12m Spearman IC +observability metric. Coverage focus: + +- Bailey 2014 cohort framing (``n_trials = cohort_size``) holds + end-to-end through :func:`factor_passes_gates`. +- Asymmetric NaN policy (per-signal ``dropna`` + cohort + ``fillna(0.0)``) is locked in regression — test #13 fails if a + future maintainer accidentally reverts to ``dropna(how='any')``. +- Rejection-reason classification is exact (``insufficient_data`` / + ``high_pbo`` / ``low_dsr`` / ``gate_failed``) so commit 5's + metadata writer can group correctly. +- Rolling-IC pure-pandas Spearman matches a hand-computed reference + and gracefully degrades on insufficient history. +""" + +from __future__ import annotations + +import math + +import numpy as np +import pandas as pd +import pytest + +from compute.validation.osap_validation import ( + MIN_OBS_PER_SIGNAL, + ROLLING_IC_WINDOW_MONTHS, + GateResult, + compute_rolling_ic_12m, + filter_accepted_signals, + gate_osap_signals, +) + + +def _long_format( + wide: pd.DataFrame, +) -> pd.DataFrame: + """Helper: wide (date × signal) → long (signalname, date, ls_return). + + Mirrors commit 2's ``compute_long_short_returns`` output. Drops + NaN cells (commit 2 does the same — long format never carries + explicit NaN rows). + """ + long = ( + wide.rename_axis(index="date", columns="signalname") + .stack() + .rename("ls_return") + .reset_index() + ) + return long.dropna(subset=["ls_return"]).astype({"signalname": str}) + + +def _make_noise_cohort( + n_dates: int = 64, + n_signals: int = 10, + *, + seed: int = 42, + start: str = "2020-01-31", +) -> pd.DataFrame: + """Synthetic monthly long-short returns — independent noise per signal. + + Matches the synthetic-fixture pattern at + ``tests/test_validation/test_pbo_dsr.py:63`` (deterministic seed + + Normal(0, 0.05) monthly returns). + """ + rng = np.random.default_rng(seed=seed) + dates = pd.date_range(start=start, periods=n_dates, freq="ME") + data = rng.normal(0.0, 0.05, size=(n_dates, n_signals)) + cols = [f"NoiseSig{i:02d}" for i in range(n_signals)] + wide = pd.DataFrame(data, index=dates, columns=cols) + return _long_format(wide) + + +# --------------------------------------------------------------------------- +# gate_osap_signals +# --------------------------------------------------------------------------- + + +def test_gate_osap_signals_random_noise_yields_high_pbo() -> None: + """Pure-noise cohort → no signals accepted (Bailey 2014 invariant). + + Independent random strategies have no persistent signal, so PBO + clusters near 0.5+ AND deflated Sharpe (corrected for n_trials + multiple-testing) lands at or below zero for typical samples. The + gate's hard PBO ≤ 0.5 conjunction-with DSR > 0 should reject every + signal — categorized as ``'high_pbo'`` (PBO-only fail), + ``'low_dsr'`` (DSR-only fail), or ``'gate_failed'`` (both fail) + depending on which side wins for that draw. + """ + df = _make_noise_cohort(n_dates=64, n_signals=10, seed=42) + + results = gate_osap_signals(df) + + assert len(results) == 10 + assert all(isinstance(r, GateResult) for r in results.values()) + + # Bailey 2014 invariant: pure noise → zero acceptances. + accepted = [s for s, r in results.items() if r.accepted] + assert accepted == [] + + # All rejections cite one of the three real-gate reasons (no + # short-circuits — every signal got a real PBO/DSR computation). + reasons = {r.rejection_reason for r in results.values()} + assert reasons.issubset({"high_pbo", "low_dsr", "gate_failed"}) + assert "insufficient_data" not in reasons + + # Sanity: every result has populated pbo + dsr + sharpe floats. + for r in results.values(): + assert r.pbo is not None and 0.0 <= r.pbo <= 1.0 + assert r.dsr is not None + assert r.sharpe is not None + assert r.n_observations == 64 + + +def test_gate_osap_signals_low_sharpe_signal_rejected_for_dsr() -> None: + """Near-zero-mean signal in a strong cohort → fails DSR.""" + rng = np.random.default_rng(seed=7) + dates = pd.date_range(start="2020-01-31", periods=64, freq="ME") + # 9 random signals at modest scale + 1 near-zero signal. + data = rng.normal(0.0, 0.05, size=(64, 9)) + near_zero = rng.normal(0.0, 1e-4, size=(64, 1)) + wide = pd.DataFrame( + np.hstack([data, near_zero]), + index=dates, + columns=[f"Sig{i:02d}" for i in range(9)] + ["DeadSig"], + ) + df = _long_format(wide) + + results = gate_osap_signals(df) + + dead = results["DeadSig"] + # Near-zero σ → DSR ≈ 0 → fails DSR_VETO_THRESHOLD (= 0.0, strict >). + assert dead.accepted is False + assert dead.rejection_reason in ("low_dsr", "gate_failed") + assert dead.dsr is not None and dead.dsr <= 0.0 + + +def test_gate_osap_signals_strong_signal_accepted() -> None: + """Monotone-drift signal beats noisy cohort → accepted with float metrics.""" + rng = np.random.default_rng(seed=11) + dates = pd.date_range(start="2015-01-31", periods=120, freq="ME") + # 9 noise + 1 strong drift signal with very high Sharpe and stable OOS rank. + noise = rng.normal(0.0, 0.05, size=(120, 9)) + strong = np.full((120, 1), 0.03) + rng.normal(0.0, 0.005, size=(120, 1)) + wide = pd.DataFrame( + np.hstack([noise, strong]), + index=dates, + columns=[f"Noise{i:02d}" for i in range(9)] + ["StrongSig"], + ) + df = _long_format(wide) + + results = gate_osap_signals(df) + + strong_result = results["StrongSig"] + assert strong_result.accepted is True + assert strong_result.rejection_reason is None + assert strong_result.pbo is not None and strong_result.pbo <= 0.5 + assert strong_result.dsr is not None and strong_result.dsr > 0.0 + assert strong_result.sharpe is not None and strong_result.sharpe > 0.0 + + +def test_gate_osap_signals_insufficient_data() -> None: + """Cohort with < ``MIN_OBS_PER_SIGNAL`` rows → all signals rejected + with reason ``insufficient_data`` (cohort precondition fails).""" + rng = np.random.default_rng(seed=3) + dates = pd.date_range(start="2024-01-31", periods=10, freq="ME") + wide = pd.DataFrame( + rng.normal(0.0, 0.05, size=(10, 4)), + index=dates, + columns=["A", "B", "C", "D"], + ) + df = _long_format(wide) + + results = gate_osap_signals(df) + + assert len(results) == 4 + for r in results.values(): + assert r.accepted is False + assert r.rejection_reason == "insufficient_data" + assert r.pbo is None + assert r.dsr is None + assert r.sharpe is None + assert r.n_observations == 10 + + +def test_gate_osap_signals_requested_signals_filter() -> None: + """``requested_signals`` subsets the cohort before gating — result + has only those keys.""" + df = _make_noise_cohort(n_dates=48, n_signals=5, seed=42) + # Input contains NoiseSig00..NoiseSig04. Request a 3-signal subset. + requested = ("NoiseSig00", "NoiseSig02", "NoiseSig04") + + results = gate_osap_signals(df, requested_signals=requested) + + assert set(results.keys()) == set(requested) + # All three got a real PBO/DSR run (cohort size 3 ≥ 2, dates ≥ 16). + for r in results.values(): + assert r.rejection_reason != "insufficient_data" + + +def test_gate_osap_signals_requested_none_uses_all_signals_in_df() -> None: + """``requested_signals=None`` (default) → result covers every + unique signalname in the input.""" + df = _make_noise_cohort(n_dates=32, n_signals=6, seed=1) + + results = gate_osap_signals(df, requested_signals=None) + + assert set(results.keys()) == set(df["signalname"].unique()) + + +def test_gate_osap_signals_empty_input_returns_empty_dict() -> None: + """Empty long-format DF → ``{}``, no crash.""" + df = pd.DataFrame(columns=["signalname", "date", "ls_return"]) + results = gate_osap_signals(df) + assert results == {} + + +def test_gate_osap_signals_single_signal_cohort_rejects_with_insufficient_data() -> None: + """Cohort with 1 column → PBO precondition (``cohort_size < 2``) + short-circuits all signals to insufficient_data.""" + rng = np.random.default_rng(seed=5) + dates = pd.date_range(start="2020-01-31", periods=64, freq="ME") + wide = pd.DataFrame( + rng.normal(0.0, 0.05, size=(64, 1)), + index=dates, + columns=["LonelySig"], + ) + df = _long_format(wide) + + results = gate_osap_signals(df) + + assert results["LonelySig"].accepted is False + assert results["LonelySig"].rejection_reason == "insufficient_data" + assert results["LonelySig"].pbo is None + + +def test_compute_rolling_ic_12m_known_signal() -> None: + """Hand-constructed lag-1-correlated series → Spearman ≈ expected.""" + # Strictly monotone increasing series — every (t, t+1) pair has rank + # correlation = 1.0 (perfectly preserved order under shift). + dates = pd.date_range(start="2023-01-31", periods=15, freq="ME") + series = pd.DataFrame( + { + "signalname": ["S"] * 15, + "date": dates, + "ls_return": np.arange(15, dtype=float) * 0.01, + } + ) + + ic = compute_rolling_ic_12m(series, "S") + + assert ic is not None + assert ic == pytest.approx(1.0, abs=1e-9) + + +def test_compute_rolling_ic_12m_insufficient_history() -> None: + """Signal with < 13 monthly observations → ``None``.""" + dates = pd.date_range(start="2024-01-31", periods=8, freq="ME") + df = pd.DataFrame( + { + "signalname": ["S"] * 8, + "date": dates, + "ls_return": np.linspace(0.01, 0.08, 8), + } + ) + + assert compute_rolling_ic_12m(df, "S") is None + + +def test_compute_rolling_ic_12m_nan_safe_with_gaps() -> None: + """NaN ``ls_return`` rows OUTSIDE the rolling-12m tail window are + pruned by ``tail(13)`` and do not poison the IC. + + Production long-format DataFrames never carry explicit NaN rows + (commit 2's ``compute_long_short_returns`` drops them at long + conversion), but pre-window NaN sourced from upstream test fixtures + or future refactors must not propagate into the Spearman result. + """ + # 20 rows total. The last 13 (tail window) are strictly monotone + # with no NaN. NaN are punched into older history. + dates = pd.date_range(start="2022-06-30", periods=20, freq="ME") + rets = np.arange(20, dtype=float) * 0.01 + rets[0] = np.nan + rets[3] = np.nan + rets[5] = np.nan # All three NaN sit in indices 0..6 (pruned by tail(13)). + df = pd.DataFrame( + { + "signalname": ["S"] * 20, + "date": dates, + "ls_return": rets, + } + ) + + ic = compute_rolling_ic_12m(df, "S") + + # Strictly monotone tail-13 → Spearman should be exactly 1.0. + assert ic is not None + assert math.isfinite(ic) + assert ic == pytest.approx(1.0, abs=1e-9) + + +def test_filter_accepted_signals_splits_into_sorted_lists() -> None: + """Mixed gate_results → alphabetically sorted accepted vs excluded; + union equals the full key set.""" + gate_results = { + "C_pass": GateResult(True, 0.3, 1.5, 1.2, 60, None), + "A_fail_pbo": GateResult(False, 0.7, 1.2, 1.0, 60, "high_pbo"), + "B_pass": GateResult(True, 0.4, 2.0, 1.6, 60, None), + "Z_insufficient": GateResult( + False, None, None, None, 10, "insufficient_data" + ), + "Y_fail_dsr": GateResult(False, 0.3, -0.5, 0.2, 60, "low_dsr"), + } + + accepted, excluded = filter_accepted_signals(gate_results) + + assert accepted == ["B_pass", "C_pass"] + assert excluded == ["A_fail_pbo", "Y_fail_dsr", "Z_insufficient"] + # Round-trip: union of the two lists equals all gate-result keys. + assert set(accepted + excluded) == set(gate_results.keys()) + + +def test_gate_osap_signals_sparse_cohort_zero_filled_not_decimated() -> None: + """Lock the §NaN policy: sparse coverage signals stay in the cohort + via ``fillna(0.0)`` rather than being dropped by ``dropna(how='any')``. + + Regression guard — if a future maintainer reverts the cohort + fillna(0.0) to ``dropna(how='any')``, the cohort matrix would + collapse to the intersection (8 rows after sparse signals drop 16 + of 64 months each), tripping the cohort-precondition row count and + cascading every signal to ``insufficient_data``. + """ + rng = np.random.default_rng(seed=23) + dates = pd.date_range(start="2020-01-31", periods=64, freq="ME") + n_signals = 10 + data = rng.normal(0.0, 0.05, size=(64, n_signals)) + cols = [f"Sig{i:02d}" for i in range(n_signals)] + wide = pd.DataFrame(data, index=dates, columns=cols) + + # Punch sparse-coverage NaN holes into 3 signals (~25% of rows + # missing each, but DIFFERENT rows per signal — so a + # dropna(how='any') intersection would decimate hard). + sparse_signals = ["Sig00", "Sig03", "Sig07"] + for offset, sig in enumerate(sparse_signals): + nan_rows = np.arange(offset, 64, 4) # 16 NaN rows, offset per signal + wide.loc[wide.index[nan_rows], sig] = np.nan + + df_long = _long_format(wide) + + results = gate_osap_signals(df_long) + + # All 10 signals received a GateResult (not collapsed by dropna). + assert set(results.keys()) == set(cols) + + # The 3 sparse signals each have ≥ 48 obs (well above MIN_OBS_PER_SIGNAL + # = 16) so none short-circuited on per-signal insufficient_data. + # Critically: NO signal should have rejection_reason='insufficient_data' + # — that would mean the cohort precondition failed, which is exactly + # what dropna(how='any') would trigger. + insufficient = [ + s for s, r in results.items() if r.rejection_reason == "insufficient_data" + ] + assert insufficient == [], ( + f"Cohort decimation detected — signals {insufficient} short-" + f"circuited on insufficient_data despite having enough per-signal " + f"obs. Did the cohort fillna(0.0) get reverted to dropna(how='any')?" + ) + + # Every signal has a populated n_observations from the actual + # factor_passes_gates call (not the precheck-only count). + for sig, r in results.items(): + assert r.pbo is not None, f"{sig} missing pbo — short-circuited?" + assert r.dsr is not None, f"{sig} missing dsr — short-circuited?" + + +def test_module_load_constants_sourced_from_pbo_dsr() -> None: + """Smoke test — module-level constants match pbo_dsr defaults so + a downstream caller using the wrapper's defaults gets the canonical + Phase 4 PBO ≤ 0.5 / DSR > 0 gate.""" + from compute.validation.pbo_dsr import DEFAULT_N_PARTITIONS as PD_NP + + assert MIN_OBS_PER_SIGNAL == PD_NP # 16 + assert ROLLING_IC_WINDOW_MONTHS == 12