dackclup · dackclup · May 18, 2026 · May 18, 2026 · May 18, 2026 · May 18, 2026
diff --git a/.github/workflows/compute-rankings.yml b/.github/workflows/compute-rankings.yml
@@ -46,7 +46,10 @@ jobs:
       - name: Install
         run: |
           python -m pip install --upgrade pip
-          pip install -e .
+          # Phase 4h: weekly compute imports compute/ingest/osap.py which
+          # imports the `openassetpricing` package — installed via the
+          # `factors` extra (pinned to ==0.0.2 in pyproject.toml).
+          pip install -e ".[factors]"
 
       - name: Compute current quarter id
         id: quarter

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -153,14 +153,16 @@ non-connector-bound work.
 
 ## Phase status
 
-Current schema: **`0.8.0-phase4.5f`** · Defense layer: **17**
-(7 active vetoes + 10 annotates + 5 numerical guards +
-`manipulation_index` rollup). Latest release tag:
+Current schema: **`0.9.0-phase4h`** (bumped from `0.8.0-phase4.5f` in
+PR #112). Defense layer: **17** (7 active vetoes + 10 annotates + 5
+numerical guards + `manipulation_index` rollup) — Phase 4h adds
+observability surface, no new veto. Latest release tag:
 [**`v1.2.0-phase4.5`**](https://github.com/dackclup/quantrank/releases/tag/v1.2.0-phase4.5)
-**SHIPPED 2026-05-17** at commit `6d414a9b` — **Phase 4.5 cluster
-✅ complete** (6 sub-PRs). Production verified run #51
-(`b1588b2a`, 5m14s warm-cache). Test suite: 856 offline + 17
-`@network`.
+shipped 2026-05-17 at commit `6d414a9b`. **Phase 4h in flight in PR
+#112** — OSAP signal replication (factor-exposure proxy) + PBO/DSR
+hard gate (PR #60 reuse) + rolling-12m IC observability + Path-b
+composite × OSAP blend (50/50 default, Top-5 still ranks raw
+composite per Rule 16). Test suite: 906 offline + 19 `@network`.
 
 **Next deliverable** (pick by appetite — three tracks parallelize):
 **4.5e** (Form 4 insider, ~3w → v1.3.0) · **4h/4i/4j/4k** factor

diff --git a/PHASE_STATUS.md b/PHASE_STATUS.md
@@ -6,7 +6,7 @@
 | 1 | Universe + prices ingestion | ✅ DONE — 2026-05-08 |
 | 2 | Fundamentals via SEC EDGAR | ✅ DONE — 2026-05-08 |
 | 3 | Classical features + composite + **defenses** → **v1.0** | ✅ **DONE — 2026-05-14** (v1.0.0 tagged + GitHub release) |
-| 4 | Factor consolidation (OSAP + JKP + Qlib + IPCA) → **v1.1** | 🟡 IN PROGRESS — 4a-4g + 4c.1/4c.2/4c.3 + PR 4b §1+§2 all merged; PR 4b §3 IC-decay output deferred to Phase 5; **next: 4h / 4i / 4j / 4k factor integrations** (PBO/DSR gate ready), can run in parallel with Phase 4.5 |
+| 4 | Factor consolidation (OSAP + JKP + Qlib + IPCA) → **v1.1** | 🟡 IN PROGRESS — 4a-4g + 4c.1/4c.2/4c.3 + PR 4b §1+§2 all merged; **PR #112 (Phase 4h)** ships OSAP signal replication + PBO/DSR gate + Path-b 50/50 blend (schema bump `0.8.0-phase4.5f` → `0.9.0-phase4h`, no new veto — annotate-only blend, Top-5 still ranks raw composite per Rule 16, 5-commit cluster on `claude/resume-quantrank-phase-4.5-Zh0pO`); 4i/4j/4k pending; PR 4b §3 IC-decay output deferred to Phase 5 |
 | **4.5** | **Earnings-manipulation defense cluster** → **v1.2** | ✅ **DONE 2026-05-17** — **tag [`v1.2.0-phase4.5`](https://github.com/dackclup/quantrank/releases/tag/v1.2.0-phase4.5) cut** at commit `6d414a9b`. 6 sub-PRs (#89/#90/#91 + #93 + #95 + #97 + #100). Active vetoes **5 → 7**; defense layer **9 → 17** (= 7 vetoes + 10 annotates). 4.5f adds `manipulation_index` (0-100 rollup) + `composite_score_adjusted` (soft penalty, max 10 pts, informational only) + `ManipulationRiskCard` UI + schema bump **`0.7.1-phase4g` → `0.8.0-phase4.5f`**. Production verified run #51 (`b1588b2a`, 5m14s warm-cache): card fires on 158/502 (31.5%); HIGH band 2 (SMCI=84 · WAT=64), MODERATE 60, LOW 96. 4.5e Form-4 insider clustering **deferred to v1.3.0** — reserved-slot weights already declared in `FLAG_WEIGHTS`. |
 | 5 | ML meta-learner (Triple-Barrier + Meta-Labeling + Conformal) + SHAP | ⚪ not started |
 | 6 | Sentiment v2 (FinBERT + Whisper + 8-K Lazy Prices) | ⚪ not started |

diff --git a/SKILL.md b/SKILL.md
@@ -304,6 +304,7 @@ Schema versions:
 | `0.7.0-phase4g` | Phase 4g | **8-K Tier-2 event defenses re-enabled** (PR #79, merged 2026-05-15 on `c35c6d40`, closes [issue #14](https://github.com/dackclup/quantrank/issues/14)). Flipped `compute/scoring/tier2._EIGHT_K_DEFENSES_ENABLED = True` after the PR 3d workflow-timeout deferral (root cause cleared by PR #58 cache layers + PR 3d tenacity tightening). `non_reliance_filing` (Item 4.02 hard veto, 365d lookback, Schroeder 2024 SSRN — ~50% of 4.02 filings precede formal restatement) returns to the active layer as the **5th active veto**. `auditor_change` (Item 4.01 annotate, 730d lookback, Reg S-K Item 304, Cohen-Malloy-Nguyen 2020 type) joins the Tier-2 annotate surface. No data-schema-shape delta — only the feature-flag flip + reason-taxonomy expansion. |
 | `0.7.1-phase4g` | Phase 4g | **`price_change_1d_pct` additive field** (squash-merged via PR #80, commit `1509f707`). New optional `float \| None` field on `StockSummary` + `StockDetail` — day-over-day percent change from the prior trading-day close. Computed once in `compute/main.py:_fetch_prices_one` from the last two valid yfinance closes; null for newly-IPO'd tickers (only one close available). Lets the ranking-table mobile cards render a change pill without lazy-fetching 502 per-stock history JSONs. Per `phase-4/schema-versioning/PLAN.md`: "Add a new optional field (default = None) → patch". Production metadata.version stays `0.7.0-phase4g` until next weekly compute. |
 | `0.7.1-phase4g` (no schema delta) | Phase 4.5a-4.5d wave | **Earnings-manipulation defense cluster — sub-PRs 4.5a + 4.5b + 4.5c + 4.5d shipped 2026-05-16/17** (PRs #89/#90/#91 + #93 + #95 + #97). **No data-schema-shape delta** — all 9 new flag identifiers are strings appended to existing `risk_flags: list[str]` (active vetoes) + `valuation_warnings: list[str]` (annotates) arrays. Active vetoes **5 → 7**: + `beneish_manipulation_veto` (Beneish 1999, M > −1.78) + `dechow_manipulation_veto` (Dechow 2011, F > 3.0). Annotates added: `manipulation_triple_flag` (4.5a joint gate, 2 fired: SMCI · WAT), `restatement_history` (4.5b, 59 fired / 11.8% — Hennes-Leone-Miller 2008 *TAR*), `late_filing_notification` (4.5b, 2 fired: HAS · Q — Bartov-Lai-Yeung 2002 *JAR*), `rem_suspect` (4.5c, 16 fired / 3.2% — Roychowdhury 2006 *JAE* 3-proxy REM via per-sector OLS), `accruals_momentum_high` (4.5d, 50 fired / 10.0% — Sloan 1996 / Beneish 1999 Δ(TATA) > +0.05 over 3y), `loss_avoidance_pattern` (4.5d, 0 fired — Burgstahler-Dichev 1997 cohort thresholds too tight for S&P 500 large-cap universe, file as follow-up). Also closes [issue #7](https://github.com/dackclup/quantrank/issues/7) (Sloan over-firing on Financials: 21.3% → 11.7%, sector spread 7.7× → 1.4×). 2 new cache dirs (`compute/cache/edgar_amendments/` + `compute/cache/edgar_late_filings/`, 7d TTL each). Test suite **646 → 831 offline**. Reason taxonomy: 24 stable + 2 Tier-3 + 2 new vetoes + 6 new annotates = **34 stable identifiers**. |
+| **`0.9.0-phase4h`** (in flight in PR #112) | Phase 4h | **OSAP signal replication + PBO/DSR hard gate + Path-b composite × OSAP blend** (5-commit cluster on branch `claude/resume-quantrank-phase-4.5-Zh0pO`: 06bdac76 schema-foundation, b79983f6 osap_replicate proxy + 100-signal manifest, a6760d91 osap_blend Path-b, df4d9bd2 osap_validation PBO/DSR gate + rolling-12m-IC, [TBD] compute/main.py wiring + @network e2e). **Minor bump** — 6 new optional fields land simultaneously: `StockDetail.osap_signals: dict[str, float] \| None` + `StockDetail.osap_blended_score: float \| None`; `Metadata.osap_signals_used: list[str] \| None`, `Metadata.osap_excluded_signals: list[str] \| None`, `Metadata.osap_signals_ic_12m: dict[str, float] \| None`, `Metadata.osap_signals_coverage_pct: dict[str, float] \| None`. **OSAP blend stays OUTSIDE `compute_composite()`** — `PHASE3_WEIGHTS` sum-to-1.0 invariant (`compute/scoring/composite.py:43-45`) intact; Path-b formula `blended = (1 - weight) × composite_score + weight × osap_signal_aggregate`, default `weight=0.5` locked at `osap-integration/PLAN.md:168-170`. **Hard gate** = PBO ≤ 0.5 AND DSR > 0 via PR #60's `factor_passes_gates`; rolling-12m Spearman IC is observability-only (full walk-forward CV deferred to Phase 5 per `defense-infrastructure/PLAN.md:270`). **No new veto** (Top-5 still ranks raw `composite_score` per Rule 16; `osap_blended_score` is informational); defense layer stays at **17**. **Universe-gap policy** — tickers with no OSAP coverage pass `composite_score` through unchanged (no impute, distinct from pillar `neutralize_missing=True`). **NaN policy in PBO cohort** — zero-fill (not mean-fill, not dropna) preserves Bailey 2014 `n_trials = cohort_size` multiple-testing correction; sparse signals naturally lose on DSR (low Sharpe → DSR rejection). **OSAP failure is observability-only** — wrapped in try/except in `compute/main.py` so live-fetch / package failure NEVER blocks weekly production; all 6 new fields degrade to `None`. Test suite **856 → 906 offline + 18 → 19 `@network`** (commits 2-5 added 50 tests; e2e network test added in commit 5). Reason taxonomy unchanged at 34 stable identifiers. Tag `v1.1.0-phase4` (or `v1.3.0` for the 4.5e+4h combined release) deferred until 4i/4j/4k also merge. |
 | **`0.8.0-phase4.5f`** | Phase 4.5f | **Manipulation Composite + soft composite penalty + UI** (PR #100 merged 2026-05-17 on commit `b1588b2a`; production verified on commit `e57f09cb`, run #51, warm-cache 5m14s). **Minor bump** because 5 new optional fields land simultaneously + new UI surface ships + tag `v1.2.0-phase4.5` coordinates with the data-version bump (semver coupling). Additive optional fields: `StockSummary.manipulation_index: float \| None`, `StockSummary.composite_score_adjusted: float \| None`, `StockDetail.manipulation_index`, `StockDetail.composite_score_adjusted`, `StockDetail.manipulation_components: dict[str, bool] \| None`. **`manipulation_index`** is a 0-100 rollup over the 4.5a-d flag set via a per-flag additive weight table in `compute/scoring/manipulation_index.py::FLAG_WEIGHTS` (active vetoes 15-20 pts · joint-gate 10 · annotates 5-8 · Tier-3 soft 3); clipped to `[0, 100]`. **`composite_score_adjusted`** applies the soft penalty `composite − 0.5 × (index / 100) × 20` (max 10-pt deduction at index = 100); the original `composite_score` field is preserved untouched per Rule 9 audit trail. **Rank source stays the raw composite per Rule 16** — the adjusted value is informational only, surfaced on the new detail-page `ManipulationRiskCard` (3-band outlined-light: emerald LOW / amber MODERATE / rose HIGH) with the in-line qualifier "Composite penalty: −X.XX pts (informational; rank uses raw composite)". Production: 158/502 (31.5%) fire the card (HIGH 2: SMCI=84 · WAT=64; MODERATE 60; LOW 96). **Phase 4.5e reserved-slot weights declared** (`INSIDER_SELL_CLUSTER_WEIGHT_RESERVED = 10`, `C_SUITE_UNUSUAL_SELL_WEIGHT_RESERVED = 5`) — the 4.5e PR uncomments 2 entries in `FLAG_WEIGHTS`, no calibration cascade. Test suite **831 → 856 offline**. Reason taxonomy: 34 stable identifiers (unchanged — `manipulation_index` is a derivation, not a new flag). Tag **`v1.2.0-phase4.5`** ready to cut. |
 
 > Phase 4+ schemas are tracked in [`WORKFLOW.md`](WORKFLOW.md) "Defense

diff --git a/compute/config.py b/compute/config.py
@@ -27,7 +27,7 @@
 MODELS_DIR: Path = PROJECT_ROOT / "models"
 
 UNIVERSE: str = "SP500"
-SCHEMA_VERSION: str = "0.8.0-phase4.5f"
+SCHEMA_VERSION: str = "0.9.0-phase4h"
 
 PRICES_PERIOD: str = "5y"
 MAX_PARALLEL_FETCHES: int = 10
@@ -181,3 +181,71 @@
 # more often is wasted bandwidth.
 OSAP_RETURNS_CACHE: Path = CACHE_DIR / "osap" / "returns.parquet"
 OSAP_RETURNS_MAX_AGE_DAYS: int = 31
+
+# --- Phase 4h: 100-signal manifest ---
+#
+# Theme buckets mirror the table at
+# `.claude/skills/phase-4/osap-integration/PLAN.md` L60-73
+# (Value/Quality/Momentum/Investment/Risk/EarningsNews/Trading +
+# Misc). CamelCase names follow the Chen-Zimmermann OSAP convention
+# (see github.com/OpenSourceAP/CrossSection signal docs).
+#
+# Aspirational manifest — commit 4's PBO/DSR gate
+# (`compute/validation/osap_validation.py`) will catch any signal that
+# does not resolve in the fetched OSAP returns DataFrame and log it
+# under `metadata.json::osap_excluded_signals` with reason
+# `not_found_in_osap_dataset` so the manifest can be tuned over
+# subsequent compute runs without a redeploy.
+OSAP_SIGNALS_BY_THEME: dict[str, tuple[str, ...]] = {
+    "Value": (
+        "BM", "EP", "SP", "CF", "DivYieldST", "NetEquityFinance",
+        "NetDebtFinance", "BookLeverage", "IntanBM", "IntanCFP",
+        "IntanEP", "IntanSP", "DebtIssuance", "OperatingLeverage",
+        "CompositeDebtIssuance",
+    ),  # 15
+    "Quality": (
+        "GP", "RoE", "RoA", "AssetTurnover", "AOP", "OperatingProfit",
+        "RDS", "RD", "ProfitMargin", "CashProf", "GrcapxThreeYears",
+        "AccrualsBM", "OperatingAccruals", "PctTotAcc", "Cash",
+    ),  # 15
+    "Momentum": (
+        "Mom12m", "Mom6m", "Mom36m", "Mom1m", "STreversal", "IndMom",
+        "IntMom", "EarnSupBig", "MomVol", "MomOffSeason", "MomSeason",
+        "Recomm_ShortInterest",
+    ),  # 12
+    "Investment": (
+        "AssetGrowth", "ChNNCOA", "ChNWC", "GrLTNOA", "ChInv",
+        "ShareIss1Y", "ShareIss5Y", "GrSaleToGrInv",
+    ),  # 8
+    "Risk": (
+        "MaxRet", "IdioVol3F", "IdioVolAHT", "BetaTailRisk", "Beta",
+        "BetaFP", "ReturnSkew", "ReturnSkew3F", "IndIPO",
+        "AbnormalAccruals",
+    ),  # 10
+    "EarningsNews": (
+        "SUE", "EarningsSurprise", "REV6", "RDIPO", "NumEarnIncrease",
+        "ConsRecomm", "Recomm", "EarningsForecastDisparity",
+    ),  # 8
+    "Trading": (
+        "Illiquidity", "Turnover", "Bid_Ask", "VolMkt", "VolSD",
+        "dVolCall", "Coskewness",
+    ),  # 7
+    "Misc": (
+        "Leverage", "OrgCapital", "Tax", "ChAssetTurnover", "BAR",
+        "GS", "AnnouncementReturn", "OScore", "ZScore", "CredRatDG",
+        "FailureProbability", "IRA", "FR", "BPEBM", "Activism1",
+        "Activism2", "AnalystValue", "ChForecastAccrual", "ChInvIA",
+        "AnalystRevision", "ForecastDispersion", "GrowthCapEx",
+        "MeanRankRevGrowth", "AbnormalAccrualsPercent", "ChEQ",
+    ),  # 25
+}
+
+OSAP_SIGNALS_100: tuple[str, ...] = tuple(
+    sig for theme_signals in OSAP_SIGNALS_BY_THEME.values() for sig in theme_signals
+)
+assert len(OSAP_SIGNALS_100) == 100, (
+    f"OSAP_SIGNALS_100 must have exactly 100 entries, got {len(OSAP_SIGNALS_100)}"
+)
+assert len(set(OSAP_SIGNALS_100)) == 100, (
+    "OSAP_SIGNALS_100 contains duplicate signal names"
+)