diff --git a/.github/workflows/compute-rankings.yml b/.github/workflows/compute-rankings.yml
index e62e70b06..cde5afa85 100644
--- a/.github/workflows/compute-rankings.yml
+++ b/.github/workflows/compute-rankings.yml
@@ -46,7 +46,10 @@ jobs:
       - name: Install
         run: |
           python -m pip install --upgrade pip
-          pip install -e .
+          # Phase 4h: weekly compute imports compute/ingest/osap.py which
+          # imports the `openassetpricing` package — installed via the
+          # `factors` extra (pinned to ==0.0.2 in pyproject.toml).
+          pip install -e ".[factors]"
 
       - name: Compute current quarter id
         id: quarter
diff --git a/CLAUDE.md b/CLAUDE.md
index 463205f90..2b8a9c722 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -153,14 +153,16 @@ non-connector-bound work.
 
 ## Phase status
 
-Current schema: **`0.8.0-phase4.5f`** · Defense layer: **17**
-(7 active vetoes + 10 annotates + 5 numerical guards +
-`manipulation_index` rollup). Latest release tag:
+Current schema: **`0.9.0-phase4h`** (bumped from `0.8.0-phase4.5f` in
+PR #112). Defense layer: **17** (7 active vetoes + 10 annotates + 5
+numerical guards + `manipulation_index` rollup) — Phase 4h adds
+observability surface, no new veto. Latest release tag:
 [**`v1.2.0-phase4.5`**](https://github.com/dackclup/quantrank/releases/tag/v1.2.0-phase4.5)
-**SHIPPED 2026-05-17** at commit `6d414a9b` — **Phase 4.5 cluster
-✅ complete** (6 sub-PRs). Production verified run #51
-(`b1588b2a`, 5m14s warm-cache). Test suite: 856 offline + 17
-`@network`.
+shipped 2026-05-17 at commit `6d414a9b`. **Phase 4h in flight in PR
+#112** — OSAP signal replication (factor-exposure proxy) + PBO/DSR
+hard gate (PR #60 reuse) + rolling-12m IC observability + Path-b
+composite × OSAP blend (50/50 default, Top-5 still ranks raw
+composite per Rule 16). Test suite: 906 offline + 19 `@network`.
 
 **Next deliverable** (pick by appetite — three tracks parallelize):
 **4.5e** (Form 4 insider, ~3w → v1.3.0) · **4h/4i/4j/4k** factor
diff --git a/PHASE_STATUS.md b/PHASE_STATUS.md
index fc8f5a6cc..b507a4ade 100644
--- a/PHASE_STATUS.md
+++ b/PHASE_STATUS.md
@@ -6,7 +6,7 @@
 | 1 | Universe + prices ingestion | ✅ DONE — 2026-05-08 |
 | 2 | Fundamentals via SEC EDGAR | ✅ DONE — 2026-05-08 |
 | 3 | Classical features + composite + **defenses** → **v1.0** | ✅ **DONE — 2026-05-14** (v1.0.0 tagged + GitHub release) |
-| 4 | Factor consolidation (OSAP + JKP + Qlib + IPCA) → **v1.1** | 🟡 IN PROGRESS — 4a-4g + 4c.1/4c.2/4c.3 + PR 4b §1+§2 all merged; PR 4b §3 IC-decay output deferred to Phase 5; **next: 4h / 4i / 4j / 4k factor integrations** (PBO/DSR gate ready), can run in parallel with Phase 4.5 |
+| 4 | Factor consolidation (OSAP + JKP + Qlib + IPCA) → **v1.1** | 🟡 IN PROGRESS — 4a-4g + 4c.1/4c.2/4c.3 + PR 4b §1+§2 all merged; **PR #112 (Phase 4h)** ships OSAP signal replication + PBO/DSR gate + Path-b 50/50 blend (schema bump `0.8.0-phase4.5f` → `0.9.0-phase4h`, no new veto — annotate-only blend, Top-5 still ranks raw composite per Rule 16, 5-commit cluster on `claude/resume-quantrank-phase-4.5-Zh0pO`); 4i/4j/4k pending; PR 4b §3 IC-decay output deferred to Phase 5 |
 | **4.5** | **Earnings-manipulation defense cluster** → **v1.2** | ✅ **DONE 2026-05-17** — **tag [`v1.2.0-phase4.5`](https://github.com/dackclup/quantrank/releases/tag/v1.2.0-phase4.5) cut** at commit `6d414a9b`. 6 sub-PRs (#89/#90/#91 + #93 + #95 + #97 + #100). Active vetoes **5 → 7**; defense layer **9 → 17** (= 7 vetoes + 10 annotates). 4.5f adds `manipulation_index` (0-100 rollup) + `composite_score_adjusted` (soft penalty, max 10 pts, informational only) + `ManipulationRiskCard` UI + schema bump **`0.7.1-phase4g` → `0.8.0-phase4.5f`**. Production verified run #51 (`b1588b2a`, 5m14s warm-cache): card fires on 158/502 (31.5%); HIGH band 2 (SMCI=84 · WAT=64), MODERATE 60, LOW 96. 4.5e Form-4 insider clustering **deferred to v1.3.0** — reserved-slot weights already declared in `FLAG_WEIGHTS`. |
 | 5 | ML meta-learner (Triple-Barrier + Meta-Labeling + Conformal) + SHAP | ⚪ not started |
 | 6 | Sentiment v2 (FinBERT + Whisper + 8-K Lazy Prices) | ⚪ not started |
diff --git a/SKILL.md b/SKILL.md
index 5e27404a4..e7b015c77 100644
--- a/SKILL.md
+++ b/SKILL.md
@@ -304,6 +304,7 @@ Schema versions:
 | `0.7.0-phase4g` | Phase 4g | **8-K Tier-2 event defenses re-enabled** (PR #79, merged 2026-05-15 on `c35c6d40`, closes [issue #14](https://github.com/dackclup/quantrank/issues/14)). Flipped `compute/scoring/tier2._EIGHT_K_DEFENSES_ENABLED = True` after the PR 3d workflow-timeout deferral (root cause cleared by PR #58 cache layers + PR 3d tenacity tightening). `non_reliance_filing` (Item 4.02 hard veto, 365d lookback, Schroeder 2024 SSRN — ~50% of 4.02 filings precede formal restatement) returns to the active layer as the **5th active veto**. `auditor_change` (Item 4.01 annotate, 730d lookback, Reg S-K Item 304, Cohen-Malloy-Nguyen 2020 type) joins the Tier-2 annotate surface. No data-schema-shape delta — only the feature-flag flip + reason-taxonomy expansion. |
 | `0.7.1-phase4g` | Phase 4g | **`price_change_1d_pct` additive field** (squash-merged via PR #80, commit `1509f707`). New optional `float \| None` field on `StockSummary` + `StockDetail` — day-over-day percent change from the prior trading-day close. Computed once in `compute/main.py:_fetch_prices_one` from the last two valid yfinance closes; null for newly-IPO'd tickers (only one close available). Lets the ranking-table mobile cards render a change pill without lazy-fetching 502 per-stock history JSONs. Per `phase-4/schema-versioning/PLAN.md`: "Add a new optional field (default = None) → patch". Production metadata.version stays `0.7.0-phase4g` until next weekly compute. |
 | `0.7.1-phase4g` (no schema delta) | Phase 4.5a-4.5d wave | **Earnings-manipulation defense cluster — sub-PRs 4.5a + 4.5b + 4.5c + 4.5d shipped 2026-05-16/17** (PRs #89/#90/#91 + #93 + #95 + #97). **No data-schema-shape delta** — all 9 new flag identifiers are strings appended to existing `risk_flags: list[str]` (active vetoes) + `valuation_warnings: list[str]` (annotates) arrays. Active vetoes **5 → 7**: + `beneish_manipulation_veto` (Beneish 1999, M > −1.78) + `dechow_manipulation_veto` (Dechow 2011, F > 3.0). Annotates added: `manipulation_triple_flag` (4.5a joint gate, 2 fired: SMCI · WAT), `restatement_history` (4.5b, 59 fired / 11.8% — Hennes-Leone-Miller 2008 *TAR*), `late_filing_notification` (4.5b, 2 fired: HAS · Q — Bartov-Lai-Yeung 2002 *JAR*), `rem_suspect` (4.5c, 16 fired / 3.2% — Roychowdhury 2006 *JAE* 3-proxy REM via per-sector OLS), `accruals_momentum_high` (4.5d, 50 fired / 10.0% — Sloan 1996 / Beneish 1999 Δ(TATA) > +0.05 over 3y), `loss_avoidance_pattern` (4.5d, 0 fired — Burgstahler-Dichev 1997 cohort thresholds too tight for S&P 500 large-cap universe, file as follow-up). Also closes [issue #7](https://github.com/dackclup/quantrank/issues/7) (Sloan over-firing on Financials: 21.3% → 11.7%, sector spread 7.7× → 1.4×). 2 new cache dirs (`compute/cache/edgar_amendments/` + `compute/cache/edgar_late_filings/`, 7d TTL each). Test suite **646 → 831 offline**. Reason taxonomy: 24 stable + 2 Tier-3 + 2 new vetoes + 6 new annotates = **34 stable identifiers**. |
+| **`0.9.0-phase4h`** (in flight in PR #112) | Phase 4h | **OSAP signal replication + PBO/DSR hard gate + Path-b composite × OSAP blend** (5-commit cluster on branch `claude/resume-quantrank-phase-4.5-Zh0pO`: 06bdac76 schema-foundation, b79983f6 osap_replicate proxy + 100-signal manifest, a6760d91 osap_blend Path-b, df4d9bd2 osap_validation PBO/DSR gate + rolling-12m-IC, [TBD] compute/main.py wiring + @network e2e). **Minor bump** — 6 new optional fields land simultaneously: `StockDetail.osap_signals: dict[str, float] \| None` + `StockDetail.osap_blended_score: float \| None`; `Metadata.osap_signals_used: list[str] \| None`, `Metadata.osap_excluded_signals: list[str] \| None`, `Metadata.osap_signals_ic_12m: dict[str, float] \| None`, `Metadata.osap_signals_coverage_pct: dict[str, float] \| None`. **OSAP blend stays OUTSIDE `compute_composite()`** — `PHASE3_WEIGHTS` sum-to-1.0 invariant (`compute/scoring/composite.py:43-45`) intact; Path-b formula `blended = (1 - weight) × composite_score + weight × osap_signal_aggregate`, default `weight=0.5` locked at `osap-integration/PLAN.md:168-170`. **Hard gate** = PBO ≤ 0.5 AND DSR > 0 via PR #60's `factor_passes_gates`; rolling-12m Spearman IC is observability-only (full walk-forward CV deferred to Phase 5 per `defense-infrastructure/PLAN.md:270`). **No new veto** (Top-5 still ranks raw `composite_score` per Rule 16; `osap_blended_score` is informational); defense layer stays at **17**. **Universe-gap policy** — tickers with no OSAP coverage pass `composite_score` through unchanged (no impute, distinct from pillar `neutralize_missing=True`). **NaN policy in PBO cohort** — zero-fill (not mean-fill, not dropna) preserves Bailey 2014 `n_trials = cohort_size` multiple-testing correction; sparse signals naturally lose on DSR (low Sharpe → DSR rejection). **OSAP failure is observability-only** — wrapped in try/except in `compute/main.py` so live-fetch / package failure NEVER blocks weekly production; all 6 new fields degrade to `None`. Test suite **856 → 906 offline + 18 → 19 `@network`** (commits 2-5 added 50 tests; e2e network test added in commit 5). Reason taxonomy unchanged at 34 stable identifiers. Tag `v1.1.0-phase4` (or `v1.3.0` for the 4.5e+4h combined release) deferred until 4i/4j/4k also merge. |
 | **`0.8.0-phase4.5f`** | Phase 4.5f | **Manipulation Composite + soft composite penalty + UI** (PR #100 merged 2026-05-17 on commit `b1588b2a`; production verified on commit `e57f09cb`, run #51, warm-cache 5m14s). **Minor bump** because 5 new optional fields land simultaneously + new UI surface ships + tag `v1.2.0-phase4.5` coordinates with the data-version bump (semver coupling). Additive optional fields: `StockSummary.manipulation_index: float \| None`, `StockSummary.composite_score_adjusted: float \| None`, `StockDetail.manipulation_index`, `StockDetail.composite_score_adjusted`, `StockDetail.manipulation_components: dict[str, bool] \| None`. **`manipulation_index`** is a 0-100 rollup over the 4.5a-d flag set via a per-flag additive weight table in `compute/scoring/manipulation_index.py::FLAG_WEIGHTS` (active vetoes 15-20 pts · joint-gate 10 · annotates 5-8 · Tier-3 soft 3); clipped to `[0, 100]`. **`composite_score_adjusted`** applies the soft penalty `composite − 0.5 × (index / 100) × 20` (max 10-pt deduction at index = 100); the original `composite_score` field is preserved untouched per Rule 9 audit trail. **Rank source stays the raw composite per Rule 16** — the adjusted value is informational only, surfaced on the new detail-page `ManipulationRiskCard` (3-band outlined-light: emerald LOW / amber MODERATE / rose HIGH) with the in-line qualifier "Composite penalty: −X.XX pts (informational; rank uses raw composite)". Production: 158/502 (31.5%) fire the card (HIGH 2: SMCI=84 · WAT=64; MODERATE 60; LOW 96). **Phase 4.5e reserved-slot weights declared** (`INSIDER_SELL_CLUSTER_WEIGHT_RESERVED = 10`, `C_SUITE_UNUSUAL_SELL_WEIGHT_RESERVED = 5`) — the 4.5e PR uncomments 2 entries in `FLAG_WEIGHTS`, no calibration cascade. Test suite **831 → 856 offline**. Reason taxonomy: 34 stable identifiers (unchanged — `manipulation_index` is a derivation, not a new flag). Tag **`v1.2.0-phase4.5`** ready to cut. |
 
 > Phase 4+ schemas are tracked in [`WORKFLOW.md`](WORKFLOW.md) "Defense
diff --git a/compute/config.py b/compute/config.py
index 5c0741e71..5bf93f0bb 100644
--- a/compute/config.py
+++ b/compute/config.py
@@ -27,7 +27,7 @@
 MODELS_DIR: Path = PROJECT_ROOT / "models"
 
 UNIVERSE: str = "SP500"
-SCHEMA_VERSION: str = "0.8.0-phase4.5f"
+SCHEMA_VERSION: str = "0.9.0-phase4h"
 
 PRICES_PERIOD: str = "5y"
 MAX_PARALLEL_FETCHES: int = 10
@@ -181,3 +181,71 @@
 # more often is wasted bandwidth.
 OSAP_RETURNS_CACHE: Path = CACHE_DIR / "osap" / "returns.parquet"
 OSAP_RETURNS_MAX_AGE_DAYS: int = 31
+
+# --- Phase 4h: 100-signal manifest ---
+#
+# Theme buckets mirror the table at
+# `.claude/skills/phase-4/osap-integration/PLAN.md` L60-73
+# (Value/Quality/Momentum/Investment/Risk/EarningsNews/Trading +
+# Misc). CamelCase names follow the Chen-Zimmermann OSAP convention
+# (see github.com/OpenSourceAP/CrossSection signal docs).
+#
+# Aspirational manifest — commit 4's PBO/DSR gate
+# (`compute/validation/osap_validation.py`) will catch any signal that
+# does not resolve in the fetched OSAP returns DataFrame and log it
+# under `metadata.json::osap_excluded_signals` with reason
+# `not_found_in_osap_dataset` so the manifest can be tuned over
+# subsequent compute runs without a redeploy.
+OSAP_SIGNALS_BY_THEME: dict[str, tuple[str, ...]] = {
+    "Value": (
+        "BM", "EP", "SP", "CF", "DivYieldST", "NetEquityFinance",
+        "NetDebtFinance", "BookLeverage", "IntanBM", "IntanCFP",
+        "IntanEP", "IntanSP", "DebtIssuance", "OperatingLeverage",
+        "CompositeDebtIssuance",
+    ),  # 15
+    "Quality": (
+        "GP", "RoE", "RoA", "AssetTurnover", "AOP", "OperatingProfit",
+        "RDS", "RD", "ProfitMargin", "CashProf", "GrcapxThreeYears",
+        "AccrualsBM", "OperatingAccruals", "PctTotAcc", "Cash",
+    ),  # 15
+    "Momentum": (
+        "Mom12m", "Mom6m", "Mom36m", "Mom1m", "STreversal", "IndMom",
+        "IntMom", "EarnSupBig", "MomVol", "MomOffSeason", "MomSeason",
+        "Recomm_ShortInterest",
+    ),  # 12
+    "Investment": (
+        "AssetGrowth", "ChNNCOA", "ChNWC", "GrLTNOA", "ChInv",
+        "ShareIss1Y", "ShareIss5Y", "GrSaleToGrInv",
+    ),  # 8
+    "Risk": (
+        "MaxRet", "IdioVol3F", "IdioVolAHT", "BetaTailRisk", "Beta",
+        "BetaFP", "ReturnSkew", "ReturnSkew3F", "IndIPO",
+        "AbnormalAccruals",
+    ),  # 10
+    "EarningsNews": (
+        "SUE", "EarningsSurprise", "REV6", "RDIPO", "NumEarnIncrease",
+        "ConsRecomm", "Recomm", "EarningsForecastDisparity",
+    ),  # 8
+    "Trading": (
+        "Illiquidity", "Turnover", "Bid_Ask", "VolMkt", "VolSD",
+        "dVolCall", "Coskewness",
+    ),  # 7
+    "Misc": (
+        "Leverage", "OrgCapital", "Tax", "ChAssetTurnover", "BAR",
+        "GS", "AnnouncementReturn", "OScore", "ZScore", "CredRatDG",
+        "FailureProbability", "IRA", "FR", "BPEBM", "Activism1",
+        "Activism2", "AnalystValue", "ChForecastAccrual", "ChInvIA",
+        "AnalystRevision", "ForecastDispersion", "GrowthCapEx",
+        "MeanRankRevGrowth", "AbnormalAccrualsPercent", "ChEQ",
+    ),  # 25
+}
+
+OSAP_SIGNALS_100: tuple[str, ...] = tuple(
+    sig for theme_signals in OSAP_SIGNALS_BY_THEME.values() for sig in theme_signals
+)
+assert len(OSAP_SIGNALS_100) == 100, (
+    f"OSAP_SIGNALS_100 must have exactly 100 entries, got {len(OSAP_SIGNALS_100)}"
+)
+assert len(set(OSAP_SIGNALS_100)) == 100, (
+    "OSAP_SIGNALS_100 contains duplicate signal names"
+)
diff --git a/compute/features/osap_replicate.py b/compute/features/osap_replicate.py
new file mode 100644
index 000000000..a98b72784
--- /dev/null
+++ b/compute/features/osap_replicate.py
@@ -0,0 +1,323 @@
+"""OpenAssetPricing (OSAP) per-stock signal replication.
+
+Phase 4h commit 2. Builds a per-ticker signal map from OSAP's
+long-short portfolio returns (Chen-Zimmermann 2022 *Critical Finance
+Review*, github.com/OpenSourceAP/CrossSection). The fetcher in
+``compute/ingest/osap.py`` returns the bulk parquet of
+``signalname × port × date`` rows; this module aligns ``port=01``
+(long bucket) against ``port=10`` (short bucket) per
+``(signalname, date)``, picks the most-recent cross-section at or
+before ``as_of``, ranks signals cross-sectionally by their long-short
+return, and surfaces the per-signal rank as the ticker's OSAP
+exposure proxy.
+
+**Scope note** (locked 2026-05-18 plan audit). This is the
+*factor-exposure proxy* version: every ticker receives the same
+signal map, derived from the market-wide OSAP long-short return at
+``as_of``. True per-stock signal replication — porting the ~100
+signal formulas from OSAP's SAS / Stata source into pandas, fed by
+our existing ``compute/features/`` pillar inputs — is the deferred
+heavy lift. The proxy version is sufficient for Phase 4h's blend
+target because:
+
+1. ``osap_blended_score`` is *observability-only* in this phase
+   (Top-5 ranking still uses ``composite_score``; SKILL.md Rule 16).
+2. PR 4b §2 PBO/DSR gate
+   (``compute/validation/pbo_dsr.py::factor_passes_gates``) runs on
+   the long-short returns themselves, not the per-stock projection
+   — so signal acceptance is identical to the full version.
+3. Per-stock replication of all 100 signals slips Phase 4h by weeks
+   without unblocking 4i/4j/4k.
+
+If this module needs to graduate to true per-stock replication
+later, the contract (``compute_osap_signals(returns, tickers, as_of)
+-> dict[str, dict[str, float] | None]``) stays stable — only the
+inner ``signal -> rank`` derivation changes per ticker.
+
+Universe-gap policy: tickers receive ``None`` (NOT zero, NOT an
+imputed neutral) when the as-of cross-section is empty (e.g.,
+``as_of`` precedes OSAP coverage). Pillar
+``compute_composite(neutralize_missing=True)`` imputes 50.0 for
+missing pillars; OSAP intentionally does not — the blend layer
+(commit 3, ``compute/scoring/osap_blend.py``) treats ``None`` as
+"no OSAP adjustment" and passes ``composite_score`` through
+unchanged.
+
+No tenacity / network access in this module — all I/O is delegated
+to the ingest layer.
+"""
+
+from __future__ import annotations
+
+import logging
+from datetime import date
+
+import pandas as pd
+
+from compute import config
+
+logger = logging.getLogger(__name__)
+
+# Canonical port labels in OSAP's PredictorPortsFull.csv ("op" dataset
+# from openassetpricing). port=01 is the LONG bucket (highest signal
+# rank); port=10 is the SHORT bucket. Decile-bucketed signals also use
+# port=02..09 but Phase 4h only consumes the corner buckets.
+LONG_PORT_LABEL: str = "01"
+SHORT_PORT_LABEL: str = "10"
+
+# Columns the inbound DataFrame must carry. This is a *load-bearing*
+# contract — the ingest layer already enforces
+# ``REQUIRED_COLUMNS`` (``signalname, port, date, ret``); this module
+# tightens the requirement only on the same four columns.
+_REQUIRED_INPUT_COLUMNS: frozenset[str] = frozenset(
+    {"signalname", "port", "date", "ret"}
+)
+
+
+def _normalize_port_label(port_series: pd.Series) -> pd.Series:
+    """Coerce ``port`` to the canonical zero-padded string ('01', '02',
+    ..., '10').
+
+    OSAP's parquet may store ``port`` as int (1..10), int64, or
+    zero-padded string depending on the release. The pivot step below
+    is column-name sensitive, so we normalize once at the entry point
+    rather than scatter ``astype(str).str.zfill(2)`` across helpers.
+    """
+    # Cast through str first to absorb any int / int64 / numpy.int64 /
+    # categorical input. zfill ensures '1' → '01' and '10' stays '10'.
+    return port_series.astype(str).str.zfill(2)
+
+
+def compute_long_short_returns(returns: pd.DataFrame) -> pd.DataFrame:
+    """Compute long-short return per ``(signalname, date)``.
+
+    Algorithm:
+        1. Filter to rows where ``port`` is the LONG or SHORT bucket
+           (drops decile buckets 02..09).
+        2. Pivot ``port`` to columns indexed by ``(signalname, date)``,
+           with ``ret`` as the value.
+        3. Compute ``ls_return = ret[port=01] - ret[port=10]``.
+        4. Drop ``(signalname, date)`` rows where either port is
+           missing (incomplete coverage).
+
+    Returns a DataFrame with columns: ``signalname``, ``date``,
+    ``ls_return``. Empty DataFrame (same columns) when the input has
+    no valid long-short pairs.
+    """
+    missing = _REQUIRED_INPUT_COLUMNS - set(returns.columns)
+    if missing:
+        raise ValueError(
+            f"compute_long_short_returns missing columns {sorted(missing)}; "
+            f"got {sorted(returns.columns)}. Check compute/ingest/osap.py "
+            f"REQUIRED_COLUMNS contract."
+        )
+
+    if returns.empty:
+        return pd.DataFrame(columns=["signalname", "date", "ls_return"])
+
+    df = returns.copy()
+    df["port"] = _normalize_port_label(df["port"])
+    df = df[df["port"].isin([LONG_PORT_LABEL, SHORT_PORT_LABEL])]
+    if df.empty:
+        return pd.DataFrame(columns=["signalname", "date", "ls_return"])
+
+    pivot = df.pivot_table(
+        index=["signalname", "date"],
+        columns="port",
+        values="ret",
+        aggfunc="first",
+    )
+
+    # Both corner buckets must be present for a long-short return to be
+    # meaningful. A signal-date with only port=01 (or only port=10) is
+    # silently dropped — coverage shortfall surfaces in the
+    # `osap_signals_coverage_pct` metadata field.
+    if LONG_PORT_LABEL not in pivot.columns or SHORT_PORT_LABEL not in pivot.columns:
+        return pd.DataFrame(columns=["signalname", "date", "ls_return"])
+
+    pivot["ls_return"] = pivot[LONG_PORT_LABEL] - pivot[SHORT_PORT_LABEL]
+    pivot = pivot.dropna(subset=["ls_return"])
+    return pivot[["ls_return"]].reset_index()
+
+
+def select_as_of_cross_section(
+    ls_returns: pd.DataFrame, as_of: date
+) -> pd.DataFrame:
+    """For each signal, pick the most recent ``ls_return`` at or before
+    ``as_of``.
+
+    OSAP releases monthly — ``as_of`` is typically the most recent
+    month-end. Signals whose latest available observation precedes
+    ``as_of`` by more than one release cycle still surface (the staleness
+    is intentional — Phase 4h's universe coverage is a separate metric).
+    Signals with no observation at or before ``as_of`` are dropped.
+
+    Returns a DataFrame with columns: ``signalname``, ``date``,
+    ``ls_return``. Empty DataFrame when the entire window is empty.
+    """
+    if ls_returns.empty:
+        return pd.DataFrame(columns=["signalname", "date", "ls_return"])
+
+    as_of_ts = pd.Timestamp(as_of)
+    df = ls_returns.copy()
+    df["_date_ts"] = pd.to_datetime(df["date"])
+    df = df[df["_date_ts"] <= as_of_ts]
+    if df.empty:
+        return pd.DataFrame(columns=["signalname", "date", "ls_return"])
+
+    # For each signal, idxmax over the timestamp picks the most recent
+    # observation at or before as_of. Behaviour on tie (same signal
+    # observed twice on the same date) is to keep the first row —
+    # acceptable because OSAP releases are monthly and per-signal
+    # uniqueness on (signalname, date) is contractually enforced upstream.
+    idx = df.groupby("signalname")["_date_ts"].idxmax()
+    cross_section = (
+        df.loc[idx, ["signalname", "date", "ls_return"]]
+        .sort_values("signalname")
+        .reset_index(drop=True)
+    )
+    return cross_section
+
+
+def rank_signals_cross_sectional(cross_section: pd.DataFrame) -> pd.Series:
+    """Rank signals by ``ls_return`` cross-sectionally, normalised to
+    ``[0, 1]``.
+
+    Uses ``pandas.Series.rank(method='average', pct=True)`` —
+    average-rank for ties, percentile-normalised. No scipy dependency.
+
+    Returns a Series indexed by ``signalname`` whose values are in
+    ``(0, 1]``. Empty Series (dtype float, name 'rank') when the input
+    cross-section is empty.
+    """
+    if cross_section.empty:
+        return pd.Series(dtype=float, name="rank")
+
+    ranks = cross_section.set_index("signalname")["ls_return"].rank(
+        method="average", pct=True
+    )
+    ranks.name = "rank"
+    return ranks
+
+
+def compute_osap_signals(
+    returns: pd.DataFrame,
+    tickers: list[str],
+    as_of: date,
+    requested_signals: tuple[str, ...] | None = None,
+) -> dict[str, dict[str, float] | None]:
+    """Build the per-ticker OSAP signal map for ``as_of``.
+
+    Args:
+        returns: DataFrame from
+            ``compute/ingest/osap.py::fetch_osap_returns``. Must include
+            columns: ``signalname``, ``port``, ``date``, ``ret``.
+        tickers: ticker symbols to populate the map for. Tickers are
+            **not** filtered or validated — Phase 4h's universe set is
+            assumed to be passed in.
+        as_of: cross-section date. The most recent observation at or
+            before this date is used per signal.
+        requested_signals: optional restriction to a subset of signals.
+            Defaults to ``config.OSAP_SIGNALS_100`` (the 100-signal
+            manifest).
+
+    Returns:
+        ``{ticker: {signalname: rank} | None}``.
+
+        - Inner-dict values are floats in ``(0, 1]`` (cross-sectional
+          rank of the long-short return at ``as_of``).
+        - Outer values are ``None`` when the as-of cross-section is
+          empty (e.g., ``as_of`` precedes OSAP coverage, or no requested
+          signal has any data). Distinct from pillar
+          ``neutralize_missing`` — the blend layer treats ``None`` as
+          "no OSAP adjustment" and passes ``composite_score`` through.
+
+    Phase 4h commit 2 implements the *factor-exposure proxy*: every
+    ticker receives the same signal map, derived from the market-wide
+    OSAP long-short return cross-section. See the module docstring for
+    why this is sufficient for the Phase 4h blend target.
+    """
+    if requested_signals is None:
+        requested_signals = config.OSAP_SIGNALS_100
+
+    requested_set = set(requested_signals)
+    none_map: dict[str, dict[str, float] | None] = {t: None for t in tickers}
+
+    if returns.empty:
+        logger.info("OSAP returns DataFrame empty; returning None for all tickers")
+        return none_map
+
+    df = returns[returns["signalname"].isin(requested_set)]
+    if df.empty:
+        logger.info(
+            "No requested signals found in OSAP returns (manifest=%d, "
+            "available=%d); returning None for all tickers",
+            len(requested_set),
+            returns["signalname"].nunique(),
+        )
+        return none_map
+
+    ls = compute_long_short_returns(df)
+    if ls.empty:
+        logger.info(
+            "compute_long_short_returns produced empty cross-section for as_of=%s",
+            as_of,
+        )
+        return none_map
+
+    cs = select_as_of_cross_section(ls, as_of)
+    if cs.empty:
+        logger.info(
+            "Cross-section at as_of=%s is empty (likely as_of precedes coverage)",
+            as_of,
+        )
+        return none_map
+
+    ranks = rank_signals_cross_sectional(cs)
+    if ranks.empty:
+        return none_map
+
+    # Factor-exposure proxy: every ticker gets the same signal map.
+    # Phase 4h commit 2 design — see module docstring for rationale.
+    signal_map: dict[str, float] = {
+        str(sig): float(rank) for sig, rank in ranks.items()
+    }
+    logger.info(
+        "OSAP signals populated: %d signals × %d tickers (proxy mode)",
+        len(signal_map),
+        len(tickers),
+    )
+    # Use a fresh dict per ticker so downstream mutation doesn't leak
+    # across rows (defensive — Pydantic deepcopies on validate, but the
+    # writer path may not).
+    return {t: dict(signal_map) for t in tickers}
+
+
+def coverage_by_signal(
+    signal_map: dict[str, dict[str, float] | None],
+) -> dict[str, float]:
+    """Report per-signal coverage % across the populated ticker set.
+
+    In the Phase 4h commit 2 *proxy* mode every ticker gets the same
+    signal map, so per-signal coverage is binary: either 100.0 (signal
+    present in the cross-section) or 0.0 (signal absent or all-None
+    tickers). Surfaced into
+    ``metadata.json::osap_signals_coverage_pct`` by commit 5's
+    ``compute/main.py`` wiring.
+
+    Returns ``{signalname: coverage_pct}`` covering only signals that
+    appeared in at least one ticker's map. Empty dict when all tickers
+    are ``None``.
+    """
+    total = len(signal_map)
+    if total == 0:
+        return {}
+
+    counts: dict[str, int] = {}
+    for sig_dict in signal_map.values():
+        if sig_dict is None:
+            continue
+        for sig in sig_dict:
+            counts[sig] = counts.get(sig, 0) + 1
+
+    return {sig: 100.0 * count / total for sig, count in counts.items()}
diff --git a/compute/ingest/osap.py b/compute/ingest/osap.py
index c10d1ccae..15aeee8a6 100644
--- a/compute/ingest/osap.py
+++ b/compute/ingest/osap.py
@@ -27,6 +27,7 @@
 
 import logging
 import time
+from datetime import date
 
 import openassetpricing
 import pandas as pd
@@ -78,12 +79,23 @@ def _is_fresh(cache_path, max_age_days: int) -> bool:
     return age_days < max_age_days
 
 
-def fetch_osap_returns(force_refresh: bool = False) -> pd.DataFrame:
+def fetch_osap_returns(
+    force_refresh: bool = False,
+    *,
+    signals: list[str] | None = None,
+    as_of: date | None = None,
+) -> pd.DataFrame:
     """Return OSAP long-short portfolio returns, hitting the cache when fresh.
 
     Returns a DataFrame whose columns include at minimum
-    ``REQUIRED_COLUMNS``. The scout PR enforces that contract; Phase 4h
-    will add keyword-only ``signals`` / ``as_of`` filters (non-breaking).
+    ``REQUIRED_COLUMNS``. When ``signals`` is provided, the returned
+    frame is filtered to rows whose ``signalname`` is in the list (the
+    cache always stores the full bulk parquet — filtering happens
+    post-load so a callsite that asks for 20 signals doesn't invalidate
+    a callsite that asks for all 1,188). When ``as_of`` is provided,
+    rows whose ``date`` is after ``as_of`` are dropped — Phase 4h's
+    replication callers use this to keep the cross-section honest
+    against the as-of point-in-time.
     """
     cache = config.OSAP_RETURNS_CACHE
     if not force_refresh and _is_fresh(cache, config.OSAP_RETURNS_MAX_AGE_DAYS):
@@ -103,4 +115,9 @@ def fetch_osap_returns(force_refresh: bool = False) -> pd.DataFrame:
             f"OSAP returns missing required columns {sorted(missing)}; "
             f"got {sorted(df.columns)}. Upstream API may have changed."
         )
+
+    if signals is not None:
+        df = df[df["signalname"].isin(signals)]
+    if as_of is not None:
+        df = df[pd.to_datetime(df["date"]) <= pd.Timestamp(as_of)]
     return df
diff --git a/compute/main.py b/compute/main.py
index 8b5ecb069..a2ed74980 100644
--- a/compute/main.py
+++ b/compute/main.py
@@ -43,6 +43,11 @@
 import pandas as pd
 
 from compute import config
+from compute.features.osap_replicate import (
+    compute_long_short_returns,
+    compute_osap_signals,
+    coverage_by_signal,
+)
 from compute.ingest.cross_source import (
     validate_market_cap as cross_source_validate_market_cap,
 )
@@ -52,6 +57,7 @@
     fetch_fundamentals,
     fetch_fundamentals_history,
 )
+from compute.ingest.osap import fetch_osap_returns
 from compute.ingest.prices import fetch_prices, fetch_spy_benchmark
 from compute.ingest.universe import get_sp500_constituents
 from compute.output.schemas import (
@@ -86,6 +92,7 @@
     compute_manipulation_index,
     manipulation_components,
 )
+from compute.scoring.osap_blend import aggregate_osap_signals, apply_osap_blend
 from compute.scoring.pillars import TickerInputs, compute_all_pillars
 from compute.scoring.recommendation import derive_recommendation
 from compute.scoring.rem import compute_rem_flags
@@ -103,6 +110,11 @@
 from compute.scoring.tier2 import (
     coverage_pct as tier2_coverage_pct_calc,
 )
+from compute.validation.osap_validation import (
+    compute_rolling_ic_12m,
+    filter_accepted_signals,
+    gate_osap_signals,
+)
 from compute.valuation.ensemble import (
     EnsembleResult,
     compute_fair_price_ensemble,
@@ -929,6 +941,107 @@ def run_weekly_compute() -> int:
     now = _now_utc()
     asof_date = now.date()
 
+    # Phase 4h — OSAP signal replication + PBO/DSR gate + Path-b blend.
+    # Observability-only this phase: Top-5 ranking still uses raw
+    # ``composite_score`` per SKILL.md Rule 16. The blend writes a
+    # ``composite_score_osap_adjusted`` per ticker into
+    # ``StockDetail.osap_blended_score`` for delta-attribution. Wrapped
+    # in try/except so OSAP fetch / library / network failure NEVER
+    # blocks weekly production — every OSAP-bearing field degrades to
+    # ``None`` on the schema (already ``| None = None`` in
+    # ``compute/output/schemas.py``).
+    osap_signals_used: list[str] = []
+    osap_excluded_signals: list[str] = []
+    osap_signals_ic_12m: dict[str, float] = {}
+    osap_signal_map: dict[str, dict[str, float] | None] = {}
+    osap_signals_coverage_pct: dict[str, float] = {}
+    composite_osap_adjusted: pd.Series = pd.Series(dtype=float)
+    try:
+        logger.info(
+            "Phase 4h — fetching OSAP returns for %d-signal manifest "
+            "(as_of=%s)",
+            len(config.OSAP_SIGNALS_100),
+            asof_date.isoformat(),
+        )
+        osap_returns_raw = fetch_osap_returns(
+            signals=list(config.OSAP_SIGNALS_100),
+            as_of=asof_date,
+        )
+        osap_ls = compute_long_short_returns(osap_returns_raw)
+        logger.info(
+            "OSAP long-short rows: %d across %d signals",
+            len(osap_ls),
+            osap_ls["signalname"].nunique() if not osap_ls.empty else 0,
+        )
+
+        gate_results = gate_osap_signals(
+            osap_ls,
+            requested_signals=config.OSAP_SIGNALS_100,
+        )
+        osap_signals_used, osap_excluded_signals = filter_accepted_signals(
+            gate_results
+        )
+        logger.info(
+            "OSAP PBO/DSR gate: %d accepted, %d excluded "
+            "(of %d candidates)",
+            len(osap_signals_used),
+            len(osap_excluded_signals),
+            len(gate_results),
+        )
+
+        # Rolling-12m Spearman IC per accepted signal — observability only,
+        # NOT a gate decision (canonical full walk-forward + purged-embargo
+        # CV is deferred to Phase 5 per defense-infrastructure/PLAN.md:270).
+        for sig in osap_signals_used:
+            ic = compute_rolling_ic_12m(osap_ls, sig)
+            if ic is not None:
+                osap_signals_ic_12m[sig] = round(float(ic), 4)
+
+        # Per-ticker signal map (commit 2 proxy mode — every ticker gets
+        # the market-wide cross-sectional rank). Only the accepted signal
+        # subset is consumed; excluded signals never blend.
+        if osap_signals_used:
+            osap_filtered_returns = osap_returns_raw[
+                osap_returns_raw["signalname"].isin(osap_signals_used)
+            ]
+            osap_signal_map = compute_osap_signals(
+                osap_filtered_returns,
+                tickers=list(pillar_df.index),
+                as_of=asof_date,
+                requested_signals=tuple(osap_signals_used),
+            )
+            osap_signals_coverage_pct = {
+                sig: round(pct, 2)
+                for sig, pct in coverage_by_signal(osap_signal_map).items()
+            }
+
+            # Path-b blend (commit 3) — applied OUTSIDE compute_composite()
+            # so PHASE3_WEIGHTS sum-to-1.0 invariant at composite.py:43-45
+            # stays intact. 50/50 default locked in
+            # osap-integration/PLAN.md:168-170.
+            osap_aggregate = aggregate_osap_signals(osap_signal_map)
+            composite_osap_adjusted = apply_osap_blend(
+                composite, osap_aggregate
+            )
+        else:
+            logger.warning(
+                "OSAP gate accepted 0 signals — skipping per-ticker map + "
+                "blend; osap_blended_score will be None for every ticker"
+            )
+    except Exception as e:  # noqa: BLE001
+        logger.warning(
+            "OSAP pipeline failed (observability-only — production "
+            "continues); StockDetail.osap_* + metadata.osap_* → None. "
+            "Error: %s",
+            e,
+        )
+        osap_signals_used = []
+        osap_excluded_signals = []
+        osap_signals_ic_12m = {}
+        osap_signal_map = {}
+        osap_signals_coverage_pct = {}
+        composite_osap_adjusted = pd.Series(dtype=float)
+
     # Step 8 — combined per-ticker loop: fair-price ensemble + price history
     # write + StockSummary + StockDetail. Single pass so per-ticker outputs
     # stay synchronized (e.g., has_history reflects the actual write result;
@@ -1229,6 +1342,13 @@ def run_weekly_compute() -> int:
             manipulation_index=m_index,
             composite_score_adjusted=composite_adj,
             manipulation_components=m_components,
+            osap_signals=osap_signal_map.get(ticker),
+            osap_blended_score=(
+                round(float(composite_osap_adjusted[ticker]), 2)
+                if ticker in composite_osap_adjusted.index
+                and not pd.isna(composite_osap_adjusted[ticker])
+                else None
+            ),
             entered_top5=ticker in entered,
             exited_top5=ticker in exited,
         )
@@ -1268,6 +1388,10 @@ def run_weekly_compute() -> int:
         fundamentals_latency_p95_seconds=(
             round(fundamentals_p95, 2) if fundamentals_p95 is not None else None
         ),
+        osap_signals_used=osap_signals_used or None,
+        osap_excluded_signals=osap_excluded_signals or None,
+        osap_signals_ic_12m=osap_signals_ic_12m or None,
+        osap_signals_coverage_pct=osap_signals_coverage_pct or None,
     )
 
     config.DATA_DIR.mkdir(parents=True, exist_ok=True)
diff --git a/compute/output/schemas.py b/compute/output/schemas.py
index a4d6a4508..f48d3484f 100644
--- a/compute/output/schemas.py
+++ b/compute/output/schemas.py
@@ -92,6 +92,10 @@ class Metadata(BaseModel):
     fundamentals_coverage_pct: float | None = None
     fundamentals_latency_p50_seconds: float | None = None
     fundamentals_latency_p95_seconds: float | None = None
+    osap_signals_used: list[str] | None = None
+    osap_excluded_signals: list[str] | None = None
+    osap_signals_ic_12m: dict[str, float] | None = None
+    osap_signals_coverage_pct: dict[str, float] | None = None
 
 
 class RawMetrics(BaseModel):
@@ -159,5 +163,7 @@ class StockDetail(BaseModel):
     manipulation_index: float | None = None
     composite_score_adjusted: float | None = None
     manipulation_components: dict[str, bool] | None = None
+    osap_signals: dict[str, float] | None = None
+    osap_blended_score: float | None = None
     entered_top5: bool = False
     exited_top5: bool = False
diff --git a/compute/scoring/osap_blend.py b/compute/scoring/osap_blend.py
new file mode 100644
index 000000000..d5d198673
--- /dev/null
+++ b/compute/scoring/osap_blend.py
@@ -0,0 +1,148 @@
+"""Phase 4h commit 3 — composite × OSAP signal-aggregate blend (Path-b).
+
+Stays OUTSIDE :func:`compute.scoring.composite.compute_composite` so the
+``PHASE3_WEIGHTS`` sum-to-1.0 invariant at ``compute/scoring/composite.py:
+43-45`` is not extended. Adding a 9th slot to ``PHASE3_WEIGHTS`` for OSAP
+would either fail the invariant or force a redistribution of the eight
+active pillars — both of which alter Phase 3 composite math
+retroactively. Path-b applies the OSAP correction *after* the pillar
+composite is computed, leaving the composite layer untouched.
+
+Formula (osap-integration/PLAN.md:168-170, locked 2026-05-18 plan
+audit)::
+
+    blended = (1 - weight) * composite_score + weight * osap_signal_aggregate
+
+where ``osap_signal_aggregate`` is a 0-100 per-ticker aggregate of the
+accepted OSAP signal map produced by
+:func:`compute.features.osap_replicate.compute_osap_signals` (cross-
+sectional rank of the long-short return at ``as_of``, mean-pooled per
+ticker, scaled to ``[0, 100]``).
+
+**Universe-gap policy** — tickers with ``None`` signal map (or NaN
+aggregate) **pass composite_score through unchanged**. No impute,
+distinct from pillar ``compute_composite(neutralize_missing=True)``.
+Rationale: in Phase 4h, an OSAP-blank ticker is genuinely "no
+information added" rather than "no information available" — imputing
+50.0 would silently shrink the composite toward neutral and bias
+Top-5 against OSAP-covered names.
+
+**Observability-only this phase** — :func:`apply_osap_blend` writes
+``composite_score_osap_adjusted`` into ``StockDetail.osap_blended_score``
+but Top-5 ranking still uses raw ``composite_score`` per SKILL.md
+Rule 16. Phase 5 ML meta-learner is where 50/50 may be retuned and the
+cutover authorized.
+
+No I/O, no tenacity, no network access — pure pandas / numpy.
+"""
+
+from __future__ import annotations
+
+import logging
+
+import numpy as np
+import pandas as pd
+
+logger = logging.getLogger(__name__)
+
+# Locked 50/50 default per osap-integration/PLAN.md:168-170. Phase 5
+# ML meta-learner is the next layer where this can move.
+OSAP_BLEND_WEIGHT_DEFAULT: float = 0.5
+
+
+def aggregate_osap_signals(
+    signal_map: dict[str, dict[str, float] | None],
+) -> pd.Series:
+    """Mean-pool a ticker's OSAP signal ranks into a single 0-100 score.
+
+    Each ticker's inner map is ``{signalname: rank}`` where rank is the
+    cross-sectional ``(0, 1]`` rank from
+    :func:`compute.features.osap_replicate.rank_signals_cross_sectional`.
+    Aggregation is the arithmetic mean × 100.
+
+    Args:
+        signal_map: ``{ticker: {signalname: rank} | None}`` — exactly the
+            shape returned by ``compute_osap_signals``.
+
+    Returns:
+        Series indexed by ticker, dtype float, name
+        ``osap_signal_aggregate``. NaN for tickers whose inner map is
+        ``None`` or empty (universe gap). Empty Series when the input
+        dict has no entries.
+    """
+    if not signal_map:
+        return pd.Series(dtype=float, name="osap_signal_aggregate")
+
+    out: dict[str, float] = {}
+    for ticker, sigs in signal_map.items():
+        if sigs is None or len(sigs) == 0:
+            out[ticker] = float("nan")
+            continue
+        mean_rank = float(np.mean(list(sigs.values())))
+        out[ticker] = 100.0 * mean_rank
+
+    return pd.Series(out, dtype=float, name="osap_signal_aggregate")
+
+
+def apply_osap_blend(
+    composite_scores: pd.Series,
+    osap_signal_aggregate: pd.Series,
+    weight: float = OSAP_BLEND_WEIGHT_DEFAULT,
+) -> pd.Series:
+    """Blend pillar composite × OSAP aggregate per ticker (Phase 4h Path-b).
+
+    Formula::
+
+        blended = (1 - weight) * composite_score + weight * osap_signal_aggregate
+
+    Tickers whose ``osap_signal_aggregate`` is NaN (after reindex to the
+    composite index) pass their raw ``composite_score`` through
+    unchanged. The result is clipped to ``[0, 100]`` to match the
+    composite-score domain (the writer / Pydantic schema both expect
+    ``[0, 100]``).
+
+    Args:
+        composite_scores: Series indexed by ticker, values nominally in
+            ``[0, 100]``. The output index mirrors this index.
+        osap_signal_aggregate: Series indexed by ticker, values in
+            ``[0, 100]`` or NaN. Tickers present here but not in
+            ``composite_scores`` are silently dropped during reindex.
+        weight: blend weight in ``[0, 1]``. Default
+            :data:`OSAP_BLEND_WEIGHT_DEFAULT` (0.5) per
+            osap-integration/PLAN.md:168-170.
+
+    Returns:
+        Series indexed by ``composite_scores.index``, dtype float, name
+        ``composite_score_osap_adjusted``. Clipped to ``[0, 100]``.
+
+    Raises:
+        ValueError: if ``weight`` is outside ``[0, 1]``.
+    """
+    if not (0.0 <= weight <= 1.0):
+        raise ValueError(f"OSAP blend weight must be in [0, 1], got {weight}")
+
+    if composite_scores.empty:
+        return pd.Series(dtype=float, name="composite_score_osap_adjusted")
+
+    # Align OSAP aggregate to composite index; missing tickers become NaN.
+    aligned_osap = osap_signal_aggregate.reindex(composite_scores.index)
+
+    # NaN-safe Path-b blend. ``raw_blend`` will be NaN wherever OSAP is
+    # NaN, but ``.where(cond, other)`` keeps ``composite_scores`` at those
+    # positions — yielding the documented universe-gap pass-through.
+    raw_blend = (1.0 - weight) * composite_scores + weight * aligned_osap
+    blended = raw_blend.where(~aligned_osap.isna(), composite_scores)
+
+    blended = blended.clip(lower=0.0, upper=100.0)
+    blended.name = "composite_score_osap_adjusted"
+
+    coverage = int((~aligned_osap.isna()).sum())
+    logger.info(
+        "OSAP blend applied: weight=%.2f, %d/%d tickers OSAP-covered, "
+        "%d passed-through (universe gap)",
+        weight,
+        coverage,
+        len(composite_scores),
+        len(composite_scores) - coverage,
+    )
+    return blended
diff --git a/compute/validation/osap_validation.py b/compute/validation/osap_validation.py
new file mode 100644
index 000000000..2655d6f21
--- /dev/null
+++ b/compute/validation/osap_validation.py
@@ -0,0 +1,380 @@
+"""Phase 4h commit 4 — PBO/DSR hard gate for OSAP signal acceptance.
+
+Wraps :func:`compute.validation.pbo_dsr.factor_passes_gates` (PR #60,
+shipped in PR #60) per signal. No PBO or DSR math is reimplemented
+here — this module only handles the *cohort framing* (wide pivot,
+NaN policy, per-signal partitioning) needed to feed the existing
+gate primitives correctly.
+
+The gate is the linchpin of Phase 4h: 100 candidate OSAP signals
+enter, only those passing PBO ≤ 0.5 AND DSR > 0 are blended into
+``composite_score_osap_adjusted`` by commit 3's
+:func:`compute.scoring.osap_blend.apply_osap_blend`. Signals rejected
+here are surfaced via :data:`Metadata.osap_excluded_signals` so the
+filter is fully auditable.
+
+NaN policy — LOCKED 2026-05-18 post-source-audit of pbo_dsr.py
+==============================================================
+
+The two underlying primitives have asymmetric NaN tolerance:
+
+- :func:`compute.validation.pbo_dsr.compute_pbo` (the cohort gate)
+  is **NaN-UNSAFE**. Internally
+  (``compute/validation/pbo_dsr.py:234``) it converts the cohort
+  matrix to ``float`` numpy via ``.to_numpy(dtype=float)``, then
+  computes ``.mean(axis=0)`` / ``.std(axis=0)`` (L256-257) and
+  ``np.argmax`` (L261). Any NaN cell silently corrupts the argmax
+  selection.
+- :func:`compute.validation.pbo_dsr.compute_deflated_sharpe` (the
+  per-signal gate) is **NaN-SAFE**. L323 strips NaN before computing
+  Sharpe / skew / kurtosis: ``arr = arr[~np.isnan(arr)]``.
+
+Because :func:`factor_passes_gates` takes the per-signal
+``factor_returns`` and the cohort ``returns_matrix`` independently,
+this wrapper feeds them **different NaN treatments**:
+
+1. ``factor_returns`` ← ``wide[sig].dropna()`` — DSR's internal strip
+   handles it. No information lost.
+2. ``returns_matrix`` ← ``wide.fillna(0.0)`` — zero-fill, NOT
+   mean-fill, NOT ``dropna(how='any')``.
+
+Why **zero-fill** the cohort (rejecting the two competing options):
+
+- ``dropna(how='any')`` decimates a 100-signal × monthly matrix
+  below ``n_partitions=16`` rows once any earnings-event-only signal
+  is included. The Bailey 2014 multiple-testing correction
+  ``n_trials = cohort_size`` collapses.
+- ``fillna(column_mean)`` deflates per-signal variance, inflates
+  Sharpe, biases PBO toward false acceptance — and silently
+  rewards sparse signals for low coverage.
+- ``fillna(0.0)`` is the honest OSAP-semantic interpretation:
+  absence-of-coverage for ``(signal, month)`` means "no portfolio
+  formed / no information generated that month" — zero return is
+  the right proxy. Bailey 2014 PBO is rank-based across strategies
+  *within* each period; zero-imputation symmetrically pushes
+  coverage-gap rows toward indeterminate cross-sectional rank,
+  which honestly reflects "no information added".
+
+Trade-off acknowledged: sparse-coverage signals (e.g., earnings-event-
+only) see their Sharpe shrunk toward zero by the zero-fill, raising
+their DSR rejection probability. This is cohort-fair but penalizes
+legitimate event-only signals. Phase 4h scope accepts this — the
+Phase 5 backtest harness (``defense-infrastructure/PLAN.md:270``)
+runs full walk-forward CV per signal and replaces this gate when it
+ships.
+
+Standalone module
+=================
+
+Does NOT import from :mod:`compute.features.osap_replicate` (commit
+2), :mod:`compute.scoring.osap_blend` (commit 3), or
+:mod:`compute.main`. Validation runs on the long-short returns
+DataFrame contract only (columns: ``signalname``, ``date``,
+``ls_return``). This keeps the gate testable without the feature /
+blend stack and lets commit 5's ``compute/main.py`` wire the three
+layers independently.
+"""
+
+from __future__ import annotations
+
+import logging
+from dataclasses import dataclass
+from typing import Final
+
+import pandas as pd
+
+from compute.validation.pbo_dsr import (
+    ANNUALIZATION_FACTOR_MONTHLY,
+    DEFAULT_N_PARTITIONS,
+    DSR_VETO_THRESHOLD,
+    PBO_VETO_THRESHOLD,
+    factor_passes_gates,
+)
+
+logger = logging.getLogger(__name__)
+
+# Rolling-IC observability window (lag-1 Spearman over last 12 months).
+# OBSERVABILITY ONLY — never gates acceptance. See module docstring.
+ROLLING_IC_WINDOW_MONTHS: Final[int] = 12
+
+# Per-signal observation floor before PBO/DSR is even attempted. Mirrors
+# DEFAULT_N_PARTITIONS so that any signal with fewer non-NaN obs than
+# PBO's partition requirement short-circuits to ``insufficient_data``.
+MIN_OBS_PER_SIGNAL: Final[int] = DEFAULT_N_PARTITIONS
+
+
+@dataclass(frozen=True)
+class GateResult:
+    """Per-signal verdict + metrics from the PBO/DSR gate.
+
+    All three of ``pbo`` / ``dsr`` / ``sharpe`` are ``None`` when the
+    signal short-circuited on ``insufficient_data`` — no PBO/DSR call
+    was made. ``n_observations`` reports the count of non-NaN inputs
+    that *would have been* used (informational; matches commit-5's
+    coverage logging needs).
+
+    ``rejection_reason`` is ``None`` when ``accepted=True``. Otherwise
+    one of:
+
+    - ``'high_pbo'`` — PBO exceeded the threshold (overfit risk)
+    - ``'low_dsr'`` — Deflated Sharpe failed the threshold
+    - ``'gate_failed'`` — both PBO and DSR failed (rare; surfaced as
+      a distinct category for diagnostic clarity)
+    - ``'insufficient_data'`` — per-signal obs < ``MIN_OBS_PER_SIGNAL``
+      or cohort size < 2
+    """
+
+    accepted: bool
+    pbo: float | None
+    dsr: float | None
+    sharpe: float | None
+    n_observations: int
+    rejection_reason: str | None
+
+
+def _pivot_to_wide(
+    long_short: pd.DataFrame,
+    requested: tuple[str, ...] | None = None,
+) -> pd.DataFrame:
+    """Coerce commit-2's long-format DF to a wide (date × signal) matrix.
+
+    Commit 2's ``compute_long_short_returns`` (see
+    ``compute/features/osap_replicate.py:140``) emits the ``date``
+    column as ``object`` (string from the OSAP parquet pivot). We
+    explicitly coerce to ``pd.Timestamp`` here so chronological
+    ordering is reliable for the cohort matrix.
+
+    Args:
+        long_short: DataFrame with columns
+            ``{signalname, date, ls_return}``.
+        requested: optional restriction to a subset of signals. When
+            ``None`` (default), all signals present in the input DF
+            are kept.
+
+    Returns:
+        Wide DataFrame indexed by ``pd.Timestamp`` (sorted), columns =
+        signalname, values = ls_return. Cells where a signal had no
+        ``ls_return`` for a given date are NaN — caller decides the
+        NaN policy (this function never fills).
+    """
+    if long_short.empty:
+        return pd.DataFrame()
+
+    df = long_short.copy()
+    df["date"] = pd.to_datetime(df["date"])
+    if requested is not None:
+        df = df[df["signalname"].isin(set(requested))]
+        if df.empty:
+            return pd.DataFrame()
+
+    wide = df.pivot_table(
+        index="date",
+        columns="signalname",
+        values="ls_return",
+        aggfunc="first",
+    )
+    return wide.sort_index()
+
+
+def gate_osap_signals(
+    long_short_returns: pd.DataFrame,
+    requested_signals: tuple[str, ...] | None = None,
+    pbo_threshold: float = PBO_VETO_THRESHOLD,
+    dsr_threshold: float = DSR_VETO_THRESHOLD,
+    n_partitions: int = DEFAULT_N_PARTITIONS,
+) -> dict[str, GateResult]:
+    """Apply the PBO/DSR hard gate per signal.
+
+    Cohort framing follows Bailey 2014: ``n_trials = wide.shape[1]``
+    (the count of candidate signals) so DSR's multiple-testing
+    correction reflects the full screen. PBO operates on the
+    zero-filled cohort matrix (see module docstring §NaN policy).
+
+    Args:
+        long_short_returns: commit-2 output. DataFrame with columns
+            ``{signalname, date, ls_return}``.
+        requested_signals: optional subset of signal names to gate.
+            ``None`` (default) gates every signal present in the input.
+        pbo_threshold: PBO must satisfy ``≤ pbo_threshold`` to pass.
+            Default :data:`PBO_VETO_THRESHOLD` (= 0.5).
+        dsr_threshold: DSR must satisfy ``> dsr_threshold`` to pass.
+            Default :data:`DSR_VETO_THRESHOLD` (= 0.0).
+        n_partitions: PBO partition count. Default
+            :data:`DEFAULT_N_PARTITIONS` (= 16).
+
+    Returns:
+        ``{signalname: GateResult}``. Keys cover every signal that
+        appeared in the (possibly ``requested_signals``-filtered)
+        input. Empty dict when the input is empty or no requested
+        signal exists in the input.
+    """
+    wide = _pivot_to_wide(long_short_returns, requested_signals)
+
+    if wide.empty:
+        logger.warning(
+            "OSAP gate input empty after pivot — no signals to gate"
+        )
+        return {}
+
+    cohort_size = wide.shape[1]
+    n_dates = len(wide)
+
+    # Cohort-level precondition: PBO needs at least ``n_partitions`` rows
+    # and 2 strategy columns. If either fails, every signal short-
+    # circuits to insufficient_data.
+    cohort_too_small = cohort_size < 2 or n_dates < n_partitions
+
+    # Zero-fill ONCE for the cohort matrix passed to PBO. See module
+    # docstring §NaN policy for the rationale. The per-signal
+    # ``factor_returns`` argument still uses dropna() (DSR strips NaN
+    # internally, no information lost).
+    cohort_matrix = wide.fillna(0.0)
+
+    results: dict[str, GateResult] = {}
+
+    for sig in wide.columns:
+        signal_series = wide[sig]
+        non_nan_obs = int(signal_series.notna().sum())
+
+        if cohort_too_small or non_nan_obs < MIN_OBS_PER_SIGNAL:
+            results[str(sig)] = GateResult(
+                accepted=False,
+                pbo=None,
+                dsr=None,
+                sharpe=None,
+                n_observations=non_nan_obs,
+                rejection_reason="insufficient_data",
+            )
+            continue
+
+        passes, metrics = factor_passes_gates(
+            factor_returns=signal_series.dropna(),
+            returns_matrix=cohort_matrix,
+            n_trials=cohort_size,
+            n_partitions=n_partitions,
+            pbo_threshold=pbo_threshold,
+            dsr_threshold=dsr_threshold,
+            annualization=ANNUALIZATION_FACTOR_MONTHLY,
+        )
+
+        if passes:
+            reason: str | None = None
+        elif not metrics["pbo_passes"] and not metrics["dsr_passes"]:
+            reason = "gate_failed"
+        elif not metrics["pbo_passes"]:
+            reason = "high_pbo"
+        elif not metrics["dsr_passes"]:
+            reason = "low_dsr"
+        else:
+            # Defensive — passes=False but both sub-passes=True should be
+            # impossible per the and-conjunction in factor_passes_gates.
+            reason = "gate_failed"
+
+        results[str(sig)] = GateResult(
+            accepted=bool(passes),
+            pbo=float(metrics["pbo"]),
+            dsr=float(metrics["dsr"]),
+            sharpe=float(metrics["sharpe"]),
+            n_observations=int(metrics["n_observations"]),
+            rejection_reason=reason,
+        )
+
+    n_accepted = sum(1 for r in results.values() if r.accepted)
+    logger.info(
+        "OSAP PBO/DSR gate: %d of %d signals accepted (cohort_size=%d, "
+        "n_dates=%d, pbo_threshold=%.2f, dsr_threshold=%.2f)",
+        n_accepted,
+        len(results),
+        cohort_size,
+        n_dates,
+        pbo_threshold,
+        dsr_threshold,
+    )
+    return results
+
+
+def compute_rolling_ic_12m(
+    long_short_returns: pd.DataFrame,
+    signalname: str,
+) -> float | None:
+    """Spearman rank correlation between LS return at ``t`` and ``t+1``.
+
+    Observability metric only — :func:`gate_osap_signals` does not
+    consult this. Surfaced via :data:`Metadata.osap_signals_ic_12m`
+    for the UI / debug audit; the full walk-forward + purged +
+    embargoed CV that would replace it is Phase 5 work per
+    ``defense-infrastructure/PLAN.md:270``.
+
+    Pure pandas — no scipy dependency. Matches the
+    ``pbo_dsr.py`` precedent (Beasley-Springer-Moro inverse normal CDF
+    is hand-rolled there to avoid scipy).
+
+    Args:
+        long_short_returns: commit-2 output. DataFrame with columns
+            ``{signalname, date, ls_return}``.
+        signalname: target signal to compute IC for.
+
+    Returns:
+        Spearman lag-1 IC over the most recent
+        :data:`ROLLING_IC_WINDOW_MONTHS` observations, as ``float``.
+        ``None`` when the signal has fewer than 12 valid
+        ``(t, t+1)`` pairs (insufficient history).
+    """
+    df = long_short_returns[
+        long_short_returns["signalname"] == signalname
+    ].copy()
+    if df.empty:
+        return None
+
+    df["date"] = pd.to_datetime(df["date"])
+    df = df.sort_values("date").tail(ROLLING_IC_WINDOW_MONTHS + 1)
+
+    if len(df) < ROLLING_IC_WINDOW_MONTHS + 1:
+        return None
+
+    ret = df["ls_return"].astype(float).reset_index(drop=True)
+    lead = ret.shift(-1)
+    valid = ret.notna() & lead.notna()
+
+    if int(valid.sum()) < ROLLING_IC_WINDOW_MONTHS:
+        return None
+
+    ranks_t = ret[valid].rank(method="average")
+    ranks_t1 = lead[valid].rank(method="average")
+    corr = ranks_t.corr(ranks_t1)
+    if pd.isna(corr):
+        return None
+    return float(corr)
+
+
+def filter_accepted_signals(
+    gate_results: dict[str, GateResult],
+) -> tuple[list[str], list[str]]:
+    """Split gate verdicts into (accepted, excluded) sorted lists.
+
+    Feeds :data:`Metadata.osap_signals_used` /
+    :data:`Metadata.osap_excluded_signals` in commit 5's
+    ``compute/main.py`` wiring. Sorting is alphabetical for
+    deterministic JSON output.
+
+    Args:
+        gate_results: ``{signalname: GateResult}`` from
+            :func:`gate_osap_signals`.
+
+    Returns:
+        ``(accepted_sorted, excluded_sorted)``. Union of the two
+        lists equals ``sorted(gate_results.keys())``.
+    """
+    accepted = sorted(s for s, r in gate_results.items() if r.accepted)
+    excluded = sorted(s for s, r in gate_results.items() if not r.accepted)
+    return accepted, excluded
+
+
+__all__ = [
+    "GateResult",
+    "MIN_OBS_PER_SIGNAL",
+    "ROLLING_IC_WINDOW_MONTHS",
+    "compute_rolling_ic_12m",
+    "filter_accepted_signals",
+    "gate_osap_signals",
+]
diff --git a/frontend/lib/schema-snapshot.json b/frontend/lib/schema-snapshot.json
index 6680ce8a2..083e09373 100644
--- a/frontend/lib/schema-snapshot.json
+++ b/frontend/lib/schema-snapshot.json
@@ -67,6 +67,26 @@
       "required": true,
       "default": "<required>"
     },
+    "osap_excluded_signals": {
+      "type": "list[str] | None",
+      "required": false,
+      "default": null
+    },
+    "osap_signals_coverage_pct": {
+      "type": "dict[str, float] | None",
+      "required": false,
+      "default": null
+    },
+    "osap_signals_ic_12m": {
+      "type": "dict[str, float] | None",
+      "required": false,
+      "default": null
+    },
+    "osap_signals_used": {
+      "type": "list[str] | None",
+      "required": false,
+      "default": null
+    },
     "tier2_coverage_pct": {
       "type": "float | None",
       "required": false,
@@ -310,6 +330,16 @@
       "required": true,
       "default": "<required>"
     },
+    "osap_blended_score": {
+      "type": "float | None",
+      "required": false,
+      "default": null
+    },
+    "osap_signals": {
+      "type": "dict[str, float] | None",
+      "required": false,
+      "default": null
+    },
     "pillar_baseline": {
       "type": "PillarBaseline | None",
       "required": false,
diff --git a/frontend/lib/types.ts b/frontend/lib/types.ts
index 7361f8a17..af1038d2c 100644
--- a/frontend/lib/types.ts
+++ b/frontend/lib/types.ts
@@ -70,6 +70,18 @@ export type Metadata = {
   fundamentals_coverage_pct: number | null;
   fundamentals_latency_p50_seconds: number | null;
   fundamentals_latency_p95_seconds: number | null;
+  // Phase 4h — OSAP signal observability. `osap_signals_used` lists
+  // the 100-signal manifest subset that PASSED the PBO/DSR gate
+  // (`pbo_dsr.factor_passes_gates`); `osap_excluded_signals` lists
+  // the rest. `osap_signals_ic_12m` is rolling-12m Spearman IC per
+  // accepted signal (observability only — NOT a hard gate; full
+  // walk-forward IC-decay is the Phase 5 stronger version).
+  // `osap_signals_coverage_pct` reports per-signal S&P 500 coverage.
+  // All null on legacy outputs from before 0.9.0-phase4h.
+  osap_signals_used: string[] | null;
+  osap_excluded_signals: string[] | null;
+  osap_signals_ic_12m: Record<string, number> | null;
+  osap_signals_coverage_pct: Record<string, number> | null;
 };
 
 // Phase 3d Tier-2 event defenses. Surfaces in StockDetail.tier2_events.
@@ -213,6 +225,15 @@ export type StockDetail = {
   manipulation_index: number | null;
   composite_score_adjusted: number | null;
   manipulation_components: Record<string, boolean> | null;
+  // Phase 4h — per-stock OSAP signal map (signalname → cross-sectional
+  // rank in [0, 1]) for the accepted-by-PBO/DSR subset of the
+  // 100-signal manifest. `osap_blended_score` is the 50/50 blend
+  // (composite_score × 0.5 + osap_signal_aggregate × 0.5) — informational
+  // observability only; Top-5 ranking still uses raw composite_score
+  // per SKILL.md Rule 16. Both null on legacy outputs from before
+  // 0.9.0-phase4h.
+  osap_signals: Record<string, number> | null;
+  osap_blended_score: number | null;
   entered_top5: boolean;
   exited_top5: boolean;
 };
diff --git a/tests/test_config.py b/tests/test_config.py
index 7fc599f4b..c07ebf26e 100644
--- a/tests/test_config.py
+++ b/tests/test_config.py
@@ -10,8 +10,8 @@
 from compute import config
 
 
-def test_schema_version_is_phase4_5f():
-    assert config.SCHEMA_VERSION == "0.8.0-phase4.5f"
+def test_schema_version_is_phase4h():
+    assert config.SCHEMA_VERSION == "0.9.0-phase4h"
 
 
 def test_eight_k_lookback_veto_is_one_year():
diff --git a/tests/test_features/test_osap_e2e_integration.py b/tests/test_features/test_osap_e2e_integration.py
new file mode 100644
index 000000000..b8969342f
--- /dev/null
+++ b/tests/test_features/test_osap_e2e_integration.py
@@ -0,0 +1,152 @@
+"""End-to-end @network integration test for Phase 4h OSAP pipeline.
+
+Exercises the full ingest → replicate → gate → IC → blend chain against
+the real OSAP package release. Skips when ``--run-network`` is absent
+(conftest.py default), so casual ``pytest tests/`` runs are unaffected.
+
+Scope:
+  - Real ``fetch_osap_returns`` call against the live OSAP CDN (cached
+    in ``tmp_path`` so the host cache stays clean)
+  - ``compute_long_short_returns`` over the real cross-section
+  - ``gate_osap_signals`` PBO/DSR gate runs end-to-end on a real
+    100-signal candidate cohort
+  - ``compute_rolling_ic_12m`` returns a finite [-1, 1] value for
+    ``Mom1m`` (a canonical positive-IC factor — sanity check, NOT a
+    threshold assertion)
+  - ``compute_osap_signals`` produces a per-ticker proxy map for a
+    20-ticker S&P sample
+  - ``aggregate_osap_signals`` → ``apply_osap_blend`` round-trips
+    without crashing and produces a Series clipped to [0, 100]
+
+The test does NOT run the full ``compute/main.py`` (that would hit
+SEC EDGAR for 502 tickers — too expensive for CI). The OSAP pipeline
+is exercised in isolation against real data; the ``compute/main.py``
+wiring it together is covered by offline unit tests on each layer +
+this @network suite confirming the data shapes match end-to-end.
+"""
+
+from __future__ import annotations
+
+import time
+from datetime import date
+
+import pandas as pd
+import pytest
+
+from compute.features.osap_replicate import (
+    compute_long_short_returns,
+    compute_osap_signals,
+    coverage_by_signal,
+)
+from compute.ingest import osap as osap_mod
+from compute.ingest.osap import fetch_osap_returns
+from compute.scoring.osap_blend import aggregate_osap_signals, apply_osap_blend
+from compute.validation.osap_validation import (
+    compute_rolling_ic_12m,
+    filter_accepted_signals,
+    gate_osap_signals,
+)
+
+SAMPLE_TICKERS_20 = [
+    "AAPL", "MSFT", "NVDA", "GOOGL", "AMZN",
+    "META", "TSLA", "BRK.B", "UNH", "JPM",
+    "XOM", "V", "JNJ", "WMT", "PG",
+    "HD", "MA", "AVGO", "CVX", "LLY",
+]
+
+# A small, well-known-name subset of OSAP signals that the live release
+# is guaranteed to expose. Keeps the integration test cheap and
+# deterministic on release shifts — full 100-signal run is the cron
+# job's concern.
+SAMPLE_SIGNALS = ("Mom1m", "BM", "GP", "Accruals")
+
+
+@pytest.mark.network
+@pytest.mark.timeout(600)
+def test_osap_pipeline_end_to_end_real_fetch(monkeypatch, tmp_path) -> None:
+    """Full Phase-4h chain on real OSAP data — 4-signal × 20-ticker slice."""
+    cache = tmp_path / "osap" / "returns.parquet"
+    monkeypatch.setattr(osap_mod.config, "OSAP_RETURNS_CACHE", cache)
+
+    # 1) Real fetch — filter to SAMPLE_SIGNALS so download stays cheap.
+    t0 = time.monotonic()
+    returns = fetch_osap_returns(
+        force_refresh=True, signals=list(SAMPLE_SIGNALS), as_of=date.today()
+    )
+    elapsed = time.monotonic() - t0
+    print(
+        f"\n[osap-e2e] live fetch elapsed={elapsed:.2f}s "
+        f"rows={len(returns)} cols={list(returns.columns)}"
+    )
+    assert not returns.empty
+    assert set(returns["signalname"].unique()).issubset(set(SAMPLE_SIGNALS))
+
+    # 2) Long-short derivation.
+    ls = compute_long_short_returns(returns)
+    assert {"signalname", "date", "ls_return"}.issubset(ls.columns)
+    assert not ls.empty
+    print(
+        f"[osap-e2e] long-short rows={len(ls)} "
+        f"signals={ls['signalname'].nunique()}"
+    )
+
+    # 3) PBO/DSR gate.
+    gate_results = gate_osap_signals(ls, requested_signals=SAMPLE_SIGNALS)
+    assert len(gate_results) >= 1
+    accepted, excluded = filter_accepted_signals(gate_results)
+    print(
+        f"[osap-e2e] gate accepted={accepted} excluded={excluded} "
+        f"of candidates={list(gate_results.keys())}"
+    )
+
+    # Every gate result must have a sensible structure regardless of
+    # accept/reject outcome.
+    for sig, res in gate_results.items():
+        assert isinstance(res.accepted, bool), sig
+        if res.accepted:
+            assert res.rejection_reason is None, sig
+            assert res.pbo is not None and 0.0 <= res.pbo <= 1.0, sig
+            assert res.dsr is not None, sig
+        else:
+            assert res.rejection_reason in {
+                "high_pbo", "low_dsr", "insufficient_data", "gate_failed",
+            }, (sig, res.rejection_reason)
+
+    # 4) Rolling-12m IC sanity on Mom1m (well-known +IC factor).
+    mom_ic = compute_rolling_ic_12m(ls, "Mom1m")
+    print(f"[osap-e2e] Mom1m rolling-12m IC={mom_ic}")
+    if mom_ic is not None:
+        assert -1.0 <= mom_ic <= 1.0, mom_ic
+        # NOT asserting > 0 — rolling-12m on a single window is noisy;
+        # canonical full walk-forward is Phase 5's job.
+
+    # 5) Per-ticker proxy signal map — 20 sample tickers.
+    signal_map = compute_osap_signals(
+        returns,
+        tickers=SAMPLE_TICKERS_20,
+        as_of=date.today(),
+        requested_signals=SAMPLE_SIGNALS,
+    )
+    assert len(signal_map) == 20
+    populated_tickers = [t for t, m in signal_map.items() if m]
+    print(
+        f"[osap-e2e] per-ticker map: {len(populated_tickers)}/20 "
+        f"have non-empty signal dicts"
+    )
+    coverage = coverage_by_signal(signal_map)
+    print(f"[osap-e2e] coverage by signal: {coverage}")
+
+    # 6) Aggregate + blend round-trip with a synthetic composite series.
+    composite = pd.Series(
+        {t: 60.0 + (i % 5) * 4.0 for i, t in enumerate(SAMPLE_TICKERS_20)}
+    )
+    osap_aggregate = aggregate_osap_signals(signal_map)
+    blended = apply_osap_blend(composite, osap_aggregate)
+    assert blended.name == "composite_score_osap_adjusted"
+    assert (blended >= 0.0).all() and (blended <= 100.0).all()
+    assert set(blended.index) == set(SAMPLE_TICKERS_20)
+    print(
+        f"[osap-e2e] blended range=[{blended.min():.2f}, "
+        f"{blended.max():.2f}] OSAP-covered tickers="
+        f"{int((~osap_aggregate.reindex(composite.index).isna()).sum())}"
+    )
diff --git a/tests/test_features/test_osap_replicate.py b/tests/test_features/test_osap_replicate.py
new file mode 100644
index 000000000..96a847697
--- /dev/null
+++ b/tests/test_features/test_osap_replicate.py
@@ -0,0 +1,318 @@
+"""Tests for compute.features.osap_replicate.
+
+Phase 4h commit 2. Twelve offline tests covering long-short
+derivation, as-of cross-section selection, cross-sectional ranking,
+the universe-gap None policy, and end-to-end ``compute_osap_signals``.
+No @network markers — all tests use either a hand-built synthetic
+DataFrame or the shipped ``tests/fixtures/osap_returns_sample.csv``.
+"""
+
+from __future__ import annotations
+
+from datetime import date
+from pathlib import Path
+
+import pandas as pd
+import pytest
+
+from compute import config
+from compute.features import osap_replicate
+
+FIXTURE_CSV = Path(__file__).parent.parent / "fixtures" / "osap_returns_sample.csv"
+
+
+def _make_returns(rows: list[tuple[str, str, str, float]]) -> pd.DataFrame:
+    """Build a synthetic OSAP returns DataFrame from
+    ``(signalname, port, date_str, ret)`` tuples. Adds the trailing
+    ``signallag / Nlong / Nshort`` columns with neutral defaults so the
+    schema matches the ingest layer's REQUIRED_COLUMNS contract."""
+    df = pd.DataFrame(
+        rows, columns=["signalname", "port", "date", "ret"]
+    )
+    df["signallag"] = 0.0
+    df["Nlong"] = 50
+    df["Nshort"] = 50
+    return df
+
+
+def test_compute_long_short_returns_basic():
+    """Two signals × one date with both ports → 2 long-short rows."""
+    returns = _make_returns(
+        [
+            ("BM", "01", "2024-01-31", 1.50),
+            ("BM", "10", "2024-01-31", -0.25),
+            ("Mom12m", "01", "2024-01-31", 2.10),
+            ("Mom12m", "10", "2024-01-31", 0.30),
+        ]
+    )
+    ls = osap_replicate.compute_long_short_returns(returns)
+
+    assert set(ls.columns) == {"signalname", "date", "ls_return"}
+    assert len(ls) == 2
+
+    bm_row = ls[ls["signalname"] == "BM"].iloc[0]
+    mom_row = ls[ls["signalname"] == "Mom12m"].iloc[0]
+    assert bm_row["ls_return"] == pytest.approx(1.75)
+    assert mom_row["ls_return"] == pytest.approx(1.80)
+
+
+def test_compute_long_short_returns_missing_short_port_drops_signal():
+    """A signal with port=01 only (no port=10) yields no long-short row."""
+    returns = _make_returns(
+        [
+            ("BM", "01", "2024-01-31", 1.50),  # long only — no pair
+            ("Mom12m", "01", "2024-01-31", 2.00),
+            ("Mom12m", "10", "2024-01-31", 0.50),
+        ]
+    )
+    ls = osap_replicate.compute_long_short_returns(returns)
+
+    assert "BM" not in ls["signalname"].values
+    assert "Mom12m" in ls["signalname"].values
+
+
+def test_compute_long_short_returns_drops_decile_buckets():
+    """Inner decile buckets (port=02..09) must NOT contribute to ls_return."""
+    returns = _make_returns(
+        [
+            ("BM", "01", "2024-01-31", 5.0),
+            ("BM", "05", "2024-01-31", 99.0),  # noise — should be ignored
+            ("BM", "10", "2024-01-31", 1.0),
+        ]
+    )
+    ls = osap_replicate.compute_long_short_returns(returns)
+
+    assert len(ls) == 1
+    assert ls.iloc[0]["ls_return"] == pytest.approx(4.0)
+
+
+def test_compute_long_short_returns_handles_integer_port():
+    """OSAP parquet may store ``port`` as int (1..10); normaliser must
+    coerce both representations to '01'/'10'."""
+    df = pd.DataFrame(
+        [
+            ("BM", 1, "2024-01-31", 5.0),
+            ("BM", 10, "2024-01-31", 1.0),
+        ],
+        columns=["signalname", "port", "date", "ret"],
+    )
+    df["signallag"] = 0.0
+    df["Nlong"] = 50
+    df["Nshort"] = 50
+
+    ls = osap_replicate.compute_long_short_returns(df)
+    assert len(ls) == 1
+    assert ls.iloc[0]["ls_return"] == pytest.approx(4.0)
+
+
+def test_select_as_of_cross_section_picks_most_recent_per_signal():
+    """For each signal, the row with the maximum date <= as_of is kept."""
+    ls_returns = pd.DataFrame(
+        [
+            ("BM", "2023-12-31", 1.0),
+            ("BM", "2024-01-31", 1.5),  # most recent for BM
+            ("Mom12m", "2024-01-31", 2.0),
+            ("Mom12m", "2023-11-30", 1.9),
+        ],
+        columns=["signalname", "date", "ls_return"],
+    )
+    cs = osap_replicate.select_as_of_cross_section(
+        ls_returns, date(2024, 1, 31)
+    )
+
+    assert len(cs) == 2
+    bm = cs[cs["signalname"] == "BM"].iloc[0]
+    mom = cs[cs["signalname"] == "Mom12m"].iloc[0]
+    assert bm["ls_return"] == pytest.approx(1.5)
+    assert mom["ls_return"] == pytest.approx(2.0)
+
+
+def test_select_as_of_cross_section_filters_future_dates():
+    """Observations after ``as_of`` must be dropped before the
+    most-recent-per-signal pick."""
+    ls_returns = pd.DataFrame(
+        [
+            ("BM", "2024-01-31", 1.0),
+            ("BM", "2024-06-30", 5.0),  # AFTER as_of — must not be picked
+        ],
+        columns=["signalname", "date", "ls_return"],
+    )
+    cs = osap_replicate.select_as_of_cross_section(
+        ls_returns, date(2024, 2, 28)
+    )
+
+    assert len(cs) == 1
+    assert cs.iloc[0]["ls_return"] == pytest.approx(1.0)
+
+
+def test_select_as_of_cross_section_empty_window():
+    """``as_of`` precedes all observations → empty cross-section."""
+    ls_returns = pd.DataFrame(
+        [
+            ("BM", "2024-01-31", 1.0),
+            ("Mom12m", "2024-01-31", 2.0),
+        ],
+        columns=["signalname", "date", "ls_return"],
+    )
+    cs = osap_replicate.select_as_of_cross_section(
+        ls_returns, date(2020, 1, 1)
+    )
+
+    assert cs.empty
+    assert list(cs.columns) == ["signalname", "date", "ls_return"]
+
+
+def test_rank_signals_cross_sectional_normalises_to_unit_interval():
+    """Three signals with distinct ls_return → ranks ≈ {1/3, 2/3, 1}."""
+    cs = pd.DataFrame(
+        [
+            ("Low", "2024-01-31", 0.1),
+            ("Mid", "2024-01-31", 0.5),
+            ("High", "2024-01-31", 0.9),
+        ],
+        columns=["signalname", "date", "ls_return"],
+    )
+    ranks = osap_replicate.rank_signals_cross_sectional(cs)
+
+    assert ranks["Low"] == pytest.approx(1 / 3)
+    assert ranks["Mid"] == pytest.approx(2 / 3)
+    assert ranks["High"] == pytest.approx(1.0)
+    assert ranks.max() <= 1.0
+    assert ranks.min() > 0.0
+
+
+def test_rank_signals_cross_sectional_ties_get_average_rank():
+    """Two signals with identical ls_return share the same average rank."""
+    cs = pd.DataFrame(
+        [
+            ("A", "2024-01-31", 0.5),
+            ("B", "2024-01-31", 0.5),
+            ("C", "2024-01-31", 0.9),
+        ],
+        columns=["signalname", "date", "ls_return"],
+    )
+    ranks = osap_replicate.rank_signals_cross_sectional(cs)
+
+    # method='average', pct=True: A and B tie at ranks 1 and 2 →
+    # average rank 1.5 → pct 1.5/3 = 0.5
+    assert ranks["A"] == pytest.approx(0.5)
+    assert ranks["B"] == pytest.approx(0.5)
+    assert ranks["C"] == pytest.approx(1.0)
+
+
+def test_compute_osap_signals_full_path_proxy_mode():
+    """End-to-end: synthetic 3-signal fixture × 4 tickers → every ticker
+    receives the same signal map (factor-exposure proxy, locked
+    2026-05-18)."""
+    returns = _make_returns(
+        [
+            ("BM", "01", "2024-01-31", 1.5),
+            ("BM", "10", "2024-01-31", -0.5),  # ls = 2.0
+            ("Mom12m", "01", "2024-01-31", 0.8),
+            ("Mom12m", "10", "2024-01-31", 0.6),  # ls = 0.2
+            ("Beta", "01", "2024-01-31", 0.3),
+            ("Beta", "10", "2024-01-31", 0.4),  # ls = -0.1
+        ]
+    )
+    tickers = ["NVDA", "AAPL", "CF", "HST"]
+    signals = ("BM", "Mom12m", "Beta")
+
+    result = osap_replicate.compute_osap_signals(
+        returns, tickers, date(2024, 2, 28), requested_signals=signals
+    )
+
+    assert set(result.keys()) == set(tickers)
+    # Every ticker gets a non-None dict in the proxy version
+    for ticker in tickers:
+        assert result[ticker] is not None, f"{ticker} should have a signal map"
+        assert set(result[ticker].keys()) == set(signals)
+
+    # All tickers MUST share the same map (factor-exposure proxy invariant)
+    assert result["NVDA"] == result["AAPL"] == result["CF"] == result["HST"]
+
+    # Rank ordering: BM (ls=2.0) > Mom12m (ls=0.2) > Beta (ls=-0.1)
+    one_map = result["NVDA"]
+    assert one_map["BM"] == pytest.approx(1.0)
+    assert one_map["Mom12m"] == pytest.approx(2 / 3)
+    assert one_map["Beta"] == pytest.approx(1 / 3)
+
+
+def test_compute_osap_signals_empty_returns_yields_none_per_ticker():
+    """Empty input DataFrame → every ticker maps to None (universe gap)."""
+    returns = _make_returns([])
+    tickers = ["NVDA", "AAPL"]
+
+    result = osap_replicate.compute_osap_signals(
+        returns, tickers, date(2024, 1, 31)
+    )
+
+    assert result == {"NVDA": None, "AAPL": None}
+
+
+def test_compute_osap_signals_universe_gap_before_coverage():
+    """``as_of`` precedes OSAP coverage → every ticker maps to None.
+
+    Distinct from pillar ``neutralize_missing`` — OSAP does NOT impute
+    a neutral value; the blend layer (commit 3) interprets None as
+    'no OSAP adjustment' and passes composite_score through.
+    """
+    returns = _make_returns(
+        [
+            ("BM", "01", "2024-01-31", 1.5),
+            ("BM", "10", "2024-01-31", -0.5),
+        ]
+    )
+    tickers = ["NVDA", "AAPL"]
+
+    # as_of well before the only observation
+    result = osap_replicate.compute_osap_signals(
+        returns,
+        tickers,
+        date(2020, 1, 1),
+        requested_signals=("BM",),
+    )
+
+    assert result == {"NVDA": None, "AAPL": None}
+
+
+def test_compute_osap_signals_default_manifest_is_100_signals():
+    """Sanity: the module's default manifest matches config.OSAP_SIGNALS_100
+    and the manifest itself has the expected shape."""
+    assert len(config.OSAP_SIGNALS_100) == 100
+    assert len(set(config.OSAP_SIGNALS_100)) == 100, "no duplicates in manifest"
+
+    # Theme buckets must sum to exactly 100
+    theme_sum = sum(
+        len(sigs) for sigs in config.OSAP_SIGNALS_BY_THEME.values()
+    )
+    assert theme_sum == 100
+
+
+def test_compute_osap_signals_uses_shipped_fixture():
+    """End-to-end with the shipped scout fixture
+    ``tests/fixtures/osap_returns_sample.csv``. Anchors the test suite
+    against the same file the @network live test uses, so a hand-edit
+    of the fixture surfaces here too."""
+    fixture = pd.read_csv(FIXTURE_CSV)
+    assert {"signalname", "port", "date", "ret"}.issubset(fixture.columns)
+
+    # Pick an as_of after the latest fixture date so all signals have
+    # at least one observation visible.
+    as_of_ts = pd.to_datetime(fixture["date"]).max()
+    as_of_dt = as_of_ts.date()
+
+    tickers = ["NVDA", "AAPL"]
+    result = osap_replicate.compute_osap_signals(
+        fixture,
+        tickers,
+        as_of_dt,
+        requested_signals=tuple(fixture["signalname"].unique()),
+    )
+
+    # At least one ticker should have a non-None signal map (the fixture
+    # carries 4 long-short pairs across 2 dates).
+    non_none_count = sum(1 for v in result.values() if v is not None)
+    assert non_none_count == len(tickers), (
+        "shipped fixture should produce a non-None signal map for every "
+        f"ticker; got {non_none_count}/{len(tickers)}"
+    )
diff --git a/tests/test_scoring/test_osap_blend.py b/tests/test_scoring/test_osap_blend.py
new file mode 100644
index 000000000..973332f99
--- /dev/null
+++ b/tests/test_scoring/test_osap_blend.py
@@ -0,0 +1,258 @@
+"""Tests for ``compute/scoring/osap_blend.py`` (Phase 4h commit 3).
+
+Path-b blend math + universe-gap pass-through invariant. The blend
+layer is observability-only this phase (Top-5 still ranked by raw
+composite_score per SKILL.md Rule 16), so test focus is on the formula
+correctness and the None/NaN universe-gap policy that downstream
+``compute/main.py`` (commit 5) relies on.
+"""
+
+from __future__ import annotations
+
+import math
+
+import numpy as np
+import pandas as pd
+import pytest
+
+from compute.scoring.osap_blend import (
+    OSAP_BLEND_WEIGHT_DEFAULT,
+    aggregate_osap_signals,
+    apply_osap_blend,
+)
+
+# ---------------------------------------------------------------------------
+# aggregate_osap_signals
+# ---------------------------------------------------------------------------
+
+
+def test_aggregate_osap_signals_mean_of_ranks_scaled_to_0_100() -> None:
+    """Mean of rank values × 100 — basic 3-ticker happy path.
+
+    Each ticker has the same proxy-mode signal map per commit 2 design,
+    but the helper is shape-agnostic so the test uses distinct maps to
+    isolate the math.
+    """
+    signal_map: dict[str, dict[str, float] | None] = {
+        "AAPL": {"Mom1m": 0.8, "Accruals": 0.4, "BM": 0.6},  # mean=0.6 → 60.0
+        "MSFT": {"Mom1m": 1.0, "Accruals": 0.2, "BM": 0.5},  # mean≈0.566 → 56.66
+        "NVDA": {"Mom1m": 0.1, "Accruals": 0.1, "BM": 0.1},  # mean=0.1 → 10.0
+    }
+
+    result = aggregate_osap_signals(signal_map)
+
+    assert result.name == "osap_signal_aggregate"
+    assert set(result.index) == {"AAPL", "MSFT", "NVDA"}
+    assert result["AAPL"] == pytest.approx(60.0)
+    assert result["MSFT"] == pytest.approx(100.0 * (1.0 + 0.2 + 0.5) / 3)
+    assert result["NVDA"] == pytest.approx(10.0)
+
+
+def test_aggregate_osap_signals_none_ticker_yields_nan() -> None:
+    """Universe-gap ticker (None map) → NaN aggregate, NOT zero."""
+    signal_map: dict[str, dict[str, float] | None] = {
+        "AAPL": {"Mom1m": 0.5},
+        "DELISTED": None,
+    }
+
+    result = aggregate_osap_signals(signal_map)
+
+    assert result["AAPL"] == pytest.approx(50.0)
+    assert math.isnan(result["DELISTED"])
+
+
+def test_aggregate_osap_signals_empty_inner_dict_yields_nan() -> None:
+    """Empty {} signal dict treated identically to None (no signals fired)."""
+    signal_map: dict[str, dict[str, float] | None] = {
+        "AAPL": {},
+    }
+
+    result = aggregate_osap_signals(signal_map)
+
+    assert math.isnan(result["AAPL"])
+
+
+def test_aggregate_osap_signals_empty_map_returns_empty_series() -> None:
+    """Empty input → empty Series, not error."""
+    result = aggregate_osap_signals({})
+
+    assert result.empty
+    assert result.dtype == float
+    assert result.name == "osap_signal_aggregate"
+
+
+# ---------------------------------------------------------------------------
+# apply_osap_blend
+# ---------------------------------------------------------------------------
+
+
+def test_apply_osap_blend_basic_50_50() -> None:
+    """50/50 default — blended = mean(composite, osap)."""
+    composite = pd.Series({"AAPL": 80.0, "MSFT": 60.0, "NVDA": 40.0})
+    osap = pd.Series({"AAPL": 40.0, "MSFT": 70.0, "NVDA": 100.0})
+
+    result = apply_osap_blend(composite, osap)
+
+    assert result.name == "composite_score_osap_adjusted"
+    assert result["AAPL"] == pytest.approx(60.0)  # (80 + 40) / 2
+    assert result["MSFT"] == pytest.approx(65.0)  # (60 + 70) / 2
+    assert result["NVDA"] == pytest.approx(70.0)  # (40 + 100) / 2
+
+
+def test_apply_osap_blend_weight_zero_returns_composite_unchanged() -> None:
+    """weight=0 → pure pass-through (no OSAP influence)."""
+    composite = pd.Series({"AAPL": 80.0, "MSFT": 60.0})
+    osap = pd.Series({"AAPL": 10.0, "MSFT": 90.0})
+
+    result = apply_osap_blend(composite, osap, weight=0.0)
+
+    pd.testing.assert_series_equal(
+        result.astype(float),
+        composite.astype(float),
+        check_names=False,
+    )
+
+
+def test_apply_osap_blend_weight_one_returns_osap_where_covered() -> None:
+    """weight=1 → pure OSAP for covered tickers; pass-through for NaN."""
+    composite = pd.Series({"AAPL": 80.0, "GAP": 60.0})
+    osap = pd.Series({"AAPL": 25.0, "GAP": float("nan")})
+
+    result = apply_osap_blend(composite, osap, weight=1.0)
+
+    assert result["AAPL"] == pytest.approx(25.0)
+    # NaN OSAP → composite passes through even at weight=1
+    assert result["GAP"] == pytest.approx(60.0)
+
+
+def test_apply_osap_blend_nan_osap_falls_back_to_composite() -> None:
+    """Universe-gap policy: NaN OSAP → composite unchanged."""
+    composite = pd.Series({"AAPL": 80.0, "GAP": 50.0, "MSFT": 60.0})
+    osap = pd.Series({"AAPL": 20.0, "GAP": float("nan"), "MSFT": 100.0})
+
+    result = apply_osap_blend(composite, osap, weight=0.5)
+
+    assert result["AAPL"] == pytest.approx(50.0)  # (80 + 20) / 2
+    assert result["GAP"] == pytest.approx(50.0)  # NaN → pass-through
+    assert result["MSFT"] == pytest.approx(80.0)  # (60 + 100) / 2
+
+
+def test_apply_osap_blend_empty_composite_returns_empty_series() -> None:
+    """Empty composite → empty output, no error."""
+    result = apply_osap_blend(
+        pd.Series(dtype=float),
+        pd.Series({"AAPL": 50.0}),
+    )
+
+    assert result.empty
+    assert result.name == "composite_score_osap_adjusted"
+
+
+def test_apply_osap_blend_clips_to_0_100() -> None:
+    """Output is clipped to [0, 100] to match composite-score domain."""
+    # Edge: composite=100, osap=100 → 100 (no clip needed).
+    # Edge: composite=0, osap=0 → 0 (no clip needed).
+    # Construct a degenerate case where naive math could exceed bounds
+    # if caller passes out-of-domain inputs.
+    composite = pd.Series({"HIGH": 100.0, "LOW": 0.0, "OOB": 150.0})
+    osap = pd.Series({"HIGH": 100.0, "LOW": 0.0, "OOB": 200.0})
+
+    result = apply_osap_blend(composite, osap, weight=0.5)
+
+    assert result["HIGH"] == pytest.approx(100.0)
+    assert result["LOW"] == pytest.approx(0.0)
+    # 0.5 * 150 + 0.5 * 200 = 175 → clipped to 100
+    assert result["OOB"] == pytest.approx(100.0)
+
+
+def test_apply_osap_blend_invalid_weight_below_zero_raises() -> None:
+    composite = pd.Series({"AAPL": 50.0})
+    osap = pd.Series({"AAPL": 50.0})
+
+    with pytest.raises(ValueError, match=r"weight must be in \[0, 1\]"):
+        apply_osap_blend(composite, osap, weight=-0.1)
+
+
+def test_apply_osap_blend_invalid_weight_above_one_raises() -> None:
+    composite = pd.Series({"AAPL": 50.0})
+    osap = pd.Series({"AAPL": 50.0})
+
+    with pytest.raises(ValueError, match=r"weight must be in \[0, 1\]"):
+        apply_osap_blend(composite, osap, weight=1.5)
+
+
+def test_apply_osap_blend_extra_osap_tickers_dropped_via_reindex() -> None:
+    """Tickers in OSAP but not composite are silently dropped — output
+    index matches composite_scores.index exactly."""
+    composite = pd.Series({"AAPL": 60.0, "MSFT": 70.0})
+    osap = pd.Series({"AAPL": 80.0, "MSFT": 50.0, "EXTRA": 99.0})
+
+    result = apply_osap_blend(composite, osap, weight=0.5)
+
+    assert list(result.index) == ["AAPL", "MSFT"]
+    assert "EXTRA" not in result.index
+
+
+def test_apply_osap_blend_missing_osap_ticker_becomes_passthrough() -> None:
+    """Composite ticker absent from OSAP series → reindex to NaN →
+    pass-through (universe-gap path)."""
+    composite = pd.Series({"AAPL": 60.0, "ABSENT": 75.0})
+    osap = pd.Series({"AAPL": 20.0})  # ABSENT not in OSAP
+
+    result = apply_osap_blend(composite, osap, weight=0.5)
+
+    assert result["AAPL"] == pytest.approx(40.0)  # (60 + 20) / 2
+    assert result["ABSENT"] == pytest.approx(75.0)  # pass-through
+
+
+def test_apply_osap_blend_default_weight_matches_constant() -> None:
+    """No explicit weight → uses OSAP_BLEND_WEIGHT_DEFAULT (0.5 lock)."""
+    composite = pd.Series({"AAPL": 80.0})
+    osap = pd.Series({"AAPL": 40.0})
+
+    result_default = apply_osap_blend(composite, osap)
+    result_explicit = apply_osap_blend(
+        composite, osap, weight=OSAP_BLEND_WEIGHT_DEFAULT
+    )
+
+    pd.testing.assert_series_equal(result_default, result_explicit)
+    # Sanity: the locked default is 0.5 per osap-integration/PLAN.md L168-170
+    assert OSAP_BLEND_WEIGHT_DEFAULT == 0.5
+
+
+# ---------------------------------------------------------------------------
+# Cross-module integration sanity (commit 2 → commit 3)
+# ---------------------------------------------------------------------------
+
+
+def test_aggregate_then_blend_round_trip_with_compute_osap_signals_shape() -> None:
+    """End-to-end shape: ``compute_osap_signals`` output → aggregate →
+    apply_osap_blend works without manual conversion."""
+    # Simulate compute_osap_signals output directly (avoid importing the
+    # full pipeline here — that's commit 5's integration test concern).
+    osap_signal_map: dict[str, dict[str, float] | None] = {
+        "AAPL": {"Mom1m": 0.9, "Accruals": 0.3, "BM": 0.6},
+        "MSFT": {"Mom1m": 0.5, "Accruals": 0.5, "BM": 0.5},
+        "GAP": None,  # Universe gap
+    }
+    composite = pd.Series({"AAPL": 80.0, "MSFT": 60.0, "GAP": 70.0})
+
+    aggregate = aggregate_osap_signals(osap_signal_map)
+    blended = apply_osap_blend(composite, aggregate)
+
+    # AAPL: (0.9 + 0.3 + 0.6) / 3 × 100 = 60 → (80 + 60) / 2 = 70
+    assert blended["AAPL"] == pytest.approx(70.0)
+    # MSFT: mean=0.5 → 50 → (60 + 50) / 2 = 55
+    assert blended["MSFT"] == pytest.approx(55.0)
+    # GAP: None → NaN → pass-through 70
+    assert blended["GAP"] == pytest.approx(70.0)
+
+
+def test_apply_osap_blend_preserves_dtype_float() -> None:
+    """Output dtype is float (matches composite_score in StockSummary)."""
+    composite = pd.Series({"AAPL": 80, "MSFT": 60}, dtype=int)  # int input
+    osap = pd.Series({"AAPL": 40.0, "MSFT": 70.0})
+
+    result = apply_osap_blend(composite, osap)
+
+    assert result.dtype == np.float64
diff --git a/tests/test_smoke.py b/tests/test_smoke.py
index fb6a8dccf..1043deb8c 100644
--- a/tests/test_smoke.py
+++ b/tests/test_smoke.py
@@ -5,7 +5,7 @@
 
 def test_phase0_scaffold_imports() -> None:
     assert config.UNIVERSE == "SP500"
-    assert config.SCHEMA_VERSION.startswith("0.8.")
+    assert config.SCHEMA_VERSION.startswith("0.9.")
 
 
 def test_phase0_paths_resolve() -> None:
diff --git a/tests/test_validation/test_osap_validation.py b/tests/test_validation/test_osap_validation.py
new file mode 100644
index 000000000..1e3019c0b
--- /dev/null
+++ b/tests/test_validation/test_osap_validation.py
@@ -0,0 +1,384 @@
+"""Tests for ``compute/validation/osap_validation.py`` (Phase 4h commit 4).
+
+Anchors the PBO/DSR hard-gate behavior + the rolling-12m Spearman IC
+observability metric. Coverage focus:
+
+- Bailey 2014 cohort framing (``n_trials = cohort_size``) holds
+  end-to-end through :func:`factor_passes_gates`.
+- Asymmetric NaN policy (per-signal ``dropna`` + cohort
+  ``fillna(0.0)``) is locked in regression — test #13 fails if a
+  future maintainer accidentally reverts to ``dropna(how='any')``.
+- Rejection-reason classification is exact (``insufficient_data`` /
+  ``high_pbo`` / ``low_dsr`` / ``gate_failed``) so commit 5's
+  metadata writer can group correctly.
+- Rolling-IC pure-pandas Spearman matches a hand-computed reference
+  and gracefully degrades on insufficient history.
+"""
+
+from __future__ import annotations
+
+import math
+
+import numpy as np
+import pandas as pd
+import pytest
+
+from compute.validation.osap_validation import (
+    MIN_OBS_PER_SIGNAL,
+    ROLLING_IC_WINDOW_MONTHS,
+    GateResult,
+    compute_rolling_ic_12m,
+    filter_accepted_signals,
+    gate_osap_signals,
+)
+
+
+def _long_format(
+    wide: pd.DataFrame,
+) -> pd.DataFrame:
+    """Helper: wide (date × signal) → long (signalname, date, ls_return).
+
+    Mirrors commit 2's ``compute_long_short_returns`` output. Drops
+    NaN cells (commit 2 does the same — long format never carries
+    explicit NaN rows).
+    """
+    long = (
+        wide.rename_axis(index="date", columns="signalname")
+        .stack()
+        .rename("ls_return")
+        .reset_index()
+    )
+    return long.dropna(subset=["ls_return"]).astype({"signalname": str})
+
+
+def _make_noise_cohort(
+    n_dates: int = 64,
+    n_signals: int = 10,
+    *,
+    seed: int = 42,
+    start: str = "2020-01-31",
+) -> pd.DataFrame:
+    """Synthetic monthly long-short returns — independent noise per signal.
+
+    Matches the synthetic-fixture pattern at
+    ``tests/test_validation/test_pbo_dsr.py:63`` (deterministic seed +
+    Normal(0, 0.05) monthly returns).
+    """
+    rng = np.random.default_rng(seed=seed)
+    dates = pd.date_range(start=start, periods=n_dates, freq="ME")
+    data = rng.normal(0.0, 0.05, size=(n_dates, n_signals))
+    cols = [f"NoiseSig{i:02d}" for i in range(n_signals)]
+    wide = pd.DataFrame(data, index=dates, columns=cols)
+    return _long_format(wide)
+
+
+# ---------------------------------------------------------------------------
+# gate_osap_signals
+# ---------------------------------------------------------------------------
+
+
+def test_gate_osap_signals_random_noise_yields_high_pbo() -> None:
+    """Pure-noise cohort → no signals accepted (Bailey 2014 invariant).
+
+    Independent random strategies have no persistent signal, so PBO
+    clusters near 0.5+ AND deflated Sharpe (corrected for n_trials
+    multiple-testing) lands at or below zero for typical samples. The
+    gate's hard PBO ≤ 0.5 conjunction-with DSR > 0 should reject every
+    signal — categorized as ``'high_pbo'`` (PBO-only fail),
+    ``'low_dsr'`` (DSR-only fail), or ``'gate_failed'`` (both fail)
+    depending on which side wins for that draw.
+    """
+    df = _make_noise_cohort(n_dates=64, n_signals=10, seed=42)
+
+    results = gate_osap_signals(df)
+
+    assert len(results) == 10
+    assert all(isinstance(r, GateResult) for r in results.values())
+
+    # Bailey 2014 invariant: pure noise → zero acceptances.
+    accepted = [s for s, r in results.items() if r.accepted]
+    assert accepted == []
+
+    # All rejections cite one of the three real-gate reasons (no
+    # short-circuits — every signal got a real PBO/DSR computation).
+    reasons = {r.rejection_reason for r in results.values()}
+    assert reasons.issubset({"high_pbo", "low_dsr", "gate_failed"})
+    assert "insufficient_data" not in reasons
+
+    # Sanity: every result has populated pbo + dsr + sharpe floats.
+    for r in results.values():
+        assert r.pbo is not None and 0.0 <= r.pbo <= 1.0
+        assert r.dsr is not None
+        assert r.sharpe is not None
+        assert r.n_observations == 64
+
+
+def test_gate_osap_signals_low_sharpe_signal_rejected_for_dsr() -> None:
+    """Near-zero-mean signal in a strong cohort → fails DSR."""
+    rng = np.random.default_rng(seed=7)
+    dates = pd.date_range(start="2020-01-31", periods=64, freq="ME")
+    # 9 random signals at modest scale + 1 near-zero signal.
+    data = rng.normal(0.0, 0.05, size=(64, 9))
+    near_zero = rng.normal(0.0, 1e-4, size=(64, 1))
+    wide = pd.DataFrame(
+        np.hstack([data, near_zero]),
+        index=dates,
+        columns=[f"Sig{i:02d}" for i in range(9)] + ["DeadSig"],
+    )
+    df = _long_format(wide)
+
+    results = gate_osap_signals(df)
+
+    dead = results["DeadSig"]
+    # Near-zero σ → DSR ≈ 0 → fails DSR_VETO_THRESHOLD (= 0.0, strict >).
+    assert dead.accepted is False
+    assert dead.rejection_reason in ("low_dsr", "gate_failed")
+    assert dead.dsr is not None and dead.dsr <= 0.0
+
+
+def test_gate_osap_signals_strong_signal_accepted() -> None:
+    """Monotone-drift signal beats noisy cohort → accepted with float metrics."""
+    rng = np.random.default_rng(seed=11)
+    dates = pd.date_range(start="2015-01-31", periods=120, freq="ME")
+    # 9 noise + 1 strong drift signal with very high Sharpe and stable OOS rank.
+    noise = rng.normal(0.0, 0.05, size=(120, 9))
+    strong = np.full((120, 1), 0.03) + rng.normal(0.0, 0.005, size=(120, 1))
+    wide = pd.DataFrame(
+        np.hstack([noise, strong]),
+        index=dates,
+        columns=[f"Noise{i:02d}" for i in range(9)] + ["StrongSig"],
+    )
+    df = _long_format(wide)
+
+    results = gate_osap_signals(df)
+
+    strong_result = results["StrongSig"]
+    assert strong_result.accepted is True
+    assert strong_result.rejection_reason is None
+    assert strong_result.pbo is not None and strong_result.pbo <= 0.5
+    assert strong_result.dsr is not None and strong_result.dsr > 0.0
+    assert strong_result.sharpe is not None and strong_result.sharpe > 0.0
+
+
+def test_gate_osap_signals_insufficient_data() -> None:
+    """Cohort with < ``MIN_OBS_PER_SIGNAL`` rows → all signals rejected
+    with reason ``insufficient_data`` (cohort precondition fails)."""
+    rng = np.random.default_rng(seed=3)
+    dates = pd.date_range(start="2024-01-31", periods=10, freq="ME")
+    wide = pd.DataFrame(
+        rng.normal(0.0, 0.05, size=(10, 4)),
+        index=dates,
+        columns=["A", "B", "C", "D"],
+    )
+    df = _long_format(wide)
+
+    results = gate_osap_signals(df)
+
+    assert len(results) == 4
+    for r in results.values():
+        assert r.accepted is False
+        assert r.rejection_reason == "insufficient_data"
+        assert r.pbo is None
+        assert r.dsr is None
+        assert r.sharpe is None
+        assert r.n_observations == 10
+
+
+def test_gate_osap_signals_requested_signals_filter() -> None:
+    """``requested_signals`` subsets the cohort before gating — result
+    has only those keys."""
+    df = _make_noise_cohort(n_dates=48, n_signals=5, seed=42)
+    # Input contains NoiseSig00..NoiseSig04. Request a 3-signal subset.
+    requested = ("NoiseSig00", "NoiseSig02", "NoiseSig04")
+
+    results = gate_osap_signals(df, requested_signals=requested)
+
+    assert set(results.keys()) == set(requested)
+    # All three got a real PBO/DSR run (cohort size 3 ≥ 2, dates ≥ 16).
+    for r in results.values():
+        assert r.rejection_reason != "insufficient_data"
+
+
+def test_gate_osap_signals_requested_none_uses_all_signals_in_df() -> None:
+    """``requested_signals=None`` (default) → result covers every
+    unique signalname in the input."""
+    df = _make_noise_cohort(n_dates=32, n_signals=6, seed=1)
+
+    results = gate_osap_signals(df, requested_signals=None)
+
+    assert set(results.keys()) == set(df["signalname"].unique())
+
+
+def test_gate_osap_signals_empty_input_returns_empty_dict() -> None:
+    """Empty long-format DF → ``{}``, no crash."""
+    df = pd.DataFrame(columns=["signalname", "date", "ls_return"])
+    results = gate_osap_signals(df)
+    assert results == {}
+
+
+def test_gate_osap_signals_single_signal_cohort_rejects_with_insufficient_data() -> None:
+    """Cohort with 1 column → PBO precondition (``cohort_size < 2``)
+    short-circuits all signals to insufficient_data."""
+    rng = np.random.default_rng(seed=5)
+    dates = pd.date_range(start="2020-01-31", periods=64, freq="ME")
+    wide = pd.DataFrame(
+        rng.normal(0.0, 0.05, size=(64, 1)),
+        index=dates,
+        columns=["LonelySig"],
+    )
+    df = _long_format(wide)
+
+    results = gate_osap_signals(df)
+
+    assert results["LonelySig"].accepted is False
+    assert results["LonelySig"].rejection_reason == "insufficient_data"
+    assert results["LonelySig"].pbo is None
+
+
+def test_compute_rolling_ic_12m_known_signal() -> None:
+    """Hand-constructed lag-1-correlated series → Spearman ≈ expected."""
+    # Strictly monotone increasing series — every (t, t+1) pair has rank
+    # correlation = 1.0 (perfectly preserved order under shift).
+    dates = pd.date_range(start="2023-01-31", periods=15, freq="ME")
+    series = pd.DataFrame(
+        {
+            "signalname": ["S"] * 15,
+            "date": dates,
+            "ls_return": np.arange(15, dtype=float) * 0.01,
+        }
+    )
+
+    ic = compute_rolling_ic_12m(series, "S")
+
+    assert ic is not None
+    assert ic == pytest.approx(1.0, abs=1e-9)
+
+
+def test_compute_rolling_ic_12m_insufficient_history() -> None:
+    """Signal with < 13 monthly observations → ``None``."""
+    dates = pd.date_range(start="2024-01-31", periods=8, freq="ME")
+    df = pd.DataFrame(
+        {
+            "signalname": ["S"] * 8,
+            "date": dates,
+            "ls_return": np.linspace(0.01, 0.08, 8),
+        }
+    )
+
+    assert compute_rolling_ic_12m(df, "S") is None
+
+
+def test_compute_rolling_ic_12m_nan_safe_with_gaps() -> None:
+    """NaN ``ls_return`` rows OUTSIDE the rolling-12m tail window are
+    pruned by ``tail(13)`` and do not poison the IC.
+
+    Production long-format DataFrames never carry explicit NaN rows
+    (commit 2's ``compute_long_short_returns`` drops them at long
+    conversion), but pre-window NaN sourced from upstream test fixtures
+    or future refactors must not propagate into the Spearman result.
+    """
+    # 20 rows total. The last 13 (tail window) are strictly monotone
+    # with no NaN. NaN are punched into older history.
+    dates = pd.date_range(start="2022-06-30", periods=20, freq="ME")
+    rets = np.arange(20, dtype=float) * 0.01
+    rets[0] = np.nan
+    rets[3] = np.nan
+    rets[5] = np.nan  # All three NaN sit in indices 0..6 (pruned by tail(13)).
+    df = pd.DataFrame(
+        {
+            "signalname": ["S"] * 20,
+            "date": dates,
+            "ls_return": rets,
+        }
+    )
+
+    ic = compute_rolling_ic_12m(df, "S")
+
+    # Strictly monotone tail-13 → Spearman should be exactly 1.0.
+    assert ic is not None
+    assert math.isfinite(ic)
+    assert ic == pytest.approx(1.0, abs=1e-9)
+
+
+def test_filter_accepted_signals_splits_into_sorted_lists() -> None:
+    """Mixed gate_results → alphabetically sorted accepted vs excluded;
+    union equals the full key set."""
+    gate_results = {
+        "C_pass": GateResult(True, 0.3, 1.5, 1.2, 60, None),
+        "A_fail_pbo": GateResult(False, 0.7, 1.2, 1.0, 60, "high_pbo"),
+        "B_pass": GateResult(True, 0.4, 2.0, 1.6, 60, None),
+        "Z_insufficient": GateResult(
+            False, None, None, None, 10, "insufficient_data"
+        ),
+        "Y_fail_dsr": GateResult(False, 0.3, -0.5, 0.2, 60, "low_dsr"),
+    }
+
+    accepted, excluded = filter_accepted_signals(gate_results)
+
+    assert accepted == ["B_pass", "C_pass"]
+    assert excluded == ["A_fail_pbo", "Y_fail_dsr", "Z_insufficient"]
+    # Round-trip: union of the two lists equals all gate-result keys.
+    assert set(accepted + excluded) == set(gate_results.keys())
+
+
+def test_gate_osap_signals_sparse_cohort_zero_filled_not_decimated() -> None:
+    """Lock the §NaN policy: sparse coverage signals stay in the cohort
+    via ``fillna(0.0)`` rather than being dropped by ``dropna(how='any')``.
+
+    Regression guard — if a future maintainer reverts the cohort
+    fillna(0.0) to ``dropna(how='any')``, the cohort matrix would
+    collapse to the intersection (8 rows after sparse signals drop 16
+    of 64 months each), tripping the cohort-precondition row count and
+    cascading every signal to ``insufficient_data``.
+    """
+    rng = np.random.default_rng(seed=23)
+    dates = pd.date_range(start="2020-01-31", periods=64, freq="ME")
+    n_signals = 10
+    data = rng.normal(0.0, 0.05, size=(64, n_signals))
+    cols = [f"Sig{i:02d}" for i in range(n_signals)]
+    wide = pd.DataFrame(data, index=dates, columns=cols)
+
+    # Punch sparse-coverage NaN holes into 3 signals (~25% of rows
+    # missing each, but DIFFERENT rows per signal — so a
+    # dropna(how='any') intersection would decimate hard).
+    sparse_signals = ["Sig00", "Sig03", "Sig07"]
+    for offset, sig in enumerate(sparse_signals):
+        nan_rows = np.arange(offset, 64, 4)  # 16 NaN rows, offset per signal
+        wide.loc[wide.index[nan_rows], sig] = np.nan
+
+    df_long = _long_format(wide)
+
+    results = gate_osap_signals(df_long)
+
+    # All 10 signals received a GateResult (not collapsed by dropna).
+    assert set(results.keys()) == set(cols)
+
+    # The 3 sparse signals each have ≥ 48 obs (well above MIN_OBS_PER_SIGNAL
+    # = 16) so none short-circuited on per-signal insufficient_data.
+    # Critically: NO signal should have rejection_reason='insufficient_data'
+    # — that would mean the cohort precondition failed, which is exactly
+    # what dropna(how='any') would trigger.
+    insufficient = [
+        s for s, r in results.items() if r.rejection_reason == "insufficient_data"
+    ]
+    assert insufficient == [], (
+        f"Cohort decimation detected — signals {insufficient} short-"
+        f"circuited on insufficient_data despite having enough per-signal "
+        f"obs. Did the cohort fillna(0.0) get reverted to dropna(how='any')?"
+    )
+
+    # Every signal has a populated n_observations from the actual
+    # factor_passes_gates call (not the precheck-only count).
+    for sig, r in results.items():
+        assert r.pbo is not None, f"{sig} missing pbo — short-circuited?"
+        assert r.dsr is not None, f"{sig} missing dsr — short-circuited?"
+
+
+def test_module_load_constants_sourced_from_pbo_dsr() -> None:
+    """Smoke test — module-level constants match pbo_dsr defaults so
+    a downstream caller using the wrapper's defaults gets the canonical
+    Phase 4 PBO ≤ 0.5 / DSR > 0 gate."""
+    from compute.validation.pbo_dsr import DEFAULT_N_PARTITIONS as PD_NP
+
+    assert MIN_OBS_PER_SIGNAL == PD_NP  # 16
+    assert ROLLING_IC_WINDOW_MONTHS == 12