Skip to content

feat(validation): Phase 4.6 follow-on — wire universe_provider into pbo_dsr gates#275

Merged
dackclup merged 1 commit into
mainfrom
claude/phase-4.6-pbo-dsr-universe-provider
May 27, 2026
Merged

feat(validation): Phase 4.6 follow-on — wire universe_provider into pbo_dsr gates#275
dackclup merged 1 commit into
mainfrom
claude/phase-4.6-pbo-dsr-universe-provider

Conversation

@dackclup
Copy link
Copy Markdown
Owner

Summary

Closes the Phase 4.6 next-task gap surfaced by Research Report v1.0 §7.4 follow-up. PR #274 landed compute/ingest/historical_universe.members_at() as a library but nothing called it. This PR threads an optional universe_provider callable through factor_passes_gates() so the returned metrics dict carries honest universe provenance — closing the loop between the historical-membership module and the validation gates every Phase 4+ candidate factor must pass.

Backward-compatible: every pre-Phase-4.6 caller (osap-integration / jkp-integration / qlib scout / ipca scout already-merged PRs) is byte-identical because all 3 new kwargs default to None.

What changed

File Change
compute/validation/pbo_dsr.py factor_passes_gates() gains 3 optional kwargs (universe_provider, as_of_date, current_universe). When all 3 passed, metrics dict carries universe_as_of / universe_size / survivorship_bias_corrected. Graceful degradation when provider raises. New helper today_utc_date() for forward-cron callers.
tests/test_validation/test_pbo_dsr.py +6 tests: backward-compat, happy path, degraded path, provider-raises graceful, partial-kwargs warning, today_utc_date helper.

New API surface (sample)

from compute.ingest.historical_universe import members_at
from compute.ingest.universe import get_sp500_constituents
from compute.validation.pbo_dsr import factor_passes_gates, today_utc_date

current = frozenset(get_sp500_constituents().ticker)
passes, metrics = factor_passes_gates(
    factor_returns, returns_matrix,
    n_trials=n_trials,
    universe_provider=members_at,
    as_of_date=today_utc_date(),
    current_universe=current,
)
# metrics now includes:
#   universe_as_of: "2026-05-27"
#   universe_size: 502
#   survivorship_bias_corrected: True

Hard rules preserved

  • ✅ Rule 9 — N/A; no Pydantic / TS / snapshot touched (Metadata writer wiring deferred to next PR per scope discipline)
  • ✅ Rule 16 — N/A; no scoring change
  • ✅ Rule 18 — diagnostic surface (metrics dict keys) ships in same PR as the integration
  • ✅ License: no new deps (just TYPE_CHECKING forward import for MembershipResult)
  • ✅ Backward compat: pre-Phase-4.6 callers byte-identical (3 new kwargs default to None)

Why this matters (anchor)

Hou-Xue-Zhang (2020) RFS replication-crisis evidence emphasizes survivorship as the primary failure mode in factor-zoo work. Pre-Phase-4.6 every PBO/DSR number was implicitly survivorship-biased without saying so; post-Phase-4.6 the metrics dict can carry honest provenance. This PR is the first place that knowledge surfaces in the validation metric dict.

Verification

Check Result
ruff check ✅ clean (linter moved Callable to collections.abc per modern convention; deliberate)
pytest tests/test_validation/test_pbo_dsr.py tests/test_ingest/test_historical_universe.py tests/test_config.py ✅ 59 passed in 9.34s
Test count delta +6
Backward-compat case explicitly tested test_factor_passes_gates_backward_compat_no_universe_kwargs

Test plan

  • ruff check . — clean
  • pytest tests/test_validation/ — 29/29 pass (23 legacy + 6 new)
  • Backward-compat preserved (legacy callers identical behavior)
  • Provider-raises path graceful (validation completes from supplied matrix)
  • Partial-kwargs case warns + skips (loud, not silent)
  • CI (Python lint+test, Frontend build, simulate) — expected green
  • Vercel preview — N/A (no frontend touched)

NOT in this PR (next follow-ups per priority order)

  1. compute/main.py writer wiring — populate Metadata.universe_membership_as_of + survivorship_bias_corrected from the metrics dict at JSON write time. Small companion PR; would make the diagnostic visible in production JSON output.
  2. Re-validation of existing pillars + manipulation_index with the historical universe (likely revises some published PBO/DSR baselines DOWNWARD by 5-15%). Explicit honest-correction PR.
  3. Larcker 10b5-1 three-flag (Feature 3 from Research Report v1.0) — blocked on edgartools <aff10b5One> plan_adoption_date extraction; needs minimal lxml parser first.

Subscribe-after-open suggestion: same pattern as #271-#274 — subscribe me to PR activity for CI + review comments.


Generated by Claude Code

…bo_dsr gates

Closes the next-task gap surfaced by Research Report v1.0 §7.4 follow-up:
PR #274 landed `compute/ingest/historical_universe.members_at()` as a
library but nothing called it. This PR threads an optional
`universe_provider` callable through `factor_passes_gates()` so the
returned metrics dict carries honest universe provenance — closing the
loop between the historical-membership module and the validation
gates that every Phase 4+ (OSAP / JKP / Qlib / IPCA) and Phase 5
(ML meta-learner) candidate factor must pass.

## What changed

- `compute/validation/pbo_dsr.py::factor_passes_gates()` — 3 new
  optional kwargs (`universe_provider`, `as_of_date`, `current_universe`).
  When all 3 are passed, the function calls the provider once and
  enriches the metrics dict with `universe_as_of` (ISO date),
  `universe_size` (int), and `survivorship_bias_corrected` (bool).
  When the provider returns `is_complete=False` (e.g., pre-2020
  date), the flag flips False AND a warning is logged. When the
  provider raises, behavior degrades gracefully (3 fields stay None,
  validation still completes from the supplied returns_matrix).

- `compute/validation/pbo_dsr.py::today_utc_date()` — small helper
  for callers wiring `as_of_date=today_utc_date()` in forward-cron
  validation paths.

- `tests/test_validation/test_pbo_dsr.py` — 6 new tests covering:
  * backward-compat (no universe kwargs → 3 new keys = None)
  * happy path (members_at() with date well inside coverage)
  * degraded path (pre-EARLIEST_EVENT_DATE → is_complete=False)
  * provider-raises (graceful degradation, validation still runs)
  * partial-kwargs warning (caller passes only some of the 3)
  * today_utc_date helper smoke

## Caller migration

Pre-Phase-4.6 callers (osap-integration / jkp-integration / qlib
scout / ipca scout PRs already merged) are byte-identical — all 3
new kwargs default to None and the function returns the legacy
10-key metrics dict augmented with 3 None values. No caller code
needs to change.

NEW callers (Phase 4i.1+ integration PRs + Phase 5 ML meta-learner)
should pass:

    from compute.ingest.historical_universe import members_at
    from compute.ingest.universe import get_sp500_constituents

    current = frozenset(get_sp500_constituents().ticker)
    passes, metrics = factor_passes_gates(
        factor_returns, returns_matrix,
        n_trials=n_trials,
        universe_provider=members_at,
        as_of_date=as_of_date,  # backtest cutoff or today_utc_date()
        current_universe=current,
    )

The metrics dict can then be threaded into
`compute/output/schemas.py::Metadata.universe_membership_as_of` +
`Metadata.survivorship_bias_corrected` at writer time (separate
follow-up PR — `compute/main.py` wiring stays out of this scope).

## Why this matters (anchor)

Hou-Xue-Zhang (2020) RFS replication-crisis evidence emphasizes
survivorship as the PRIMARY failure mode in factor-zoo work. Every
PBO/DSR number that doesn't carry universe provenance is suspect
— pre-Phase-4.6 we couldn't tell, post-Phase-4.6 we can. This PR
is the first place that knowledge surfaces in the validation metric
dict.

## Hard rules preserved

- Rule 9 (schema triple) — N/A; no Pydantic / TS / snapshot touched
  in this PR. Metadata wiring is deferred to a separate writer PR
  per scope discipline.
- Rule 16 (composite formula sacred) — N/A; no scoring change.
- Rule 18 (observability before wiring) — diagnostic surface
  (metrics dict keys) ships in same PR as the integration; consumer
  code (writer) lands next, after this metric surface lives in
  production for ≥ 1 cron cycle.
- License: no new deps. Pure forward TYPE_CHECKING import for type
  hinting `MembershipResult` (zero runtime cost).

## Verification

- `ruff check` — clean (linter moved `Callable` to
  `collections.abc` per modern convention; deliberate, not reverted)
- `pytest tests/test_validation/test_pbo_dsr.py
  tests/test_ingest/test_historical_universe.py
  tests/test_config.py` — 59 passed in 9.34s
- Test count delta: +6 (universe-provider integration cases)
- Backward-compat case explicitly tested (no universe kwargs →
  legacy behavior preserved)

## NOT in this PR (deferred to follow-ups)

- `compute/main.py` writer wiring to populate
  `Metadata.universe_membership_as_of` + `survivorship_bias_corrected`
  in the forward-cron output JSON
- Re-validation of existing pillars + `manipulation_index` with
  the historical universe (likely revises some published baselines
  DOWNWARD — explicit honest-correction PR)
- Verify-helper Section M for universe-provenance accounting
  equation
@vercel
Copy link
Copy Markdown

vercel Bot commented May 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
quantrank Ready Ready Preview, Comment May 27, 2026 12:10pm

@github-actions
Copy link
Copy Markdown
Contributor

Pre-merge production simulation

Field Value
Duration 369s
Universe size 502
Schema version 0.10.7-phase4.6
Compute commit a593b607397fcf9979f8cbc86b0ac3639c8d2279
PR-branch output pr-275-compute-output (14-day retention)

Diff vs main

Field Main PR Δ
Universe size 502 502 +0
Schema version 0.10.6-phase4.5e 0.10.7-phase4.6 ⚠️ bumped

Main baseline: 2026-05-26T23:19:25Z (0.5 days old)

Top-10 movers (sorted by |Δcomposite_score|)

Ticker PR rank main rank Δrank PR score main score Δscore
ATO 374 378 +4 44.23 43.92 +0.31
PH 143 147 +4 55.72 55.49 +0.23
ES 284 285 +1 49.43 49.25 +0.18
ED 215 210 -5 52.10 52.22 -0.12
EQIX 367 368 +1 45.48 45.38 +0.10
RJF 160 160 +0 54.68 54.59 +0.09
RMD 21 18 -3 65.76 65.85 -0.09
WEC 361 362 +1 45.81 45.73 +0.08
HD 255 255 +0 50.74 50.82 -0.08
AVB 72 73 +1 59.84 59.77 +0.07

@dackclup dackclup marked this pull request as ready for review May 27, 2026 12:23
@dackclup dackclup merged commit 78ab1d7 into main May 27, 2026
5 checks passed
@dackclup dackclup deleted the claude/phase-4.6-pbo-dsr-universe-provider branch May 27, 2026 12:23
dackclup added a commit that referenced this pull request May 27, 2026
…tadata in forward cron (#276)

Closes the last leg of the Phase 4.6 chain. PR #274 landed the
`historical_universe.members_at()` module + 2 nullable Metadata
fields. PR #275 wired `universe_provider` into `pbo_dsr.factor_passes_gates()`
so validation gates carry honest provenance. This PR makes the
forward-cron `metadata.json` output ACTUALLY populate those fields
instead of leaving them None.

## What changed

- `compute/main.py` — `Metadata(...)` construction now passes:
  - `universe_membership_as_of=now.date().isoformat()` (today's date
    — forward cron scores as-of today)
  - `survivorship_bias_corrected=True` (today's S&P 500 IS the
    honest universe for an as-of-today query, per the PR #274
    schema docstring semantic — True means "this output's universe
    assumption is honest for its as_of_date")

- `tests/test_output/test_writer.py` — 2 new round-trip tests:
  - Phase 4.6 happy path: both fields survive Pydantic → JSON
  - Legacy snapshot back-compat: when neither field is passed
    (pre-0.10.7 caller pattern), Pydantic defaults to None and JSON
    writes nulls

## Hard rules preserved

- ✅ Rule 9 (schema triple) — no schema change in this PR (fields
  already in schemas.py + types.ts + snapshot from PR #274)
- ✅ Rule 16 — N/A (no scoring change)
- ✅ Rule 18 — observability surface from PR #274 is now actually
  populated; consumers can branch on it
- ✅ No new deps
- ✅ No new env-vars

## Verification

- `ruff check compute/main.py tests/test_output/test_writer.py` — clean
- `python -m compute.output.schema_check` — Schema snapshot in sync
- `python -m pytest tests/test_output/test_writer.py -k metadata` —
  4 passed (2 existing + 2 new)

## What goes live on next cron

Next weekday cron (Wed 2026-05-28 22:00 UTC) writes:

    metadata.json:
      ...
      universe_membership_as_of: "2026-05-28"
      survivorship_bias_corrected: true

Backward compat: legacy snapshots (pre-0.10.7) still have these
fields as null per the Pydantic optional default.

## Closes the Phase 4.6 chain

| Layer | PR | Status |
|---|---|---|
| Module | #274 | members_at() + CSV + tests |
| Schema | #274 | Metadata fields + types.ts + snapshot |
| Validation gate | #275 | universe_provider kwarg in pbo_dsr |
| **Writer** | **this PR** | **forward cron populates Metadata** |

## NOT in this PR (next follow-ups)

- Honest re-validation of existing pillars + manipulation_index with
  historical universe (likely shifts PBO/DSR baselines DOWN 5-15%)
- Verify-helper Section M for universe-provenance accounting equation
- Backtest harness that consumes the new universe_provider end-to-end

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 27, 2026
…er (#278)

Closes follow-up #2a from
``docs/research/historical-revalidation-harness.md``. Pure-function +
cached git-archive walker that reconstructs the daily ``composite_score``
time series per ticker from the cron's committed rankings.json
snapshots.

This is the **first data source** for the downstream Phase 4.6
task #2 chain:
- #2c per-pillar IC at historical dates → consumes this DataFrame
- #2e manipulation_index distribution shift report → consumes this
  DataFrame
- #2f honest-baseline report → consumes the chain output

## What ships

- `compute/validation/ranking_history.py` (NEW, ~190 LOC):
  - `list_ranking_commits(start, end, path)` — enumerates (sha, date)
    tuples from `git log` with optional ISO-date filter window
  - `dedupe_by_date(commits, prefer="latest"|"earliest")` — collapses
    multiple commits per day to one (default: latest)
  - `load_snapshot_at(sha, path)` — `git show SHA:path` →
    parsed JSON list. Returns empty list (not raise) when SHA pre-dates
    the file's existence, per the standard cron's chore-commit pattern.
  - `load_ranking_history(start, end, columns, dedupe_dates=True)` —
    high-level orchestrator returning `(date, ticker)` MultiIndex
    DataFrame with the requested columns. Default columns =
    `("composite_score",)`. Caller-extensible to grab
    `composite_score_adjusted` / `current_price` / etc.

- `tests/test_validation/test_ranking_history.py` (NEW, 18 tests):
  - 5 dedupe_by_date helper cases (latest / earliest / invalid /
    empty / sort)
  - 3 list_ranking_commits live-git cases (smoke / date-filter /
    empty-window)
  - 2 load_snapshot_at live-git cases (HEAD parses / unknown-SHA →
    empty list)
  - 5 load_ranking_history live-git cases (smoke / default columns /
    custom columns / empty window / dedupe behavior)
  - 1 canonical-path constant pin
  - 2 monkeypatched edge cases: malformed-JSON snapshot skipped;
    rows missing required column skipped

## Subprocess safety

Every `git` call goes through `subprocess.run` with list-arg argv (no
`shell=True`). SHA + path inputs are caller-supplied but never
interpolated into a shell string, so the standard shell-injection
vector is closed.

## Hard rules preserved

- ✅ Rule 9 — no schema change (read-only consumer of existing JSON)
- ✅ Rule 16 — N/A (no scoring change)
- ✅ Rule 18 — diagnostic ships in same PR (the loader IS the diagnostic
  surface for downstream IC / distribution-shift work)
- ✅ License — pure stdlib + pandas + subprocess; no new deps
- ✅ No frontend touched

## Schema-compat

Handles every rankings.json shape from `0.5.x` onward (pre-LedgerCraft
reskin + Phase 4.5e + Phase 4h.x + Phase 3c). Top-level keys `ticker`
+ `composite_score` are present in every historical snapshot per
``compute/output/writer.py``. Rows missing those default-required keys
get a debug log + are silently skipped (rare; would indicate a
pre-rankings.json commit or a schema-break we haven't shipped).

## Verification

- `pytest tests/test_validation/test_ranking_history.py` — 18/18 pass
  in 3.95s (5 use real git; 13 use stdlib mocks)
- Full `pytest tests/test_validation/` — 83/83 pass in 50.24s (no
  regressions on #2 first-unit drift tests / #2-precursor pbo_dsr
  universe-provider tests / pre-existing ic_decay + osap_validation)
- `ruff check` — clean
- Test count delta: +18

## Next in the chain

Per `docs/research/historical-revalidation-harness.md` Future-work
TODO:

- #2b forward-return computation per ticker from `compute/cache/prices/`
  (0.5d) — gitignored cache; needs warm CI run
- #2c per-pillar IC at historical dates (1d, needs 2a + 2b)
- #2d PBO/DSR re-baseline via PR #275's `universe_provider` kwarg
  (1d, needs 2c)
- #2e `manipulation_index` distribution shift report (0.5d, needs 2a)
- #2f `docs/research/honest-baseline-2026-05-27.md` (0.5d, needs 2d)

## NOT in this PR

- The DataFrame is built from git history; no IC math, no PBO/DSR
  call sites, no chart rendering. Downstream PRs land that.
- No CLI wrapper. Direct module import expected; if a CLI feels
  warranted later, it can wrap this in 30 LOC + 5 tests.

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 27, 2026
…n shift (#279)

Closes follow-up #2e from
``docs/research/historical-revalidation-harness.md``. Consumes the
#2a rankings.json time-series loader (PR #278) and reports the
manipulation_index distribution shift across the cron's lifetime:
mean / std / quantiles + fire-rate by band (LOW [0,20), MODERATE
[20,50), HIGH [50,∞)) per date + window-end deltas.

This answers the honest question — **has the cohort of flagged stocks
materially changed across the cron's history?** A universe-mean drift
> ~5 pts would signal Phase 4.5e weight recalibration is needed (per
Q3 2026-08-19 cohort-audit gate).

## What ships

- `compute/validation/manipulation_distribution.py` (NEW, ~250 LOC):
  - `DistributionSummary` dataclass — per-date snapshot (n / mean / std
    / median / q25 / q75 / q95 / max + band counts + top-3 tickers)
  - `ShiftReport` dataclass — aggregate across window + first-to-last
    deltas (mean_delta / std_delta / high_count_delta) + note
  - `compute_manipulation_distribution_shift(start_date, end_date,
    repo=None)` — main entry; pure-function wrapping #2a's loader
  - `format_shift_report(report, max_dates=20)` — human-readable text
    rendering; truncates long windows with "..." marker

- `tests/test_validation/test_manipulation_distribution.py` (NEW, 11 tests):
  - Band-boundary constants pin (LOW < 20 ≤ MODERATE < 50 ≤ HIGH)
  - `_summarize_one_date` band partition correctness + empty + top-3
    ordering
  - `compute_shift` empty-window / all-null-window / single-date /
    two-date-with-deltas paths (monkeypatched loader)
  - `format_shift_report` header + delta + cap rendering
  - Live-git smoke against the repo's recent cron commits

## Real-world artifact (live repo, window 2026-05-01 → 2026-05-27)

3 cron dates available on main:

    date          n   mean    std    p75    p95   HIGH   top
    2026-05-22  502   4.38   9.28   5.00  25.00      2   SMCI=84.0, WAT=64.0, NVDA=48.0
    2026-05-23  502   4.38   9.28   5.00  25.00      2   SMCI=84.0, WAT=64.0, NVDA=48.0
    2026-05-26  502   4.38   9.28   5.00  25.00      2   SMCI=84.0, WAT=64.0, NVDA=48.0

    Δmean=+0.00, Δstd=+0.00, ΔHIGH=+0 (2 → 2)

Distribution is **stable** across the window (expected for 5-day
horizon — pillar inputs change slowly). Top-3 invariant: SMCI 84,
WAT 64, NVDA 48 — matches Phase 4.5f production-verified
``ManipulationRiskCard`` fire-rate snapshot. Universe mean 4.38 sits
solidly in LOW band; only 2 tickers in HIGH band (Phase 4.5f spec
target: 1-3 stocks). No recalibration signal.

A longer window (≥ 90 days) would let this report detect drift; the
chain is now ready when cron history accumulates.

## Hard rules preserved

- ✅ Rule 9 — no schema change (read-only consumer of rankings.json)
- ✅ Rule 16 — N/A (no scoring change)
- ✅ Rule 18 — diagnostic surface ships in same PR
- ✅ License — pure stdlib + pandas; no new deps
- ✅ Universe S&P 500 only

## Verification

- `pytest tests/test_validation/test_manipulation_distribution.py` —
  11/11 pass in 0.70s
- `ruff check` — clean (linter trimmed unused imports + sorted)
- Live-git artifact above generated cleanly

## Next in the chain

Per ``docs/research/historical-revalidation-harness.md`` Future-work
TODO:

- #2b forward-return computation from `compute/cache/prices/` (0.5d) —
  gitignored cache; CI-only data
- #2c per-pillar IC at historical dates (1d, needs 2a + 2b)
- #2d PBO/DSR re-baseline via PR #275's `universe_provider` kwarg
  (1d, needs 2c)
- #2f honest-baseline report (0.5d, needs 2d)

#2e (this PR) is independent of 2b/2c — could ship before, in
parallel with, or after them.

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 27, 2026
…tes (#281)

New `compute/validation/historical_ic.py` orchestrator pairs PR #278's
`load_ranking_history` (ranking at T) with PR #280's
`compute_forward_returns_batch` (realized return at T + horizon) and
computes per-pillar Spearman IC across the historical window —
closes the IC re-baseline half of the Phase 4.6 chain.

API:
- `compute_pillar_ic(scores, returns, *, method, min_tickers)`
  pure cross-sectional IC for one (pillar, date) pair
- `compute_historical_ic_report(start, end, *, horizon_months,
  pillars, ...)` walks rankings.json snapshots + forward returns
  cache, aggregates into `HistoricalICReport`
- `format_ic_report(report)` human-readable text rendering
- `PillarICEntry` / `PillarICSummary` / `HistoricalICReport`
  three frozen-dataclass carriers

Spearman computed as Pearson on rank-transformed series (Spearman
1904 + Conover 1999 §5.4) to avoid pulling scipy into the dep set
(QuantRank ships without scipy; pandas' `Series.corr(method=
'spearman')` requires it transitively).

Drops with descriptive notes:
- cross-section < MIN_TICKERS_PER_DATE = 30 (Grinold-Kahn 2000 §4.2)
- None / NaN / inf in either input
- constant inputs (std=0 → correlation undefined)

Aggregates per pillar: mean / std / median / min / max / IC IR /
hit-rate. IC IR = mean/std × sqrt(n_dates) (Grinold-Kahn 2000 §4.4).
Hit-rate = fraction of dates with strictly positive IC.

Honest-baseline disclaimer per Research Report v1.0:
- IC reported here is NAIVE — no costs / slippage / sector
  neutralization. Real net-of-cost IC typically 30-50% smaller per
  McLean-Pontiff 2016 JF post-publication decay
- The historical universe MUST come from PR #274 members_at() to
  avoid survivorship bias (Hou-Xue-Zhang 2020 RFS); orchestrator
  reads the historical universe FROM rankings.json at as-of T which
  is correct by construction (snapshot itself is historical universe)
- Report is a TIME SERIES + summary, not a single headline number

Tests: 28 new (28 passing). Coverage: module constants, pure IC
computation edge cases (perfect ±1.0, constant inputs, NaN drops,
below-min cross-section, method validation), summary aggregation
math (IC IR formula pinned, hit-rate semantics), orchestrator
full-path (one date / multi-date / missing pillar / malformed JSON),
text rendering, and a live-git smoke that auto-degrades gracefully
when the gitignored price cache is absent.

Schema impact: zero. No new Pydantic / TS / snapshot field.
Production-wiring impact: zero. No compute/main.py import. The
orchestrator is purely a validation / re-baseline tool. Downstream
PRs (#2d PBO/DSR re-baseline + #2f honest-baseline report) consume
the output.

Phase 4.6 chain status: 5 of 6 items now landed (#1/#2 PR #277, #2a
PR #278, #2b PR #280, #2c this PR, #2e PR #279; #2d gate kwarg
shipped PR #275). #4 PBO/DSR re-baseline needs a warm-CI execution
to publish actual numbers; #6 honest-baseline doc closes the chain.

PHASE_STATUS_INFLIGHT.md updated per PR #237 side-file convention.
Harness doc TODO list updated: 5 of 6 items now landed.

https://claude.ai/code/session_01AGU8d6pm4u2fQQ5cebg9qa

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 27, 2026
…on + CLI (closing the chain) (#282)

Closes the Phase 4.6 honest re-validation harness structurally — 6 of
6 chain items now landed. The only remaining work is a warm-CI
execution session that fills the TBD numeric cells.

New artifacts:

- docs/research/honest-baseline-2026-05-27.md — 10-section skeleton
  with TBD numeric cells in §2 (per-pillar IC) / §3 (PBO/DSR) / §4
  (manipulation distribution) / §5 (survivorship-bias delta).
  Methodology + framing + honest-α ceiling + disclaimer ladder
  final-form. Citation block: Hou-Xue-Zhang 2020 RFS, McLean-Pontiff
  2016 JF, Bailey-Lopez de Prado 2014 JPM, Bailey-Borwein-Lopez de
  Prado-Zhu 2014 AMS Notices, Grinold-Kahn 2000, Spearman 1904,
  Conover 1999, Kissell-Glantz 2003.
- scripts/generate_honest_baseline.py — argparse CLI that wires
  compute_historical_ic_report (PR #281) + compute_manipulation_
  distribution_shift (PR #279) end-to-end. Text mode emits the
  disclaimer banner to stderr; JSON mode embeds __banner__ in the
  payload. Exit codes: 0 (report produced), 1 (input validation),
  2 (empty report — useful CI signal).
- tests/test_validation/test_generate_honest_baseline_cli.py —
  17 tests: argparse shape, _parse_date, exit codes, banner emission,
  JSON payload shape + α ceiling cells + disclaimer string, banner
  embedding, _report_to_payload with synthetic + populated manip
  reports, and a constant pin on the banner's 5 mandatory phrases
  (NAIVE / McLean-Pontiff / 2-5% / Rule 16 / S&P 500).

Honest-baseline disclaimer per Research Report v1.0 autonomous
mission:
- IC / PBO / DSR figures NAIVE — no costs / slippage / sector
  neutralization
- Real net-of-cost IC typically 30-50% smaller per McLean-Pontiff
  2016 JF 32% post-publication decay
- Honest net α ceiling: 2-5% per year (hard-coded into doc + JSON)
- Composite formula sacred (Rule 16) — never replayed retroactively
- Universe = S&P 500 (502) only
- No trade recommendation of specific tickers — methodological
  report only

Schema impact: zero. No new Pydantic / TS / snapshot field.
Production-wiring impact: zero. No compute/main.py import.

Smoke run against real repo's recent rankings.json (no live price
cache) — orchestrator walks 3 commits, returns n_dates_with_ic=0,
exit code 2 surfaces missing-cache signal cleanly.

Phase 4.6 chain status — 6 of 6 items structurally landed:
- #1/#2 universe-drift first unit: PR #277
- #2a ranking_history loader: PR #278
- #2b forward_returns loader: PR #280
- #2c per-pillar IC orchestrator: PR #281
- #2d PBO/DSR gate kwarg: PR #275 (warm-CI execution pending)
- #2e manipulation_distribution shift: PR #279
- #2f honest-baseline skeleton + CLI: this PR

Deferred follow-ups (NOT in this PR): warm-CI execution session,
--markdown writer mode, --include-pbo-dsr factor-return wiring.

PHASE_STATUS_INFLIGHT.md updated per PR #237 side-file convention.
Harness doc TODO list: 6 of 6 items now landed.

https://claude.ai/code/session_01AGU8d6pm4u2fQQ5cebg9qa

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants