feat(validation): Phase 4.6 task #2 first unit — universe-drift harness by dackclup · Pull Request #277 · dackclup/quantrank

dackclup · 2026-05-27T12:56:57Z

Summary

Closes the first leg of Phase 4.6 task #2 (honest re-validation). Pure-function module + CLI scaffolding that answers the foundational question: what's the universe drift between today and any historical as-of date? Without this, every IC / PBO / DSR re-run is just numbers — you can't tell if the delta vs published baselines is from (a) survivorship correction, (b) scoring drift, or (c) real factor decay.

Future PRs in this chain layer per-pillar IC / PBO / DSR re-baselines on top.

What ships

File	Lines	Role
`compute/validation/universe_drift.py`	~150	`compute_universe_drift(as_of_date, current_universe) → UniverseDriftReport`. 3-way partition (`added_since` / `removed_since` / `unchanged`) + size + completeness diagnostic. `format_drift_report()` for human-readable rendering.
`scripts/historical_pillar_revalidate.py`	~110	CLI: `--as-of`, `--json`, `--no-fetch-universe` (CI/offline smoke). Exit codes 0/1/2.
`tests/test_validation/test_universe_drift.py`	~210	11 tests covering added/removed/unchanged correctness, partition invariants, degraded mode, future-date guard, text rendering, frozen-dataclass.
`docs/research/historical-revalidation-harness.md`	~80	Methodology, CLI usage, 6-item TODO list for next PRs.

CLI smoke output (real, run `--no-fetch-universe --as-of 2023-06-01`)

ADDED since as_of   : 1 tickers
  SMCI
REMOVED since as_of : 9 tickers
  AAP, ATVI, BIO, BLL, DISH, ETSY, LNC, WHR, ZION
  ↑ this is the SURVIVORSHIP-BIAS-CORRECTED cohort —
    current-universe-only views silently EXCLUDE these
UNCHANGED           : 6 tickers (always-in cohort)

These 9 are the exact cohort an honest backtest at as-of 2023-06-01 must include. Pre-Phase-4.6 work silently dropped them.

What this PR does NOT do (next PRs in the chain)

Per the docs file's "Future-work TODO list":

#	Item	Effort	Blocker
1	Git-archived `rankings.json` time-series loader	1d	—
2	Forward-return computation from `compute/cache/prices/`	0.5d	gitignored cache
3	Per-pillar IC at historical dates	1d	needs 1+2
4	PBO/DSR re-baseline via `factor_passes_gates(universe_provider=...)`	1d	needs 3
5	`manipulation_index` distribution shift report	0.5d	needs 1
6	`docs/research/honest-baseline-2026-05-27.md` revised numbers	0.5d	needs 4

Total: ~4-5d focused dev to honest-baseline report.

Hard rules preserved

✅ Rule 9 — no schema change (validation-internal module)
✅ Rule 16 — N/A (no scoring change)
✅ Rule 18 — diagnostic surface ships in same PR as the module
✅ License — no new deps; CSV already on disk from PR feat(universe): Phase 4.6 — survivorship-bias fix (historical S&P 500 membership) #274
✅ No frontend touched

Verification

Check	Result
`pytest tests/test_validation/test_universe_drift.py`	✅ 11/11 pass in 0.06s
`ruff check` (new files)	✅ clean
CLI smoke (clean)	exit 0, expected 9-ticker REMOVED cohort
CLI degraded (`--as-of 2010-01-01`)	exit 1, loud warning ✅
Test count delta	+11

Methodology

Hou, Xue, Zhang (2020). "Replicating Anomalies." Review of Financial Studies 33(5):2019-2133.
McLean, Pontiff (2016). "Does Academic Research Destroy Stock Return Predictability?" Journal of Finance 71(1):5-32.
License: factual list (uncopyrightable per Feist v. Rural Tel. Service Co., 1991).

Test plan

ruff check . — clean
pytest tests/test_validation/test_universe_drift.py — 11/11 pass
CLI smoke output verified end-to-end
CI (Python lint+test, Frontend build, simulate) — expected green; simulate fires because compute/validation/ touched

Subscribe-after-open suggestion: same pattern as #271-#276.

Generated by Claude Code

Closes the first leg of honest re-validation per Research Report v1.0 §7.4 task #2. Pure-function + CLI scaffolding that answers the foundational question: **what's the universe drift between today and any historical as-of date?** Future PRs in this chain layer per-pillar IC / PBO / DSR re-baselines on top. ## What ships - `compute/validation/universe_drift.py` (NEW, ~150 LOC) — `compute_universe_drift(as_of_date, current_universe) -> UniverseDriftReport`. Wraps `historical_universe.members_at()` and returns the 3-way symmetric-difference partition: `added_since` / `removed_since` / `unchanged` plus size + completeness diagnostic. `format_drift_report()` renders the human-readable text block with `+N more` cap for long ticker lists. - `scripts/historical_pillar_revalidate.py` (NEW) — CLI wrapper: - `--as-of YYYY-MM-DD` (required) - `--json` for downstream tooling - `--no-fetch-universe` for offline / CI / smoke runs - Exit codes: 0 (clean) / 1 (degraded `is_complete=False`) / 2 (usage) - `tests/test_validation/test_universe_drift.py` (NEW, 11 tests): - Added-since contains recent additions (SMCI, DASH, FSLR, PANW) - Removed-since contains delistings (SVB 2023-03-13) - Anchor-date = zero drift - Pre-EARLIEST_EVENT_DATE = is_complete=False, degraded - Future date raises ValueError - Partition + size invariants (added+unchanged = current; removed+unchanged = historical) - Text rendering contains 4 required section labels - max_listed cap produces "+N more" suffix - Dataclass is frozen - anchor_date default = today UTC - `docs/research/historical-revalidation-harness.md` (NEW) — methodology, CLI usage with sample output, acceptance criteria for next 6 follow-up PRs, caveats ## CLI smoke output (2023-06-01, 7-ticker synthetic universe) ADDED since as_of : 1 tickers SMCI REMOVED since as_of : 9 tickers AAP, ATVI, BIO, BLL, DISH, ETSY, LNC, WHR, ZION ↑ this is the SURVIVORSHIP-BIAS-CORRECTED cohort — current-universe-only views silently EXCLUDE these These 9 are the exact cohort an honest backtest at as-of 2023-06-01 must include. Current-universe-only views (= all pre-Phase-4.6 work) silently dropped them. ## Hard rules preserved - ✅ Rule 9 — no schema change (validation-internal module) - ✅ Rule 16 — N/A (no scoring change) - ✅ Rule 18 — diagnostic surface (UniverseDriftReport dataclass) ships in same PR as the module - ✅ License — no new deps; CSV already on disk from PR #274 - ✅ No frontend touched ## Verification - `pytest tests/test_validation/test_universe_drift.py` — 11/11 pass - `ruff check` — clean - CLI smoke `--no-fetch-universe --as-of 2023-06-01` — exits 0, produces expected 9-ticker REMOVED cohort - CLI degraded `--as-of 2010-01-01` — exits 1, loud warning ## What this PR does NOT do (deferred to next PRs in chain) Per the docs file's "Future-work TODO list": 1. Git-archived `rankings.json` time-series loader (1d) 2. Forward-return computation per ticker from cache (0.5d) 3. Per-pillar IC at historical dates (1d, needs 1+2) 4. PBO/DSR re-baseline via `factor_passes_gates(universe_provider=...)` (1d, needs 3) 5. `manipulation_index` distribution shift report (0.5d, needs 1) 6. `docs/research/honest-baseline-2026-05-27.md` with revised PBO/DSR numbers (0.5d, needs 4) Total to honest-baseline report: ~4-5 days focused dev across a sequence of PRs. ## Methodology citations - Hou, Xue, Zhang (2020). "Replicating Anomalies." Review of Financial Studies 33(5):2019-2133. - McLean, Pontiff (2016). "Does Academic Research Destroy Stock Return Predictability?" Journal of Finance 71(1):5-32. - License: factual list (uncopyrightable per Feist 1991).

vercel · 2026-05-27T12:57:58Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
quantrank	Ready	Preview, Comment	May 27, 2026 12:58pm

…tes (#281) New `compute/validation/historical_ic.py` orchestrator pairs PR #278's `load_ranking_history` (ranking at T) with PR #280's `compute_forward_returns_batch` (realized return at T + horizon) and computes per-pillar Spearman IC across the historical window — closes the IC re-baseline half of the Phase 4.6 chain. API: - `compute_pillar_ic(scores, returns, *, method, min_tickers)` pure cross-sectional IC for one (pillar, date) pair - `compute_historical_ic_report(start, end, *, horizon_months, pillars, ...)` walks rankings.json snapshots + forward returns cache, aggregates into `HistoricalICReport` - `format_ic_report(report)` human-readable text rendering - `PillarICEntry` / `PillarICSummary` / `HistoricalICReport` three frozen-dataclass carriers Spearman computed as Pearson on rank-transformed series (Spearman 1904 + Conover 1999 §5.4) to avoid pulling scipy into the dep set (QuantRank ships without scipy; pandas' `Series.corr(method= 'spearman')` requires it transitively). Drops with descriptive notes: - cross-section < MIN_TICKERS_PER_DATE = 30 (Grinold-Kahn 2000 §4.2) - None / NaN / inf in either input - constant inputs (std=0 → correlation undefined) Aggregates per pillar: mean / std / median / min / max / IC IR / hit-rate. IC IR = mean/std × sqrt(n_dates) (Grinold-Kahn 2000 §4.4). Hit-rate = fraction of dates with strictly positive IC. Honest-baseline disclaimer per Research Report v1.0: - IC reported here is NAIVE — no costs / slippage / sector neutralization. Real net-of-cost IC typically 30-50% smaller per McLean-Pontiff 2016 JF post-publication decay - The historical universe MUST come from PR #274 members_at() to avoid survivorship bias (Hou-Xue-Zhang 2020 RFS); orchestrator reads the historical universe FROM rankings.json at as-of T which is correct by construction (snapshot itself is historical universe) - Report is a TIME SERIES + summary, not a single headline number Tests: 28 new (28 passing). Coverage: module constants, pure IC computation edge cases (perfect ±1.0, constant inputs, NaN drops, below-min cross-section, method validation), summary aggregation math (IC IR formula pinned, hit-rate semantics), orchestrator full-path (one date / multi-date / missing pillar / malformed JSON), text rendering, and a live-git smoke that auto-degrades gracefully when the gitignored price cache is absent. Schema impact: zero. No new Pydantic / TS / snapshot field. Production-wiring impact: zero. No compute/main.py import. The orchestrator is purely a validation / re-baseline tool. Downstream PRs (#2d PBO/DSR re-baseline + #2f honest-baseline report) consume the output. Phase 4.6 chain status: 5 of 6 items now landed (#1/#2 PR #277, #2a PR #278, #2b PR #280, #2c this PR, #2e PR #279; #2d gate kwarg shipped PR #275). #4 PBO/DSR re-baseline needs a warm-CI execution to publish actual numbers; #6 honest-baseline doc closes the chain. PHASE_STATUS_INFLIGHT.md updated per PR #237 side-file convention. Harness doc TODO list updated: 5 of 6 items now landed. https://claude.ai/code/session_01AGU8d6pm4u2fQQ5cebg9qa Co-authored-by: Claude <noreply@anthropic.com>

…on + CLI (closing the chain) (#282) Closes the Phase 4.6 honest re-validation harness structurally — 6 of 6 chain items now landed. The only remaining work is a warm-CI execution session that fills the TBD numeric cells. New artifacts: - docs/research/honest-baseline-2026-05-27.md — 10-section skeleton with TBD numeric cells in §2 (per-pillar IC) / §3 (PBO/DSR) / §4 (manipulation distribution) / §5 (survivorship-bias delta). Methodology + framing + honest-α ceiling + disclaimer ladder final-form. Citation block: Hou-Xue-Zhang 2020 RFS, McLean-Pontiff 2016 JF, Bailey-Lopez de Prado 2014 JPM, Bailey-Borwein-Lopez de Prado-Zhu 2014 AMS Notices, Grinold-Kahn 2000, Spearman 1904, Conover 1999, Kissell-Glantz 2003. - scripts/generate_honest_baseline.py — argparse CLI that wires compute_historical_ic_report (PR #281) + compute_manipulation_ distribution_shift (PR #279) end-to-end. Text mode emits the disclaimer banner to stderr; JSON mode embeds __banner__ in the payload. Exit codes: 0 (report produced), 1 (input validation), 2 (empty report — useful CI signal). - tests/test_validation/test_generate_honest_baseline_cli.py — 17 tests: argparse shape, _parse_date, exit codes, banner emission, JSON payload shape + α ceiling cells + disclaimer string, banner embedding, _report_to_payload with synthetic + populated manip reports, and a constant pin on the banner's 5 mandatory phrases (NAIVE / McLean-Pontiff / 2-5% / Rule 16 / S&P 500). Honest-baseline disclaimer per Research Report v1.0 autonomous mission: - IC / PBO / DSR figures NAIVE — no costs / slippage / sector neutralization - Real net-of-cost IC typically 30-50% smaller per McLean-Pontiff 2016 JF 32% post-publication decay - Honest net α ceiling: 2-5% per year (hard-coded into doc + JSON) - Composite formula sacred (Rule 16) — never replayed retroactively - Universe = S&P 500 (502) only - No trade recommendation of specific tickers — methodological report only Schema impact: zero. No new Pydantic / TS / snapshot field. Production-wiring impact: zero. No compute/main.py import. Smoke run against real repo's recent rankings.json (no live price cache) — orchestrator walks 3 commits, returns n_dates_with_ic=0, exit code 2 surfaces missing-cache signal cleanly. Phase 4.6 chain status — 6 of 6 items structurally landed: - #1/#2 universe-drift first unit: PR #277 - #2a ranking_history loader: PR #278 - #2b forward_returns loader: PR #280 - #2c per-pillar IC orchestrator: PR #281 - #2d PBO/DSR gate kwarg: PR #275 (warm-CI execution pending) - #2e manipulation_distribution shift: PR #279 - #2f honest-baseline skeleton + CLI: this PR Deferred follow-ups (NOT in this PR): warm-CI execution session, --markdown writer mode, --include-pbo-dsr factor-return wiring. PHASE_STATUS_INFLIGHT.md updated per PR #237 side-file convention. Harness doc TODO list: 6 of 6 items now landed. https://claude.ai/code/session_01AGU8d6pm4u2fQQ5cebg9qa Co-authored-by: Claude <noreply@anthropic.com>

vercel Bot deployed to Preview May 27, 2026 12:57 View deployment

dackclup marked this pull request as ready for review May 27, 2026 13:32

dackclup merged commit b70ea97 into main May 27, 2026
4 checks passed

dackclup deleted the claude/phase-4.6-historical-revalidation-harness branch May 27, 2026 13:32

dackclup mentioned this pull request May 27, 2026

docs+scripts(validation): Phase 4.6 task #2f — honest-baseline skeleton + CLI (closing the chain) #282

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(validation): Phase 4.6 task #2 first unit — universe-drift harness#277

feat(validation): Phase 4.6 task #2 first unit — universe-drift harness#277
dackclup merged 1 commit into
mainfrom
claude/phase-4.6-historical-revalidation-harness

dackclup commented May 27, 2026

Uh oh!

vercel Bot commented May 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dackclup commented May 27, 2026

Summary

What ships

CLI smoke output (real, run --no-fetch-universe --as-of 2023-06-01)

What this PR does NOT do (next PRs in the chain)

Hard rules preserved

Verification

Methodology

Test plan

Uh oh!

vercel Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLI smoke output (real, run `--no-fetch-universe --as-of 2023-06-01`)

vercel Bot commented May 27, 2026 •

edited

Loading