Skip to content

feat(validation): Phase 4.6 task #2 first unit — universe-drift harness#277

Merged
dackclup merged 1 commit into
mainfrom
claude/phase-4.6-historical-revalidation-harness
May 27, 2026
Merged

feat(validation): Phase 4.6 task #2 first unit — universe-drift harness#277
dackclup merged 1 commit into
mainfrom
claude/phase-4.6-historical-revalidation-harness

Conversation

@dackclup
Copy link
Copy Markdown
Owner

Summary

Closes the first leg of Phase 4.6 task #2 (honest re-validation). Pure-function module + CLI scaffolding that answers the foundational question: what's the universe drift between today and any historical as-of date? Without this, every IC / PBO / DSR re-run is just numbers — you can't tell if the delta vs published baselines is from (a) survivorship correction, (b) scoring drift, or (c) real factor decay.

Future PRs in this chain layer per-pillar IC / PBO / DSR re-baselines on top.

What ships

File Lines Role
compute/validation/universe_drift.py ~150 compute_universe_drift(as_of_date, current_universe) → UniverseDriftReport. 3-way partition (added_since / removed_since / unchanged) + size + completeness diagnostic. format_drift_report() for human-readable rendering.
scripts/historical_pillar_revalidate.py ~110 CLI: --as-of, --json, --no-fetch-universe (CI/offline smoke). Exit codes 0/1/2.
tests/test_validation/test_universe_drift.py ~210 11 tests covering added/removed/unchanged correctness, partition invariants, degraded mode, future-date guard, text rendering, frozen-dataclass.
docs/research/historical-revalidation-harness.md ~80 Methodology, CLI usage, 6-item TODO list for next PRs.

CLI smoke output (real, run --no-fetch-universe --as-of 2023-06-01)

ADDED since as_of   : 1 tickers
  SMCI
REMOVED since as_of : 9 tickers
  AAP, ATVI, BIO, BLL, DISH, ETSY, LNC, WHR, ZION
  ↑ this is the SURVIVORSHIP-BIAS-CORRECTED cohort —
    current-universe-only views silently EXCLUDE these
UNCHANGED           : 6 tickers (always-in cohort)

These 9 are the exact cohort an honest backtest at as-of 2023-06-01 must include. Pre-Phase-4.6 work silently dropped them.

What this PR does NOT do (next PRs in the chain)

Per the docs file's "Future-work TODO list":

# Item Effort Blocker
1 Git-archived rankings.json time-series loader 1d
2 Forward-return computation from compute/cache/prices/ 0.5d gitignored cache
3 Per-pillar IC at historical dates 1d needs 1+2
4 PBO/DSR re-baseline via factor_passes_gates(universe_provider=...) 1d needs 3
5 manipulation_index distribution shift report 0.5d needs 1
6 docs/research/honest-baseline-2026-05-27.md revised numbers 0.5d needs 4

Total: ~4-5d focused dev to honest-baseline report.

Hard rules preserved

Verification

Check Result
pytest tests/test_validation/test_universe_drift.py ✅ 11/11 pass in 0.06s
ruff check (new files) ✅ clean
CLI smoke (clean) exit 0, expected 9-ticker REMOVED cohort
CLI degraded (--as-of 2010-01-01) exit 1, loud warning ✅
Test count delta +11

Methodology

  • Hou, Xue, Zhang (2020). "Replicating Anomalies." Review of Financial Studies 33(5):2019-2133.
  • McLean, Pontiff (2016). "Does Academic Research Destroy Stock Return Predictability?" Journal of Finance 71(1):5-32.
  • License: factual list (uncopyrightable per Feist v. Rural Tel. Service Co., 1991).

Test plan

  • ruff check . — clean
  • pytest tests/test_validation/test_universe_drift.py — 11/11 pass
  • CLI smoke output verified end-to-end
  • CI (Python lint+test, Frontend build, simulate) — expected green; simulate fires because compute/validation/ touched

Subscribe-after-open suggestion: same pattern as #271-#276.


Generated by Claude Code

Closes the first leg of honest re-validation per Research Report v1.0 §7.4
task #2. Pure-function + CLI scaffolding that answers the foundational
question: **what's the universe drift between today and any historical
as-of date?** Future PRs in this chain layer per-pillar IC / PBO / DSR
re-baselines on top.

## What ships

- `compute/validation/universe_drift.py` (NEW, ~150 LOC) —
  `compute_universe_drift(as_of_date, current_universe) -> UniverseDriftReport`.
  Wraps `historical_universe.members_at()` and returns the 3-way
  symmetric-difference partition: `added_since` / `removed_since` /
  `unchanged` plus size + completeness diagnostic. `format_drift_report()`
  renders the human-readable text block with `+N more` cap for long
  ticker lists.

- `scripts/historical_pillar_revalidate.py` (NEW) — CLI wrapper:
  - `--as-of YYYY-MM-DD` (required)
  - `--json` for downstream tooling
  - `--no-fetch-universe` for offline / CI / smoke runs
  - Exit codes: 0 (clean) / 1 (degraded `is_complete=False`) / 2 (usage)

- `tests/test_validation/test_universe_drift.py` (NEW, 11 tests):
  - Added-since contains recent additions (SMCI, DASH, FSLR, PANW)
  - Removed-since contains delistings (SVB 2023-03-13)
  - Anchor-date = zero drift
  - Pre-EARLIEST_EVENT_DATE = is_complete=False, degraded
  - Future date raises ValueError
  - Partition + size invariants (added+unchanged = current;
    removed+unchanged = historical)
  - Text rendering contains 4 required section labels
  - max_listed cap produces "+N more" suffix
  - Dataclass is frozen
  - anchor_date default = today UTC

- `docs/research/historical-revalidation-harness.md` (NEW) —
  methodology, CLI usage with sample output, acceptance criteria
  for next 6 follow-up PRs, caveats

## CLI smoke output (2023-06-01, 7-ticker synthetic universe)

    ADDED since as_of   : 1 tickers
      SMCI
    REMOVED since as_of : 9 tickers
      AAP, ATVI, BIO, BLL, DISH, ETSY, LNC, WHR, ZION
      ↑ this is the SURVIVORSHIP-BIAS-CORRECTED cohort —
        current-universe-only views silently EXCLUDE these

These 9 are the exact cohort an honest backtest at as-of 2023-06-01
must include. Current-universe-only views (= all pre-Phase-4.6 work)
silently dropped them.

## Hard rules preserved

- ✅ Rule 9 — no schema change (validation-internal module)
- ✅ Rule 16 — N/A (no scoring change)
- ✅ Rule 18 — diagnostic surface (UniverseDriftReport dataclass)
  ships in same PR as the module
- ✅ License — no new deps; CSV already on disk from PR #274
- ✅ No frontend touched

## Verification

- `pytest tests/test_validation/test_universe_drift.py` — 11/11 pass
- `ruff check` — clean
- CLI smoke `--no-fetch-universe --as-of 2023-06-01` — exits 0,
  produces expected 9-ticker REMOVED cohort
- CLI degraded `--as-of 2010-01-01` — exits 1, loud warning

## What this PR does NOT do (deferred to next PRs in chain)

Per the docs file's "Future-work TODO list":

1. Git-archived `rankings.json` time-series loader (1d)
2. Forward-return computation per ticker from cache (0.5d)
3. Per-pillar IC at historical dates (1d, needs 1+2)
4. PBO/DSR re-baseline via `factor_passes_gates(universe_provider=...)`
   (1d, needs 3)
5. `manipulation_index` distribution shift report (0.5d, needs 1)
6. `docs/research/honest-baseline-2026-05-27.md` with revised PBO/DSR
   numbers (0.5d, needs 4)

Total to honest-baseline report: ~4-5 days focused dev across a
sequence of PRs.

## Methodology citations

- Hou, Xue, Zhang (2020). "Replicating Anomalies." Review of
  Financial Studies 33(5):2019-2133.
- McLean, Pontiff (2016). "Does Academic Research Destroy Stock
  Return Predictability?" Journal of Finance 71(1):5-32.
- License: factual list (uncopyrightable per Feist 1991).
@vercel
Copy link
Copy Markdown

vercel Bot commented May 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
quantrank Ready Ready Preview, Comment May 27, 2026 12:58pm

@dackclup dackclup marked this pull request as ready for review May 27, 2026 13:32
@dackclup dackclup merged commit b70ea97 into main May 27, 2026
4 checks passed
@dackclup dackclup deleted the claude/phase-4.6-historical-revalidation-harness branch May 27, 2026 13:32
dackclup added a commit that referenced this pull request May 27, 2026
…tes (#281)

New `compute/validation/historical_ic.py` orchestrator pairs PR #278's
`load_ranking_history` (ranking at T) with PR #280's
`compute_forward_returns_batch` (realized return at T + horizon) and
computes per-pillar Spearman IC across the historical window —
closes the IC re-baseline half of the Phase 4.6 chain.

API:
- `compute_pillar_ic(scores, returns, *, method, min_tickers)`
  pure cross-sectional IC for one (pillar, date) pair
- `compute_historical_ic_report(start, end, *, horizon_months,
  pillars, ...)` walks rankings.json snapshots + forward returns
  cache, aggregates into `HistoricalICReport`
- `format_ic_report(report)` human-readable text rendering
- `PillarICEntry` / `PillarICSummary` / `HistoricalICReport`
  three frozen-dataclass carriers

Spearman computed as Pearson on rank-transformed series (Spearman
1904 + Conover 1999 §5.4) to avoid pulling scipy into the dep set
(QuantRank ships without scipy; pandas' `Series.corr(method=
'spearman')` requires it transitively).

Drops with descriptive notes:
- cross-section < MIN_TICKERS_PER_DATE = 30 (Grinold-Kahn 2000 §4.2)
- None / NaN / inf in either input
- constant inputs (std=0 → correlation undefined)

Aggregates per pillar: mean / std / median / min / max / IC IR /
hit-rate. IC IR = mean/std × sqrt(n_dates) (Grinold-Kahn 2000 §4.4).
Hit-rate = fraction of dates with strictly positive IC.

Honest-baseline disclaimer per Research Report v1.0:
- IC reported here is NAIVE — no costs / slippage / sector
  neutralization. Real net-of-cost IC typically 30-50% smaller per
  McLean-Pontiff 2016 JF post-publication decay
- The historical universe MUST come from PR #274 members_at() to
  avoid survivorship bias (Hou-Xue-Zhang 2020 RFS); orchestrator
  reads the historical universe FROM rankings.json at as-of T which
  is correct by construction (snapshot itself is historical universe)
- Report is a TIME SERIES + summary, not a single headline number

Tests: 28 new (28 passing). Coverage: module constants, pure IC
computation edge cases (perfect ±1.0, constant inputs, NaN drops,
below-min cross-section, method validation), summary aggregation
math (IC IR formula pinned, hit-rate semantics), orchestrator
full-path (one date / multi-date / missing pillar / malformed JSON),
text rendering, and a live-git smoke that auto-degrades gracefully
when the gitignored price cache is absent.

Schema impact: zero. No new Pydantic / TS / snapshot field.
Production-wiring impact: zero. No compute/main.py import. The
orchestrator is purely a validation / re-baseline tool. Downstream
PRs (#2d PBO/DSR re-baseline + #2f honest-baseline report) consume
the output.

Phase 4.6 chain status: 5 of 6 items now landed (#1/#2 PR #277, #2a
PR #278, #2b PR #280, #2c this PR, #2e PR #279; #2d gate kwarg
shipped PR #275). #4 PBO/DSR re-baseline needs a warm-CI execution
to publish actual numbers; #6 honest-baseline doc closes the chain.

PHASE_STATUS_INFLIGHT.md updated per PR #237 side-file convention.
Harness doc TODO list updated: 5 of 6 items now landed.

https://claude.ai/code/session_01AGU8d6pm4u2fQQ5cebg9qa

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 27, 2026
…on + CLI (closing the chain) (#282)

Closes the Phase 4.6 honest re-validation harness structurally — 6 of
6 chain items now landed. The only remaining work is a warm-CI
execution session that fills the TBD numeric cells.

New artifacts:

- docs/research/honest-baseline-2026-05-27.md — 10-section skeleton
  with TBD numeric cells in §2 (per-pillar IC) / §3 (PBO/DSR) / §4
  (manipulation distribution) / §5 (survivorship-bias delta).
  Methodology + framing + honest-α ceiling + disclaimer ladder
  final-form. Citation block: Hou-Xue-Zhang 2020 RFS, McLean-Pontiff
  2016 JF, Bailey-Lopez de Prado 2014 JPM, Bailey-Borwein-Lopez de
  Prado-Zhu 2014 AMS Notices, Grinold-Kahn 2000, Spearman 1904,
  Conover 1999, Kissell-Glantz 2003.
- scripts/generate_honest_baseline.py — argparse CLI that wires
  compute_historical_ic_report (PR #281) + compute_manipulation_
  distribution_shift (PR #279) end-to-end. Text mode emits the
  disclaimer banner to stderr; JSON mode embeds __banner__ in the
  payload. Exit codes: 0 (report produced), 1 (input validation),
  2 (empty report — useful CI signal).
- tests/test_validation/test_generate_honest_baseline_cli.py —
  17 tests: argparse shape, _parse_date, exit codes, banner emission,
  JSON payload shape + α ceiling cells + disclaimer string, banner
  embedding, _report_to_payload with synthetic + populated manip
  reports, and a constant pin on the banner's 5 mandatory phrases
  (NAIVE / McLean-Pontiff / 2-5% / Rule 16 / S&P 500).

Honest-baseline disclaimer per Research Report v1.0 autonomous
mission:
- IC / PBO / DSR figures NAIVE — no costs / slippage / sector
  neutralization
- Real net-of-cost IC typically 30-50% smaller per McLean-Pontiff
  2016 JF 32% post-publication decay
- Honest net α ceiling: 2-5% per year (hard-coded into doc + JSON)
- Composite formula sacred (Rule 16) — never replayed retroactively
- Universe = S&P 500 (502) only
- No trade recommendation of specific tickers — methodological
  report only

Schema impact: zero. No new Pydantic / TS / snapshot field.
Production-wiring impact: zero. No compute/main.py import.

Smoke run against real repo's recent rankings.json (no live price
cache) — orchestrator walks 3 commits, returns n_dates_with_ic=0,
exit code 2 surfaces missing-cache signal cleanly.

Phase 4.6 chain status — 6 of 6 items structurally landed:
- #1/#2 universe-drift first unit: PR #277
- #2a ranking_history loader: PR #278
- #2b forward_returns loader: PR #280
- #2c per-pillar IC orchestrator: PR #281
- #2d PBO/DSR gate kwarg: PR #275 (warm-CI execution pending)
- #2e manipulation_distribution shift: PR #279
- #2f honest-baseline skeleton + CLI: this PR

Deferred follow-ups (NOT in this PR): warm-CI execution session,
--markdown writer mode, --include-pbo-dsr factor-return wiring.

PHASE_STATUS_INFLIGHT.md updated per PR #237 side-file convention.
Harness doc TODO list: 6 of 6 items now landed.

https://claude.ai/code/session_01AGU8d6pm4u2fQQ5cebg9qa

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants