feat(validation): Phase 4.6 task #2a — rankings.json time-series loader by dackclup · Pull Request #278 · dackclup/quantrank

dackclup · 2026-05-27T13:37:49Z

Summary

Closes Phase 4.6 task #2a (per docs/research/historical-revalidation-harness.md Future-work TODO list shipped in PR #277).

Pure-function + cached git-archive walker that reconstructs the daily composite_score time series per ticker from the cron's committed rankings.json snapshots. This is the first data source for downstream IC re-baselining (#2c), manipulation-index distribution shift (#2e), and the honest-baseline report (#2f).

What ships

File	Lines	Role
`compute/validation/ranking_history.py`	~190	4 functions: `list_ranking_commits`, `dedupe_by_date`, `load_snapshot_at`, `load_ranking_history`
`tests/test_validation/test_ranking_history.py`	~210	18 tests (5 dedupe helper · 3 live-git list · 2 live-git snapshot · 5 live-git history · 1 const pin · 2 monkeypatched edge cases)

API sample

from datetime import date
from compute.validation.ranking_history import load_ranking_history

df = load_ranking_history(
    start_date=date(2026, 5, 1),
    end_date=date(2026, 5, 27),
    columns=("composite_score", "composite_score_adjusted"),
)
# df.index = MultiIndex(date, ticker); df.columns = the requested keys

Subprocess safety

Every git call uses subprocess.run with list-arg argv (no shell=True). SHA + path inputs are caller-supplied but never interpolated into a shell string — standard shell-injection vector closed.

Hard rules preserved

✅ Rule 9 — no schema change (read-only consumer of existing JSON)
✅ Rule 16 — N/A (no scoring change)
✅ Rule 18 — diagnostic ships in same PR (the loader IS the surface)
✅ License — pure stdlib + pandas + subprocess; no new deps
✅ No frontend touched
✅ Schema-compat: handles every rankings.json shape from 0.5.x onward; ticker + composite_score keys present in every historical commit

Verification

Check	Result
`pytest tests/test_validation/test_ranking_history.py`	✅ 18/18 pass in 3.95s
`pytest tests/test_validation/` (full)	✅ 83/83 pass in 50.24s (no regressions)
`ruff check`	✅ clean
Test count delta	+18

Test plan

Pure helpers (dedupe, sort, validation) — 5 unit tests
Real-git smoke tests — list / snapshot / history loaders all hit the repo's actual cron commits
Edge cases — empty window, unknown SHA, malformed JSON, missing-column rows
CI (Python lint+test, Frontend build, simulate) — expected green; simulate fires because compute/validation/ touched

Next in the chain

Per docs/research/historical-revalidation-harness.md Future-work TODO:

2b Forward-return computation per ticker from compute/cache/prices/ (0.5d) — gitignored cache; needs warm CI run
2c Per-pillar IC at historical dates (1d, needs 2a + 2b)
2d PBO/DSR re-baseline via PR feat(validation): Phase 4.6 follow-on — wire universe_provider into pbo_dsr gates #275's universe_provider kwarg (1d, needs 2c)
2e manipulation_index distribution shift report (0.5d, needs 2a)
2f docs/research/honest-baseline-2026-05-27.md revised numbers (0.5d, needs 2d)

Subscribe-after-open suggestion: same pattern as #271-#277.

Generated by Claude Code

Closes follow-up #2a from ``docs/research/historical-revalidation-harness.md``. Pure-function + cached git-archive walker that reconstructs the daily ``composite_score`` time series per ticker from the cron's committed rankings.json snapshots. This is the **first data source** for the downstream Phase 4.6 task #2 chain: - #2c per-pillar IC at historical dates → consumes this DataFrame - #2e manipulation_index distribution shift report → consumes this DataFrame - #2f honest-baseline report → consumes the chain output ## What ships - `compute/validation/ranking_history.py` (NEW, ~190 LOC): - `list_ranking_commits(start, end, path)` — enumerates (sha, date) tuples from `git log` with optional ISO-date filter window - `dedupe_by_date(commits, prefer="latest"|"earliest")` — collapses multiple commits per day to one (default: latest) - `load_snapshot_at(sha, path)` — `git show SHA:path` → parsed JSON list. Returns empty list (not raise) when SHA pre-dates the file's existence, per the standard cron's chore-commit pattern. - `load_ranking_history(start, end, columns, dedupe_dates=True)` — high-level orchestrator returning `(date, ticker)` MultiIndex DataFrame with the requested columns. Default columns = `("composite_score",)`. Caller-extensible to grab `composite_score_adjusted` / `current_price` / etc. - `tests/test_validation/test_ranking_history.py` (NEW, 18 tests): - 5 dedupe_by_date helper cases (latest / earliest / invalid / empty / sort) - 3 list_ranking_commits live-git cases (smoke / date-filter / empty-window) - 2 load_snapshot_at live-git cases (HEAD parses / unknown-SHA → empty list) - 5 load_ranking_history live-git cases (smoke / default columns / custom columns / empty window / dedupe behavior) - 1 canonical-path constant pin - 2 monkeypatched edge cases: malformed-JSON snapshot skipped; rows missing required column skipped ## Subprocess safety Every `git` call goes through `subprocess.run` with list-arg argv (no `shell=True`). SHA + path inputs are caller-supplied but never interpolated into a shell string, so the standard shell-injection vector is closed. ## Hard rules preserved - ✅ Rule 9 — no schema change (read-only consumer of existing JSON) - ✅ Rule 16 — N/A (no scoring change) - ✅ Rule 18 — diagnostic ships in same PR (the loader IS the diagnostic surface for downstream IC / distribution-shift work) - ✅ License — pure stdlib + pandas + subprocess; no new deps - ✅ No frontend touched ## Schema-compat Handles every rankings.json shape from `0.5.x` onward (pre-LedgerCraft reskin + Phase 4.5e + Phase 4h.x + Phase 3c). Top-level keys `ticker` + `composite_score` are present in every historical snapshot per ``compute/output/writer.py``. Rows missing those default-required keys get a debug log + are silently skipped (rare; would indicate a pre-rankings.json commit or a schema-break we haven't shipped). ## Verification - `pytest tests/test_validation/test_ranking_history.py` — 18/18 pass in 3.95s (5 use real git; 13 use stdlib mocks) - Full `pytest tests/test_validation/` — 83/83 pass in 50.24s (no regressions on #2 first-unit drift tests / #2-precursor pbo_dsr universe-provider tests / pre-existing ic_decay + osap_validation) - `ruff check` — clean - Test count delta: +18 ## Next in the chain Per `docs/research/historical-revalidation-harness.md` Future-work TODO: - #2b forward-return computation per ticker from `compute/cache/prices/` (0.5d) — gitignored cache; needs warm CI run - #2c per-pillar IC at historical dates (1d, needs 2a + 2b) - #2d PBO/DSR re-baseline via PR #275's `universe_provider` kwarg (1d, needs 2c) - #2e `manipulation_index` distribution shift report (0.5d, needs 2a) - #2f `docs/research/honest-baseline-2026-05-27.md` (0.5d, needs 2d) ## NOT in this PR - The DataFrame is built from git history; no IC math, no PBO/DSR call sites, no chart rendering. Downstream PRs land that. - No CLI wrapper. Direct module import expected; if a CLI feels warranted later, it can wrap this in 30 LOC + 5 tests.

vercel · 2026-05-27T13:38:05Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
quantrank	Ready	Preview, Comment	May 27, 2026 1:38pm

…n shift (#279) Closes follow-up #2e from ``docs/research/historical-revalidation-harness.md``. Consumes the #2a rankings.json time-series loader (PR #278) and reports the manipulation_index distribution shift across the cron's lifetime: mean / std / quantiles + fire-rate by band (LOW [0,20), MODERATE [20,50), HIGH [50,∞)) per date + window-end deltas. This answers the honest question — **has the cohort of flagged stocks materially changed across the cron's history?** A universe-mean drift > ~5 pts would signal Phase 4.5e weight recalibration is needed (per Q3 2026-08-19 cohort-audit gate). ## What ships - `compute/validation/manipulation_distribution.py` (NEW, ~250 LOC): - `DistributionSummary` dataclass — per-date snapshot (n / mean / std / median / q25 / q75 / q95 / max + band counts + top-3 tickers) - `ShiftReport` dataclass — aggregate across window + first-to-last deltas (mean_delta / std_delta / high_count_delta) + note - `compute_manipulation_distribution_shift(start_date, end_date, repo=None)` — main entry; pure-function wrapping #2a's loader - `format_shift_report(report, max_dates=20)` — human-readable text rendering; truncates long windows with "..." marker - `tests/test_validation/test_manipulation_distribution.py` (NEW, 11 tests): - Band-boundary constants pin (LOW < 20 ≤ MODERATE < 50 ≤ HIGH) - `_summarize_one_date` band partition correctness + empty + top-3 ordering - `compute_shift` empty-window / all-null-window / single-date / two-date-with-deltas paths (monkeypatched loader) - `format_shift_report` header + delta + cap rendering - Live-git smoke against the repo's recent cron commits ## Real-world artifact (live repo, window 2026-05-01 → 2026-05-27) 3 cron dates available on main: date n mean std p75 p95 HIGH top 2026-05-22 502 4.38 9.28 5.00 25.00 2 SMCI=84.0, WAT=64.0, NVDA=48.0 2026-05-23 502 4.38 9.28 5.00 25.00 2 SMCI=84.0, WAT=64.0, NVDA=48.0 2026-05-26 502 4.38 9.28 5.00 25.00 2 SMCI=84.0, WAT=64.0, NVDA=48.0 Δmean=+0.00, Δstd=+0.00, ΔHIGH=+0 (2 → 2) Distribution is **stable** across the window (expected for 5-day horizon — pillar inputs change slowly). Top-3 invariant: SMCI 84, WAT 64, NVDA 48 — matches Phase 4.5f production-verified ``ManipulationRiskCard`` fire-rate snapshot. Universe mean 4.38 sits solidly in LOW band; only 2 tickers in HIGH band (Phase 4.5f spec target: 1-3 stocks). No recalibration signal. A longer window (≥ 90 days) would let this report detect drift; the chain is now ready when cron history accumulates. ## Hard rules preserved - ✅ Rule 9 — no schema change (read-only consumer of rankings.json) - ✅ Rule 16 — N/A (no scoring change) - ✅ Rule 18 — diagnostic surface ships in same PR - ✅ License — pure stdlib + pandas; no new deps - ✅ Universe S&P 500 only ## Verification - `pytest tests/test_validation/test_manipulation_distribution.py` — 11/11 pass in 0.70s - `ruff check` — clean (linter trimmed unused imports + sorted) - Live-git artifact above generated cleanly ## Next in the chain Per ``docs/research/historical-revalidation-harness.md`` Future-work TODO: - #2b forward-return computation from `compute/cache/prices/` (0.5d) — gitignored cache; CI-only data - #2c per-pillar IC at historical dates (1d, needs 2a + 2b) - #2d PBO/DSR re-baseline via PR #275's `universe_provider` kwarg (1d, needs 2c) - #2f honest-baseline report (0.5d, needs 2d) #2e (this PR) is independent of 2b/2c — could ship before, in parallel with, or after them. Co-authored-by: Claude <noreply@anthropic.com>

…ignored price cache (#280) New `compute/validation/forward_returns.py` reads the gitignored `compute/cache/prices/<TICKER>.parquet` cache (written by `compute.ingest.prices.fetch_prices`) and computes close-to-close N-month forward total return at any as-of date. Pairs with PR #278's `load_ranking_history()` to close the honest IC re-baseline loop (ranking at T from #2a; realized return at T+horizon from this PR). API: - `compute_forward_return(ticker, as_of_date, horizon_months, *, cache_dir=None) -> float | None` - `compute_forward_return_detailed(...) -> ForwardReturnResult` carrying the actual trading dates / closes / note - `compute_forward_returns_batch(...)` universe batch wrapper - `coverage_report(...)` failure-mode aggregator for the Hou-Xue-Zhang 2020 RFS-style coverage check Edge cases handled: missing parquet → None; no close column → None; as-of doesn't snap within 5d → None; horizon past last cached row → censored = None; start_close ≤ 0 or NaN → None; end_close NaN → None; non-DatetimeIndex parquet → coerced back via `pd.to_datetime`. Source semantics: prefers `Adj Close` (dividend-adjusted) over `Close`; NAIVE returns (no costs / slippage); survivorship-bias correction is NOT done here — callers pair with PR #274's `members_at()` for honest universe construction (Hou-Xue-Zhang 2020 RFS). Tests: 19 new + 1 live-cache smoke (auto-skipped without warm cache). Synthetic OHLCV parquets in `tmp_path` via `cache_dir=tmp_path`. Mirrors PR #278's synthetic-fixture + live-smoke pattern. Schema impact: zero (read-only consumer of existing cache shape). Production-wiring impact: zero (validation tool; no `compute/main.py` hook). #2c per-pillar IC re-baseline will be the first consumer. Honest-baseline disclaimer per Research Report v1.0 autonomous mission constraint: outputs feed IC/DSR/PBO re-baselining, NOT a backtest. Downstream α claims must net frictions (≥30bp/leg), cite McLean- Pontiff 2016 32% decay, and cap at 2-5% net per the ceiling. PHASE_STATUS_INFLIGHT.md updated per PR #237 side-file convention. Harness doc TODO list updated: 4 of 6 items now landed. https://claude.ai/code/session_01AGU8d6pm4u2fQQ5cebg9qa Co-authored-by: Claude <noreply@anthropic.com>

…tes (#281) New `compute/validation/historical_ic.py` orchestrator pairs PR #278's `load_ranking_history` (ranking at T) with PR #280's `compute_forward_returns_batch` (realized return at T + horizon) and computes per-pillar Spearman IC across the historical window — closes the IC re-baseline half of the Phase 4.6 chain. API: - `compute_pillar_ic(scores, returns, *, method, min_tickers)` pure cross-sectional IC for one (pillar, date) pair - `compute_historical_ic_report(start, end, *, horizon_months, pillars, ...)` walks rankings.json snapshots + forward returns cache, aggregates into `HistoricalICReport` - `format_ic_report(report)` human-readable text rendering - `PillarICEntry` / `PillarICSummary` / `HistoricalICReport` three frozen-dataclass carriers Spearman computed as Pearson on rank-transformed series (Spearman 1904 + Conover 1999 §5.4) to avoid pulling scipy into the dep set (QuantRank ships without scipy; pandas' `Series.corr(method= 'spearman')` requires it transitively). Drops with descriptive notes: - cross-section < MIN_TICKERS_PER_DATE = 30 (Grinold-Kahn 2000 §4.2) - None / NaN / inf in either input - constant inputs (std=0 → correlation undefined) Aggregates per pillar: mean / std / median / min / max / IC IR / hit-rate. IC IR = mean/std × sqrt(n_dates) (Grinold-Kahn 2000 §4.4). Hit-rate = fraction of dates with strictly positive IC. Honest-baseline disclaimer per Research Report v1.0: - IC reported here is NAIVE — no costs / slippage / sector neutralization. Real net-of-cost IC typically 30-50% smaller per McLean-Pontiff 2016 JF post-publication decay - The historical universe MUST come from PR #274 members_at() to avoid survivorship bias (Hou-Xue-Zhang 2020 RFS); orchestrator reads the historical universe FROM rankings.json at as-of T which is correct by construction (snapshot itself is historical universe) - Report is a TIME SERIES + summary, not a single headline number Tests: 28 new (28 passing). Coverage: module constants, pure IC computation edge cases (perfect ±1.0, constant inputs, NaN drops, below-min cross-section, method validation), summary aggregation math (IC IR formula pinned, hit-rate semantics), orchestrator full-path (one date / multi-date / missing pillar / malformed JSON), text rendering, and a live-git smoke that auto-degrades gracefully when the gitignored price cache is absent. Schema impact: zero. No new Pydantic / TS / snapshot field. Production-wiring impact: zero. No compute/main.py import. The orchestrator is purely a validation / re-baseline tool. Downstream PRs (#2d PBO/DSR re-baseline + #2f honest-baseline report) consume the output. Phase 4.6 chain status: 5 of 6 items now landed (#1/#2 PR #277, #2a PR #278, #2b PR #280, #2c this PR, #2e PR #279; #2d gate kwarg shipped PR #275). #4 PBO/DSR re-baseline needs a warm-CI execution to publish actual numbers; #6 honest-baseline doc closes the chain. PHASE_STATUS_INFLIGHT.md updated per PR #237 side-file convention. Harness doc TODO list updated: 5 of 6 items now landed. https://claude.ai/code/session_01AGU8d6pm4u2fQQ5cebg9qa Co-authored-by: Claude <noreply@anthropic.com>

…on + CLI (closing the chain) (#282) Closes the Phase 4.6 honest re-validation harness structurally — 6 of 6 chain items now landed. The only remaining work is a warm-CI execution session that fills the TBD numeric cells. New artifacts: - docs/research/honest-baseline-2026-05-27.md — 10-section skeleton with TBD numeric cells in §2 (per-pillar IC) / §3 (PBO/DSR) / §4 (manipulation distribution) / §5 (survivorship-bias delta). Methodology + framing + honest-α ceiling + disclaimer ladder final-form. Citation block: Hou-Xue-Zhang 2020 RFS, McLean-Pontiff 2016 JF, Bailey-Lopez de Prado 2014 JPM, Bailey-Borwein-Lopez de Prado-Zhu 2014 AMS Notices, Grinold-Kahn 2000, Spearman 1904, Conover 1999, Kissell-Glantz 2003. - scripts/generate_honest_baseline.py — argparse CLI that wires compute_historical_ic_report (PR #281) + compute_manipulation_ distribution_shift (PR #279) end-to-end. Text mode emits the disclaimer banner to stderr; JSON mode embeds __banner__ in the payload. Exit codes: 0 (report produced), 1 (input validation), 2 (empty report — useful CI signal). - tests/test_validation/test_generate_honest_baseline_cli.py — 17 tests: argparse shape, _parse_date, exit codes, banner emission, JSON payload shape + α ceiling cells + disclaimer string, banner embedding, _report_to_payload with synthetic + populated manip reports, and a constant pin on the banner's 5 mandatory phrases (NAIVE / McLean-Pontiff / 2-5% / Rule 16 / S&P 500). Honest-baseline disclaimer per Research Report v1.0 autonomous mission: - IC / PBO / DSR figures NAIVE — no costs / slippage / sector neutralization - Real net-of-cost IC typically 30-50% smaller per McLean-Pontiff 2016 JF 32% post-publication decay - Honest net α ceiling: 2-5% per year (hard-coded into doc + JSON) - Composite formula sacred (Rule 16) — never replayed retroactively - Universe = S&P 500 (502) only - No trade recommendation of specific tickers — methodological report only Schema impact: zero. No new Pydantic / TS / snapshot field. Production-wiring impact: zero. No compute/main.py import. Smoke run against real repo's recent rankings.json (no live price cache) — orchestrator walks 3 commits, returns n_dates_with_ic=0, exit code 2 surfaces missing-cache signal cleanly. Phase 4.6 chain status — 6 of 6 items structurally landed: - #1/#2 universe-drift first unit: PR #277 - #2a ranking_history loader: PR #278 - #2b forward_returns loader: PR #280 - #2c per-pillar IC orchestrator: PR #281 - #2d PBO/DSR gate kwarg: PR #275 (warm-CI execution pending) - #2e manipulation_distribution shift: PR #279 - #2f honest-baseline skeleton + CLI: this PR Deferred follow-ups (NOT in this PR): warm-CI execution session, --markdown writer mode, --include-pbo-dsr factor-return wiring. PHASE_STATUS_INFLIGHT.md updated per PR #237 side-file convention. Harness doc TODO list: 6 of 6 items now landed. https://claude.ai/code/session_01AGU8d6pm4u2fQQ5cebg9qa Co-authored-by: Claude <noreply@anthropic.com>

vercel Bot deployed to Preview May 27, 2026 13:38 View deployment

dackclup marked this pull request as ready for review May 27, 2026 13:51

dackclup merged commit e169aba into main May 27, 2026
4 checks passed

dackclup deleted the claude/phase-4.6-ranking-history-loader branch May 27, 2026 13:51

dackclup mentioned this pull request May 27, 2026

feat(validation): Phase 4.6 task #2e — manipulation_index distribution shift #279

Merged

dackclup mentioned this pull request May 27, 2026

feat(validation): Phase 4.6 task #2b — forward-return loader from gitignored price cache #280

Merged

5 tasks

dackclup mentioned this pull request May 27, 2026

feat(validation): Phase 4.6 task #2c — per-pillar IC at historical dates #281

Merged

5 tasks

dackclup mentioned this pull request May 27, 2026

docs+scripts(validation): Phase 4.6 task #2f — honest-baseline skeleton + CLI (closing the chain) #282

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(validation): Phase 4.6 task #2a — rankings.json time-series loader#278

feat(validation): Phase 4.6 task #2a — rankings.json time-series loader#278
dackclup merged 1 commit into
mainfrom
claude/phase-4.6-ranking-history-loader

dackclup commented May 27, 2026

Uh oh!

vercel Bot commented May 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dackclup commented May 27, 2026

Summary

What ships

API sample

Subprocess safety

Hard rules preserved

Verification

Test plan

Next in the chain

Uh oh!

vercel Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented May 27, 2026 •

edited

Loading