feat(validation): Phase 4.6 task #2e — manipulation_index distribution shift by dackclup · Pull Request #279 · dackclup/quantrank

dackclup · 2026-05-27T13:57:28Z

Summary

Closes Phase 4.6 task #2e — the manipulation_index distribution shift report. Consumes #2a's rankings.json time-series loader (PR #278) and answers the honest question: has the cohort of flagged stocks materially changed across the cron's history?

A universe-mean drift > ~5 pts would signal Phase 4.5e weight recalibration is needed (per Q3 2026-08-19 cohort-audit gate).

What ships

File	Lines	Role
`compute/validation/manipulation_distribution.py`	~250	2 dataclasses (`DistributionSummary`, `ShiftReport`) + `compute_manipulation_distribution_shift()` + `format_shift_report()`
`tests/test_validation/test_manipulation_distribution.py`	~225	11 tests covering band boundaries, partition correctness, empty/null/single/multi-date paths, text rendering, live-git smoke

Real-world artifact (live, window 2026-05-01 → 2026-05-27)

Manipulation-index distribution shift report
  window           : [2026-05-01, 2026-05-27]
  n_dates          : 3
  n_unique_tickers : 503
  note             : first=2026-05-22, last=2026-05-26

  Window-end deltas (last_date − first_date):
    Δmean       : +0.00
    Δstd        : +0.00
    ΔHIGH count : +0 (2 → 2)

  Per-date summary:
    date          n   mean    std    p75    p95   HIGH   top
    2026-05-22  502   4.38   9.28   5.00  25.00      2   SMCI=84.0, WAT=64.0, NVDA=48.0
    2026-05-23  502   4.38   9.28   5.00  25.00      2   SMCI=84.0, WAT=64.0, NVDA=48.0
    2026-05-26  502   4.38   9.28   5.00  25.00      2   SMCI=84.0, WAT=64.0, NVDA=48.0

Findings:

Distribution stable across the 5-day window (expected — pillars change slowly)
Top-3 invariant: SMCI 84, WAT 64, NVDA 48 — matches Phase 4.5f production-verified ManipulationRiskCard fire-rate snapshot
Universe mean 4.38 → LOW band; HIGH count 2 → within Phase 4.5f spec target (1-3 stocks)
No recalibration signal at this window length
A longer window (≥ 90 days of cron history) would let this report detect drift; the chain is ready when cron accumulates

Hard rules preserved

✅ Rule 9 — no schema change (read-only consumer)
✅ Rule 16 — N/A (no scoring change)
✅ Rule 18 — diagnostic surface ships in same PR
✅ License — pure stdlib + pandas; no new deps

Verification

Check	Result
`pytest tests/test_validation/test_manipulation_distribution.py`	✅ 11/11 pass in 0.70s
`ruff check`	✅ clean
Live-git artifact	✅ generated cleanly above
Test count delta	+11

Next in chain (independent of #2b/2c/2d)

#	Item	Effort	Blocker
2b	Forward-return from `compute/cache/prices/`	0.5d	gitignored cache
2c	Per-pillar IC at historical dates	1d	needs 2a + 2b
2d	PBO/DSR re-baseline via PR #275 kwarg	1d	needs 2c
2f	Honest-baseline report	0.5d	needs 2d

Subscribe-after-open suggestion: same pattern as #271-#278.

Generated by Claude Code

…n shift Closes follow-up #2e from ``docs/research/historical-revalidation-harness.md``. Consumes the #2a rankings.json time-series loader (PR #278) and reports the manipulation_index distribution shift across the cron's lifetime: mean / std / quantiles + fire-rate by band (LOW [0,20), MODERATE [20,50), HIGH [50,∞)) per date + window-end deltas. This answers the honest question — **has the cohort of flagged stocks materially changed across the cron's history?** A universe-mean drift > ~5 pts would signal Phase 4.5e weight recalibration is needed (per Q3 2026-08-19 cohort-audit gate). ## What ships - `compute/validation/manipulation_distribution.py` (NEW, ~250 LOC): - `DistributionSummary` dataclass — per-date snapshot (n / mean / std / median / q25 / q75 / q95 / max + band counts + top-3 tickers) - `ShiftReport` dataclass — aggregate across window + first-to-last deltas (mean_delta / std_delta / high_count_delta) + note - `compute_manipulation_distribution_shift(start_date, end_date, repo=None)` — main entry; pure-function wrapping #2a's loader - `format_shift_report(report, max_dates=20)` — human-readable text rendering; truncates long windows with "..." marker - `tests/test_validation/test_manipulation_distribution.py` (NEW, 11 tests): - Band-boundary constants pin (LOW < 20 ≤ MODERATE < 50 ≤ HIGH) - `_summarize_one_date` band partition correctness + empty + top-3 ordering - `compute_shift` empty-window / all-null-window / single-date / two-date-with-deltas paths (monkeypatched loader) - `format_shift_report` header + delta + cap rendering - Live-git smoke against the repo's recent cron commits ## Real-world artifact (live repo, window 2026-05-01 → 2026-05-27) 3 cron dates available on main: date n mean std p75 p95 HIGH top 2026-05-22 502 4.38 9.28 5.00 25.00 2 SMCI=84.0, WAT=64.0, NVDA=48.0 2026-05-23 502 4.38 9.28 5.00 25.00 2 SMCI=84.0, WAT=64.0, NVDA=48.0 2026-05-26 502 4.38 9.28 5.00 25.00 2 SMCI=84.0, WAT=64.0, NVDA=48.0 Δmean=+0.00, Δstd=+0.00, ΔHIGH=+0 (2 → 2) Distribution is **stable** across the window (expected for 5-day horizon — pillar inputs change slowly). Top-3 invariant: SMCI 84, WAT 64, NVDA 48 — matches Phase 4.5f production-verified ``ManipulationRiskCard`` fire-rate snapshot. Universe mean 4.38 sits solidly in LOW band; only 2 tickers in HIGH band (Phase 4.5f spec target: 1-3 stocks). No recalibration signal. A longer window (≥ 90 days) would let this report detect drift; the chain is now ready when cron history accumulates. ## Hard rules preserved - ✅ Rule 9 — no schema change (read-only consumer of rankings.json) - ✅ Rule 16 — N/A (no scoring change) - ✅ Rule 18 — diagnostic surface ships in same PR - ✅ License — pure stdlib + pandas; no new deps - ✅ Universe S&P 500 only ## Verification - `pytest tests/test_validation/test_manipulation_distribution.py` — 11/11 pass in 0.70s - `ruff check` — clean (linter trimmed unused imports + sorted) - Live-git artifact above generated cleanly ## Next in the chain Per ``docs/research/historical-revalidation-harness.md`` Future-work TODO: - #2b forward-return computation from `compute/cache/prices/` (0.5d) — gitignored cache; CI-only data - #2c per-pillar IC at historical dates (1d, needs 2a + 2b) - #2d PBO/DSR re-baseline via PR #275's `universe_provider` kwarg (1d, needs 2c) - #2f honest-baseline report (0.5d, needs 2d) #2e (this PR) is independent of 2b/2c — could ship before, in parallel with, or after them.

vercel · 2026-05-27T13:57:34Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
quantrank	Ready	Preview, Comment	May 27, 2026 1:58pm

…tes (#281) New `compute/validation/historical_ic.py` orchestrator pairs PR #278's `load_ranking_history` (ranking at T) with PR #280's `compute_forward_returns_batch` (realized return at T + horizon) and computes per-pillar Spearman IC across the historical window — closes the IC re-baseline half of the Phase 4.6 chain. API: - `compute_pillar_ic(scores, returns, *, method, min_tickers)` pure cross-sectional IC for one (pillar, date) pair - `compute_historical_ic_report(start, end, *, horizon_months, pillars, ...)` walks rankings.json snapshots + forward returns cache, aggregates into `HistoricalICReport` - `format_ic_report(report)` human-readable text rendering - `PillarICEntry` / `PillarICSummary` / `HistoricalICReport` three frozen-dataclass carriers Spearman computed as Pearson on rank-transformed series (Spearman 1904 + Conover 1999 §5.4) to avoid pulling scipy into the dep set (QuantRank ships without scipy; pandas' `Series.corr(method= 'spearman')` requires it transitively). Drops with descriptive notes: - cross-section < MIN_TICKERS_PER_DATE = 30 (Grinold-Kahn 2000 §4.2) - None / NaN / inf in either input - constant inputs (std=0 → correlation undefined) Aggregates per pillar: mean / std / median / min / max / IC IR / hit-rate. IC IR = mean/std × sqrt(n_dates) (Grinold-Kahn 2000 §4.4). Hit-rate = fraction of dates with strictly positive IC. Honest-baseline disclaimer per Research Report v1.0: - IC reported here is NAIVE — no costs / slippage / sector neutralization. Real net-of-cost IC typically 30-50% smaller per McLean-Pontiff 2016 JF post-publication decay - The historical universe MUST come from PR #274 members_at() to avoid survivorship bias (Hou-Xue-Zhang 2020 RFS); orchestrator reads the historical universe FROM rankings.json at as-of T which is correct by construction (snapshot itself is historical universe) - Report is a TIME SERIES + summary, not a single headline number Tests: 28 new (28 passing). Coverage: module constants, pure IC computation edge cases (perfect ±1.0, constant inputs, NaN drops, below-min cross-section, method validation), summary aggregation math (IC IR formula pinned, hit-rate semantics), orchestrator full-path (one date / multi-date / missing pillar / malformed JSON), text rendering, and a live-git smoke that auto-degrades gracefully when the gitignored price cache is absent. Schema impact: zero. No new Pydantic / TS / snapshot field. Production-wiring impact: zero. No compute/main.py import. The orchestrator is purely a validation / re-baseline tool. Downstream PRs (#2d PBO/DSR re-baseline + #2f honest-baseline report) consume the output. Phase 4.6 chain status: 5 of 6 items now landed (#1/#2 PR #277, #2a PR #278, #2b PR #280, #2c this PR, #2e PR #279; #2d gate kwarg shipped PR #275). #4 PBO/DSR re-baseline needs a warm-CI execution to publish actual numbers; #6 honest-baseline doc closes the chain. PHASE_STATUS_INFLIGHT.md updated per PR #237 side-file convention. Harness doc TODO list updated: 5 of 6 items now landed. https://claude.ai/code/session_01AGU8d6pm4u2fQQ5cebg9qa Co-authored-by: Claude <noreply@anthropic.com>

…on + CLI (closing the chain) (#282) Closes the Phase 4.6 honest re-validation harness structurally — 6 of 6 chain items now landed. The only remaining work is a warm-CI execution session that fills the TBD numeric cells. New artifacts: - docs/research/honest-baseline-2026-05-27.md — 10-section skeleton with TBD numeric cells in §2 (per-pillar IC) / §3 (PBO/DSR) / §4 (manipulation distribution) / §5 (survivorship-bias delta). Methodology + framing + honest-α ceiling + disclaimer ladder final-form. Citation block: Hou-Xue-Zhang 2020 RFS, McLean-Pontiff 2016 JF, Bailey-Lopez de Prado 2014 JPM, Bailey-Borwein-Lopez de Prado-Zhu 2014 AMS Notices, Grinold-Kahn 2000, Spearman 1904, Conover 1999, Kissell-Glantz 2003. - scripts/generate_honest_baseline.py — argparse CLI that wires compute_historical_ic_report (PR #281) + compute_manipulation_ distribution_shift (PR #279) end-to-end. Text mode emits the disclaimer banner to stderr; JSON mode embeds __banner__ in the payload. Exit codes: 0 (report produced), 1 (input validation), 2 (empty report — useful CI signal). - tests/test_validation/test_generate_honest_baseline_cli.py — 17 tests: argparse shape, _parse_date, exit codes, banner emission, JSON payload shape + α ceiling cells + disclaimer string, banner embedding, _report_to_payload with synthetic + populated manip reports, and a constant pin on the banner's 5 mandatory phrases (NAIVE / McLean-Pontiff / 2-5% / Rule 16 / S&P 500). Honest-baseline disclaimer per Research Report v1.0 autonomous mission: - IC / PBO / DSR figures NAIVE — no costs / slippage / sector neutralization - Real net-of-cost IC typically 30-50% smaller per McLean-Pontiff 2016 JF 32% post-publication decay - Honest net α ceiling: 2-5% per year (hard-coded into doc + JSON) - Composite formula sacred (Rule 16) — never replayed retroactively - Universe = S&P 500 (502) only - No trade recommendation of specific tickers — methodological report only Schema impact: zero. No new Pydantic / TS / snapshot field. Production-wiring impact: zero. No compute/main.py import. Smoke run against real repo's recent rankings.json (no live price cache) — orchestrator walks 3 commits, returns n_dates_with_ic=0, exit code 2 surfaces missing-cache signal cleanly. Phase 4.6 chain status — 6 of 6 items structurally landed: - #1/#2 universe-drift first unit: PR #277 - #2a ranking_history loader: PR #278 - #2b forward_returns loader: PR #280 - #2c per-pillar IC orchestrator: PR #281 - #2d PBO/DSR gate kwarg: PR #275 (warm-CI execution pending) - #2e manipulation_distribution shift: PR #279 - #2f honest-baseline skeleton + CLI: this PR Deferred follow-ups (NOT in this PR): warm-CI execution session, --markdown writer mode, --include-pbo-dsr factor-return wiring. PHASE_STATUS_INFLIGHT.md updated per PR #237 side-file convention. Harness doc TODO list: 6 of 6 items now landed. https://claude.ai/code/session_01AGU8d6pm4u2fQQ5cebg9qa Co-authored-by: Claude <noreply@anthropic.com>

…clones (#284) Root cause: `test_compute_shift_live_repo_recent_window` asserts `report.n_dates >= 1` for window [2026-05-22, 2026-05-27], but CI's `actions/checkout@v6` defaults to fetch-depth=1 (shallow). On shallow clones, `git log -- frontend/public/data/rankings.json` returns only the HEAD commit which (a) didn't touch rankings.json (release commit) and (b) is dated 2026-05-28 (outside the test window) → empty report → AssertionError → CI Python step exit code 1. This is the failure mode triggered on PR #283's main-push CI run (26526262716, Python job 78138619376 — Failing after 2m 43s). Sibling smoke tests are resilient because they use `HEAD` directly (`git show`) or don't constrain by a date window. Fix: pytest.skip() when report.n_dates == 0 AND note == "empty window" (the signature of a shallow-clone walk). Full clones still exercise the real assertion. Skip message cites the CI checkout convention so future readers understand why the test is intentionally lenient. Verified: - Shallow clone (CI): 10 passed + 1 skipped (graceful) - Full clone (sandbox / dev): 11 passed (assertion exercised) - ruff: clean Alternative considered (not adopted): bump `actions/checkout` fetch-depth to 0 (full history). Rejected because (a) the live-smoke tests are designed to skip when their data substrate isn't available (matches the no-price-cache pattern in `test_forward_returns.py` and `test_historical_ic.py`), (b) full git fetch adds 30-60s to CI cold start with diminishing returns, and (c) the universe of tests that need rankings.json history is small and bounded to validation/. Phase 4.6 task #2e (PR #279) regression — applies same pattern as the other 4 live-smoke tests already use. No schema / compute / output JSON change. https://claude.ai/code/session_01AGU8d6pm4u2fQQ5cebg9qa Co-authored-by: Claude <noreply@anthropic.com>

vercel Bot deployed to Preview May 27, 2026 13:58 View deployment

dackclup marked this pull request as ready for review May 27, 2026 14:19

dackclup merged commit 6a712e8 into main May 27, 2026
4 checks passed

dackclup deleted the claude/phase-4.6-manipulation-distribution-shift branch May 27, 2026 14:19

This was referenced May 27, 2026

feat(validation): Phase 4.6 task #2b — forward-return loader from gitignored price cache #280

Merged

feat(validation): Phase 4.6 task #2c — per-pillar IC at historical dates #281

Merged

dackclup mentioned this pull request May 27, 2026

docs+scripts(validation): Phase 4.6 task #2f — honest-baseline skeleton + CLI (closing the chain) #282

Merged

6 tasks

dackclup mentioned this pull request May 27, 2026

fix(test): make manipulation_distribution smoke resilient to shallow clones #284

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(validation): Phase 4.6 task #2e — manipulation_index distribution shift#279

feat(validation): Phase 4.6 task #2e — manipulation_index distribution shift#279
dackclup merged 1 commit into
mainfrom
claude/phase-4.6-manipulation-distribution-shift

dackclup commented May 27, 2026

Uh oh!

vercel Bot commented May 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dackclup commented May 27, 2026

Summary

What ships

Real-world artifact (live, window 2026-05-01 → 2026-05-27)

Hard rules preserved

Verification

Next in chain (independent of #2b/2c/2d)

Uh oh!

vercel Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented May 27, 2026 •

edited

Loading