Skip to content

feat(validation): Phase 4.6 task #2e — manipulation_index distribution shift#279

Merged
dackclup merged 1 commit into
mainfrom
claude/phase-4.6-manipulation-distribution-shift
May 27, 2026
Merged

feat(validation): Phase 4.6 task #2e — manipulation_index distribution shift#279
dackclup merged 1 commit into
mainfrom
claude/phase-4.6-manipulation-distribution-shift

Conversation

@dackclup
Copy link
Copy Markdown
Owner

Summary

Closes Phase 4.6 task #2e — the manipulation_index distribution shift report. Consumes #2a's rankings.json time-series loader (PR #278) and answers the honest question: has the cohort of flagged stocks materially changed across the cron's history?

A universe-mean drift > ~5 pts would signal Phase 4.5e weight recalibration is needed (per Q3 2026-08-19 cohort-audit gate).

What ships

File Lines Role
compute/validation/manipulation_distribution.py ~250 2 dataclasses (DistributionSummary, ShiftReport) + compute_manipulation_distribution_shift() + format_shift_report()
tests/test_validation/test_manipulation_distribution.py ~225 11 tests covering band boundaries, partition correctness, empty/null/single/multi-date paths, text rendering, live-git smoke

Real-world artifact (live, window 2026-05-01 → 2026-05-27)

Manipulation-index distribution shift report
  window           : [2026-05-01, 2026-05-27]
  n_dates          : 3
  n_unique_tickers : 503
  note             : first=2026-05-22, last=2026-05-26

  Window-end deltas (last_date − first_date):
    Δmean       : +0.00
    Δstd        : +0.00
    ΔHIGH count : +0 (2 → 2)

  Per-date summary:
    date          n   mean    std    p75    p95   HIGH   top
    2026-05-22  502   4.38   9.28   5.00  25.00      2   SMCI=84.0, WAT=64.0, NVDA=48.0
    2026-05-23  502   4.38   9.28   5.00  25.00      2   SMCI=84.0, WAT=64.0, NVDA=48.0
    2026-05-26  502   4.38   9.28   5.00  25.00      2   SMCI=84.0, WAT=64.0, NVDA=48.0

Findings:

  • Distribution stable across the 5-day window (expected — pillars change slowly)
  • Top-3 invariant: SMCI 84, WAT 64, NVDA 48 — matches Phase 4.5f production-verified ManipulationRiskCard fire-rate snapshot
  • Universe mean 4.38 → LOW band; HIGH count 2 → within Phase 4.5f spec target (1-3 stocks)
  • No recalibration signal at this window length
  • A longer window (≥ 90 days of cron history) would let this report detect drift; the chain is ready when cron accumulates

Hard rules preserved

  • ✅ Rule 9 — no schema change (read-only consumer)
  • ✅ Rule 16 — N/A (no scoring change)
  • ✅ Rule 18 — diagnostic surface ships in same PR
  • ✅ License — pure stdlib + pandas; no new deps

Verification

Check Result
pytest tests/test_validation/test_manipulation_distribution.py ✅ 11/11 pass in 0.70s
ruff check ✅ clean
Live-git artifact ✅ generated cleanly above
Test count delta +11

Next in chain (independent of #2b/2c/2d)

# Item Effort Blocker
2b Forward-return from compute/cache/prices/ 0.5d gitignored cache
2c Per-pillar IC at historical dates 1d needs 2a + 2b
2d PBO/DSR re-baseline via PR #275 kwarg 1d needs 2c
2f Honest-baseline report 0.5d needs 2d

Subscribe-after-open suggestion: same pattern as #271-#278.


Generated by Claude Code

…n shift

Closes follow-up #2e from
``docs/research/historical-revalidation-harness.md``. Consumes the
#2a rankings.json time-series loader (PR #278) and reports the
manipulation_index distribution shift across the cron's lifetime:
mean / std / quantiles + fire-rate by band (LOW [0,20), MODERATE
[20,50), HIGH [50,∞)) per date + window-end deltas.

This answers the honest question — **has the cohort of flagged stocks
materially changed across the cron's history?** A universe-mean drift
> ~5 pts would signal Phase 4.5e weight recalibration is needed (per
Q3 2026-08-19 cohort-audit gate).

## What ships

- `compute/validation/manipulation_distribution.py` (NEW, ~250 LOC):
  - `DistributionSummary` dataclass — per-date snapshot (n / mean / std
    / median / q25 / q75 / q95 / max + band counts + top-3 tickers)
  - `ShiftReport` dataclass — aggregate across window + first-to-last
    deltas (mean_delta / std_delta / high_count_delta) + note
  - `compute_manipulation_distribution_shift(start_date, end_date,
    repo=None)` — main entry; pure-function wrapping #2a's loader
  - `format_shift_report(report, max_dates=20)` — human-readable text
    rendering; truncates long windows with "..." marker

- `tests/test_validation/test_manipulation_distribution.py` (NEW, 11 tests):
  - Band-boundary constants pin (LOW < 20 ≤ MODERATE < 50 ≤ HIGH)
  - `_summarize_one_date` band partition correctness + empty + top-3
    ordering
  - `compute_shift` empty-window / all-null-window / single-date /
    two-date-with-deltas paths (monkeypatched loader)
  - `format_shift_report` header + delta + cap rendering
  - Live-git smoke against the repo's recent cron commits

## Real-world artifact (live repo, window 2026-05-01 → 2026-05-27)

3 cron dates available on main:

    date          n   mean    std    p75    p95   HIGH   top
    2026-05-22  502   4.38   9.28   5.00  25.00      2   SMCI=84.0, WAT=64.0, NVDA=48.0
    2026-05-23  502   4.38   9.28   5.00  25.00      2   SMCI=84.0, WAT=64.0, NVDA=48.0
    2026-05-26  502   4.38   9.28   5.00  25.00      2   SMCI=84.0, WAT=64.0, NVDA=48.0

    Δmean=+0.00, Δstd=+0.00, ΔHIGH=+0 (2 → 2)

Distribution is **stable** across the window (expected for 5-day
horizon — pillar inputs change slowly). Top-3 invariant: SMCI 84,
WAT 64, NVDA 48 — matches Phase 4.5f production-verified
``ManipulationRiskCard`` fire-rate snapshot. Universe mean 4.38 sits
solidly in LOW band; only 2 tickers in HIGH band (Phase 4.5f spec
target: 1-3 stocks). No recalibration signal.

A longer window (≥ 90 days) would let this report detect drift; the
chain is now ready when cron history accumulates.

## Hard rules preserved

- ✅ Rule 9 — no schema change (read-only consumer of rankings.json)
- ✅ Rule 16 — N/A (no scoring change)
- ✅ Rule 18 — diagnostic surface ships in same PR
- ✅ License — pure stdlib + pandas; no new deps
- ✅ Universe S&P 500 only

## Verification

- `pytest tests/test_validation/test_manipulation_distribution.py` —
  11/11 pass in 0.70s
- `ruff check` — clean (linter trimmed unused imports + sorted)
- Live-git artifact above generated cleanly

## Next in the chain

Per ``docs/research/historical-revalidation-harness.md`` Future-work
TODO:

- #2b forward-return computation from `compute/cache/prices/` (0.5d) —
  gitignored cache; CI-only data
- #2c per-pillar IC at historical dates (1d, needs 2a + 2b)
- #2d PBO/DSR re-baseline via PR #275's `universe_provider` kwarg
  (1d, needs 2c)
- #2f honest-baseline report (0.5d, needs 2d)

#2e (this PR) is independent of 2b/2c — could ship before, in
parallel with, or after them.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
quantrank Ready Ready Preview, Comment May 27, 2026 1:58pm

@dackclup dackclup marked this pull request as ready for review May 27, 2026 14:19
@dackclup dackclup merged commit 6a712e8 into main May 27, 2026
4 checks passed
@dackclup dackclup deleted the claude/phase-4.6-manipulation-distribution-shift branch May 27, 2026 14:19
dackclup added a commit that referenced this pull request May 27, 2026
…tes (#281)

New `compute/validation/historical_ic.py` orchestrator pairs PR #278's
`load_ranking_history` (ranking at T) with PR #280's
`compute_forward_returns_batch` (realized return at T + horizon) and
computes per-pillar Spearman IC across the historical window —
closes the IC re-baseline half of the Phase 4.6 chain.

API:
- `compute_pillar_ic(scores, returns, *, method, min_tickers)`
  pure cross-sectional IC for one (pillar, date) pair
- `compute_historical_ic_report(start, end, *, horizon_months,
  pillars, ...)` walks rankings.json snapshots + forward returns
  cache, aggregates into `HistoricalICReport`
- `format_ic_report(report)` human-readable text rendering
- `PillarICEntry` / `PillarICSummary` / `HistoricalICReport`
  three frozen-dataclass carriers

Spearman computed as Pearson on rank-transformed series (Spearman
1904 + Conover 1999 §5.4) to avoid pulling scipy into the dep set
(QuantRank ships without scipy; pandas' `Series.corr(method=
'spearman')` requires it transitively).

Drops with descriptive notes:
- cross-section < MIN_TICKERS_PER_DATE = 30 (Grinold-Kahn 2000 §4.2)
- None / NaN / inf in either input
- constant inputs (std=0 → correlation undefined)

Aggregates per pillar: mean / std / median / min / max / IC IR /
hit-rate. IC IR = mean/std × sqrt(n_dates) (Grinold-Kahn 2000 §4.4).
Hit-rate = fraction of dates with strictly positive IC.

Honest-baseline disclaimer per Research Report v1.0:
- IC reported here is NAIVE — no costs / slippage / sector
  neutralization. Real net-of-cost IC typically 30-50% smaller per
  McLean-Pontiff 2016 JF post-publication decay
- The historical universe MUST come from PR #274 members_at() to
  avoid survivorship bias (Hou-Xue-Zhang 2020 RFS); orchestrator
  reads the historical universe FROM rankings.json at as-of T which
  is correct by construction (snapshot itself is historical universe)
- Report is a TIME SERIES + summary, not a single headline number

Tests: 28 new (28 passing). Coverage: module constants, pure IC
computation edge cases (perfect ±1.0, constant inputs, NaN drops,
below-min cross-section, method validation), summary aggregation
math (IC IR formula pinned, hit-rate semantics), orchestrator
full-path (one date / multi-date / missing pillar / malformed JSON),
text rendering, and a live-git smoke that auto-degrades gracefully
when the gitignored price cache is absent.

Schema impact: zero. No new Pydantic / TS / snapshot field.
Production-wiring impact: zero. No compute/main.py import. The
orchestrator is purely a validation / re-baseline tool. Downstream
PRs (#2d PBO/DSR re-baseline + #2f honest-baseline report) consume
the output.

Phase 4.6 chain status: 5 of 6 items now landed (#1/#2 PR #277, #2a
PR #278, #2b PR #280, #2c this PR, #2e PR #279; #2d gate kwarg
shipped PR #275). #4 PBO/DSR re-baseline needs a warm-CI execution
to publish actual numbers; #6 honest-baseline doc closes the chain.

PHASE_STATUS_INFLIGHT.md updated per PR #237 side-file convention.
Harness doc TODO list updated: 5 of 6 items now landed.

https://claude.ai/code/session_01AGU8d6pm4u2fQQ5cebg9qa

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 27, 2026
…on + CLI (closing the chain) (#282)

Closes the Phase 4.6 honest re-validation harness structurally — 6 of
6 chain items now landed. The only remaining work is a warm-CI
execution session that fills the TBD numeric cells.

New artifacts:

- docs/research/honest-baseline-2026-05-27.md — 10-section skeleton
  with TBD numeric cells in §2 (per-pillar IC) / §3 (PBO/DSR) / §4
  (manipulation distribution) / §5 (survivorship-bias delta).
  Methodology + framing + honest-α ceiling + disclaimer ladder
  final-form. Citation block: Hou-Xue-Zhang 2020 RFS, McLean-Pontiff
  2016 JF, Bailey-Lopez de Prado 2014 JPM, Bailey-Borwein-Lopez de
  Prado-Zhu 2014 AMS Notices, Grinold-Kahn 2000, Spearman 1904,
  Conover 1999, Kissell-Glantz 2003.
- scripts/generate_honest_baseline.py — argparse CLI that wires
  compute_historical_ic_report (PR #281) + compute_manipulation_
  distribution_shift (PR #279) end-to-end. Text mode emits the
  disclaimer banner to stderr; JSON mode embeds __banner__ in the
  payload. Exit codes: 0 (report produced), 1 (input validation),
  2 (empty report — useful CI signal).
- tests/test_validation/test_generate_honest_baseline_cli.py —
  17 tests: argparse shape, _parse_date, exit codes, banner emission,
  JSON payload shape + α ceiling cells + disclaimer string, banner
  embedding, _report_to_payload with synthetic + populated manip
  reports, and a constant pin on the banner's 5 mandatory phrases
  (NAIVE / McLean-Pontiff / 2-5% / Rule 16 / S&P 500).

Honest-baseline disclaimer per Research Report v1.0 autonomous
mission:
- IC / PBO / DSR figures NAIVE — no costs / slippage / sector
  neutralization
- Real net-of-cost IC typically 30-50% smaller per McLean-Pontiff
  2016 JF 32% post-publication decay
- Honest net α ceiling: 2-5% per year (hard-coded into doc + JSON)
- Composite formula sacred (Rule 16) — never replayed retroactively
- Universe = S&P 500 (502) only
- No trade recommendation of specific tickers — methodological
  report only

Schema impact: zero. No new Pydantic / TS / snapshot field.
Production-wiring impact: zero. No compute/main.py import.

Smoke run against real repo's recent rankings.json (no live price
cache) — orchestrator walks 3 commits, returns n_dates_with_ic=0,
exit code 2 surfaces missing-cache signal cleanly.

Phase 4.6 chain status — 6 of 6 items structurally landed:
- #1/#2 universe-drift first unit: PR #277
- #2a ranking_history loader: PR #278
- #2b forward_returns loader: PR #280
- #2c per-pillar IC orchestrator: PR #281
- #2d PBO/DSR gate kwarg: PR #275 (warm-CI execution pending)
- #2e manipulation_distribution shift: PR #279
- #2f honest-baseline skeleton + CLI: this PR

Deferred follow-ups (NOT in this PR): warm-CI execution session,
--markdown writer mode, --include-pbo-dsr factor-return wiring.

PHASE_STATUS_INFLIGHT.md updated per PR #237 side-file convention.
Harness doc TODO list: 6 of 6 items now landed.

https://claude.ai/code/session_01AGU8d6pm4u2fQQ5cebg9qa

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 27, 2026
…clones (#284)

Root cause: `test_compute_shift_live_repo_recent_window` asserts
`report.n_dates >= 1` for window [2026-05-22, 2026-05-27], but CI's
`actions/checkout@v6` defaults to fetch-depth=1 (shallow). On shallow
clones, `git log -- frontend/public/data/rankings.json` returns only
the HEAD commit which (a) didn't touch rankings.json (release commit)
and (b) is dated 2026-05-28 (outside the test window) → empty report
→ AssertionError → CI Python step exit code 1.

This is the failure mode triggered on PR #283's main-push CI run
(26526262716, Python job 78138619376 — Failing after 2m 43s). Sibling
smoke tests are resilient because they use `HEAD` directly (`git show`)
or don't constrain by a date window.

Fix: pytest.skip() when report.n_dates == 0 AND note == "empty window"
(the signature of a shallow-clone walk). Full clones still exercise
the real assertion. Skip message cites the CI checkout convention so
future readers understand why the test is intentionally lenient.

Verified:
- Shallow clone (CI): 10 passed + 1 skipped (graceful)
- Full clone (sandbox / dev): 11 passed (assertion exercised)
- ruff: clean

Alternative considered (not adopted): bump `actions/checkout`
fetch-depth to 0 (full history). Rejected because (a) the live-smoke
tests are designed to skip when their data substrate isn't available
(matches the no-price-cache pattern in `test_forward_returns.py` and
`test_historical_ic.py`), (b) full git fetch adds 30-60s to CI cold
start with diminishing returns, and (c) the universe of tests that
need rankings.json history is small and bounded to validation/.

Phase 4.6 task #2e (PR #279) regression — applies same pattern as
the other 4 live-smoke tests already use. No schema / compute / output
JSON change.

https://claude.ai/code/session_01AGU8d6pm4u2fQQ5cebg9qa

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants