docs(phase-3): Epic #150 Phase 3 — flag-correlation baseline analysis by dackclup · Pull Request #164 · dackclup/quantrank

dackclup · 2026-05-21T02:18:08Z

Summary

Epic #150 Phase 3 — pairwise φ-coefficient analysis of the 25 active defense-layer flags on production output. Reproducible one-shot script plus baseline outputs under docs/phase3-correlation/ for Q3 audit diff.

What's in this PR

File	Purpose
`scripts/phase3_flag_correlation.py`	One-shot analysis script (re-run for each quarterly audit)
`docs/phase3-correlation/summary.md`	Auto-generated firing rates + pair tables
`docs/phase3-correlation/findings.md`	Hand-written interpretation + decision matrix
`docs/phase3-correlation/heatmap.png`	φ-matrix heatmap (20 flags, firing-rate ≥ 1%)
`docs/phase3-correlation/*.csv`	Baseline data for Q3 audit diff
`CLAUDE.md` + `AGENTS.md`	Phase 3 status entry + Phase 2.4 → "merged" flip

Methodology

Metric: φ-coefficient (Matthews correlation for the 2×2 case) — boolean analog of Pearson, reads -1 to +1, 0 = independent
Sources: risk_flags ∪ valuation_warnings ∪ tier2_events boolean keys, deduped on flag name
Universe: 502 S&P 500 constituents (production output @5dfe6287, pre-Phase-2.4)
Thresholds: redundancy at |φ| ≥ 0.30 (Cohen 1988 medium effect); diversity at |φ| ≤ 0.05 with both base-rates ≥ 5%

Headline findings

1. Defense layer is mostly orthogonal ✅

35 diversity-confirmed pairs vs 15 redundancy candidates. altman_distress is orthogonal to nearly every other flag (goodwill, restatement, sloan, NSI, accruals_momentum, beneish_high, extreme_*) — its financial-distress signal is genuinely distinct.

2. `restatement_history` ⟂ manipulation cluster → Phase 2.2 safe ✅

Pair	φ
`restatement_history` ↔ `sloan_accruals_top_decile`	+0.008
`restatement_history` ↔ `accruals_momentum_high`	+0.003
`restatement_history` ↔ `altman_distress`	-0.009

Phase 2.2 recalibration (tighten to "amendment + Item 4.02 within 90d") can proceed without double-counting.

3. Warning-band ↔ active-veto: correlated by design ✅

dechow_high ↔ dechow_manipulation_veto: φ = +0.706
beneish_high ↔ beneish_manipulation_veto: φ = +0.640

NOT redundancy — these are intentionally nested tiers per manipulation_index.py Phase 2.5 provenance. The half-PPV calibration of BENEISH_HIGH_WEIGHT and DECHOW_HIGH_WEIGHT (≈ ½ of active-veto weight) is validated.

4. `manipulation_triple_flag` φ-locked to `dechow_high` ⚠️

φ = +1.000 at n=2 each — likely small-sample artifact, but worth watching. If persistent in Q3 audit, downgrade TRIPLE_FLAG_WEIGHT = 10.0 (currently labeled gut-feel calibration in Phase 2.5 provenance docstring).

5. `extreme_*_estimate` family clusters — already handled ✅

Pair	φ
`extreme_graham_estimate` ↔ `extreme_rim_estimate`	+0.515
`extreme_rim_estimate` ↔ `goodwill_heavy`	+0.445
`extreme_multiples_ev_ebitda` ↔ `extreme_multiples_pe`	+0.409

Already surfaced positively as valuation_methods_applicable (PR #161). No additional work.

Decision matrix for downstream PRs

Action	Decision	Evidence
Phase 2.2 — recalibrate `restatement_history`	✅ Proceed	§2
Phase 2.5 — downgrade `TRIPLE_FLAG_WEIGHT`	⏳ Watch Q3 audit	§4
Phase 2.5 — keep `ACCRUALS_MOMENTUM_WEIGHT = 5.0`	✅ Keep	φ = +0.305 (moderate, not redundant)
Phase 2.5 — drop `extreme_*_estimate` annotates	❌ Don't drop	Per-method debug value (§5)
Phase 4i/4j/4k — JKP/Qlib/IPCA factor adds	🎯 Target orthogonality vs altman/restatement	§1, §2

Post-Phase-2.4 follow-up

The baseline does NOT include loss_avoidance_pattern firing data — PR #163 merged but the rescale hasn't applied to a production cron yet. Re-run after the next weekly cron to include it (expected ~5-15% base rate per BD 1997 priors).

Risk

LOW — doc + script only, no compute/schema/output change. The new script is one-shot under scripts/ (not in the cron path). matplotlib is lazy-imported with a fallback for the heatmap step — not added to pyproject.toml dependencies.

Test plan

ruff check . clean
Script runs end-to-end → produces 7 output files including heatmap
CLAUDE.md + AGENTS.md lockstep
Phase 2.4 "in flight" → "merged via PR fix(loss-avoidance): Epic #150 Phase 2.4 — rescale BD 1997 thresholds 10× for S&P 500 #163" flip
(Optional) Spot-check summary.md rendering on Vercel preview

https://claude.ai/code/session_01Nj5sMzisnqDmF46g5ckEJn

Generated by Claude Code

Adds the Phase 3 correlation analysis: a reproducible one-shot script (`scripts/phase3_flag_correlation.py`) plus baseline outputs under `docs/phase3-correlation/` containing firing rates, pairwise φ-coefficients, redundancy candidates, diversity-confirmed pairs, a heatmap PNG, and an interpretive findings doc. Methodology: - φ-coefficient (Matthews correlation for boolean variables) — the proper analog of Pearson for the {0,1} case. - Sources merged: `risk_flags` + `valuation_warnings` + boolean keys in `tier2_events`. - Universe: 502 S&P 500 constituents (production output @5dfe6287, pre-Phase-2.4 so `loss_avoidance_pattern` is still 0% — re-run after the next weekly cron to include it). Headline findings (full version in `findings.md`): - Defense layer is mostly orthogonal — 35 diversity-confirmed pairs (|φ| ≤ 0.05, both base-rates ≥ 5%) vs 15 redundancy candidates (|φ| ≥ 0.30). `altman_distress` is orthogonal to nearly every other flag — its financial-distress signal is genuinely distinct. - `restatement_history` is independent of the Sloan/Beneish/accruals manipulation cluster (φ ≈ 0). Phase 2.2 recalibration can proceed without redundancy concerns. - Warning-band ↔ active-veto pairs (Dechow + Beneish) are correlated by design (φ ≈ +0.6-0.7) — confirms the half-PPV Phase 2.5 calibration of `BENEISH_HIGH_WEIGHT` and `DECHOW_HIGH_WEIGHT`. - `manipulation_triple_flag` is φ-locked to `dechow_high` at current sample size (n=2 each). Watch in Q3 cohort audit — if persistent, downgrade `TRIPLE_FLAG_WEIGHT = 10.0` (currently labeled gut-feel in Phase 2.5 provenance docstring). - `extreme_*_estimate` family clusters (φ ≈ +0.4-0.5) — already surfaced positively as `valuation_methods_applicable` (PR #161). No additional work needed. Decision matrix for downstream Phase 2.x PRs in `findings.md`. Doc + script only — no compute / schema / output change. Re-run after every quarterly cohort audit (next: 2026-08-19) + after each Phase 2.x recalibration PR lands. Closes the Phase 3 deliverable in epic #150. https://claude.ai/code/session_01Nj5sMzisnqDmF46g5ckEJn

vercel · 2026-05-21T02:18:13Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
quantrank	Ready	Preview, Comment	May 21, 2026 2:18am

…x) (#184) Doc-only refresh closing a parking-lot drift from the 14-subagent self-audit (2026-05-21). `docs/METHODOLOGY.md` §"Annotate-only flags" expanded from 10 documented bullets to 18; 8 previously-emitted-but- undocumented annotate flags now carry full bullets with literature anchors verified by `methodology-scientist` Mode C. New bullets (per Phase 2.5 provenance tier cross-checked against `compute/scoring/manipulation_index.py` weight docstrings — zero drift verified): - accruals_momentum_high — Sloan 1996 TAR §IV + Beneish 1999 FAJ ΔM + Xie 2001 TAR §IV persistence. GUT-FEEL on the +0.05 threshold. - loss_avoidance_pattern — Burgstahler-Dichev 1997 JAE §3 Table 2 kink-at-zero. LITERATURE-ANCHORED on signature; GUT-FEEL on the 10× rescale magnitude (PR #163). - beneish_high — Beneish 1999 FAJ + Beneish-Lee-Nichols 2013 FAJ §4 warning-band PPV (~35-40%). LITERATURE-ANCHORED. - dechow_high — Dechow-Ge-Larson-Sloan 2011 CAR Table 9. LITERATURE-ANCHORED. - manipulation_triple_flag — Phase 4.5a.3 joint-gate. GUT-FEEL (no academic source for joint-gate weight); PR #164 flagged redundancy candidate. - restatement_history — Hennes-Leone-Miller 2008 TAR §"Errors vs Irregularities" bare-flag PPV ~30%. LITERATURE-ANCHORED on cite, GUT-FEEL on 5y window. - restatement_high_confidence — HLM 2008 §4 irregularity signature (PPV ~70%) + Schroeder 2024 SSRN §3.2 90d window. LITERATURE-ANCHORED. - late_filing_notification — Bartov-Lai-Yeung 2002 JAR §IV ~2× restatement base-rate. GUT-FEEL (no replicated PPV on QR universe). CORRECTION surfaced during audit: hand-off attributed `late_filing_notification` to Cohen-Malloy-Pomorski 2012; the actual anchor is Bartov-Lai-Yeung 2002 (CMP 2012 is reserved-not-emitted for Phase 4.5e Form-4 slots — confirmed NOT-NEEDED for this PR). Stale "Phase 3e adds `beneish_high` and `dechow_f_high`" footnote removed (now full bullets; footnote also misspelled the flag name — actual emit is `dechow_high`). No compute / schema / scoring / valuation / code change. Defense layer emit count UNCHANGED at 30/13 annotates — this PR closes the doc gap, not the emit gap; the 8 flags were already in the headline math. CLAUDE.md + AGENTS.md lockstep entries added. Two prior in-flight entries (Issue #177 PR #183) updated from "in flight" to "merged". Verification: ruff clean, schema_check in sync, 1059/1059 offline tests pass. Co-authored-by: Claude <noreply@anthropic.com>

vercel Bot deployed to Preview May 21, 2026 02:18 View deployment

dackclup marked this pull request as ready for review May 21, 2026 02:21

dackclup merged commit 873fcca into main May 21, 2026
4 checks passed

dackclup deleted the claude/phase-3-flag-correlation branch May 21, 2026 02:21

dackclup mentioned this pull request May 21, 2026

feat(restatement): Epic #150 Phase 2.2 — add restatement_high_confidence annotate #165

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(phase-3): Epic #150 Phase 3 — flag-correlation baseline analysis#164

docs(phase-3): Epic #150 Phase 3 — flag-correlation baseline analysis#164
dackclup merged 1 commit into
mainfrom
claude/phase-3-flag-correlation

dackclup commented May 21, 2026

Uh oh!

vercel Bot commented May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dackclup commented May 21, 2026

Summary

What's in this PR

Methodology

Headline findings

1. Defense layer is mostly orthogonal ✅

2. restatement_history ⟂ manipulation cluster → Phase 2.2 safe ✅

3. Warning-band ↔ active-veto: correlated by design ✅

4. manipulation_triple_flag φ-locked to dechow_high ⚠️

5. extreme_*_estimate family clusters — already handled ✅

Decision matrix for downstream PRs

Post-Phase-2.4 follow-up

Risk

Test plan

Uh oh!

vercel Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

2. `restatement_history` ⟂ manipulation cluster → Phase 2.2 safe ✅

4. `manipulation_triple_flag` φ-locked to `dechow_high` ⚠️

5. `extreme_*_estimate` family clusters — already handled ✅

vercel Bot commented May 21, 2026 •

edited

Loading