docs(phase-3): Epic #150 Phase 3 — flag-correlation baseline analysis#164
Merged
Conversation
Adds the Phase 3 correlation analysis: a reproducible one-shot script
(`scripts/phase3_flag_correlation.py`) plus baseline outputs under
`docs/phase3-correlation/` containing firing rates, pairwise
φ-coefficients, redundancy candidates, diversity-confirmed pairs, a
heatmap PNG, and an interpretive findings doc.
Methodology:
- φ-coefficient (Matthews correlation for boolean variables) — the
proper analog of Pearson for the {0,1} case.
- Sources merged: `risk_flags` + `valuation_warnings` + boolean keys
in `tier2_events`.
- Universe: 502 S&P 500 constituents (production output @5dfe6287,
pre-Phase-2.4 so `loss_avoidance_pattern` is still 0% — re-run
after the next weekly cron to include it).
Headline findings (full version in `findings.md`):
- Defense layer is mostly orthogonal — 35 diversity-confirmed pairs
(|φ| ≤ 0.05, both base-rates ≥ 5%) vs 15 redundancy candidates
(|φ| ≥ 0.30). `altman_distress` is orthogonal to nearly every
other flag — its financial-distress signal is genuinely distinct.
- `restatement_history` is independent of the Sloan/Beneish/accruals
manipulation cluster (φ ≈ 0). Phase 2.2 recalibration can proceed
without redundancy concerns.
- Warning-band ↔ active-veto pairs (Dechow + Beneish) are correlated
by design (φ ≈ +0.6-0.7) — confirms the half-PPV Phase 2.5
calibration of `BENEISH_HIGH_WEIGHT` and `DECHOW_HIGH_WEIGHT`.
- `manipulation_triple_flag` is φ-locked to `dechow_high` at current
sample size (n=2 each). Watch in Q3 cohort audit — if persistent,
downgrade `TRIPLE_FLAG_WEIGHT = 10.0` (currently labeled gut-feel
in Phase 2.5 provenance docstring).
- `extreme_*_estimate` family clusters (φ ≈ +0.4-0.5) — already
surfaced positively as `valuation_methods_applicable` (PR #161).
No additional work needed.
Decision matrix for downstream Phase 2.x PRs in `findings.md`.
Doc + script only — no compute / schema / output change. Re-run after
every quarterly cohort audit (next: 2026-08-19) + after each Phase 2.x
recalibration PR lands.
Closes the Phase 3 deliverable in epic #150.
https://claude.ai/code/session_01Nj5sMzisnqDmF46g5ckEJn
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
5 tasks
dackclup
added a commit
that referenced
this pull request
May 22, 2026
…x) (#184) Doc-only refresh closing a parking-lot drift from the 14-subagent self-audit (2026-05-21). `docs/METHODOLOGY.md` §"Annotate-only flags" expanded from 10 documented bullets to 18; 8 previously-emitted-but- undocumented annotate flags now carry full bullets with literature anchors verified by `methodology-scientist` Mode C. New bullets (per Phase 2.5 provenance tier cross-checked against `compute/scoring/manipulation_index.py` weight docstrings — zero drift verified): - accruals_momentum_high — Sloan 1996 TAR §IV + Beneish 1999 FAJ ΔM + Xie 2001 TAR §IV persistence. GUT-FEEL on the +0.05 threshold. - loss_avoidance_pattern — Burgstahler-Dichev 1997 JAE §3 Table 2 kink-at-zero. LITERATURE-ANCHORED on signature; GUT-FEEL on the 10× rescale magnitude (PR #163). - beneish_high — Beneish 1999 FAJ + Beneish-Lee-Nichols 2013 FAJ §4 warning-band PPV (~35-40%). LITERATURE-ANCHORED. - dechow_high — Dechow-Ge-Larson-Sloan 2011 CAR Table 9. LITERATURE-ANCHORED. - manipulation_triple_flag — Phase 4.5a.3 joint-gate. GUT-FEEL (no academic source for joint-gate weight); PR #164 flagged redundancy candidate. - restatement_history — Hennes-Leone-Miller 2008 TAR §"Errors vs Irregularities" bare-flag PPV ~30%. LITERATURE-ANCHORED on cite, GUT-FEEL on 5y window. - restatement_high_confidence — HLM 2008 §4 irregularity signature (PPV ~70%) + Schroeder 2024 SSRN §3.2 90d window. LITERATURE-ANCHORED. - late_filing_notification — Bartov-Lai-Yeung 2002 JAR §IV ~2× restatement base-rate. GUT-FEEL (no replicated PPV on QR universe). CORRECTION surfaced during audit: hand-off attributed `late_filing_notification` to Cohen-Malloy-Pomorski 2012; the actual anchor is Bartov-Lai-Yeung 2002 (CMP 2012 is reserved-not-emitted for Phase 4.5e Form-4 slots — confirmed NOT-NEEDED for this PR). Stale "Phase 3e adds `beneish_high` and `dechow_f_high`" footnote removed (now full bullets; footnote also misspelled the flag name — actual emit is `dechow_high`). No compute / schema / scoring / valuation / code change. Defense layer emit count UNCHANGED at 30/13 annotates — this PR closes the doc gap, not the emit gap; the 8 flags were already in the headline math. CLAUDE.md + AGENTS.md lockstep entries added. Two prior in-flight entries (Issue #177 PR #183) updated from "in flight" to "merged". Verification: ruff clean, schema_check in sync, 1059/1059 offline tests pass. Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Epic #150 Phase 3 — pairwise φ-coefficient analysis of the 25 active defense-layer flags on production output. Reproducible one-shot script plus baseline outputs under
docs/phase3-correlation/for Q3 audit diff.What's in this PR
scripts/phase3_flag_correlation.pydocs/phase3-correlation/summary.mddocs/phase3-correlation/findings.mddocs/phase3-correlation/heatmap.pngdocs/phase3-correlation/*.csvCLAUDE.md+AGENTS.mdMethodology
risk_flags∪valuation_warnings∪tier2_eventsboolean keys, deduped on flag name5dfe6287, pre-Phase-2.4)Headline findings
1. Defense layer is mostly orthogonal ✅
35 diversity-confirmed pairs vs 15 redundancy candidates.
altman_distressis orthogonal to nearly every other flag (goodwill, restatement, sloan, NSI, accruals_momentum, beneish_high, extreme_*) — its financial-distress signal is genuinely distinct.2.
restatement_history⟂ manipulation cluster → Phase 2.2 safe ✅restatement_history↔sloan_accruals_top_decilerestatement_history↔accruals_momentum_highrestatement_history↔altman_distressPhase 2.2 recalibration (tighten to "amendment + Item 4.02 within 90d") can proceed without double-counting.
3. Warning-band ↔ active-veto: correlated by design ✅
dechow_high↔dechow_manipulation_veto: φ = +0.706beneish_high↔beneish_manipulation_veto: φ = +0.640NOT redundancy — these are intentionally nested tiers per
manipulation_index.pyPhase 2.5 provenance. The half-PPV calibration ofBENEISH_HIGH_WEIGHTandDECHOW_HIGH_WEIGHT(≈ ½ of active-veto weight) is validated.4.⚠️
manipulation_triple_flagφ-locked todechow_highφ = +1.000 at n=2 each — likely small-sample artifact, but worth watching. If persistent in Q3 audit, downgrade
TRIPLE_FLAG_WEIGHT = 10.0(currently labeled gut-feel calibration in Phase 2.5 provenance docstring).5.
extreme_*_estimatefamily clusters — already handled ✅extreme_graham_estimate↔extreme_rim_estimateextreme_rim_estimate↔goodwill_heavyextreme_multiples_ev_ebitda↔extreme_multiples_peAlready surfaced positively as
valuation_methods_applicable(PR #161). No additional work.Decision matrix for downstream PRs
restatement_historyTRIPLE_FLAG_WEIGHTACCRUALS_MOMENTUM_WEIGHT = 5.0extreme_*_estimateannotatesPost-Phase-2.4 follow-up
The baseline does NOT include
loss_avoidance_patternfiring data — PR #163 merged but the rescale hasn't applied to a production cron yet. Re-run after the next weekly cron to include it (expected ~5-15% base rate per BD 1997 priors).Risk
LOW — doc + script only, no compute/schema/output change. The new script is one-shot under
scripts/(not in the cron path).matplotlibis lazy-imported with a fallback for the heatmap step — not added topyproject.tomldependencies.Test plan
ruff check .cleansummary.mdrendering on Vercel previewhttps://claude.ai/code/session_01Nj5sMzisnqDmF46g5ckEJn
Generated by Claude Code