Skip to content

docs(phase-3): Epic #150 Phase 3 — flag-correlation baseline analysis#164

Merged
dackclup merged 1 commit into
mainfrom
claude/phase-3-flag-correlation
May 21, 2026
Merged

docs(phase-3): Epic #150 Phase 3 — flag-correlation baseline analysis#164
dackclup merged 1 commit into
mainfrom
claude/phase-3-flag-correlation

Conversation

@dackclup
Copy link
Copy Markdown
Owner

Summary

Epic #150 Phase 3 — pairwise φ-coefficient analysis of the 25 active defense-layer flags on production output. Reproducible one-shot script plus baseline outputs under docs/phase3-correlation/ for Q3 audit diff.

What's in this PR

File Purpose
scripts/phase3_flag_correlation.py One-shot analysis script (re-run for each quarterly audit)
docs/phase3-correlation/summary.md Auto-generated firing rates + pair tables
docs/phase3-correlation/findings.md Hand-written interpretation + decision matrix
docs/phase3-correlation/heatmap.png φ-matrix heatmap (20 flags, firing-rate ≥ 1%)
docs/phase3-correlation/*.csv Baseline data for Q3 audit diff
CLAUDE.md + AGENTS.md Phase 3 status entry + Phase 2.4 → "merged" flip

Methodology

  • Metric: φ-coefficient (Matthews correlation for the 2×2 case) — boolean analog of Pearson, reads -1 to +1, 0 = independent
  • Sources: risk_flagsvaluation_warningstier2_events boolean keys, deduped on flag name
  • Universe: 502 S&P 500 constituents (production output @5dfe6287, pre-Phase-2.4)
  • Thresholds: redundancy at |φ| ≥ 0.30 (Cohen 1988 medium effect); diversity at |φ| ≤ 0.05 with both base-rates ≥ 5%

Headline findings

1. Defense layer is mostly orthogonal ✅

35 diversity-confirmed pairs vs 15 redundancy candidates. altman_distress is orthogonal to nearly every other flag (goodwill, restatement, sloan, NSI, accruals_momentum, beneish_high, extreme_*) — its financial-distress signal is genuinely distinct.

2. restatement_history ⟂ manipulation cluster → Phase 2.2 safe ✅

Pair φ
restatement_historysloan_accruals_top_decile +0.008
restatement_historyaccruals_momentum_high +0.003
restatement_historyaltman_distress -0.009

Phase 2.2 recalibration (tighten to "amendment + Item 4.02 within 90d") can proceed without double-counting.

3. Warning-band ↔ active-veto: correlated by design ✅

  • dechow_highdechow_manipulation_veto: φ = +0.706
  • beneish_highbeneish_manipulation_veto: φ = +0.640

NOT redundancy — these are intentionally nested tiers per manipulation_index.py Phase 2.5 provenance. The half-PPV calibration of BENEISH_HIGH_WEIGHT and DECHOW_HIGH_WEIGHT (≈ ½ of active-veto weight) is validated.

4. manipulation_triple_flag φ-locked to dechow_high ⚠️

φ = +1.000 at n=2 each — likely small-sample artifact, but worth watching. If persistent in Q3 audit, downgrade TRIPLE_FLAG_WEIGHT = 10.0 (currently labeled gut-feel calibration in Phase 2.5 provenance docstring).

5. extreme_*_estimate family clusters — already handled ✅

Pair φ
extreme_graham_estimateextreme_rim_estimate +0.515
extreme_rim_estimategoodwill_heavy +0.445
extreme_multiples_ev_ebitdaextreme_multiples_pe +0.409

Already surfaced positively as valuation_methods_applicable (PR #161). No additional work.

Decision matrix for downstream PRs

Action Decision Evidence
Phase 2.2 — recalibrate restatement_history Proceed §2
Phase 2.5 — downgrade TRIPLE_FLAG_WEIGHT Watch Q3 audit §4
Phase 2.5 — keep ACCRUALS_MOMENTUM_WEIGHT = 5.0 Keep φ = +0.305 (moderate, not redundant)
Phase 2.5 — drop extreme_*_estimate annotates Don't drop Per-method debug value (§5)
Phase 4i/4j/4k — JKP/Qlib/IPCA factor adds 🎯 Target orthogonality vs altman/restatement §1, §2

Post-Phase-2.4 follow-up

The baseline does NOT include loss_avoidance_pattern firing data — PR #163 merged but the rescale hasn't applied to a production cron yet. Re-run after the next weekly cron to include it (expected ~5-15% base rate per BD 1997 priors).

Risk

LOW — doc + script only, no compute/schema/output change. The new script is one-shot under scripts/ (not in the cron path). matplotlib is lazy-imported with a fallback for the heatmap step — not added to pyproject.toml dependencies.

Test plan

https://claude.ai/code/session_01Nj5sMzisnqDmF46g5ckEJn


Generated by Claude Code

Adds the Phase 3 correlation analysis: a reproducible one-shot script
(`scripts/phase3_flag_correlation.py`) plus baseline outputs under
`docs/phase3-correlation/` containing firing rates, pairwise
φ-coefficients, redundancy candidates, diversity-confirmed pairs, a
heatmap PNG, and an interpretive findings doc.

Methodology:
- φ-coefficient (Matthews correlation for boolean variables) — the
  proper analog of Pearson for the {0,1} case.
- Sources merged: `risk_flags` + `valuation_warnings` + boolean keys
  in `tier2_events`.
- Universe: 502 S&P 500 constituents (production output @5dfe6287,
  pre-Phase-2.4 so `loss_avoidance_pattern` is still 0% — re-run
  after the next weekly cron to include it).

Headline findings (full version in `findings.md`):
- Defense layer is mostly orthogonal — 35 diversity-confirmed pairs
  (|φ| ≤ 0.05, both base-rates ≥ 5%) vs 15 redundancy candidates
  (|φ| ≥ 0.30). `altman_distress` is orthogonal to nearly every
  other flag — its financial-distress signal is genuinely distinct.
- `restatement_history` is independent of the Sloan/Beneish/accruals
  manipulation cluster (φ ≈ 0). Phase 2.2 recalibration can proceed
  without redundancy concerns.
- Warning-band ↔ active-veto pairs (Dechow + Beneish) are correlated
  by design (φ ≈ +0.6-0.7) — confirms the half-PPV Phase 2.5
  calibration of `BENEISH_HIGH_WEIGHT` and `DECHOW_HIGH_WEIGHT`.
- `manipulation_triple_flag` is φ-locked to `dechow_high` at current
  sample size (n=2 each). Watch in Q3 cohort audit — if persistent,
  downgrade `TRIPLE_FLAG_WEIGHT = 10.0` (currently labeled gut-feel
  in Phase 2.5 provenance docstring).
- `extreme_*_estimate` family clusters (φ ≈ +0.4-0.5) — already
  surfaced positively as `valuation_methods_applicable` (PR #161).
  No additional work needed.

Decision matrix for downstream Phase 2.x PRs in `findings.md`.

Doc + script only — no compute / schema / output change. Re-run after
every quarterly cohort audit (next: 2026-08-19) + after each Phase 2.x
recalibration PR lands.

Closes the Phase 3 deliverable in epic #150.

https://claude.ai/code/session_01Nj5sMzisnqDmF46g5ckEJn
@vercel
Copy link
Copy Markdown

vercel Bot commented May 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
quantrank Ready Ready Preview, Comment May 21, 2026 2:18am

@dackclup dackclup marked this pull request as ready for review May 21, 2026 02:21
@dackclup dackclup merged commit 873fcca into main May 21, 2026
4 checks passed
@dackclup dackclup deleted the claude/phase-3-flag-correlation branch May 21, 2026 02:21
dackclup added a commit that referenced this pull request May 22, 2026
…x) (#184)

Doc-only refresh closing a parking-lot drift from the 14-subagent
self-audit (2026-05-21). `docs/METHODOLOGY.md` §"Annotate-only flags"
expanded from 10 documented bullets to 18; 8 previously-emitted-but-
undocumented annotate flags now carry full bullets with literature
anchors verified by `methodology-scientist` Mode C.

New bullets (per Phase 2.5 provenance tier cross-checked against
`compute/scoring/manipulation_index.py` weight docstrings — zero drift
verified):

- accruals_momentum_high — Sloan 1996 TAR §IV + Beneish 1999 FAJ ΔM
  + Xie 2001 TAR §IV persistence. GUT-FEEL on the +0.05 threshold.
- loss_avoidance_pattern — Burgstahler-Dichev 1997 JAE §3 Table 2
  kink-at-zero. LITERATURE-ANCHORED on signature; GUT-FEEL on the 10×
  rescale magnitude (PR #163).
- beneish_high — Beneish 1999 FAJ + Beneish-Lee-Nichols 2013 FAJ §4
  warning-band PPV (~35-40%). LITERATURE-ANCHORED.
- dechow_high — Dechow-Ge-Larson-Sloan 2011 CAR Table 9.
  LITERATURE-ANCHORED.
- manipulation_triple_flag — Phase 4.5a.3 joint-gate. GUT-FEEL (no
  academic source for joint-gate weight); PR #164 flagged redundancy
  candidate.
- restatement_history — Hennes-Leone-Miller 2008 TAR §"Errors vs
  Irregularities" bare-flag PPV ~30%. LITERATURE-ANCHORED on cite,
  GUT-FEEL on 5y window.
- restatement_high_confidence — HLM 2008 §4 irregularity signature
  (PPV ~70%) + Schroeder 2024 SSRN §3.2 90d window.
  LITERATURE-ANCHORED.
- late_filing_notification — Bartov-Lai-Yeung 2002 JAR §IV ~2×
  restatement base-rate. GUT-FEEL (no replicated PPV on QR universe).

CORRECTION surfaced during audit: hand-off attributed
`late_filing_notification` to Cohen-Malloy-Pomorski 2012; the actual
anchor is Bartov-Lai-Yeung 2002 (CMP 2012 is reserved-not-emitted for
Phase 4.5e Form-4 slots — confirmed NOT-NEEDED for this PR).

Stale "Phase 3e adds `beneish_high` and `dechow_f_high`" footnote
removed (now full bullets; footnote also misspelled the flag name —
actual emit is `dechow_high`).

No compute / schema / scoring / valuation / code change. Defense
layer emit count UNCHANGED at 30/13 annotates — this PR closes the
doc gap, not the emit gap; the 8 flags were already in the headline
math.

CLAUDE.md + AGENTS.md lockstep entries added. Two prior in-flight
entries (Issue #177 PR #183) updated from "in flight" to "merged".

Verification: ruff clean, schema_check in sync, 1059/1059 offline
tests pass.

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants