Skip to content

docs(phase4.5): add Earnings-Manipulation Defense Cluster to the v1.x roadmap#86

Merged
dackclup merged 1 commit into
mainfrom
docs/phase4.5-manipulation-roadmap
May 16, 2026
Merged

docs(phase4.5): add Earnings-Manipulation Defense Cluster to the v1.x roadmap#86
dackclup merged 1 commit into
mainfrom
docs/phase4.5-manipulation-roadmap

Conversation

@dackclup
Copy link
Copy Markdown
Owner

Summary

User: "อยากทำให้ระบบป้องกันได้แน่นหนาที่สุด ช่วยเสนอแผนมาหน่อย" → "รวมแผนเข้าไปในแผนการทำ app".

Folds the earnings-manipulation-defense proposal into the existing phase-tracker triple as a new Phase 4.5 → v1.2.0 cluster, inserted between Phase 4 (factor consolidation → v1.1) and Phase 5 (ML meta-learner).

Phase 4.5 sub-PRs (~10-11 working weeks)

Sub-PR New defenses Effort
4.5a (3 sub-PRs in parallel) Sector-relative Sloan + Beneish soft-veto (M > −1.78) + Dechow soft-veto (F > 3.0) + manipulation_triple_flag joint badge ~180 LOC, 1-2w
4.5b restatement_history 10-K/A scan + late_filing_notification Form 12b-25 ~270 LOC, 1w
4.5c Roychowdhury REM 3-proxy (abnormal_CFO + abnormal_production + abnormal_discretionary_exp) ~250 LOC, 2w
4.5d m_score_deteriorating 3y momentum + Burgstahler-Dichev kink-at-zero loss_avoidance_pattern ~180 LOC, 2w
4.5e New SEC Form 4 parser + insider_sell_cluster + c_suite_unusual_sell ~420 LOC, 3w
4.5f manipulation_index 0-100 composite + composite-score penalty + UI pillar card + README Honest Limitations + schema bump → 0.8.0-phase4.5f ~250 LOC, 1w

Defense layer after 4.5: 5 → 7 active vetoes; 4 → 11 annotates; 9 → 18 total layers.

Validation harness (cross-cutting)

  • SEC AAER list 2000-2024 (~600 confirmed manipulators per Dechow et al. 2011 + ongoing). Public.
  • Audit Analytics restatement subset (~1,200 firms 2000-2024).
  • PBO ≤ 0.5 AND DSR > 0 gate per addition (Bailey-López de Prado-Zhu 2014 CSCV — already in PR 4b §2 scope per issue #75).
  • Purged + embargoed walk-forward CV (López de Prado 2018).

Sequencing

  1. PR 4b (defense-infrastructure) — MUST land first; provides the PBO/DSR + AAER cohort fixtures
  2. v1.1.0-phase4: 4h/4i/4j/4k factor integrations (OSAP / JKP / Qlib / IPCA)
  3. v1.2.0-phase4.5: 4.5a (3 parallel sub-PRs) → 4.5b + 4.5c (parallel) → 4.5d → 4.5e → 4.5f
  4. Factor 4h/4i/4j/4k can overlap 4.5 (disjoint files, same harness)

Doc lockstep (per phase-status-bump skill)

File Change
PHASE_STATUS.md New row in Phase Overview table + full "Phase 4.5 plan" detail section with all 6 sub-PRs + acceptance criteria + sequencing notes
WORKFLOW.md New row in Phase Overview header table + full "PHASE 4.5" task section (~150 lines) with checkboxes per sub-PR + Defense Roadmap table extended with 4.5a-4.5f rows + updated calendar totals for v1.0 / v1.1 / v1.2 / v2.0
CLAUDE.md §Phase status — Phase 4.5 named as next-after-PR-4b; defense-layer count delta (9 → 18) called out; issue #7 (Sloan sector-relative) noted as folded into 4.5a.1

Not in this PR

  • SKILL.md — no current-state constant change yet (schema / veto count / rule additions land per sub-PR as they ship, not at plan time)
  • Code changes (compute/, frontend/, tests/) — plan only
  • .claude/skills/phase-4.5/ sub-skill PLAN.md stubs — land per sub-PR at 4.5a kickoff

Test plan

  • No code changes; CI runs only doc-level lint
  • Phase 4.5 row appears in PHASE_STATUS.md table
  • WORKFLOW.md "PHASE 4.5" task section parses cleanly (markdown checkboxes render)
  • CLAUDE.md cross-references resolve (PHASE_STATUS.md → §"Phase 4.5 plan")

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2


Generated by Claude Code

… roadmap

User: "อยากทำให้ระบบป้องกันได้แน่นหนาที่สุด ช่วยเสนอแผนมาหน่อย" →
"รวมแผนเข้าไปในแผนการทำ app".

Folds the manipulation-defense proposal into the existing
phase-tracker triple as a new **Phase 4.5 → v1.2.0** cluster
inserted between Phase 4 (factor consolidation → v1.1.0) and
Phase 5 (ML meta-learner).

## Why this is its own phase, not folded into Phase 4

- **Phase 4 = factor focus** (OSAP / JKP / Qlib / IPCA). Ships v1.1.
- **Phase 4.5 = manipulation focus** (Sloan/Beneish/Dechow/REM/insider).
  Touches disjoint code paths.
- Splitting keeps release themes clean and lets v1.1 ship sooner.
- Phase 4.5 sub-PRs **can run in parallel with Phase 4 factor
  integrations** (4h/4i/4j/4k) since they touch different layers
  and share the PR 4b PBO/DSR validation harness.

## Phase 4.5 sub-PRs (6 total, ~10-11 weeks full-time)

| Sub-PR | New defenses | Effort |
|---|---|---|
| 4.5a (3 sub-PRs in parallel) | Sector-relative Sloan + Beneish/Dechow soft-veto + `manipulation_triple_flag` joint badge | ~180 LOC, 1-2w |
| 4.5b | `restatement_history` (Hennes-Leone-Miller 2008) + `late_filing_notification` Form 12b-25 (Bartov-Lai-Yeung 2002) | ~270 LOC, 1w |
| 4.5c | Roychowdhury REM 3-proxy (`abnormal_CFO` + `abnormal_production` + `abnormal_discretionary_exp`) | ~250 LOC, 2w |
| 4.5d | `m_score_deteriorating` 3y momentum + Burgstahler-Dichev kink at zero | ~180 LOC, 2w |
| 4.5e | New SEC Form 4 parser + `insider_sell_cluster` (Cohen-Malloy-Pomorski 2012) + `c_suite_unusual_sell` | ~420 LOC, 3w |
| 4.5f | `manipulation_index` 0-100 composite + composite-score penalty + UI pillar card + README Honest Limitations + schema bump 0.7.x → 0.8.0-phase4.5f | ~250 LOC, 1w |

**Defense layer after 4.5**: 5 → 7 active vetoes; 4 → 11 annotates;
9 → **18 total layers** (verifiable via defense-scorecard skill).

## Validation cohort (used per sub-PR)

- SEC AAER list 2000-2024 (~600 confirmed manipulators per Dechow
  et al. 2011 dataset + ongoing). Public, free.
- Audit Analytics restatement subset (~1,200 firms 2000-2024) as
  second-source.
- PBO ≤ 0.5 AND DSR > 0 gate per addition (Bailey-de Prado-Zhu
  2014 CSCV harness — already in PR 4b §2 scope per issue #75).
- Purged + embargoed walk-forward CV (López de Prado 2018).

## Doc lockstep (per phase-status-bump skill)

| File | Change |
|---|---|
| `PHASE_STATUS.md` | New row in Phase Overview table + full "Phase 4.5 plan" section with all 6 sub-PRs + acceptance criteria + sequencing |
| `WORKFLOW.md` | New row in Phase Overview header table + full "PHASE 4.5 — Earnings-Manipulation Defense Cluster" task section with checkboxes per sub-PR + Defense Roadmap table extended with 4.5a-4.5f rows + updated calendar totals for v1.1 / v1.2 / v2.0 |
| `CLAUDE.md` | §Phase status updated — Phase 4.5 named as the next deliverable after PR 4b → v1.1.0; defense-layer count delta (9 → 18) called out; issue #7 noted as folded into 4.5a |

## Not touched

- `SKILL.md` — no current-state constant change yet (schema /
  veto count / rule additions land per sub-PR as they ship,
  not at plan time). Rule 16 (annotate-and-veto-Top-N) covers
  4.5a's defense pattern; new rule if `manipulation_index`
  composite-penalty becomes a Phase 5+ adopted pattern.
- Code (`compute/`, `frontend/`, `tests/`) — plan only.
- `.claude/skills/phase-4.5/` — sub-skill PLAN.md stubs land
  per sub-PR at the start of 4.5a kickoff.

## Sequencing reminder

1. PR 4b (defense-infrastructure) — MUST land first
2. v1.1.0-phase4: 4h/4i/4j/4k factor integrations
3. v1.2.0-phase4.5: 4.5a → 4.5b + 4.5c (parallel) → 4.5d → 4.5e → 4.5f
4. (Factor 4h/4i/4j/4k can also overlap 4.5 — disjoint files)

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2
@vercel
Copy link
Copy Markdown

vercel Bot commented May 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
quantrank Ready Ready Preview, Comment May 16, 2026 9:13am

@dackclup dackclup marked this pull request as ready for review May 16, 2026 12:25
@dackclup dackclup merged commit c59b8f5 into main May 16, 2026
4 checks passed
@dackclup dackclup deleted the docs/phase4.5-manipulation-roadmap branch May 16, 2026 12:25
dackclup added a commit that referenced this pull request May 16, 2026
… only §3 polish remains (#87)

During run #45 verification (defense-scorecard skill) the
`cross_source_disagreement` flag surfaced 23 stocks in production
— evidence that PR 4b §1 had already shipped. A `git log` confirms
**PR #60 ("feat(defense): cross-source validator + PBO/DSR + IC-
decay infra (PR 4b)") merged on 2026-05-14**, before v1.0.0.
Issue #75 was filed on 2026-05-15 (one day after #60 merged) to
track the remaining acceptance criteria — but PR #81 and PR #86
mistakenly treated "PR 4b" as fully not-yet-started.

This commit re-aligns the triple-doc with the actual state.

## What PR #60 actually shipped (already in production)

| Sub-section | Status | Evidence |
|---|---|---|
| **§1 Cross-source validator** | ✅ DONE | `compute/ingest/cross_source.py` exists; wired in `compute/main.py:979-988`; run #45 shows 23/502 stocks (4.6%) flagging `cross_source_disagreement` annotate |
| **§2 PBO/DSR library** | ✅ DONE | `compute/validation/pbo_dsr.py` exists with CSCV + DSR + Beasley-Springer-Moro inverse normal CDF; `factor_passes_gates()` entry point ready |
| **§3 IC-decay monitor** | 🟡 PARTIAL | `compute/validation/ic_decay.py` exists; **`decay_report.json` writer NOT wired** + no UI transparency surface — exactly the 2 unchecked acceptance criteria on issue #75 |

## Real "next deliverable" — PR 4b §3 polish (~2-3 days)

1. **Writer wiring** — call `ic_decay.run()` from `compute/main.py`
   after pillar normalization; write per-pillar decay table to
   `frontend/public/data/decay_report.json` via new writer in
   `compute/output/writer.py` (atomic temp → rename pattern).
2. **UI transparency surface** — new `DecayReportCard.tsx` on
   the stock detail page below `PillarRadarChart`. Reads
   `decay_report.json` client-side (fail-soft if absent), shows
   8-pillar 12m + 36m IC trend + decay-alert badges per pillar.

Effort: ~150 LOC writer + UI + ~80 LOC tests.

After §3 polish ships: → 4h/4i/4j/4k factor integrations (each
gated by the now-complete PBO/DSR harness) → tag v1.1.0-phase4.

## File-by-file changes

| File | Change |
|---|---|
| `CLAUDE.md` | §Phase status — "Next deliverable" reframed as PR 4b §3 polish only. Production verification numbers cited (23 stocks, 4.6%). Issue #75 description updated to "remaining items: writer + UI surface". |
| `PHASE_STATUS.md` | Phase 4 table row reflects PR 4b §1+§2 merged via PR #60. The long "Next deliverable: PR 4b — defense-infrastructure" block is replaced by a 3-row sub-section status table (§1 ✅ / §2 ✅ / §3 🟡) + a tighter "PR 4b §3 polish" next-deliverable scope description. |
| `WORKFLOW.md` | Phase 4 Acceptance Criteria — 3 cross-source / PBO-DSR / IC-decay checkboxes flipped from `[ ]` to `[x]` (with footnotes on PR #60 + run #45 evidence) for §1+§2; §3 stays `[ ]` with the remaining-items breakdown. |

## Doc audit trail

This is the second triple-doc fix this week:
- PR #81: marked 4g ✅ DONE (correct)
- PR #86: added Phase 4.5 roadmap + named PR 4b as next (WRONG —
  this commit corrects)
- This PR: PR 4b §1+§2 ✅ DONE; §3 polish is the real next

The cause of the PR #81/#86 error: PR #60 merged 2026-05-14 used
the literal title "PR 4b" — but the `PHASE_STATUS.md` PR-label
slot "4b" was already taken by the `_avg_3y_roe` fix (different
content, same letter). Two unrelated PRs both calling themselves
"PR 4b" created the confusion. Going forward we should disambiguate
by always referring to it as "PR 4b defense-infrastructure" or
"issue #75" when meaning the cross-source/PBO/IC-decay work.

## Verification

- No code changes; docs only
- `grep "PR 4b defense-infrastructure (issue #75) next"` — returns 0 hits
- `grep "PR 4b §3 polish"` — appears in CLAUDE.md + PHASE_STATUS.md (the new wording)
- WORKFLOW.md Phase 4 acceptance checklist: 3 items flipped to [x] with evidence

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 16, 2026
…oses issue #7) (#89)

User: "อยากทำให้ระบบป้องกันได้แน่นหนาที่สุด" → "รวมแผนเข้าไปในแผน
การทำ app" — Phase 4.5 manipulation-defense cluster sub-PR 1 of 6.

## Problem (issue #7)

`sloan_accruals_top_decile` ใช้ **cross-sectional** top-decile —
ไม่คิดถึง sector. Sloan accruals = (NI − CFO) / TotalAssets วัด
"earnings ที่ไม่ใช่เงินสด" ซึ่งโดยโครงสร้างแตกต่างกันตาม sector:

- **Financials** มี non-cash items สูง (loan-loss provisions,
  fair-value adjustments) — ไม่ใช่ manipulation
- **REITs** มี D&A สูง — โครงสร้างของ business model
- **Tech** มี SBC + working-capital build จาก growth — ไม่ใช่
  manipulation

Cross-sectional top-decile จึง over-fire ที่ Financials/REITs.

Production run #45 confirms — **Financials 16/75 = 21.3% flagged**
vs expected ~10% (ผ่าน 2× over-fire). IT/Comm/REITs ก็เลื่อนสูง
กว่า baseline เล็กน้อย; Staples/Utilities ต่ำกว่ามาก (3% / 2.8%).
Sector spread 21.3% / 2.8% = 7.6× — เกินที่ Sloan paper บ่งบอก.

## Fix

`compute_risk_flags()` คำนวณ Sloan top-decile threshold **per
sector** เมื่อ `sectors` ถูกส่งเข้ามา (production ส่งเสมอ).
Sectors ที่มีขนาด < `SLOAN_MIN_POPULATION_SECTOR=15` (เช่น future
S&P 1500 sub-sectors) fall back ไป cross-sectional threshold.

Mirrors NSI per-sector pattern ที่มีอยู่แล้ว — เพิ่ม dict
`sloan_thresholds_by_sector` คู่ขนานกับ `nsi_thresholds_by_sector`
และเปลี่ยน per-ticker check ให้ prefer sector threshold แล้วค่อย
fall back ไป cross-sectional.

## Expected production impact

| Sector | Current | After 4.5a.1 (~10%) | Δ |
|---|---|---|---|
| Financials (75) | 16 (21.3%) | ~8 (10%) | **−8** |
| IT (73) | 8 (11.0%) | ~7 (10%) | −1 |
| Comm (23) | 3 (13.0%) | ~2 (10%) | −1 |
| Cons.Disc (48) | 5 (10.4%) | ~5 (10%) | 0 |
| Health (59) | 6 (10.2%) | ~6 (10%) | 0 |
| Real Estate (31) | 3 (9.7%) | ~3 (10%) | 0 |
| Industrials (79) | 6 (7.6%) | ~8 (10%) | +2 |
| Energy (21) | 1 (4.8%) | ~2 (10%) | +1 |
| Materials (26) | 1 (3.8%) | ~3 (10%) | +2 |
| Utilities (31) | 1 (3.2%) | ~3 (10%) | +2 |
| Cons.Staples (36) | 1 (2.8%) | ~4 (10%) | +3 |

Net: **51 → ~50** flagged (roughly stable total) but **per-sector
rate stabilizes at ~10%** instead of the 7.6× spread today. The
flag now means "this stock is in the worst Sloan decile **of its
own sector peer group**", which is the correct earnings-quality
signal per Sloan 1996.

Tier-3 defenses (Beneish + Dechow) catch the residual sector-
agnostic manipulation cases that don't show up in sector-relative
Sloan (4.5a.2 + 4.5a.3 will soft-veto promote them).

## Implementation

| File | Change |
|---|---|
| `compute/scoring/risk_overlay.py` | New `SLOAN_MIN_POPULATION_SECTOR = 15` constant. Module docstring updated to flag PR 4.5a.1 + issue #7 close. `compute_risk_flags()` builds `sloan_thresholds_by_sector` dict when `sectors` is supplied (mirrors NSI pattern). Per-ticker check prefers sector threshold → cross-sectional fallback → skip. |
| `tests/test_scoring/test_risk_overlay.py` | 3 new tests: `test_sloan_sector_relative_top_decile_when_sectors_supplied` (top in EACH of 2 sectors flagged; bottom in each NOT flagged), `test_sloan_sector_relative_skips_undersized_sector` (small-sector tickers fall back to cross-sectional), `test_sloan_sector_relative_floor_constant` (sanity: floor ≥ 10 and ≥ SLOAN_MIN_POPULATION). |

## Backward compat

- Existing tests that call `compute_risk_flags(snaps)` without
  `sectors` continue to work — cross-sectional fallback is the
  same code path as v1.0.
- `sectors` arg was already optional + production already passes
  `sectors=sectors_dict` to `compute_risk_flags` (compute/main.py:830).
- No schema change. No new flag identifier — the flag name
  `sloan_accruals_top_decile` is unchanged (the threshold rubric
  is the implementation detail).

## Verification ladder

- ✅ ruff check . — clean
- ✅ pytest tests/ -m "not network" — **775 passed** (was 772)
- ✅ schema_check — not touched (no schema delta)
- ⏳ Production verification deferred to next weekly compute — will
  re-run `defense-scorecard` + `verify-production-output` Section E
  vs current 51-flag baseline after the next workflow_dispatch lands

## Closes / references

- Closes [issue #7](#7)
  (Sloan over-fire on growers + Financials)
- First sub-PR of Phase 4.5 manipulation-defense cluster per PR #86
- Defense layer count unchanged (5 active vetoes), defense **quality**
  improves — same number of flagged stocks but now correctly
  sector-distributed

## Future (not in this PR)

- 4.5a.2: Beneish soft-veto (M > −1.78 threshold)
- 4.5a.3: Dechow soft-veto + `manipulation_triple_flag` joint gate
- AAER cohort backtest via the PR 4b §2 PBO/DSR harness (deferred
  until cohort fixtures land in 4.5c kickoff)

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 16, 2026
…tered_top5 suppression) (#90)

Phase 4.5 manipulation-defense cluster sub-PR 2 of 6 per PR #86.
Promotes `beneish_high` (M > -2.22 annotate, PR 3e.1) to a **second
tier** active veto at the stricter threshold `M > -1.78`.

## Threshold rationale

Beneish 1999 *FAJ* original cutoff is M > -2.22 — Type-I ~17%,
Type-II ~24%. The PR 3e.1 annotate uses this threshold as the
"is this stock worth a closer look" signal (low precision, high
recall, doesn't suppress Top-5).

PR 4.5a.2 adds a STRICTER cutoff M > -1.78 for the **active veto**
path. Beneish 1999 paper Table 4 shows positive-predictive-value
crosses ~60% at M > -1.78 on the original 74-manipulator sample;
below that the precision drops into FP-heavy territory. The
stricter cutoff mirrors PR 3d's `non_reliance_filing` veto trade-
off — high precision, narrower recall, won't dilute Top-5 with
marginal annotators.

Tickers in the -2.22 to -1.78 band keep ONLY the annotate flag
(no veto). Tickers above -1.78 get BOTH the annotate AND the veto.

## Production estimate (run #46, c=737d8efe)

| Threshold | Coverage | Flagged |
|---|---|---|
| M > -2.22 (annotate, existing) | 160/502 (32% covered) | 26 (16.2%) |
| **M > -1.78 (veto, NEW)** | same 160 | **11 (6.9%)** |

New vetoes that will fire (top 11 in M-score order):
SMCI · WAT · PODD · WDC · NVDA · CAT · PLTR · SNDK · BG · STX · LLY

Most are growth tech (SMCI, NVDA, PLTR, SNDK, STX) where Beneish
1999 acknowledged growers can FP. The Tier-3 forensic posture
documented in PR 3e.1 is exactly the right framing — the veto is
high-precision-narrow-recall; growers that show up here usually
warrant a closer look even if not all are confirmed manipulators.

## Architecture

| Layer | Before | After |
|---|---|---|
| Active vetoes | 5 (altman / sloan / NSI / non_reliance / data_quality) | **6** (+ `beneish_manipulation_veto`) |
| Annotate flags | `beneish_high` at M > -2.22 (unchanged) | + nothing new |

`compute_risk_flags(beneish_m_scores=...)` is the inject pattern —
mirrors `non_reliance_by_ticker`. `compute/main.py` pre-computes
all 502 Beneish results BEFORE the risk_flag pass so the veto can
suppress entered_top5; the existing per-ticker loop (Step 8) reads
the cached `beneish_results[ticker]` instead of recomputing
(performance neutral — one compute per ticker, was already two
before this refactor).

## Files changed

| File | Change |
|---|---|
| `compute/scoring/beneish.py` | + `BENEISH_VETO_THRESHOLD = -1.78` constant + docstring rationale (paper Table 4 PPV crossover, parallel to non_reliance_filing trade-off) |
| `compute/scoring/risk_overlay.py` | + `beneish_m_scores` kwarg on `compute_risk_flags`. New veto check at the end of the per-ticker loop emits `beneish_manipulation_veto` when M > threshold. Imports `BENEISH_VETO_THRESHOLD`. |
| `compute/main.py` | Pre-compute `beneish_results` dict + `beneish_m_scores` dict before `compute_risk_flags` call. Pass to compute_risk_flags. Per-ticker Step-8 loop reads from cached `beneish_results[ticker]` (no double-compute). |
| `tests/test_scoring/test_risk_overlay.py` | 4 new tests: veto fires above strict threshold, skipped on None m_score, strict inequality at exact threshold, disabled when dict not supplied (backward-compat). |

## Backward compat

- `beneish_m_scores` arg is **optional**. Existing callers without it (tests, future external users) see no behavior change.
- `beneish_high` annotate at M > -2.22 **unchanged** — old flag still fires for ranks below the veto band.
- `StockDetail.beneish_m_score` numeric field unchanged.

## Verification ladder

- ✅ `ruff check .` — clean
- ✅ `pytest tests/ -m "not network"` — **779 passed** (was 775; 4 new)
- ✅ schema_check — N/A (no schema delta)
- ⏳ Production verification deferred to next workflow_dispatch — expect:
  - 11 new tickers with `beneish_manipulation_veto` in `risk_flags`
  - Top-5 rotation re-shuffles if NVDA/SMCI/WAT/etc were in raw-top-5 (NVDA in run #46 raw-top-5 at #3 — will be suppressed with Sloan + new Beneish veto)
  - Active veto count metadata 5 → 6 (verify via `defense-scorecard`)

## Sibling sub-PRs (Phase 4.5a wave)

- ✅ 4.5a.1 (Sloan sector-relative) — merged PR #89
- **4.5a.2 (this PR)** — Beneish soft-veto
- ⬜ 4.5a.3 (Dechow soft-veto + `manipulation_triple_flag`) — next, branches off this PR or off main after merge

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2

Co-authored-by: Claude <noreply@anthropic.com>
dackclup pushed a commit that referenced this pull request May 16, 2026
Phase 4.5 manipulation-defense cluster sub-PR 3 of 6 per PR #86.
Final 4.5a wave member. Branched off PR #90 (4.5a.2 Beneish veto)
to integrate cleanly. Depends on 4.5a.2 merging first.

## Two additions

### 1. Dechow F-score soft-veto (F > 3.0)

Promotes `dechow_high` (F > 2.45 annotate, PR 3e.2) to a second-tier
active veto at the stricter threshold F > 3.0. Mirrors PR 4.5a.2's
Beneish veto pattern exactly.

Threshold rationale: Dechow 2011 *CAR* Table 7 shows that at F > 3.0
the AAER hit rate exceeds 4× baseline (vs ~2× at F > 2.45). The
stricter cutoff matches the precision/recall trade-off PR 4.5a.2
locked for Beneish and PR 3d locked for non_reliance_filing — high
precision, narrower recall, won't dilute Top-5.

### 2. `manipulation_triple_flag` joint-gate badge

Fires when Sloan + Beneish-high + Dechow-high all flag on the same
ticker. Rare but high-confidence — 2 tickers in run #46:

  - **SMCI**: F=6.65 (Dechow veto fires too) + Sloan + Beneish high
  - **WAT**: Sloan + Beneish high + Dechow high (annotates only)

UI-only badge in `valuation_warnings`; does NOT stack a third veto
on top of the individual component vetoes. Per PR #86 plan §4.5a.3.

## Production estimate (run #46)

| Threshold | Coverage | Flagged |
|---|---|---|
| F > 2.45 (annotate, existing) | 157/502 (31% covered) | 2 (1.3%) |
| **F > 3.0 (veto, NEW)** | same 157 | **1 (0.6%)** |
| **manipulation_triple_flag** | full universe | **2** |

The veto layer expects after this PR ships:

- 4.5a.1 (merged): Sloan sector-relative, 51 → ~56
- 4.5a.2 (PR #90): + Beneish veto, 11 new flags
- **4.5a.3 (this PR)**: + Dechow veto, **1** new flag (SMCI overlaps
  with Beneish veto on SMCI — Top-5 suppression stacks but the
  effective count of NEW suppressions is 1 unique ticker, since
  SMCI already loses entered_top5 from the Beneish veto)
- Active vetoes: **5 → 7** (Beneish + Dechow added)
- Annotates: + manipulation_triple_flag = **+1 reason taxonomy**

## Architecture

| Layer | Before 4.5a wave | After 4.5a wave |
|---|---|---|
| Active vetoes | 5 | **7** (+ beneish_manipulation_veto, dechow_manipulation_veto) |
| Tier-3 annotates | `beneish_high` + `dechow_high` at looser thresholds | unchanged (kept for the soft band) |
| Joint gates | none | **+ `manipulation_triple_flag`** (3-of-3 joint) |

## Files changed

| File | Change |
|---|---|
| `compute/scoring/dechow_f.py` | + `DECHOW_VETO_THRESHOLD = 3.0` constant + docstring rationale (Dechow 2011 Table 7 4× baseline crossover) |
| `compute/scoring/risk_overlay.py` | + `dechow_f_scores` kwarg on `compute_risk_flags`. New veto check at end of per-ticker loop, immediately after Beneish. Imports `DECHOW_VETO_THRESHOLD`. |
| `compute/main.py` | Pre-compute `dechow_results` dict + `dechow_f_scores` dict alongside Beneish (one combined pass). Pass to compute_risk_flags. Step-8 per-ticker loop reads from cached `dechow_results[ticker]` (no double-compute). + `manipulation_triple_flag` joint-gate logic appended after Dechow annotate emission. |
| `tests/test_scoring/test_risk_overlay.py` | 5 new tests: veto fires above threshold, skipped on None, strict inequality, disabled when dict not supplied, Beneish + Dechow co-firing independence. |

## Backward compat

- `dechow_f_scores` arg optional. Existing callers without it unchanged.
- `dechow_high` annotate at F > 2.45 unchanged.
- `StockDetail.dechow_f_score` numeric field unchanged.
- `manipulation_triple_flag` is in `valuation_warnings` (annotate)
  not `risk_flags` — doesn't change Top-N suppression on top of
  component vetoes. UI must opt in to render it.

## Verification ladder

- ✅ `ruff check .` — clean
- ✅ `pytest tests/ -m "not network"` — **784 passed** (was 779 on 4.5a.2 branch; +5 new)
- ✅ schema_check — N/A (no schema delta — `manipulation_triple_flag` is a string in existing `valuation_warnings: list[str]`)
- ⏳ Production verification deferred — expect:
  - 1 new `dechow_manipulation_veto` (SMCI)
  - 2 `manipulation_triple_flag` annotates (SMCI, WAT)
  - 7 active vetoes total

## Sibling sub-PRs (Phase 4.5a wave — COMPLETE after this PR)

- ✅ **4.5a.1** Sloan sector-relative — merged PR #89
- 🟡 **4.5a.2** Beneish soft-veto — open PR #90
- **4.5a.3 (this PR)** — Dechow soft-veto + manipulation_triple_flag

Next: **4.5b** (restatement_history + late_filing_notification),
**4.5c** (Roychowdhury REM), **4.5d** (m-score momentum + Burgstahler
kink), **4.5e** (Form 4 insider clustering), **4.5f** (composite
manipulation_index + UI + schema bump).

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2
dackclup added a commit that referenced this pull request May 16, 2026
#91)

Phase 4.5 manipulation-defense cluster sub-PR 3 of 6 per PR #86.
Final 4.5a wave member. Branched off PR #90 (4.5a.2 Beneish veto)
to integrate cleanly. Depends on 4.5a.2 merging first.

## Two additions

### 1. Dechow F-score soft-veto (F > 3.0)

Promotes `dechow_high` (F > 2.45 annotate, PR 3e.2) to a second-tier
active veto at the stricter threshold F > 3.0. Mirrors PR 4.5a.2's
Beneish veto pattern exactly.

Threshold rationale: Dechow 2011 *CAR* Table 7 shows that at F > 3.0
the AAER hit rate exceeds 4× baseline (vs ~2× at F > 2.45). The
stricter cutoff matches the precision/recall trade-off PR 4.5a.2
locked for Beneish and PR 3d locked for non_reliance_filing — high
precision, narrower recall, won't dilute Top-5.

### 2. `manipulation_triple_flag` joint-gate badge

Fires when Sloan + Beneish-high + Dechow-high all flag on the same
ticker. Rare but high-confidence — 2 tickers in run #46:

  - **SMCI**: F=6.65 (Dechow veto fires too) + Sloan + Beneish high
  - **WAT**: Sloan + Beneish high + Dechow high (annotates only)

UI-only badge in `valuation_warnings`; does NOT stack a third veto
on top of the individual component vetoes. Per PR #86 plan §4.5a.3.

## Production estimate (run #46)

| Threshold | Coverage | Flagged |
|---|---|---|
| F > 2.45 (annotate, existing) | 157/502 (31% covered) | 2 (1.3%) |
| **F > 3.0 (veto, NEW)** | same 157 | **1 (0.6%)** |
| **manipulation_triple_flag** | full universe | **2** |

The veto layer expects after this PR ships:

- 4.5a.1 (merged): Sloan sector-relative, 51 → ~56
- 4.5a.2 (PR #90): + Beneish veto, 11 new flags
- **4.5a.3 (this PR)**: + Dechow veto, **1** new flag (SMCI overlaps
  with Beneish veto on SMCI — Top-5 suppression stacks but the
  effective count of NEW suppressions is 1 unique ticker, since
  SMCI already loses entered_top5 from the Beneish veto)
- Active vetoes: **5 → 7** (Beneish + Dechow added)
- Annotates: + manipulation_triple_flag = **+1 reason taxonomy**

## Architecture

| Layer | Before 4.5a wave | After 4.5a wave |
|---|---|---|
| Active vetoes | 5 | **7** (+ beneish_manipulation_veto, dechow_manipulation_veto) |
| Tier-3 annotates | `beneish_high` + `dechow_high` at looser thresholds | unchanged (kept for the soft band) |
| Joint gates | none | **+ `manipulation_triple_flag`** (3-of-3 joint) |

## Files changed

| File | Change |
|---|---|
| `compute/scoring/dechow_f.py` | + `DECHOW_VETO_THRESHOLD = 3.0` constant + docstring rationale (Dechow 2011 Table 7 4× baseline crossover) |
| `compute/scoring/risk_overlay.py` | + `dechow_f_scores` kwarg on `compute_risk_flags`. New veto check at end of per-ticker loop, immediately after Beneish. Imports `DECHOW_VETO_THRESHOLD`. |
| `compute/main.py` | Pre-compute `dechow_results` dict + `dechow_f_scores` dict alongside Beneish (one combined pass). Pass to compute_risk_flags. Step-8 per-ticker loop reads from cached `dechow_results[ticker]` (no double-compute). + `manipulation_triple_flag` joint-gate logic appended after Dechow annotate emission. |
| `tests/test_scoring/test_risk_overlay.py` | 5 new tests: veto fires above threshold, skipped on None, strict inequality, disabled when dict not supplied, Beneish + Dechow co-firing independence. |

## Backward compat

- `dechow_f_scores` arg optional. Existing callers without it unchanged.
- `dechow_high` annotate at F > 2.45 unchanged.
- `StockDetail.dechow_f_score` numeric field unchanged.
- `manipulation_triple_flag` is in `valuation_warnings` (annotate)
  not `risk_flags` — doesn't change Top-N suppression on top of
  component vetoes. UI must opt in to render it.

## Verification ladder

- ✅ `ruff check .` — clean
- ✅ `pytest tests/ -m "not network"` — **784 passed** (was 779 on 4.5a.2 branch; +5 new)
- ✅ schema_check — N/A (no schema delta — `manipulation_triple_flag` is a string in existing `valuation_warnings: list[str]`)
- ⏳ Production verification deferred — expect:
  - 1 new `dechow_manipulation_veto` (SMCI)
  - 2 `manipulation_triple_flag` annotates (SMCI, WAT)
  - 7 active vetoes total

## Sibling sub-PRs (Phase 4.5a wave — COMPLETE after this PR)

- ✅ **4.5a.1** Sloan sector-relative — merged PR #89
- 🟡 **4.5a.2** Beneish soft-veto — open PR #90
- **4.5a.3 (this PR)** — Dechow soft-veto + manipulation_triple_flag

Next: **4.5b** (restatement_history + late_filing_notification),
**4.5c** (Roychowdhury REM), **4.5d** (m-score momentum + Burgstahler
kink), **4.5e** (Form 4 insider clustering), **4.5f** (composite
manipulation_index + UI + schema bump).

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 16, 2026
#92)

Phase 4.5a manipulation-defense quick-wins shipped 2026-05-16 across
3 sub-PRs (#89/#90/#91). Production verified on run #47 (commit
`8cdf4886`). This commit bumps the triple-doc lockstep so future
sessions read the actual current state instead of the in-progress
plan.

## What shipped (per-sub-PR)

| Sub-PR | PR | Delivered | Production effect |
|---|---|---|---|
| **4.5a.1** | #89 | Sloan accruals top-decile within sector; `SLOAN_MIN_POPULATION_SECTOR=15` floor; cross-sectional fallback for under-floor sectors. Closes issue #7. | Financials Sloan rate 21.3% → 11.7%. Cross-sector spread 7.7× → 1.4×. Total Sloan flagged 51 → 56. |
| **4.5a.2** | #90 | `beneish_manipulation_veto` active-veto path at M > −1.78 (Beneish 1999 Table 4 PPV crossover). | 11 new vetoed tickers: SMCI · WAT · PODD · WDC · NVDA · CAT · PLTR · SNDK · BG · STX · LLY. |
| **4.5a.3** | #91 | `dechow_manipulation_veto` active-veto path at F > 3.0 (Dechow 2011 Table 7 4× baseline crossover) + `manipulation_triple_flag` joint-gate annotate. | 1 Dechow veto (SMCI F=6.65); 2 triple_flag tickers (SMCI + WAT). |

## End-state defense layer

- **Active vetoes**: 5 → **7** (added `beneish_manipulation_veto`,
  `dechow_manipulation_veto`)
- **Annotate flags**: 4 → **5** (added `manipulation_triple_flag`)
- **Tier-3 forensic**: still 2 (Beneish + Dechow operating at two
  thresholds each — annotate + veto)
- **Reason taxonomy**: 24 → **29 stable identifiers**

No schema delta — new flag IDs are strings in the existing
`risk_flags: list[str]` and `valuation_warnings: list[str]`
arrays. `SCHEMA_VERSION` stays `0.7.1-phase4g`.

## Triple-doc lockstep changes

| File | Change |
|---|---|
| `CLAUDE.md` | §Phase status — "Next deliverable" reframed from "4.5a.1 sector-relative Sloan" to "4.5b disclosure-driven catches". Defense layer count "9 → 18 target" updated to "9 → 11 after 4.5a; target 18 after 4.5f". Issue #7 marked ✅ closed by 4.5a.1. |
| `PHASE_STATUS.md` | Phase Overview table: Phase 4.5 row flipped ⚪ → 🟡 IN PROGRESS with the 4.5a wave landed; duplicate ⚪ row removed. "Phase 4.5 plan" section §4.5a replaced "1-2 weeks, +2 active veto + 1 badge" header with "✅ DONE 2026-05-16" + a results table showing per-sub-PR production effect. Original plan text preserved below the results table for audit. |
| `WORKFLOW.md` | Phase 4.5 §Tasks §4.5a — all 4 checkboxes flipped [ ] → [x] with per-sub-PR PR-number citations (PR #89 / #90 / #91), LOC counts, test deltas, production verification numbers. |

## Audit trail

This is the 4th doc-correction PR in the post-v1.0 cleanup pattern,
this time legitimate (the work actually shipped). Earlier ones in
this session were correcting drift between intent and state:

- PR #81 — 4g status correct (factual)
- PR #86 — added Phase 4.5 roadmap (planning)
- PR #87 — corrected "PR 4b next" → "§3 polish next" (was wrong)
- PR #88 — corrected "§3 polish next" → "deferred to Phase 5" (was wrong)
- **This PR** — 4.5a wave ✅ DONE (factual, not drift-correction)

## Verification

- No code changes; docs only
- `grep "4.5a.1" CLAUDE.md PHASE_STATUS.md WORKFLOW.md` returns
  only DONE/closed references in active sections
- `grep "Next deliverable.*4.5a"` returns 0 hits (all moved to 4.5b)
- `grep "9 → 11"` appears in CLAUDE.md (new defense layer count)

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 16, 2026
…tatement_history + late_filing_notification) (#93)

Phase 4.5 manipulation-defense cluster sub-PR 4 of 6 per PR #86.
Adds two ANNOTATE-only flags surfaced from the SEC EDGAR filing list.

## What's new

### `restatement_history` annotate (5y lookback)

- Module: `compute/scoring/restatement_filings.py` (~390 LOC + ~290
  test LOC).
- Counts 10-K/A + 10-Q/A filings per CIK in the trailing 5 years.
- Paper: Hennes-Leone-Miller 2008 *TAR* — restating firms see -9%
  abnormal return on announcement; recurrent restaters compound.
- Lookback: `config.RESTATEMENT_HISTORY_LOOKBACK_DAYS = 1825` (5×365).
- ANNOTATE-only — base rate sector-agnostic, no veto without sector
  adjustment (which 4.5b doesn't include).

### `late_filing_notification` annotate (1y lookback)

- Same module.
- Detects SEC Form 12b-25 (NT 10-K / NT 10-Q) in the trailing 365d.
- Paper: Bartov-Lai-Yeung 2002 *JAR* — late filers see -5-7%
  abnormal returns.
- Lookback: `config.LATE_FILING_LOOKBACK_DAYS = 365`.

## Architecture (mirrors `compute.scoring.eight_k_events`)

- Per-CIK JSON cache (7d TTL) at `compute/cache/edgar_amendments/`
  and `compute/cache/edgar_late_filings/`.
- Cache shape: `{fetched_at, lookback_days, filings: [{accession,
  form, filing_date, filing_url}]}`.
- Fetch path: `edgar.Company(ticker).get_filings(form=...)` with
  per-form retry; merges results across multiple forms client-side
  + sorts desc by filing_date so `filings[0]` is the latest.
- Public entry points: `check_restatement_history(ticker, ...,
  filings_override=...)` and `check_late_filing(ticker, ...,
  filings_override=...)`. The override path is the test inject —
  bypasses EDGAR, keeps unit tests offline.

## Files touched

| File | Change |
|---|---|
| `compute/scoring/restatement_filings.py` | NEW. Cache + fetch + check_restatement_history + check_late_filing. |
| `compute/config.py` | + `RESTATEMENT_HISTORY_LOOKBACK_DAYS=1825` + `LATE_FILING_LOOKBACK_DAYS=365` + `EDGAR_AMENDMENTS_CACHE_DIR` + `EDGAR_LATE_FILINGS_CACHE_DIR`. |
| `compute/main.py` | + import `check_late_filing` + `check_restatement_history`. Per-ticker Step-8 loop appends `restatement_history` / `late_filing_notification` to `valuation_warnings` when the check fires. Slots immediately after the existing PR 4b §1 `cross_source_disagreement` block. |
| `.github/workflows/compute-rankings.yml` | + 2 new cache paths (`edgar_amendments`, `edgar_late_filings`) so weekly runs preserve the per-ticker JSON files. |
| `tests/test_scoring/test_restatement_filings.py` | NEW. 17 tests covering `_filing_date_within` boundaries, both check_* entry points (no filings, within window, outside window, multiple within window, lookback constants, fetch-failure graceful path). All offline via `filings_override`. |
| `tests/test_workflow_cache_coverage.py` | + 2 new parametrized cache-path assertions for the new directories (catches future workflow YAML drift). |

## Defense layer end-state (after this PR ships)

- Active vetoes: 7 (unchanged — 4.5b is annotate-only)
- Annotate flags: 5 → **7** (added `restatement_history`,
  `late_filing_notification`)
- Reason taxonomy: 29 → **31** stable identifiers

No schema delta — new flag IDs are strings in existing
`valuation_warnings: list[str]`.

## Backward compat

- `filings_override` arg is opt-in (None default → fetches via
  EDGAR). Existing callers without it unchanged.
- `EDGAR_USER_AGENT` env var precondition matches the rest of the
  Tier-2 layer — fetcher returns None when unset (cleanly skipping
  the flag rather than crashing).
- Caches gitignored under `compute/cache/`.

## Verification ladder

- ✅ `ruff check .` — clean
- ✅ `pytest tests/ -m "not network"` — **803 passed** (was 784;
  +17 new restatement tests + 2 new workflow-cache parametrize entries)
- ⏳ Production verification deferred to next workflow_dispatch.
  Expected fire rates on S&P 500 (rough — needs production run to
  confirm):
  - `restatement_history` — 30-80 tickers (~6-16%) based on
    historical 10-K/A base rates 2020-2025
  - `late_filing_notification` — 5-20 tickers (~1-4%) based on
    SEC Form 12b-25 filing data

## Sibling sub-PRs (Phase 4.5 cluster)

- ✅ **4.5a wave** complete (PRs #89/#90/#91 + #92 docs)
- **4.5b (this PR)** — disclosure-driven catches
- ⬜ **4.5c** — Roychowdhury REM 3-proxy
- ⬜ **4.5d** — M-score 3y momentum + Burgstahler-Dichev kink
- ⬜ **4.5e** — Form 4 insider clustering
- ⬜ **4.5f** — `manipulation_index` composite + UI + schema bump → v1.2.0

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 17, 2026
#94)

Phase 4.5b disclosure-driven catches shipped via PR #93. Production
verified on run #48 (commit `849b7ca8`, workflow 2h08m due to
cold-cache populating both new `edgar_amendments` +
`edgar_late_filings` dirs; warm runs return to ~1h30m).

## What shipped

| Flag | Lookback | Production fire | Notes |
|---|---|---|---|
| `restatement_history` | 5y 10-K/A + 10-Q/A | **60 / 502 (12.0%)** | within expected 6-16% — AMD, DIS, CVX, BSX, EBAY etc. (mostly mature firms with periodic amendments) |
| `late_filing_notification` | 365d Form 12b-25 | **2 / 502 (0.4%)** | HAS + Q — slightly under expected 1-4% (S&P 500 firms tend to be more compliant than broader Bartov-Lai-Yeung sample) |

## End-state defense layer

- Active vetoes: **7** (unchanged — 4.5b is annotate-only)
- Annotate flags: 5 → **7** (+ `restatement_history`,
  `late_filing_notification`)
- Reason taxonomy: 29 → **31**
- **Defense layer 9 → 13 layers after 4.5a + 4.5b**

No schema delta — both new flags are strings in existing
`valuation_warnings: list[str]`. `SCHEMA_VERSION` stays
`0.7.1-phase4g`.

## Triple-doc lockstep changes

| File | Change |
|---|---|
| `CLAUDE.md` | "Next deliverable" reframed from 4.5b to 4.5c (Roychowdhury REM). Defense layer count "9 → 11 after 4.5a" → "9 → 13 after 4.5a + 4.5b". 4.5b results summary appended. |
| `PHASE_STATUS.md` | Phase 4.5 table row updated with 4.5b results + tickers + 60/2 counts. §4.5b header flipped to ✅ DONE 2026-05-16 with results table + workflow time note (cold-cache 2h08m). Original plan text preserved below for audit. |
| `WORKFLOW.md` | §Tasks §4.5b — all 4 checkboxes [ ] → [x] with PR-number + LOC + test-count + production-verification citations. SEC Filing Roadmap table: 4 new rows for 10-K/A, 10-Q/A, NT 10-K, NT 10-Q (all ✅ active with PR #93 / 2026-05-16 production-fire-rate footnotes). Form 4 status flipped from "❌ not used" to "⬜ planned (Phase 4.5e)" to reflect the upcoming sub-PR. |

## Next deliverable

**Phase 4.5c — Real Earnings Management (Roychowdhury 2006 REM)**:
- 3 abnormal proxies per ticker:
  - `abnormal_CFO` = actual − model(Sales, ΔSales)
  - `abnormal_production` = actual − model(Sales, ΔSales, ΔSales_t−1)
  - `abnormal_discretionary_expenses` = actual − model(Sales_t−1)
- Flag `rem_suspect` fires when 2 of 3 proxies sit in worst decile
  within sector
- ~250 LOC + golden tests against Roychowdhury 2006 paper Table 6
- Catches REAL manipulation (cutting R&D, channel stuffing,
  deferring maintenance) — invisible to Sloan/Beneish/Dechow which
  target accrual manipulation

## Audit trail (post-v1.0 doc PRs)

| PR | Purpose |
|---|---|
| #81 | 4g ✅ DONE |
| #86 | Phase 4.5 roadmap added |
| #87 | "PR 4b next" → "§3 polish next" (was wrong) |
| #88 | "§3 polish next" → "Phase-5 blocked" (was wrong) |
| #92 | 4.5a wave ✅ DONE |
| **this PR** | 4.5b wave ✅ DONE |

## Verification

- No code changes; docs only
- `grep "Next deliverable.*4.5b"` returns 0 hits (all moved to 4.5c)
- `grep "9 → 13"` appears in CLAUDE.md (new defense layer count)
- `grep "10-K/A.*✅ active"` returns the new WORKFLOW.md filing-roadmap row

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 17, 2026
…em_suspect annotate) (#95)

Phase 4.5 manipulation-defense cluster sub-PR 5 of 6 per PR #86.
Catches REAL manipulation (cutting R&D, channel stuffing, deferring
maintenance, overproduction) — invisible to the existing accrual-
targeting defenses (Sloan / Beneish / Dechow).

## Model — Roychowdhury 2006 *JAE* 3-proxy

Three abnormal residuals from per-sector OLS regressions:

1. **Abnormal CFO** — residual of
   `CFO_t / A_{t-1}` on `[1, 1/A_{t-1}, Sales_t/A_{t-1}, ΔSales_t/A_{t-1}]`.
   **Low (negative)** = suspicious → firm front-loaded sales via
   channel stuffing / loose credit / discounts to inflate CFO.
2. **Abnormal Production** — residual of
   `(COGS_t + ΔInventory_t) / A_{t-1}` on `[1, 1/A_{t-1}, Sales_t,
   ΔSales_t, ΔSales_{t-1}]` (all over A_{t-1}).
   **High** = suspicious → overproduction spreads fixed costs over
   more units, deflating per-unit COGS and inflating gross margin.
3. **Abnormal Discretionary Expenses** — residual of
   `(R&D_t + SGA_t) / A_{t-1}` on `[1, 1/A_{t-1}, Sales_{t-1}/A_{t-1}]`.
   **Low (negative)** = suspicious → firm cut discretionary spending
   to boost current earnings. (Advertising omitted — SEC XBRL rarely
   tags it separately; per Roychowdhury 2006 footnote 7 the SGA-only
   adaptation is acceptable since advertising is usually subsumed
   in SGA.)

Flag `rem_suspect` fires when **≥ 2 of 3** residuals sit in their
respective worst decile within the ticker's GICS sector. Mirrors the
4.5a.3 `manipulation_triple_flag` pattern but uses *real* (not
accrual) signals.

## Architecture

| File | Change |
|---|---|
| `compute/scoring/rem.py` | **NEW** — `REMProxies` + `REMResult` dataclasses; `compute_proxies` (per-ticker input vector from snap + history); `_fit_sector_models` (per-sector OLS via `numpy.linalg.lstsq`); `compute_rem_flags` (two-pass: proxies → sector models → residuals → within-sector decile rank → fire). ~420 LOC. |
| `compute/main.py` | Pre-compute `rem_results` once via `compute_rem_flags(snapshots, histories=histories, sectors=sectors_dict)` right after `compute_risk_flags`. Per-ticker Step-8 loop appends `rem_suspect` to `valuation_warnings` when `rem_result.fired`. |
| `tests/test_scoring/test_rem.py` | **NEW** — 14 tests in three layers: (1) proxy construction (5 tests covering well-formed, missing snap, missing assets denominator, R&D fallback to SGA-only, inventory-missing PROD skip), (2) end-to-end `compute_rem_flags` (8 tests: empty, below floor, at floor, double-outlier fires, single-outlier triggers cfo axis, triple-outlier 3-trigger, normal-ticker H0 FP rate, constants), (3) **golden numerical test** verifying OLS recovers known-DGP coefficients. |

## No new dependencies

- `numpy.linalg.lstsq` for OLS (already in dep tree)
- No `sklearn`, no `statsmodels` — pure-numpy reimplementation keeps
  install surface tight (mirrors PR 4b §2 PBO/DSR decision)

## Defense-layer end-state (after this PR ships)

- Active vetoes: 7 (unchanged — 4.5c is annotate-only)
- Annotate flags: 7 → **8** (+ `rem_suspect`)
- Reason taxonomy: 31 → **32** stable identifiers

No schema delta — `rem_suspect` is a string in existing
`valuation_warnings: list[str]`. `SCHEMA_VERSION` stays `0.7.1-phase4g`.

## Backward compat

- `compute_rem_flags(snapshots, histories=None, sectors=None)` —
  both kwargs optional. When sectors absent, no sector model can
  fit (every ticker's sector lookup returns None); all results have
  `fired=False`.
- Sectors below `REM_MIN_POPULATION_SECTOR = 15` (matches 4.5a.1
  Sloan sector-relative floor) skip REM cleanly — those tickers
  get `REMResult(None, None, None, fired=False)`. No active-veto
  fallback (REM is annotate-only).
- DISEXP falls back to SGA-only when R&D is missing (financials /
  REITs / utilities) per Roychowdhury 2006 footnote 7.

## Verification ladder

- ✅ `ruff check .` — clean
- ✅ `pytest tests/test_scoring/test_rem.py` — **14 passed**
- ✅ `pytest tests/ -m "not network"` — **817 passed** (was 803;
  +14 new REM tests)
- ✅ schema_check — N/A (no schema delta)
- ⏳ Production verification deferred to next workflow_dispatch.
  Expected fire rate: 5-7% (~25-35 of 502 S&P 500 tickers)
  assuming moderate axis correlation. H0 (independent axes) FP
  rate is 2.8% per the 2-of-3 joint-probability calc.

## Sibling sub-PRs (Phase 4.5 cluster)

- ✅ **4.5a wave** complete (PRs #89 / #90 / #91 + #92 docs)
- ✅ **4.5b** complete (PR #93 + #94 docs)
- **4.5c (this PR)** — Roychowdhury REM
- ⬜ **4.5d** — M-score 3y momentum + Burgstahler-Dichev kink
- ⬜ **4.5e** — Form 4 insider clustering
- ⬜ **4.5f** — `manipulation_index` composite + UI + schema bump → v1.2.0

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 17, 2026
#96)

Phase 4.5c Roychowdhury REM shipped via PR #95. Production verified
on run #49 (commit `65097703`, warm-cache 6m25s — all 9 cache layers
populated).

## What shipped

`rem_suspect` annotate via per-sector OLS regressions on 3 abnormal
proxies (CFO, Production, Discretionary Expenses). Module
`compute/scoring/rem.py` (~420 LOC, pure-numpy via
`np.linalg.lstsq`, no sklearn/statsmodels dep). 14 offline tests
including golden numerical test recovering known-DGP coefficients.

## Production verification

| Metric | Value |
|---|---|
| Fire rate | **16 / 502 (3.2%)** — within H0-to-correlation expected 2.8-7% |
| Tickers fired | SMCI · WAT · ADM · TSN · HRL · STLD · FSLR · JBL · COHR · LII · LDOS · POOL · OMC · WY · TECH · RVTY |
| Orthogonality check | NVDA / PLTR (Beneish-veto fired) **NOT** in REM list — confirms 4.5c captures real-manipulation signal orthogonal to accrual targets |
| Real-world coverage | ADM (2024 SEC investigation) · SMCI (2024 investigation) · TSN / HRL (periodic scrutiny) · FSLR (solar channel-stuffing history) |

## End-state defense layer

- Active vetoes: **7** (unchanged — 4.5c is annotate-only)
- Annotate flags: 7 → **8** (+ `rem_suspect`)
- Reason taxonomy: 31 → **32**
- **Defense layer 9 → 14 after 4.5a + 4.5b + 4.5c**

No schema delta — `rem_suspect` is a string in existing
`valuation_warnings: list[str]`. `SCHEMA_VERSION` stays
`0.7.1-phase4g`.

## Triple-doc lockstep changes

| File | Change |
|---|---|
| `CLAUDE.md` | "Next deliverable" 4.5c → 4.5d. Defense layer "9 → 13 after 4.5a+4.5b" → "9 → 14 after 4.5a+4.5b+4.5c". 4.5c results + ticker list + orthogonality note inserted between 4.5b and the post-completion roadmap. |
| `PHASE_STATUS.md` | Phase 4.5 row updated with 4.5c production stats. §4.5c header flipped to ✅ DONE 2026-05-17 with results table + orthogonality note. Original plan text preserved below for audit. |
| `WORKFLOW.md` | §4.5c checkboxes [ ] → [x] with PR-number / LOC / test-count / production-verification citations + golden-test reference. |

## Next deliverable

**Phase 4.5d — earnings-quality time-series + Burgstahler-Dichev
kink at zero** (~180 LOC, ~7 days):
- `m_score_deteriorating` annotate — Δ(Beneish M-score) > +0.5
  over trailing 3y (manipulation gathering steam)
- `loss_avoidance_pattern` annotate — NI ∈ [0, $5M] OR EPS ∈
  [0, $0.05] for 3+ consecutive years (Burgstahler-Dichev 1997 kink)

## Audit trail (post-v1.0 doc PRs)

| PR | Purpose |
|---|---|
| #81 | 4g ✅ DONE |
| #86 | Phase 4.5 roadmap added |
| #87 | "PR 4b next" → "§3 polish next" (was wrong) |
| #88 | "§3 polish next" → "Phase-5 blocked" (was wrong) |
| #92 | 4.5a wave ✅ DONE |
| #94 | 4.5b wave ✅ DONE |
| **this PR** | 4.5c wave ✅ DONE |

## Verification

- No code changes; docs only
- `grep "Next deliverable.*4.5c"` returns 0 hits (all moved to 4.5d)
- `grep "9 → 14"` appears in CLAUDE.md (new defense layer count)
- `grep "rem_suspect"` appears in PHASE_STATUS.md + WORKFLOW.md
  active-flags references

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 17, 2026
…als_momentum_high + loss_avoidance_pattern) (#97)

Phase 4.5 manipulation-defense cluster sub-PR 6 of 6 (the last
purely-defense sub before 4.5f composite + UI bundling). Two
annotate-only flags derived from the per-ticker fundamentals
history (annual XBRL).

## What's new

### `accruals_momentum_high` — Δ(TATA) over 3y > +0.05

- TATA = (NetIncome − OperatingCashFlow) / TotalAssets, the Sloan
  1996 / Beneish 1999 accruals backbone.
- Threshold +0.05 ≈ Beneish 1999 ΔM > +0.5 via the β_TATA = 4.679
  coefficient (ΔM ≈ 4.679 × ΔTATA → ΔM > 0.5 ⇔ ΔTATA > 0.107). We
  use 0.05 — more sensitive since TATA alone captures less than
  the full 8-ratio signal; standard practitioner adaptation when
  shortening to one ratio.
- Catches manipulation **gathering steam** — the snapshot-only
  Sloan + Beneish flags miss the trajectory entirely.

**Practical note on naming**: PR #86 plan §4.5d called this
`m_score_deteriorating` (full Δ(Beneish M-score) > +0.5). We
chose TATA momentum as a practical equivalent: building 3
historical 8-ratio Beneish snapshots from XBRL history would
require expanding the annual-history coverage of 6+ supplementary
ratios (DSRI / GMI / AQI / etc.) that often have gaps for prior
years. TATA is the single Beneish component that's a level rather
than a ratio-of-ratios, and Sloan 1996 established it as the
standalone accruals signal — so this is a clean shortening, not a
weakening.

### `loss_avoidance_pattern` — Burgstahler-Dichev 1997 kink at zero

- Fires when **3+ consecutive fiscal years** of tiny-positive
  earnings: NI ∈ [\$0, \$5M] **OR** EPS ∈ [\$0.00, \$0.05].
- Per-share band catches the high-share-count case where NI alone
  is above the absolute floor but per-share is still tiny.
- Empirical kink-at-zero signature of managers shading reported
  earnings just enough to clear the loss / loss-threshold.

## Architecture

| File | Change |
|---|---|
| `compute/scoring/earnings_quality.py` | **NEW** ~250 LOC — `check_accruals_momentum` + `check_loss_avoidance` + history-walk helpers (`_annual_values`, `_value_at_year`). Pure pandas; no new deps. |
| `compute/main.py` | + 2 import lines + 2 per-ticker annotate appends in the Step-8 loop, slotting after `rem_suspect`. |
| `tests/test_scoring/test_earnings_quality.py` | **NEW** ~225 LOC — 14 offline tests covering both flags (fires / doesn't fire / improves / threshold pins / EPS-band fallback / negative-NI rejection / large-NI rejection / multi-year streak / streak break / constants sanity). |

## Defense-layer end-state (after this PR ships)

- Active vetoes: **7** (unchanged — 4.5d is annotate-only)
- Annotate flags: 8 → **10** (+ `accruals_momentum_high`,
  `loss_avoidance_pattern`)
- Reason taxonomy: 32 → **34** stable identifiers
- Total defense layers: **9 → 16** after 4.5a + 4.5b + 4.5c + 4.5d

No schema delta — both flags are strings in existing
`valuation_warnings: list[str]`. `SCHEMA_VERSION` stays
`0.7.1-phase4g`.

## Backward compat

- Both check functions take `(snap, history)` — no caller
  changes elsewhere. Missing inputs (snap=None, no history,
  insufficient years) cleanly return fired=False.
- No new EDGAR fetches — both flags read from existing
  fundamentals + fundamentals_history caches.

## Verification ladder

- ✅ `ruff check .` — clean
- ✅ `pytest tests/test_scoring/test_earnings_quality.py` —
  **14 passed**
- ✅ `pytest tests/ -m "not network"` — **831 passed** (was 817;
  +14 new)
- ✅ schema_check — N/A (no schema delta)
- ⏳ Production verification deferred. Expected fire rates on S&P 500:
  - `accruals_momentum_high` ~3-8% (~15-40 tickers) — H0 from
    Δ(TATA) > 0.05 base rate
  - `loss_avoidance_pattern` ~1-3% (~5-15 tickers) — S&P 500 firms
    rarely report tiny-positive earnings for 3+ years (mega-cap
    distribution); base rate higher on small-caps per Burgstahler-
    Dichev 1997 original sample

## Sibling sub-PRs (Phase 4.5 cluster)

- ✅ **4.5a wave** (PRs #89 / #90 / #91 + #92 docs)
- ✅ **4.5b** (PR #93 + #94 docs)
- ✅ **4.5c** (PR #95 + #96 docs)
- **4.5d (this PR)** — earnings-quality time-series
- ⬜ **4.5e** — Form 4 insider clustering (~420 LOC, ~12 days —
  needs new SEC Form 4 parser)
- ⬜ **4.5f** — `manipulation_index` composite + composite-score
  penalty + UI pillar card + README Honest Limitations + schema
  bump → **v1.2.0-phase4.5**

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants