Skip to content

feat(scoring): Dechow soft-veto + manipulation_triple_flag (PR 4.5a.3)#91

Merged
dackclup merged 1 commit into
mainfrom
feat/4.5a.3-dechow-veto-and-triple-flag
May 16, 2026
Merged

feat(scoring): Dechow soft-veto + manipulation_triple_flag (PR 4.5a.3)#91
dackclup merged 1 commit into
mainfrom
feat/4.5a.3-dechow-veto-and-triple-flag

Conversation

@dackclup
Copy link
Copy Markdown
Owner

@dackclup dackclup commented May 16, 2026

Summary

Phase 4.5 manipulation-defense cluster sub-PR 3 of 6 per PR #86. Branched off PR #90 (4.5a.2) to integrate cleanly — base = feat/4.5a.2-beneish-soft-veto. Will rebase onto main after PR #90 merges.

Two additions:

1. Dechow F-score soft-veto (F > 3.0)

Promotes dechow_high (F > 2.45 annotate, PR 3e.2) to a second-tier active veto at F > 3.0. Mirrors PR 4.5a.2's Beneish pattern.

Threshold rationale: Dechow 2011 CAR Table 7 — AAER hit rate exceeds 4× baseline at F > 3.0 (vs ~2× at F > 2.45). Same precision/recall trade-off as Beneish veto.

2. manipulation_triple_flag joint-gate badge

Fires when Sloan + Beneish-high + Dechow-high all flag on the same ticker. Rare, high-confidence — 2 tickers in run #46:

Ticker Sloan Beneish-high Dechow-high Beneish veto Dechow veto
SMCI ✓ (M=-0.83) ✓ (F=6.65)
WAT ✓ (M=-0.91)

UI-only badge in valuation_warnings; doesn't stack a third veto on top of the component vetoes.

Production estimate (run #46)

Threshold Coverage Flagged
F > 2.45 (annotate, existing) 157/502 (31% covered) 2 (1.3%)
F > 3.0 (veto, NEW) same 157 1 (0.6%) — SMCI
manipulation_triple_flag (joint) full universe 2 — SMCI + WAT

4.5a wave end-state

After this PR + #89 + #90 merge:

Layer Was Now
Active vetoes 5 7 (+ beneish_manipulation_veto, dechow_manipulation_veto)
Tier-3 annotates beneish_high + dechow_high (looser thresholds) unchanged
Joint gates none + manipulation_triple_flag (3-of-3)

Files changed (atop 4.5a.2)

File Change
compute/scoring/dechow_f.py + DECHOW_VETO_THRESHOLD = 3.0
compute/scoring/risk_overlay.py + dechow_f_scores kwarg + veto check
compute/main.py Pre-compute Dechow alongside Beneish in one pass + manipulation_triple_flag joint-gate emission after Dechow annotate
tests/test_scoring/test_risk_overlay.py 5 new tests

Backward compat

  • dechow_f_scores arg optional. Existing callers without it unchanged.
  • dechow_high annotate at F > 2.45 unchanged.
  • StockDetail.dechow_f_score numeric field unchanged.
  • manipulation_triple_flag in valuation_warnings (annotate); UI opts in to render.

Verification

  • ruff check . — clean
  • pytest tests/ -m "not network"784 passed (was 779 on 4.5a.2; +5 new)
  • schema_check — N/A (no schema delta — new flag is a string in existing valuation_warnings: list[str])
  • Production verification — expect:
    • 1 new dechow_manipulation_veto (SMCI)
    • 2 manipulation_triple_flag annotates (SMCI, WAT)
    • 7 active vetoes total in defense-scorecard output

Merge order

  1. PR feat(scoring): Beneish soft-veto promotion (PR 4.5a.2, M > -1.78 → entered_top5 suppression) #90 (4.5a.2 Beneish) — must merge first
  2. This PR — rebase onto main, then merge
  3. (Next sub-PR 4.5b — restatement + late filing)

Sibling sub-PRs (Phase 4.5a wave — COMPLETE after this PR)

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2


Generated by Claude Code

@vercel
Copy link
Copy Markdown

vercel Bot commented May 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
quantrank Ready Ready Preview, Comment May 16, 2026 3:55pm

Base automatically changed from feat/4.5a.2-beneish-soft-veto to main May 16, 2026 15:53
Phase 4.5 manipulation-defense cluster sub-PR 3 of 6 per PR #86.
Final 4.5a wave member. Branched off PR #90 (4.5a.2 Beneish veto)
to integrate cleanly. Depends on 4.5a.2 merging first.

## Two additions

### 1. Dechow F-score soft-veto (F > 3.0)

Promotes `dechow_high` (F > 2.45 annotate, PR 3e.2) to a second-tier
active veto at the stricter threshold F > 3.0. Mirrors PR 4.5a.2's
Beneish veto pattern exactly.

Threshold rationale: Dechow 2011 *CAR* Table 7 shows that at F > 3.0
the AAER hit rate exceeds 4× baseline (vs ~2× at F > 2.45). The
stricter cutoff matches the precision/recall trade-off PR 4.5a.2
locked for Beneish and PR 3d locked for non_reliance_filing — high
precision, narrower recall, won't dilute Top-5.

### 2. `manipulation_triple_flag` joint-gate badge

Fires when Sloan + Beneish-high + Dechow-high all flag on the same
ticker. Rare but high-confidence — 2 tickers in run #46:

  - **SMCI**: F=6.65 (Dechow veto fires too) + Sloan + Beneish high
  - **WAT**: Sloan + Beneish high + Dechow high (annotates only)

UI-only badge in `valuation_warnings`; does NOT stack a third veto
on top of the individual component vetoes. Per PR #86 plan §4.5a.3.

## Production estimate (run #46)

| Threshold | Coverage | Flagged |
|---|---|---|
| F > 2.45 (annotate, existing) | 157/502 (31% covered) | 2 (1.3%) |
| **F > 3.0 (veto, NEW)** | same 157 | **1 (0.6%)** |
| **manipulation_triple_flag** | full universe | **2** |

The veto layer expects after this PR ships:

- 4.5a.1 (merged): Sloan sector-relative, 51 → ~56
- 4.5a.2 (PR #90): + Beneish veto, 11 new flags
- **4.5a.3 (this PR)**: + Dechow veto, **1** new flag (SMCI overlaps
  with Beneish veto on SMCI — Top-5 suppression stacks but the
  effective count of NEW suppressions is 1 unique ticker, since
  SMCI already loses entered_top5 from the Beneish veto)
- Active vetoes: **5 → 7** (Beneish + Dechow added)
- Annotates: + manipulation_triple_flag = **+1 reason taxonomy**

## Architecture

| Layer | Before 4.5a wave | After 4.5a wave |
|---|---|---|
| Active vetoes | 5 | **7** (+ beneish_manipulation_veto, dechow_manipulation_veto) |
| Tier-3 annotates | `beneish_high` + `dechow_high` at looser thresholds | unchanged (kept for the soft band) |
| Joint gates | none | **+ `manipulation_triple_flag`** (3-of-3 joint) |

## Files changed

| File | Change |
|---|---|
| `compute/scoring/dechow_f.py` | + `DECHOW_VETO_THRESHOLD = 3.0` constant + docstring rationale (Dechow 2011 Table 7 4× baseline crossover) |
| `compute/scoring/risk_overlay.py` | + `dechow_f_scores` kwarg on `compute_risk_flags`. New veto check at end of per-ticker loop, immediately after Beneish. Imports `DECHOW_VETO_THRESHOLD`. |
| `compute/main.py` | Pre-compute `dechow_results` dict + `dechow_f_scores` dict alongside Beneish (one combined pass). Pass to compute_risk_flags. Step-8 per-ticker loop reads from cached `dechow_results[ticker]` (no double-compute). + `manipulation_triple_flag` joint-gate logic appended after Dechow annotate emission. |
| `tests/test_scoring/test_risk_overlay.py` | 5 new tests: veto fires above threshold, skipped on None, strict inequality, disabled when dict not supplied, Beneish + Dechow co-firing independence. |

## Backward compat

- `dechow_f_scores` arg optional. Existing callers without it unchanged.
- `dechow_high` annotate at F > 2.45 unchanged.
- `StockDetail.dechow_f_score` numeric field unchanged.
- `manipulation_triple_flag` is in `valuation_warnings` (annotate)
  not `risk_flags` — doesn't change Top-N suppression on top of
  component vetoes. UI must opt in to render it.

## Verification ladder

- ✅ `ruff check .` — clean
- ✅ `pytest tests/ -m "not network"` — **784 passed** (was 779 on 4.5a.2 branch; +5 new)
- ✅ schema_check — N/A (no schema delta — `manipulation_triple_flag` is a string in existing `valuation_warnings: list[str]`)
- ⏳ Production verification deferred — expect:
  - 1 new `dechow_manipulation_veto` (SMCI)
  - 2 `manipulation_triple_flag` annotates (SMCI, WAT)
  - 7 active vetoes total

## Sibling sub-PRs (Phase 4.5a wave — COMPLETE after this PR)

- ✅ **4.5a.1** Sloan sector-relative — merged PR #89
- 🟡 **4.5a.2** Beneish soft-veto — open PR #90
- **4.5a.3 (this PR)** — Dechow soft-veto + manipulation_triple_flag

Next: **4.5b** (restatement_history + late_filing_notification),
**4.5c** (Roychowdhury REM), **4.5d** (m-score momentum + Burgstahler
kink), **4.5e** (Form 4 insider clustering), **4.5f** (composite
manipulation_index + UI + schema bump).

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2
@dackclup dackclup force-pushed the feat/4.5a.3-dechow-veto-and-triple-flag branch from 84121cc to c937cdc Compare May 16, 2026 15:53
@dackclup dackclup marked this pull request as ready for review May 16, 2026 15:55
@dackclup dackclup merged commit 8cdf488 into main May 16, 2026
4 checks passed
@dackclup dackclup deleted the feat/4.5a.3-dechow-veto-and-triple-flag branch May 16, 2026 15:55
dackclup added a commit that referenced this pull request May 16, 2026
#92)

Phase 4.5a manipulation-defense quick-wins shipped 2026-05-16 across
3 sub-PRs (#89/#90/#91). Production verified on run #47 (commit
`8cdf4886`). This commit bumps the triple-doc lockstep so future
sessions read the actual current state instead of the in-progress
plan.

## What shipped (per-sub-PR)

| Sub-PR | PR | Delivered | Production effect |
|---|---|---|---|
| **4.5a.1** | #89 | Sloan accruals top-decile within sector; `SLOAN_MIN_POPULATION_SECTOR=15` floor; cross-sectional fallback for under-floor sectors. Closes issue #7. | Financials Sloan rate 21.3% → 11.7%. Cross-sector spread 7.7× → 1.4×. Total Sloan flagged 51 → 56. |
| **4.5a.2** | #90 | `beneish_manipulation_veto` active-veto path at M > −1.78 (Beneish 1999 Table 4 PPV crossover). | 11 new vetoed tickers: SMCI · WAT · PODD · WDC · NVDA · CAT · PLTR · SNDK · BG · STX · LLY. |
| **4.5a.3** | #91 | `dechow_manipulation_veto` active-veto path at F > 3.0 (Dechow 2011 Table 7 4× baseline crossover) + `manipulation_triple_flag` joint-gate annotate. | 1 Dechow veto (SMCI F=6.65); 2 triple_flag tickers (SMCI + WAT). |

## End-state defense layer

- **Active vetoes**: 5 → **7** (added `beneish_manipulation_veto`,
  `dechow_manipulation_veto`)
- **Annotate flags**: 4 → **5** (added `manipulation_triple_flag`)
- **Tier-3 forensic**: still 2 (Beneish + Dechow operating at two
  thresholds each — annotate + veto)
- **Reason taxonomy**: 24 → **29 stable identifiers**

No schema delta — new flag IDs are strings in the existing
`risk_flags: list[str]` and `valuation_warnings: list[str]`
arrays. `SCHEMA_VERSION` stays `0.7.1-phase4g`.

## Triple-doc lockstep changes

| File | Change |
|---|---|
| `CLAUDE.md` | §Phase status — "Next deliverable" reframed from "4.5a.1 sector-relative Sloan" to "4.5b disclosure-driven catches". Defense layer count "9 → 18 target" updated to "9 → 11 after 4.5a; target 18 after 4.5f". Issue #7 marked ✅ closed by 4.5a.1. |
| `PHASE_STATUS.md` | Phase Overview table: Phase 4.5 row flipped ⚪ → 🟡 IN PROGRESS with the 4.5a wave landed; duplicate ⚪ row removed. "Phase 4.5 plan" section §4.5a replaced "1-2 weeks, +2 active veto + 1 badge" header with "✅ DONE 2026-05-16" + a results table showing per-sub-PR production effect. Original plan text preserved below the results table for audit. |
| `WORKFLOW.md` | Phase 4.5 §Tasks §4.5a — all 4 checkboxes flipped [ ] → [x] with per-sub-PR PR-number citations (PR #89 / #90 / #91), LOC counts, test deltas, production verification numbers. |

## Audit trail

This is the 4th doc-correction PR in the post-v1.0 cleanup pattern,
this time legitimate (the work actually shipped). Earlier ones in
this session were correcting drift between intent and state:

- PR #81 — 4g status correct (factual)
- PR #86 — added Phase 4.5 roadmap (planning)
- PR #87 — corrected "PR 4b next" → "§3 polish next" (was wrong)
- PR #88 — corrected "§3 polish next" → "deferred to Phase 5" (was wrong)
- **This PR** — 4.5a wave ✅ DONE (factual, not drift-correction)

## Verification

- No code changes; docs only
- `grep "4.5a.1" CLAUDE.md PHASE_STATUS.md WORKFLOW.md` returns
  only DONE/closed references in active sections
- `grep "Next deliverable.*4.5a"` returns 0 hits (all moved to 4.5b)
- `grep "9 → 11"` appears in CLAUDE.md (new defense layer count)

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 16, 2026
…tatement_history + late_filing_notification) (#93)

Phase 4.5 manipulation-defense cluster sub-PR 4 of 6 per PR #86.
Adds two ANNOTATE-only flags surfaced from the SEC EDGAR filing list.

## What's new

### `restatement_history` annotate (5y lookback)

- Module: `compute/scoring/restatement_filings.py` (~390 LOC + ~290
  test LOC).
- Counts 10-K/A + 10-Q/A filings per CIK in the trailing 5 years.
- Paper: Hennes-Leone-Miller 2008 *TAR* — restating firms see -9%
  abnormal return on announcement; recurrent restaters compound.
- Lookback: `config.RESTATEMENT_HISTORY_LOOKBACK_DAYS = 1825` (5×365).
- ANNOTATE-only — base rate sector-agnostic, no veto without sector
  adjustment (which 4.5b doesn't include).

### `late_filing_notification` annotate (1y lookback)

- Same module.
- Detects SEC Form 12b-25 (NT 10-K / NT 10-Q) in the trailing 365d.
- Paper: Bartov-Lai-Yeung 2002 *JAR* — late filers see -5-7%
  abnormal returns.
- Lookback: `config.LATE_FILING_LOOKBACK_DAYS = 365`.

## Architecture (mirrors `compute.scoring.eight_k_events`)

- Per-CIK JSON cache (7d TTL) at `compute/cache/edgar_amendments/`
  and `compute/cache/edgar_late_filings/`.
- Cache shape: `{fetched_at, lookback_days, filings: [{accession,
  form, filing_date, filing_url}]}`.
- Fetch path: `edgar.Company(ticker).get_filings(form=...)` with
  per-form retry; merges results across multiple forms client-side
  + sorts desc by filing_date so `filings[0]` is the latest.
- Public entry points: `check_restatement_history(ticker, ...,
  filings_override=...)` and `check_late_filing(ticker, ...,
  filings_override=...)`. The override path is the test inject —
  bypasses EDGAR, keeps unit tests offline.

## Files touched

| File | Change |
|---|---|
| `compute/scoring/restatement_filings.py` | NEW. Cache + fetch + check_restatement_history + check_late_filing. |
| `compute/config.py` | + `RESTATEMENT_HISTORY_LOOKBACK_DAYS=1825` + `LATE_FILING_LOOKBACK_DAYS=365` + `EDGAR_AMENDMENTS_CACHE_DIR` + `EDGAR_LATE_FILINGS_CACHE_DIR`. |
| `compute/main.py` | + import `check_late_filing` + `check_restatement_history`. Per-ticker Step-8 loop appends `restatement_history` / `late_filing_notification` to `valuation_warnings` when the check fires. Slots immediately after the existing PR 4b §1 `cross_source_disagreement` block. |
| `.github/workflows/compute-rankings.yml` | + 2 new cache paths (`edgar_amendments`, `edgar_late_filings`) so weekly runs preserve the per-ticker JSON files. |
| `tests/test_scoring/test_restatement_filings.py` | NEW. 17 tests covering `_filing_date_within` boundaries, both check_* entry points (no filings, within window, outside window, multiple within window, lookback constants, fetch-failure graceful path). All offline via `filings_override`. |
| `tests/test_workflow_cache_coverage.py` | + 2 new parametrized cache-path assertions for the new directories (catches future workflow YAML drift). |

## Defense layer end-state (after this PR ships)

- Active vetoes: 7 (unchanged — 4.5b is annotate-only)
- Annotate flags: 5 → **7** (added `restatement_history`,
  `late_filing_notification`)
- Reason taxonomy: 29 → **31** stable identifiers

No schema delta — new flag IDs are strings in existing
`valuation_warnings: list[str]`.

## Backward compat

- `filings_override` arg is opt-in (None default → fetches via
  EDGAR). Existing callers without it unchanged.
- `EDGAR_USER_AGENT` env var precondition matches the rest of the
  Tier-2 layer — fetcher returns None when unset (cleanly skipping
  the flag rather than crashing).
- Caches gitignored under `compute/cache/`.

## Verification ladder

- ✅ `ruff check .` — clean
- ✅ `pytest tests/ -m "not network"` — **803 passed** (was 784;
  +17 new restatement tests + 2 new workflow-cache parametrize entries)
- ⏳ Production verification deferred to next workflow_dispatch.
  Expected fire rates on S&P 500 (rough — needs production run to
  confirm):
  - `restatement_history` — 30-80 tickers (~6-16%) based on
    historical 10-K/A base rates 2020-2025
  - `late_filing_notification` — 5-20 tickers (~1-4%) based on
    SEC Form 12b-25 filing data

## Sibling sub-PRs (Phase 4.5 cluster)

- ✅ **4.5a wave** complete (PRs #89/#90/#91 + #92 docs)
- **4.5b (this PR)** — disclosure-driven catches
- ⬜ **4.5c** — Roychowdhury REM 3-proxy
- ⬜ **4.5d** — M-score 3y momentum + Burgstahler-Dichev kink
- ⬜ **4.5e** — Form 4 insider clustering
- ⬜ **4.5f** — `manipulation_index` composite + UI + schema bump → v1.2.0

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 17, 2026
…em_suspect annotate) (#95)

Phase 4.5 manipulation-defense cluster sub-PR 5 of 6 per PR #86.
Catches REAL manipulation (cutting R&D, channel stuffing, deferring
maintenance, overproduction) — invisible to the existing accrual-
targeting defenses (Sloan / Beneish / Dechow).

## Model — Roychowdhury 2006 *JAE* 3-proxy

Three abnormal residuals from per-sector OLS regressions:

1. **Abnormal CFO** — residual of
   `CFO_t / A_{t-1}` on `[1, 1/A_{t-1}, Sales_t/A_{t-1}, ΔSales_t/A_{t-1}]`.
   **Low (negative)** = suspicious → firm front-loaded sales via
   channel stuffing / loose credit / discounts to inflate CFO.
2. **Abnormal Production** — residual of
   `(COGS_t + ΔInventory_t) / A_{t-1}` on `[1, 1/A_{t-1}, Sales_t,
   ΔSales_t, ΔSales_{t-1}]` (all over A_{t-1}).
   **High** = suspicious → overproduction spreads fixed costs over
   more units, deflating per-unit COGS and inflating gross margin.
3. **Abnormal Discretionary Expenses** — residual of
   `(R&D_t + SGA_t) / A_{t-1}` on `[1, 1/A_{t-1}, Sales_{t-1}/A_{t-1}]`.
   **Low (negative)** = suspicious → firm cut discretionary spending
   to boost current earnings. (Advertising omitted — SEC XBRL rarely
   tags it separately; per Roychowdhury 2006 footnote 7 the SGA-only
   adaptation is acceptable since advertising is usually subsumed
   in SGA.)

Flag `rem_suspect` fires when **≥ 2 of 3** residuals sit in their
respective worst decile within the ticker's GICS sector. Mirrors the
4.5a.3 `manipulation_triple_flag` pattern but uses *real* (not
accrual) signals.

## Architecture

| File | Change |
|---|---|
| `compute/scoring/rem.py` | **NEW** — `REMProxies` + `REMResult` dataclasses; `compute_proxies` (per-ticker input vector from snap + history); `_fit_sector_models` (per-sector OLS via `numpy.linalg.lstsq`); `compute_rem_flags` (two-pass: proxies → sector models → residuals → within-sector decile rank → fire). ~420 LOC. |
| `compute/main.py` | Pre-compute `rem_results` once via `compute_rem_flags(snapshots, histories=histories, sectors=sectors_dict)` right after `compute_risk_flags`. Per-ticker Step-8 loop appends `rem_suspect` to `valuation_warnings` when `rem_result.fired`. |
| `tests/test_scoring/test_rem.py` | **NEW** — 14 tests in three layers: (1) proxy construction (5 tests covering well-formed, missing snap, missing assets denominator, R&D fallback to SGA-only, inventory-missing PROD skip), (2) end-to-end `compute_rem_flags` (8 tests: empty, below floor, at floor, double-outlier fires, single-outlier triggers cfo axis, triple-outlier 3-trigger, normal-ticker H0 FP rate, constants), (3) **golden numerical test** verifying OLS recovers known-DGP coefficients. |

## No new dependencies

- `numpy.linalg.lstsq` for OLS (already in dep tree)
- No `sklearn`, no `statsmodels` — pure-numpy reimplementation keeps
  install surface tight (mirrors PR 4b §2 PBO/DSR decision)

## Defense-layer end-state (after this PR ships)

- Active vetoes: 7 (unchanged — 4.5c is annotate-only)
- Annotate flags: 7 → **8** (+ `rem_suspect`)
- Reason taxonomy: 31 → **32** stable identifiers

No schema delta — `rem_suspect` is a string in existing
`valuation_warnings: list[str]`. `SCHEMA_VERSION` stays `0.7.1-phase4g`.

## Backward compat

- `compute_rem_flags(snapshots, histories=None, sectors=None)` —
  both kwargs optional. When sectors absent, no sector model can
  fit (every ticker's sector lookup returns None); all results have
  `fired=False`.
- Sectors below `REM_MIN_POPULATION_SECTOR = 15` (matches 4.5a.1
  Sloan sector-relative floor) skip REM cleanly — those tickers
  get `REMResult(None, None, None, fired=False)`. No active-veto
  fallback (REM is annotate-only).
- DISEXP falls back to SGA-only when R&D is missing (financials /
  REITs / utilities) per Roychowdhury 2006 footnote 7.

## Verification ladder

- ✅ `ruff check .` — clean
- ✅ `pytest tests/test_scoring/test_rem.py` — **14 passed**
- ✅ `pytest tests/ -m "not network"` — **817 passed** (was 803;
  +14 new REM tests)
- ✅ schema_check — N/A (no schema delta)
- ⏳ Production verification deferred to next workflow_dispatch.
  Expected fire rate: 5-7% (~25-35 of 502 S&P 500 tickers)
  assuming moderate axis correlation. H0 (independent axes) FP
  rate is 2.8% per the 2-of-3 joint-probability calc.

## Sibling sub-PRs (Phase 4.5 cluster)

- ✅ **4.5a wave** complete (PRs #89 / #90 / #91 + #92 docs)
- ✅ **4.5b** complete (PR #93 + #94 docs)
- **4.5c (this PR)** — Roychowdhury REM
- ⬜ **4.5d** — M-score 3y momentum + Burgstahler-Dichev kink
- ⬜ **4.5e** — Form 4 insider clustering
- ⬜ **4.5f** — `manipulation_index` composite + UI + schema bump → v1.2.0

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 17, 2026
…als_momentum_high + loss_avoidance_pattern) (#97)

Phase 4.5 manipulation-defense cluster sub-PR 6 of 6 (the last
purely-defense sub before 4.5f composite + UI bundling). Two
annotate-only flags derived from the per-ticker fundamentals
history (annual XBRL).

## What's new

### `accruals_momentum_high` — Δ(TATA) over 3y > +0.05

- TATA = (NetIncome − OperatingCashFlow) / TotalAssets, the Sloan
  1996 / Beneish 1999 accruals backbone.
- Threshold +0.05 ≈ Beneish 1999 ΔM > +0.5 via the β_TATA = 4.679
  coefficient (ΔM ≈ 4.679 × ΔTATA → ΔM > 0.5 ⇔ ΔTATA > 0.107). We
  use 0.05 — more sensitive since TATA alone captures less than
  the full 8-ratio signal; standard practitioner adaptation when
  shortening to one ratio.
- Catches manipulation **gathering steam** — the snapshot-only
  Sloan + Beneish flags miss the trajectory entirely.

**Practical note on naming**: PR #86 plan §4.5d called this
`m_score_deteriorating` (full Δ(Beneish M-score) > +0.5). We
chose TATA momentum as a practical equivalent: building 3
historical 8-ratio Beneish snapshots from XBRL history would
require expanding the annual-history coverage of 6+ supplementary
ratios (DSRI / GMI / AQI / etc.) that often have gaps for prior
years. TATA is the single Beneish component that's a level rather
than a ratio-of-ratios, and Sloan 1996 established it as the
standalone accruals signal — so this is a clean shortening, not a
weakening.

### `loss_avoidance_pattern` — Burgstahler-Dichev 1997 kink at zero

- Fires when **3+ consecutive fiscal years** of tiny-positive
  earnings: NI ∈ [\$0, \$5M] **OR** EPS ∈ [\$0.00, \$0.05].
- Per-share band catches the high-share-count case where NI alone
  is above the absolute floor but per-share is still tiny.
- Empirical kink-at-zero signature of managers shading reported
  earnings just enough to clear the loss / loss-threshold.

## Architecture

| File | Change |
|---|---|
| `compute/scoring/earnings_quality.py` | **NEW** ~250 LOC — `check_accruals_momentum` + `check_loss_avoidance` + history-walk helpers (`_annual_values`, `_value_at_year`). Pure pandas; no new deps. |
| `compute/main.py` | + 2 import lines + 2 per-ticker annotate appends in the Step-8 loop, slotting after `rem_suspect`. |
| `tests/test_scoring/test_earnings_quality.py` | **NEW** ~225 LOC — 14 offline tests covering both flags (fires / doesn't fire / improves / threshold pins / EPS-band fallback / negative-NI rejection / large-NI rejection / multi-year streak / streak break / constants sanity). |

## Defense-layer end-state (after this PR ships)

- Active vetoes: **7** (unchanged — 4.5d is annotate-only)
- Annotate flags: 8 → **10** (+ `accruals_momentum_high`,
  `loss_avoidance_pattern`)
- Reason taxonomy: 32 → **34** stable identifiers
- Total defense layers: **9 → 16** after 4.5a + 4.5b + 4.5c + 4.5d

No schema delta — both flags are strings in existing
`valuation_warnings: list[str]`. `SCHEMA_VERSION` stays
`0.7.1-phase4g`.

## Backward compat

- Both check functions take `(snap, history)` — no caller
  changes elsewhere. Missing inputs (snap=None, no history,
  insufficient years) cleanly return fired=False.
- No new EDGAR fetches — both flags read from existing
  fundamentals + fundamentals_history caches.

## Verification ladder

- ✅ `ruff check .` — clean
- ✅ `pytest tests/test_scoring/test_earnings_quality.py` —
  **14 passed**
- ✅ `pytest tests/ -m "not network"` — **831 passed** (was 817;
  +14 new)
- ✅ schema_check — N/A (no schema delta)
- ⏳ Production verification deferred. Expected fire rates on S&P 500:
  - `accruals_momentum_high` ~3-8% (~15-40 tickers) — H0 from
    Δ(TATA) > 0.05 base rate
  - `loss_avoidance_pattern` ~1-3% (~5-15 tickers) — S&P 500 firms
    rarely report tiny-positive earnings for 3+ years (mega-cap
    distribution); base rate higher on small-caps per Burgstahler-
    Dichev 1997 original sample

## Sibling sub-PRs (Phase 4.5 cluster)

- ✅ **4.5a wave** (PRs #89 / #90 / #91 + #92 docs)
- ✅ **4.5b** (PR #93 + #94 docs)
- ✅ **4.5c** (PR #95 + #96 docs)
- **4.5d (this PR)** — earnings-quality time-series
- ⬜ **4.5e** — Form 4 insider clustering (~420 LOC, ~12 days —
  needs new SEC Form 4 parser)
- ⬜ **4.5f** — `manipulation_index` composite + composite-score
  penalty + UI pillar card + README Honest Limitations + schema
  bump → **v1.2.0-phase4.5**

https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants