Skip to content

fix(ingest): extend shares_outstanding fallback to catch implausibly-low primary extraction (#246 ERIE)#253

Merged
dackclup merged 2 commits into
mainfrom
claude/eager-bohr-12bQi
May 25, 2026
Merged

fix(ingest): extend shares_outstanding fallback to catch implausibly-low primary extraction (#246 ERIE)#253
dackclup merged 2 commits into
mainfrom
claude/eager-bohr-12bQi

Conversation

@dackclup
Copy link
Copy Markdown
Owner

Summary

Closes #246. ERIE's raw_metrics.shares_outstanding = 2542 (real ~57M) because SEC companyfacts aggregate API filters out dimensional facts. ERIE files Class A (~54.9M) only with a dei: share-class dimension and Class B (~2,541) without — so the aggregate returns Class B only.

PR #182 STZ fallback _fetch_shares_from_per_filing_xbrl already exists, BUT its trigger was strict shares is None (Issue #176 STZ signature). ERIE's non-None 2,542 slipped past → fallback never fired → ERIE shipped with wrong data + ranks at #69 despite veto.

This PR extends the trigger to also fire when shares < MIN_PLAUSIBLE_SHARE_COUNT (100_000) — the S&P 500 index-floor sanity gate (30× safer than any plausible legitimate value).

Production code changes (~12 net LOC)

  • compute/config.py — new MIN_PLAUSIBLE_SHARE_COUNT: int = 100_000 constant with rationale docstring (S&P 500 index-floor mcap $15B + extreme single-share-price $500 implies ≥ 30M minimum; 100K is 30× safer)
  • compute/ingest/fundamentals.py:774-803 — trigger extended from shares is None to shares is None OR shares < MIN_PLAUSIBLE_SHARE_COUNT. Existing revenue > 0 + total_assets > 0 gates preserved. Logger updated to surface primary=<None|count> for operator visibility (distinguishes STZ-signature None case from ERIE-signature too-low case without re-running a probe)
  • PHASE_STATUS_INFLIGHT.md — in-flight entry per §Conventions (PR docs(workflow): adopt PHASE_STATUS_INFLIGHT.md side-file (structural fix for parallel-PR collision) #237 convention)

Tests

Test-engineer spawn in flight — 12 cases incoming on a follow-up commit:

  • 6 boundary/branch (ERIE shape · STZ backward-compat · 99K/100K/100001 boundaries)
  • 2 gate preservation (revenue/assets zero invariants)
  • 1 logging discipline (caplog distinguishes None vs too-low)
  • 1 Hypothesis property
  • 1 config constant pin
  • 1 @network ERIE drift-detector (mirrors STZ/AAPL/WMT @network pins)

Side-effect coverage

Ticker Extracted Status
ERIE 2,542 (< 100K) fixed — fallback fires → recovers ~57M
BRK-B 1,643,118 (> 100K) not covered — existing DQ veto stays; Class A → B 1500:1 conversion is separate methodology call
V (Visa) 469M (> 100K) not covered — handled by #248 PR2a/PR2b per methodology verdict
FOXA / NWS / NWSA all > 100K not covered — included in #248 PR2b regression fixture

Backward-compat note

ERIE's risk_flags = ['data_quality_input_corruption'] will stop firing post-fix (shares becomes plausible → TBVPS gate doesn't trigger). No existing test pins ERIE's veto state (verified via grep -r ERIE tests/ — zero hits). ERIE's rank shifts based on now-valid valuation; expected behavior.

Schema / docs

Verification

  • ruff check compute/config.py compute/ingest/fundamentals.py — clean
  • python -c "from compute.config import MIN_PLAUSIBLE_SHARE_COUNT" — imports OK
  • pytest tests/test_ingest/test_fundamentals.py -m "not network" -x -q — pending test-engineer commit
  • Test count delta — pending
  • @network ERIE drift-detector — runs in CI with EDGAR_USER_AGENT

Sibling issues filed in same audit batch (2026-05-25)

Test plan

  • CI green (ruff + offline pytest + frontend tsc + frontend build)
  • Test-engineer follow-up commit adds 12 cases (boundary + property + @network)
  • Mon 22:00 UTC cron-feat(phase-2): SEC EDGAR fundamentals + per-stock detail pages #4 produces fresh ERIE.json with shares_outstanding ≈ 57M + market_cap > $10B + empty risk_flags
  • defense-layer-auditor post-cron Section A-J verifies no regression

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4


Generated by Claude Code

@vercel
Copy link
Copy Markdown

vercel Bot commented May 25, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
quantrank Ready Ready Preview, Comment May 25, 2026 8:23am

claude added 2 commits May 25, 2026 08:22
…ausibly-low primary extraction (#246)

Closes #246. ERIE was returning shares_outstanding=2542 (real ~57M)
because SEC companyfacts aggregate API filters out dimensional facts;
ERIE files Class A (~54.9M) only with dimension, Class B (~2,541)
without, so aggregate returns Class B only. PR #182 STZ fallback
existed but trigger was strict `shares is None` so ERIE's non-None
2,542 slipped past. This PR extends the trigger to also fire when
shares < MIN_PLAUSIBLE_SHARE_COUNT (100K — 30× safer than any
plausible S&P 500 minimum share count).

Production changes:
- compute/config.py: new MIN_PLAUSIBLE_SHARE_COUNT = 100_000 constant
- compute/ingest/fundamentals.py: trigger extended `is None OR < 100K`;
  logger surfaces primary=<None|count> for operator visibility
- PHASE_STATUS_INFLIGHT.md: in-flight entry per §Conventions

Tests follow in a subsequent commit (test-engineer spawn in flight).

Side-effect coverage:
- BRK-B (1.64M, above threshold): not fixed here (Class A→B 1500:1
  conversion is separate methodology call); existing DQ veto stays
- V/FOXA/NWS/NWSA: above threshold, covered by issue #248 PR2a/PR2b
  per methodology-scientist 2026-05-25 verdict

Backward-compat: ERIE risk_flags[data_quality_input_corruption] will
stop firing post-fix (shares becomes plausible → TBVPS gate doesn't
trigger). No existing test pins ERIE's veto state.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
12 tests for the trigger-decision branch added by the previous commit
(cd9cbd9). Suite 1171 → 1182 (+11 offline + 1 @network gated).

Coverage:
- Boundary + branch (6): ERIE shape (2542 fires) · plausible 100M (no
  fire) · None backward-compat (PR #182 STZ path) · strict-< floor at
  99_999/100_000/100_001
- Gate preservation (2): too-low + revenue=0 blocks · too-low +
  total_assets=0 blocks (PR #182 invariants preserved)
- Logging (1): caplog distinguishes primary=None vs primary=2542 for
  operator visibility
- Hypothesis property (1): @given(st.one_of(st.none(),
  st.integers(0, 10M))) — fallback fires iff primary is None or
  primary < 100_000; no @settings(deadline=None) per CLAUDE.md
- Config pin (1): MIN_PLAUSIBLE_SHARE_COUNT == 100_000 drift detector
- @network (1): live ERIE probe, recovered shares ∈ (50M, 65M) —
  mirrors STZ/AAPL/WMT @network pins from PR #182

Verified: pytest tests/test_ingest/test_fundamentals.py -m "not network"
→ 11 passed, 1 deselected, 0 failures. ruff clean.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
@dackclup dackclup force-pushed the claude/eager-bohr-12bQi branch from 092a2b8 to 51ec5d6 Compare May 25, 2026 08:22
@dackclup dackclup marked this pull request as ready for review May 25, 2026 08:22
@dackclup dackclup merged commit 4059b38 into main May 25, 2026
3 of 5 checks passed
@dackclup dackclup deleted the claude/eager-bohr-12bQi branch May 25, 2026 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ERIE shares_outstanding=2,542 — XBRL dimensional-fact extraction failure (STZ-class pattern, sibling of #176)

2 participants