fix(ingest): extend shares_outstanding fallback to catch implausibly-low primary extraction (#246 ERIE)#253
Merged
Merged
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
…ausibly-low primary extraction (#246) Closes #246. ERIE was returning shares_outstanding=2542 (real ~57M) because SEC companyfacts aggregate API filters out dimensional facts; ERIE files Class A (~54.9M) only with dimension, Class B (~2,541) without, so aggregate returns Class B only. PR #182 STZ fallback existed but trigger was strict `shares is None` so ERIE's non-None 2,542 slipped past. This PR extends the trigger to also fire when shares < MIN_PLAUSIBLE_SHARE_COUNT (100K — 30× safer than any plausible S&P 500 minimum share count). Production changes: - compute/config.py: new MIN_PLAUSIBLE_SHARE_COUNT = 100_000 constant - compute/ingest/fundamentals.py: trigger extended `is None OR < 100K`; logger surfaces primary=<None|count> for operator visibility - PHASE_STATUS_INFLIGHT.md: in-flight entry per §Conventions Tests follow in a subsequent commit (test-engineer spawn in flight). Side-effect coverage: - BRK-B (1.64M, above threshold): not fixed here (Class A→B 1500:1 conversion is separate methodology call); existing DQ veto stays - V/FOXA/NWS/NWSA: above threshold, covered by issue #248 PR2a/PR2b per methodology-scientist 2026-05-25 verdict Backward-compat: ERIE risk_flags[data_quality_input_corruption] will stop firing post-fix (shares becomes plausible → TBVPS gate doesn't trigger). No existing test pins ERIE's veto state. https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
12 tests for the trigger-decision branch added by the previous commit (cd9cbd9). Suite 1171 → 1182 (+11 offline + 1 @network gated). Coverage: - Boundary + branch (6): ERIE shape (2542 fires) · plausible 100M (no fire) · None backward-compat (PR #182 STZ path) · strict-< floor at 99_999/100_000/100_001 - Gate preservation (2): too-low + revenue=0 blocks · too-low + total_assets=0 blocks (PR #182 invariants preserved) - Logging (1): caplog distinguishes primary=None vs primary=2542 for operator visibility - Hypothesis property (1): @given(st.one_of(st.none(), st.integers(0, 10M))) — fallback fires iff primary is None or primary < 100_000; no @settings(deadline=None) per CLAUDE.md - Config pin (1): MIN_PLAUSIBLE_SHARE_COUNT == 100_000 drift detector - @network (1): live ERIE probe, recovered shares ∈ (50M, 65M) — mirrors STZ/AAPL/WMT @network pins from PR #182 Verified: pytest tests/test_ingest/test_fundamentals.py -m "not network" → 11 passed, 1 deselected, 0 failures. ruff clean. https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
092a2b8 to
51ec5d6
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #246. ERIE's
raw_metrics.shares_outstanding = 2542(real ~57M) because SECcompanyfactsaggregate API filters out dimensional facts. ERIE files Class A (~54.9M) only with adei:share-class dimension and Class B (~2,541) without — so the aggregate returns Class B only.PR #182 STZ fallback
_fetch_shares_from_per_filing_xbrlalready exists, BUT its trigger was strictshares is None(Issue #176 STZ signature). ERIE's non-None 2,542 slipped past → fallback never fired → ERIE shipped with wrong data + ranks at #69 despite veto.This PR extends the trigger to also fire when
shares < MIN_PLAUSIBLE_SHARE_COUNT (100_000)— the S&P 500 index-floor sanity gate (30× safer than any plausible legitimate value).Production code changes (~12 net LOC)
compute/config.py— newMIN_PLAUSIBLE_SHARE_COUNT: int = 100_000constant with rationale docstring (S&P 500 index-floor mcap $15B + extreme single-share-price $500 implies ≥ 30M minimum; 100K is 30× safer)compute/ingest/fundamentals.py:774-803— trigger extended fromshares is Nonetoshares is None OR shares < MIN_PLAUSIBLE_SHARE_COUNT. Existingrevenue > 0+total_assets > 0gates preserved. Logger updated to surfaceprimary=<None|count>for operator visibility (distinguishes STZ-signature None case from ERIE-signature too-low case without re-running a probe)PHASE_STATUS_INFLIGHT.md— in-flight entry per §Conventions (PR docs(workflow): adopt PHASE_STATUS_INFLIGHT.md side-file (structural fix for parallel-PR collision) #237 convention)Tests
Test-engineer spawn in flight — 12 cases incoming on a follow-up commit:
@networkERIE drift-detector (mirrors STZ/AAPL/WMT @network pins)Side-effect coverage
Backward-compat note
ERIE's
risk_flags = ['data_quality_input_corruption']will stop firing post-fix (shares becomes plausible → TBVPS gate doesn't trigger). No existing test pins ERIE's veto state (verified viagrep -r ERIE tests/— zero hits). ERIE's rank shifts based on now-valid valuation; expected behavior.Schema / docs
0.10.2-phase4.5e.Verification
ruff check compute/config.py compute/ingest/fundamentals.py— cleanpython -c "from compute.config import MIN_PLAUSIBLE_SHARE_COUNT"— imports OKpytest tests/test_ingest/test_fundamentals.py -m "not network" -x -q— pending test-engineer commit@networkERIE drift-detector — runs in CI withEDGAR_USER_AGENTSibling issues filed in same audit batch (2026-05-25)
data_quality_input_corruptioninvaluation_warningsbut absent fromrisk_flags(scoring layer gap) #247 NVR scoring-layer DQ gap → methodology-scientist Mode B ownsshares_outstanding=469M(real ~1.85B, ~4× off) — no veto despite obvious mcap understatement #248 V/Visa cross-source escalation → PR2a + PR2b split per methodology verdictcompute-rankings.ymltimeout-minutesvs 5-loop cold-cache reality (OR adopt off-cycle pre-cache) #249 Rebaselinecompute-rankings.ymltimeout (durable fix for 2026-05-25 P1)Test plan
ruff+ offline pytest + frontend tsc + frontend build)ERIE.jsonwithshares_outstanding ≈ 57M+market_cap > $10B+ emptyrisk_flagsdefense-layer-auditorpost-cron Section A-J verifies no regressionhttps://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
Generated by Claude Code