fix(ingest): per-filing XBRL fallback recovers STZ-style dimensional shares_outstanding (closes #176)#182
Merged
Conversation
…shares_outstanding (closes #176) Issue #176 ships in two PRs: - PR #181 (visibility) — annotate `share_count_extraction_missing` surfaces tickers where shares_outstanding extraction fails. - This PR (root cause) — actually recovers the missing share count. Live SEC probe (2026-05-21) confirmed STZ files `dei:EntityCommonStockSharesOutstanding` only with Class A / Class B share-class dimensions. The SEC `companyfacts` aggregate API filters out dimensional facts (companyconcept API returns HTTP 404 on STZ for both `dei:EntityCommonStockSharesOutstanding` and `us-gaap:CommonStockSharesOutstanding`), so the primary extraction path via `Company.facts.get_concept` returns None even though revenue + balance sheet extract cleanly. New `_fetch_shares_from_per_filing_xbrl(company)` pulls the most recent 10-K (falls back to 10-Q if none on file), aggregates `dei:EntityCommonStockSharesOutstanding` across all dimensional contexts at the most-recent `period_instant` (share-count facts are instant-type, not flow-type), and returns the sum. Falls back to `us-gaap:CommonStockSharesIssued` if the dei concept is empty. Wrapped in graceful-degradation try/except — any failure returns None and the upstream PR-#181 annotate keeps firing as the safety net. Triggered ONLY when the primary extraction returns None AND revenue > 0 AND total_assets > 0 (the PR-#181 signature), bounding universe-wide HTTP cost to ~1-3 tickers per cron (blast radius = 1 on the 2026-05-14 baseline). Live verification: STZ: 172.20M shares (Class A 172.17M + Class B 26K) ✓ AAPL: 14.78B ✓ WMT: 7.97B ✓ Tests 1040 → 1049: +9 offline mock tests covering positive STZ-signature aggregation, most-recent period_instant filter, us-gaap fallback chain, six graceful-degradation paths, plus 1 @network STZ live drift-detector (locks the period_instant column + get_facts_by_concept shape against future edgartools API drift). No schema change — operates at the snapshot-construction layer. https://claude.ai/code/session_01HHo4UHKc9iKKytkKfxfVnA
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This was referenced May 23, 2026
dackclup
added a commit
that referenced
this pull request
May 23, 2026
…ing (#220) Two cron-#3 audit follow-ups (2026-05-23) folded into one PR. Both surfaced by stock-detail-auditor + root-caused by edgar-debugger. ## Bug 1 — DD eps_diluted XBRL single-period mis-parse DD shipped 2026-05-23 with `eps_diluted=0.39` against `NI/shares = $7M/410M = $0.017` (~23× off). Root cause: `compute/main.py` `_build_raw_metrics` was passing `snapshot.eps_diluted` raw to RawMetrics. The XBRL `EarningsPerShareDiluted` concept returns the **latest single-period value** per `fundamentals.py:114-117` — for a quarterly filer that's one quarter's EPS, not TTM. `pe_ratio_ttm` was already on the NI/shares path since audit #6 / PR #49, so the valuation chain held internal consistency — but the EPS display field on /stock/DD rendered the wrong number to users. Fix: compute `ttm_eps = NI / shares` once and use for both `eps_basic` + `eps_diluted` display fields. The basic-vs-diluted spread on the S&P 500 is typically < 1-3% — within display precision. Negative net_income preserves sign (loss-year tickers show "−$0.42 EPS" not "null"). `pe_ratio_ttm` formula unchanged. ## Bug 2 — STZ shares_outstanding fallback silent failure STZ on 2026-05-23 ran with `shares_outstanding=null` + `market_cap= null` despite PR #182's per-filing XBRL fallback. 2026-05-21 live SEC probe confirmed the fallback works (returns 172.20M). Two days later under cron load it returned None silently. Root cause: PR #182's outer `except: return None` was bare — no log line on the failure path. The operator couldn't distinguish transient SEC 429 from structural XBRL drift without re-running a live probe. PR-3d amplification pattern parallel: graceful degradation correct, but silence kills observability. Fix: - Thread `ticker` arg into `_fetch_shares_from_per_filing_xbrl` (optional kwarg, back-compat preserved — existing 8 offline tests still pass without the kwarg) - Emit `logger.warning("shares_outstanding fallback FAILED for %s — %s: %s", ticker, type(e).__name__, e)` on the outer except; the message includes a note that `share_count_extraction_missing` annotate will fire as the safety net - Inner two `except` blocks (filings.head() + get_facts_by_concept) log at DEBUG so the failure mode is distinguishable in verbose mode without spamming default-level logs Annotate `share_count_extraction_missing` (PR #181) keeps firing upstream — this PR is observability-only, not a new recovery path. ## Tests (+7) `tests/test_main.py` (+5): - `test_build_raw_metrics_eps_diluted_derived_from_ni_not_xbrl_singleperiod` pins the DD case: snap.eps_diluted=0.39 + NI=7M + shares=410M → RawMetrics.eps_diluted = 7M/410M ≈ $0.017 (NOT 0.39) - `test_build_raw_metrics_eps_preserves_negative_sign_on_loss_year` loss-year shows signed EPS, pe_ttm null - `test_build_raw_metrics_eps_null_when_shares_outstanding_missing` STZ regression case — eps fields null, no exception - `test_build_raw_metrics_eps_null_when_zero_shares` defensive guard - `test_build_raw_metrics_pe_ttm_unchanged_by_eps_fix` audit-#6 / PR #49 regression guard — pe_ttm logic preserved `tests/test_ingest/test_fundamentals_xbrl_fallback.py` (+2): - `test_per_filing_fallback_emits_warning_on_outer_except` pins the logger.warning emission with caplog when get_filings raises - `test_per_filing_fallback_ticker_arg_optional_for_back_compat` pins the back-compat path — no ticker kwarg → "?" sentinel in the log message, existing call sites unbroken Tests 1049 → 1056 (+7). Pre-existing optional-deps skips (ipca / qlib / OSAP — `.[factors]` extra) unaffected. ## Verification - `ruff check .` — clean - `python -m pytest tests/ -m "not network" --ignore=tests/test_features/test_osap_e2e_integration.py --ignore=tests/test_ingest/test_osap.py` — **1103 passed, 7 skipped, 20 deselected** - No schema / Pydantic / TypeScript / snapshot triple touch - No frontend touch ## Issues filed in parallel (cron-#3 audit follow-ups) - **#217** stock-detail-auditor factor-exposure proxy heuristic (OSAP false-positive prevention) - **#218** verify-helper Section L OSAP invariant assertion ## Cross-references - Issue #176 / PR #181 / PR #182 — STZ shares_outstanding recovery ladder this PR completes - Audit #6 / PR #49 — `pe_ratio_ttm` NI/shares pattern this PR extends to `eps_diluted` / `eps_basic` display fields - 2026-05-23 cron #3 stock-detail-auditor + edgar-debugger reports Co-authored-by: Claude <noreply@anthropic.com>
This was referenced May 25, 2026
dackclup
pushed a commit
that referenced
this pull request
May 25, 2026
12 tests for the trigger-decision branch added by the previous commit (cd9cbd9). Suite 1171 → 1182 (+11 offline + 1 @network gated). Coverage: - Boundary + branch (6): ERIE shape (2542 fires) · plausible 100M (no fire) · None backward-compat (PR #182 STZ path) · strict-< floor at 99_999/100_000/100_001 - Gate preservation (2): too-low + revenue=0 blocks · too-low + total_assets=0 blocks (PR #182 invariants preserved) - Logging (1): caplog distinguishes primary=None vs primary=2542 for operator visibility - Hypothesis property (1): @given(st.one_of(st.none(), st.integers(0, 10M))) — fallback fires iff primary is None or primary < 100_000; no @settings(deadline=None) per CLAUDE.md - Config pin (1): MIN_PLAUSIBLE_SHARE_COUNT == 100_000 drift detector - @network (1): live ERIE probe, recovered shares ∈ (50M, 65M) — mirrors STZ/AAPL/WMT @network pins from PR #182 Verified: pytest tests/test_ingest/test_fundamentals.py -m "not network" → 11 passed, 1 deselected, 0 failures. ruff clean. https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
dackclup
pushed a commit
that referenced
this pull request
May 25, 2026
…ausibly-low primary extraction (#246) Closes #246. ERIE was returning shares_outstanding=2542 (real ~57M) because SEC companyfacts aggregate API filters out dimensional facts; ERIE files Class A (~54.9M) only with dimension, Class B (~2,541) without, so aggregate returns Class B only. PR #182 STZ fallback existed but trigger was strict `shares is None` so ERIE's non-None 2,542 slipped past. This PR extends the trigger to also fire when shares < MIN_PLAUSIBLE_SHARE_COUNT (100K — 30× safer than any plausible S&P 500 minimum share count). Production changes: - compute/config.py: new MIN_PLAUSIBLE_SHARE_COUNT = 100_000 constant - compute/ingest/fundamentals.py: trigger extended `is None OR < 100K`; logger surfaces primary=<None|count> for operator visibility - PHASE_STATUS_INFLIGHT.md: in-flight entry per §Conventions Tests follow in a subsequent commit (test-engineer spawn in flight). Side-effect coverage: - BRK-B (1.64M, above threshold): not fixed here (Class A→B 1500:1 conversion is separate methodology call); existing DQ veto stays - V/FOXA/NWS/NWSA: above threshold, covered by issue #248 PR2a/PR2b per methodology-scientist 2026-05-25 verdict Backward-compat: ERIE risk_flags[data_quality_input_corruption] will stop firing post-fix (shares becomes plausible → TBVPS gate doesn't trigger). No existing test pins ERIE's veto state. https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
dackclup
pushed a commit
that referenced
this pull request
May 25, 2026
12 tests for the trigger-decision branch added by the previous commit (cd9cbd9). Suite 1171 → 1182 (+11 offline + 1 @network gated). Coverage: - Boundary + branch (6): ERIE shape (2542 fires) · plausible 100M (no fire) · None backward-compat (PR #182 STZ path) · strict-< floor at 99_999/100_000/100_001 - Gate preservation (2): too-low + revenue=0 blocks · too-low + total_assets=0 blocks (PR #182 invariants preserved) - Logging (1): caplog distinguishes primary=None vs primary=2542 for operator visibility - Hypothesis property (1): @given(st.one_of(st.none(), st.integers(0, 10M))) — fallback fires iff primary is None or primary < 100_000; no @settings(deadline=None) per CLAUDE.md - Config pin (1): MIN_PLAUSIBLE_SHARE_COUNT == 100_000 drift detector - @network (1): live ERIE probe, recovered shares ∈ (50M, 65M) — mirrors STZ/AAPL/WMT @network pins from PR #182 Verified: pytest tests/test_ingest/test_fundamentals.py -m "not network" → 11 passed, 1 deselected, 0 failures. ruff clean. https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
dackclup
pushed a commit
that referenced
this pull request
May 25, 2026
… (V/NWS/NWSA/FOX/FOXA/BRK-B/STZ) Closes the V/Visa root-cause undercount surfaced by PR2a (PR #256) cross-source observability — SEC `companyfacts` aggregate API filters dimensional facts, so multi-class issuers' primary `shares_outstanding` value is mechanically incomplete (Class A only). Locked across 6 grill-me + 3 sub-agent verdicts: - **methodology-scientist** (Mode B) — Damodaran 2019 *Investment Valuation* 3rd ed. Ch. 16: total common shares outstanding = SUM across all classes; voting-premium discount applies to PRICE only. Summed dimensional value is definitionally the truth (0% delta threshold, NOT 10% as originally proposed — a 10% gate would suppress the truth for any filer with minor classes < 10%). - **performance-engineer** (Mode B) — universal peek-XBRL adds ~5 min wall-clock and breaches the <5 min warm-cache budget (current p95 fundamentals latency 16.25s already over the 15s warn threshold). Allowlist limits peek to 7 tickers → ~5s wall-clock total. - **edgar-debugger** (Mode B) — verified allowlist completeness via EPS cross-check on production output: V (4.5x), NWS/NWSA (1.56x), FOX/FOXA (2.2x), BRK-B (1300x improvement; ~14% residual Class A 1500x weighting deferred to Q3 2026-08-19 cohort audit), STZ (existing None-trigger path). GOOG/GOOGL excluded (file non-dimensionally; companyfacts works). **Code changes**: - `compute/config.py` — SCHEMA_VERSION 0.10.3 → 0.10.4-phase4.5e; `MULTI_CLASS_SHARE_ALLOWLIST: frozenset[str]` (7 tickers) with Damodaran provenance docstring + BRK-B caveat + expansion-gate procedure. - `compute/ingest/fundamentals.py` — `_FALLBACK_STATS` gains `dimensional_override` counter; new `elif` branch in `_build_snapshot` fires for allowlist tickers with plausible primary; reuses existing `_fetch_shares_from_per_filing_xbrl` (PR #182 STZ path); overrides when `summed > primary`. - `compute/main.py` — read + log + wire new counter to Metadata. - `compute/output/schemas.py` + `frontend/lib/types.ts` + `frontend/lib/schema-snapshot.json` — new `Metadata.shares_fallback_dimensional_override_count: int | None` (additive, nullable; PATCH bump). **Behavior delta**: - 7 allowlist tickers see corrected `shares_outstanding` → `market_cap` / `pe_ratio_ttm` / fair-price ensemble update (some Top-N rank moves likely) - 495 other S&P 500 tickers: ZERO change (composite / risk_flags / fair_price / top5 rotation unchanged) - STZ unchanged (already on None-trigger path) **Test pin**: `tests/test_config.py` — schema version pin update + new `test_multi_class_share_allowlist_membership` (catches accidental allowlist mutation). 7 new offline test cases for the dimensional override path land in tests/test_ingest/test_fundamentals.py in a follow-up commit (test-engineer sub-agent writing in parallel). Expected Metadata fingerprint (post first cron): - `shares_fallback_dimensional_override_count` ≈ 6 (V + NWS + NWSA + FOX + FOXA + BRK-B; STZ captured by None-trigger path first) Refs Issue #248, supersedes PR #256's 0.10.3 schema bump. https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
This was referenced May 26, 2026
dackclup
added a commit
that referenced
this pull request
May 26, 2026
… fix for GOOG/GOOGL $4.6T overcount) (#269) Closes the structural half of issue #261 — the OVERCOUNT pattern where SEC companyfacts returns Alphabet's 12.12B total shares to both per-class tickers, producing $4.6T market_cap per ticker vs real ~$1.05T per class. PR-A (PR #264, merged) shipped the multi_class_aggregate_shares_suspected annotate; this PR-B ships the actual fix. Per methodology-scientist Mode B 2026-05-26 (Path 1 reverse- allowlist) + edgar-debugger live probe (Alphabet 10-K accession 0001652044-26-000018). compute/config.py: - New MULTI_CLASS_OVERCOUNT_ALLOWLIST: dict[str, str]: - GOOGL → "us-gaap:CommonClassAMember" (standard namespace) - GOOG → "goog:CapitalClassCMember" (FILER-SPECIFIC namespace gotcha — caught by edgar-debugger probe) - SCHEMA_VERSION bump 0.10.5 → 0.10.6-phase4.5e compute/ingest/fundamentals.py: - Extended _fetch_shares_from_per_filing_xbrl with target_class_member parameter (None=sum-all PR #182 STZ pattern; set=filter to specific class member via xbrl.contexts[ref].dimensions lookup) - New elif branch in _build_snapshot fires when ticker in allowlist + primary plausible + QR_SKIP_FUNDAMENTALS not set; overrides primary IFF per_class < primary; defensive mc_reconcile_failure counter for sanity-check failures (per_class >= primary OR per_class fraction outside 5-95% of primary) compute/output/schemas.py + frontend/lib/types.ts + snapshot: - Triple lockstep — 2 additive Metadata fields: multi_class_per_class_override_count (expected steady-state ≈ 2) multi_class_mc_reconcile_failure_count (defensive Rule-18 guard) compute/main.py — wire counters from _FALLBACK_STATS Tests +9 (1216 → 1225): - test_config.py: schema pin update, allowlist membership pin, disjoint-allowlist invariant test - test_ingest/test_fundamentals.py: GOOG override (filer-namespace), GOOGL override (standard namespace), non-allowlist ticker doesn't fire, QR_SKIP_FUNDAMENTALS escape-hatch, per_class >= primary sanity skip, mc_reconcile warning on <5% fraction, None return silently skipped. Plus 3 existing _FALLBACK_STATS tests updated to new 5-key dict shape (was 3 keys). ZERO behavior change for 500 non-allowlist tickers. 2 allowlist tickers (GOOG/GOOGL) gain corrected shares_outstanding (~5.4B/~5.8B from prior 12.12B overcount); flows through to market_cap (~$4.6T → ~$1.05T per class), pe_ratio_ttm, fair-price ensemble. The multi_class_aggregate_shares_suspected annotate (PR-A) continues to fire correctly (CIK collision invariant holds). Verification: - ruff clean - python -m compute.output.schema_check — triple in sync at 0.10.6 - pytest 1225 passed (offline), 7 skipped (factors extras), 24 deselected (@network — GOOG/GOOGL live drift-detector deferred) PHASE_STATUS_INFLIGHT.md side-file satisfies §Conventions lockstep per PR #237 convention. https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4 Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Root-cause fix for issue #176 — actually recovers the missing
shares_outstandingfor STZ-style filers, paired with PR #181 (visibility annotate).The investigation comment on issue #176 documents the SEC API probing that motivated this fix:
companyfactsaggregate filters out dimensional facts, but per-filing XBRL exposes them withis_dimensioned=Trueand aperiod_instant"as-of" column.Approach
New private helper
_fetch_shares_from_per_filing_xbrl(company)incompute/ingest/fundamentals.py:dei:EntityCommonStockSharesOutstandingacross all dimensional contexts at the most-recentperiod_instant.us-gaap:CommonStockSharesIssuedif dei is empty.Noneand the upstream PR-feat(defense): add share_count_extraction_missing annotate (closes #176) #181 annotateshare_count_extraction_missingkeeps firing as the safety net.Trigger gate matches the PR-#181 signature exactly:
shares_outstanding is None AND revenue > 0 AND total_assets > 0. Universe-wide HTTP cost is bounded to ~1-3 tickers per cron (blast radius = 1 on 2026-05-14).Live verification (this session)
Files changed
compute/ingest/fundamentals.py—_fetch_shares_from_per_filing_xbrl(company)+ the in-_build_snapshotfallback trigger afterbalance_valuesis filled.tests/test_ingest/test_fundamentals_xbrl_fallback.py(new file) — 9 offline mock tests + 1@networkSTZ live drift-detector.CLAUDE.md+AGENTS.md— lockstep ship-with-every-PR entries (Gotchas + Recently-merged + cross-tool agent note about the new edgartoolsFiling.xbrl().facts.get_facts_by_conceptcall surface).Test plan
ruff check compute/ tests/cleanpython -m pytest -m "not network" -q— 1049 passed (1040 → 1049, +9 new tests)EDGAR_USER_AGENT=... python -m pytest --run-network -k live_stz— passesmarket_cap+shares_outstanding+ EPS/P/E; theshare_count_extraction_missingannotate stops firing for STZ; rank may move from 308 because some valuation methods that needed shares can now compute)Constraints upheld
None, PR-feat(defense): add share_count_extraction_missing annotate (closes #176) #181 annotate keeps firing.Follow-ups (out of scope)
compute/cache/per_filing_xbrl/directory keyed by(cik, accession_no). XBRL doesn't change after filing so TTL is effectively infinite. Not needed today.Filing.xbrl().facts.get_facts_by_conceptAPI surface — defer to the next time edgartools' CHANGELOG shows movement in that area.eps_basic/eps_dilutedfor dimensional filers, but isn't needed today (the affected tickers' EPS extraction goes through a different path that handles dimensions differently).https://claude.ai/code/session_01HHo4UHKc9iKKytkKfxfVnA
Generated by Claude Code