fix(ingest): per-filing XBRL fallback recovers STZ-style dimensional shares_outstanding (closes #176) by dackclup · Pull Request #182 · dackclup/quantrank

dackclup · 2026-05-21T16:15:01Z

Summary

Root-cause fix for issue #176 — actually recovers the missing shares_outstanding for STZ-style filers, paired with PR #181 (visibility annotate).

The investigation comment on issue #176 documents the SEC API probing that motivated this fix: companyfacts aggregate filters out dimensional facts, but per-filing XBRL exposes them with is_dimensioned=True and a period_instant "as-of" column.

Approach

New private helper _fetch_shares_from_per_filing_xbrl(company) in compute/ingest/fundamentals.py:

Pulls the most recent 10-K (falls back to 10-Q if no 10-K on file).
Aggregates dei:EntityCommonStockSharesOutstanding across all dimensional contexts at the most-recent period_instant.
Falls back to us-gaap:CommonStockSharesIssued if dei is empty.
Wrapped in graceful-degradation try/except — any failure returns None and the upstream PR-feat(defense): add share_count_extraction_missing annotate (closes #176) #181 annotate share_count_extraction_missing keeps firing as the safety net.

Trigger gate matches the PR-#181 signature exactly: shares_outstanding is None AND revenue > 0 AND total_assets > 0. Universe-wide HTTP cost is bounded to ~1-3 tickers per cron (blast radius = 1 on 2026-05-14).

Live verification (this session)

Ticker	Recovered shares	Ground truth
STZ	172.20M (Class A 172.17M + Class B 26K)	~172M total ✓
AAPL	14.78B	~15B ✓
WMT	7.97B	~8B (post-2024 split) ✓

Files changed

compute/ingest/fundamentals.py — _fetch_shares_from_per_filing_xbrl(company) + the in-_build_snapshot fallback trigger after balance_values is filled.
tests/test_ingest/test_fundamentals_xbrl_fallback.py (new file) — 9 offline mock tests + 1 @network STZ live drift-detector.
CLAUDE.md + AGENTS.md — lockstep ship-with-every-PR entries (Gotchas + Recently-merged + cross-tool agent note about the new edgartools Filing.xbrl().facts.get_facts_by_concept call surface).

Test plan

ruff check compute/ tests/ clean
python -m pytest -m "not network" -q — 1049 passed (1040 → 1049, +9 new tests)
EDGAR_USER_AGENT=... python -m pytest --run-network -k live_stz — passes
Live SEC probe: STZ / AAPL / WMT all match ground truth
CI green
Pre-merge prod-sim diff (expect: STZ now has a populated market_cap + shares_outstanding + EPS/P/E; the share_count_extraction_missing annotate stops firing for STZ; rank may move from 308 because some valuation methods that needed shares can now compute)
User authorize merge

Constraints upheld

Rule 16 — operates at the ingest snapshot layer; composite scoring is unchanged. Top-5 ranks raw composite per Rule 16.
Graceful degradation — any per-filing fetch failure returns None, PR-feat(defense): add share_count_extraction_missing annotate (closes #176) #181 annotate keeps firing.
Bounded cost — trigger gate guarantees the extra HTTP call only fires on the narrow PR-feat(defense): add share_count_extraction_missing annotate (closes #176) #181 signature.
No schema change — no Pydantic / TS / snapshot edits.
CLAUDE.md + AGENTS.md both touched per §Conventions.

Follow-ups (out of scope)

Cache layer for per-filing XBRL — if the affected-ticker count grows past ~5/cron, add a compute/cache/per_filing_xbrl/ directory keyed by (cik, accession_no). XBRL doesn't change after filing so TTL is effectively infinite. Not needed today.
Drift-detector manifest for the edgartools Filing.xbrl().facts.get_facts_by_concept API surface — defer to the next time edgartools' CHANGELOG shows movement in that area.
Per-filing-XBRL extraction for other partial-extraction failure modes — same pattern could rescue eps_basic / eps_diluted for dimensional filers, but isn't needed today (the affected tickers' EPS extraction goes through a different path that handles dimensions differently).

https://claude.ai/code/session_01HHo4UHKc9iKKytkKfxfVnA

Generated by Claude Code

…shares_outstanding (closes #176) Issue #176 ships in two PRs: - PR #181 (visibility) — annotate `share_count_extraction_missing` surfaces tickers where shares_outstanding extraction fails. - This PR (root cause) — actually recovers the missing share count. Live SEC probe (2026-05-21) confirmed STZ files `dei:EntityCommonStockSharesOutstanding` only with Class A / Class B share-class dimensions. The SEC `companyfacts` aggregate API filters out dimensional facts (companyconcept API returns HTTP 404 on STZ for both `dei:EntityCommonStockSharesOutstanding` and `us-gaap:CommonStockSharesOutstanding`), so the primary extraction path via `Company.facts.get_concept` returns None even though revenue + balance sheet extract cleanly. New `_fetch_shares_from_per_filing_xbrl(company)` pulls the most recent 10-K (falls back to 10-Q if none on file), aggregates `dei:EntityCommonStockSharesOutstanding` across all dimensional contexts at the most-recent `period_instant` (share-count facts are instant-type, not flow-type), and returns the sum. Falls back to `us-gaap:CommonStockSharesIssued` if the dei concept is empty. Wrapped in graceful-degradation try/except — any failure returns None and the upstream PR-#181 annotate keeps firing as the safety net. Triggered ONLY when the primary extraction returns None AND revenue > 0 AND total_assets > 0 (the PR-#181 signature), bounding universe-wide HTTP cost to ~1-3 tickers per cron (blast radius = 1 on the 2026-05-14 baseline). Live verification: STZ: 172.20M shares (Class A 172.17M + Class B 26K) ✓ AAPL: 14.78B ✓ WMT: 7.97B ✓ Tests 1040 → 1049: +9 offline mock tests covering positive STZ-signature aggregation, most-recent period_instant filter, us-gaap fallback chain, six graceful-degradation paths, plus 1 @network STZ live drift-detector (locks the period_instant column + get_facts_by_concept shape against future edgartools API drift). No schema change — operates at the snapshot-construction layer. https://claude.ai/code/session_01HHo4UHKc9iKKytkKfxfVnA

vercel · 2026-05-21T16:15:08Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
quantrank	Ready	Preview, Comment	May 21, 2026 4:15pm

…ing (#220) Two cron-#3 audit follow-ups (2026-05-23) folded into one PR. Both surfaced by stock-detail-auditor + root-caused by edgar-debugger. ## Bug 1 — DD eps_diluted XBRL single-period mis-parse DD shipped 2026-05-23 with `eps_diluted=0.39` against `NI/shares = $7M/410M = $0.017` (~23× off). Root cause: `compute/main.py` `_build_raw_metrics` was passing `snapshot.eps_diluted` raw to RawMetrics. The XBRL `EarningsPerShareDiluted` concept returns the **latest single-period value** per `fundamentals.py:114-117` — for a quarterly filer that's one quarter's EPS, not TTM. `pe_ratio_ttm` was already on the NI/shares path since audit #6 / PR #49, so the valuation chain held internal consistency — but the EPS display field on /stock/DD rendered the wrong number to users. Fix: compute `ttm_eps = NI / shares` once and use for both `eps_basic` + `eps_diluted` display fields. The basic-vs-diluted spread on the S&P 500 is typically < 1-3% — within display precision. Negative net_income preserves sign (loss-year tickers show "−$0.42 EPS" not "null"). `pe_ratio_ttm` formula unchanged. ## Bug 2 — STZ shares_outstanding fallback silent failure STZ on 2026-05-23 ran with `shares_outstanding=null` + `market_cap= null` despite PR #182's per-filing XBRL fallback. 2026-05-21 live SEC probe confirmed the fallback works (returns 172.20M). Two days later under cron load it returned None silently. Root cause: PR #182's outer `except: return None` was bare — no log line on the failure path. The operator couldn't distinguish transient SEC 429 from structural XBRL drift without re-running a live probe. PR-3d amplification pattern parallel: graceful degradation correct, but silence kills observability. Fix: - Thread `ticker` arg into `_fetch_shares_from_per_filing_xbrl` (optional kwarg, back-compat preserved — existing 8 offline tests still pass without the kwarg) - Emit `logger.warning("shares_outstanding fallback FAILED for %s — %s: %s", ticker, type(e).__name__, e)` on the outer except; the message includes a note that `share_count_extraction_missing` annotate will fire as the safety net - Inner two `except` blocks (filings.head() + get_facts_by_concept) log at DEBUG so the failure mode is distinguishable in verbose mode without spamming default-level logs Annotate `share_count_extraction_missing` (PR #181) keeps firing upstream — this PR is observability-only, not a new recovery path. ## Tests (+7) `tests/test_main.py` (+5): - `test_build_raw_metrics_eps_diluted_derived_from_ni_not_xbrl_singleperiod` pins the DD case: snap.eps_diluted=0.39 + NI=7M + shares=410M → RawMetrics.eps_diluted = 7M/410M ≈ $0.017 (NOT 0.39) - `test_build_raw_metrics_eps_preserves_negative_sign_on_loss_year` loss-year shows signed EPS, pe_ttm null - `test_build_raw_metrics_eps_null_when_shares_outstanding_missing` STZ regression case — eps fields null, no exception - `test_build_raw_metrics_eps_null_when_zero_shares` defensive guard - `test_build_raw_metrics_pe_ttm_unchanged_by_eps_fix` audit-#6 / PR #49 regression guard — pe_ttm logic preserved `tests/test_ingest/test_fundamentals_xbrl_fallback.py` (+2): - `test_per_filing_fallback_emits_warning_on_outer_except` pins the logger.warning emission with caplog when get_filings raises - `test_per_filing_fallback_ticker_arg_optional_for_back_compat` pins the back-compat path — no ticker kwarg → "?" sentinel in the log message, existing call sites unbroken Tests 1049 → 1056 (+7). Pre-existing optional-deps skips (ipca / qlib / OSAP — `.[factors]` extra) unaffected. ## Verification - `ruff check .` — clean - `python -m pytest tests/ -m "not network" --ignore=tests/test_features/test_osap_e2e_integration.py --ignore=tests/test_ingest/test_osap.py` — **1103 passed, 7 skipped, 20 deselected** - No schema / Pydantic / TypeScript / snapshot triple touch - No frontend touch ## Issues filed in parallel (cron-#3 audit follow-ups) - **#217** stock-detail-auditor factor-exposure proxy heuristic (OSAP false-positive prevention) - **#218** verify-helper Section L OSAP invariant assertion ## Cross-references - Issue #176 / PR #181 / PR #182 — STZ shares_outstanding recovery ladder this PR completes - Audit #6 / PR #49 — `pe_ratio_ttm` NI/shares pattern this PR extends to `eps_diluted` / `eps_basic` display fields - 2026-05-23 cron #3 stock-detail-auditor + edgar-debugger reports Co-authored-by: Claude <noreply@anthropic.com>

@given

12 tests for the trigger-decision branch added by the previous commit (cd9cbd9). Suite 1171 → 1182 (+11 offline + 1 @network gated). Coverage: - Boundary + branch (6): ERIE shape (2542 fires) · plausible 100M (no fire) · None backward-compat (PR #182 STZ path) · strict-< floor at 99_999/100_000/100_001 - Gate preservation (2): too-low + revenue=0 blocks · too-low + total_assets=0 blocks (PR #182 invariants preserved) - Logging (1): caplog distinguishes primary=None vs primary=2542 for operator visibility - Hypothesis property (1): @given(st.one_of(st.none(), st.integers(0, 10M))) — fallback fires iff primary is None or primary < 100_000; no @settings(deadline=None) per CLAUDE.md - Config pin (1): MIN_PLAUSIBLE_SHARE_COUNT == 100_000 drift detector - @network (1): live ERIE probe, recovered shares ∈ (50M, 65M) — mirrors STZ/AAPL/WMT @network pins from PR #182 Verified: pytest tests/test_ingest/test_fundamentals.py -m "not network" → 11 passed, 1 deselected, 0 failures. ruff clean. https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4

…ausibly-low primary extraction (#246) Closes #246. ERIE was returning shares_outstanding=2542 (real ~57M) because SEC companyfacts aggregate API filters out dimensional facts; ERIE files Class A (~54.9M) only with dimension, Class B (~2,541) without, so aggregate returns Class B only. PR #182 STZ fallback existed but trigger was strict `shares is None` so ERIE's non-None 2,542 slipped past. This PR extends the trigger to also fire when shares < MIN_PLAUSIBLE_SHARE_COUNT (100K — 30× safer than any plausible S&P 500 minimum share count). Production changes: - compute/config.py: new MIN_PLAUSIBLE_SHARE_COUNT = 100_000 constant - compute/ingest/fundamentals.py: trigger extended `is None OR < 100K`; logger surfaces primary=<None|count> for operator visibility - PHASE_STATUS_INFLIGHT.md: in-flight entry per §Conventions Tests follow in a subsequent commit (test-engineer spawn in flight). Side-effect coverage: - BRK-B (1.64M, above threshold): not fixed here (Class A→B 1500:1 conversion is separate methodology call); existing DQ veto stays - V/FOXA/NWS/NWSA: above threshold, covered by issue #248 PR2a/PR2b per methodology-scientist 2026-05-25 verdict Backward-compat: ERIE risk_flags[data_quality_input_corruption] will stop firing post-fix (shares becomes plausible → TBVPS gate doesn't trigger). No existing test pins ERIE's veto state. https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4

@given

12 tests for the trigger-decision branch added by the previous commit (cd9cbd9). Suite 1171 → 1182 (+11 offline + 1 @network gated). Coverage: - Boundary + branch (6): ERIE shape (2542 fires) · plausible 100M (no fire) · None backward-compat (PR #182 STZ path) · strict-< floor at 99_999/100_000/100_001 - Gate preservation (2): too-low + revenue=0 blocks · too-low + total_assets=0 blocks (PR #182 invariants preserved) - Logging (1): caplog distinguishes primary=None vs primary=2542 for operator visibility - Hypothesis property (1): @given(st.one_of(st.none(), st.integers(0, 10M))) — fallback fires iff primary is None or primary < 100_000; no @settings(deadline=None) per CLAUDE.md - Config pin (1): MIN_PLAUSIBLE_SHARE_COUNT == 100_000 drift detector - @network (1): live ERIE probe, recovered shares ∈ (50M, 65M) — mirrors STZ/AAPL/WMT @network pins from PR #182 Verified: pytest tests/test_ingest/test_fundamentals.py -m "not network" → 11 passed, 1 deselected, 0 failures. ruff clean. https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4

… (V/NWS/NWSA/FOX/FOXA/BRK-B/STZ) Closes the V/Visa root-cause undercount surfaced by PR2a (PR #256) cross-source observability — SEC `companyfacts` aggregate API filters dimensional facts, so multi-class issuers' primary `shares_outstanding` value is mechanically incomplete (Class A only). Locked across 6 grill-me + 3 sub-agent verdicts: - **methodology-scientist** (Mode B) — Damodaran 2019 *Investment Valuation* 3rd ed. Ch. 16: total common shares outstanding = SUM across all classes; voting-premium discount applies to PRICE only. Summed dimensional value is definitionally the truth (0% delta threshold, NOT 10% as originally proposed — a 10% gate would suppress the truth for any filer with minor classes < 10%). - **performance-engineer** (Mode B) — universal peek-XBRL adds ~5 min wall-clock and breaches the <5 min warm-cache budget (current p95 fundamentals latency 16.25s already over the 15s warn threshold). Allowlist limits peek to 7 tickers → ~5s wall-clock total. - **edgar-debugger** (Mode B) — verified allowlist completeness via EPS cross-check on production output: V (4.5x), NWS/NWSA (1.56x), FOX/FOXA (2.2x), BRK-B (1300x improvement; ~14% residual Class A 1500x weighting deferred to Q3 2026-08-19 cohort audit), STZ (existing None-trigger path). GOOG/GOOGL excluded (file non-dimensionally; companyfacts works). **Code changes**: - `compute/config.py` — SCHEMA_VERSION 0.10.3 → 0.10.4-phase4.5e; `MULTI_CLASS_SHARE_ALLOWLIST: frozenset[str]` (7 tickers) with Damodaran provenance docstring + BRK-B caveat + expansion-gate procedure. - `compute/ingest/fundamentals.py` — `_FALLBACK_STATS` gains `dimensional_override` counter; new `elif` branch in `_build_snapshot` fires for allowlist tickers with plausible primary; reuses existing `_fetch_shares_from_per_filing_xbrl` (PR #182 STZ path); overrides when `summed > primary`. - `compute/main.py` — read + log + wire new counter to Metadata. - `compute/output/schemas.py` + `frontend/lib/types.ts` + `frontend/lib/schema-snapshot.json` — new `Metadata.shares_fallback_dimensional_override_count: int | None` (additive, nullable; PATCH bump). **Behavior delta**: - 7 allowlist tickers see corrected `shares_outstanding` → `market_cap` / `pe_ratio_ttm` / fair-price ensemble update (some Top-N rank moves likely) - 495 other S&P 500 tickers: ZERO change (composite / risk_flags / fair_price / top5 rotation unchanged) - STZ unchanged (already on None-trigger path) **Test pin**: `tests/test_config.py` — schema version pin update + new `test_multi_class_share_allowlist_membership` (catches accidental allowlist mutation). 7 new offline test cases for the dimensional override path land in tests/test_ingest/test_fundamentals.py in a follow-up commit (test-engineer sub-agent writing in parallel). Expected Metadata fingerprint (post first cron): - `shares_fallback_dimensional_override_count` ≈ 6 (V + NWS + NWSA + FOX + FOXA + BRK-B; STZ captured by None-trigger path first) Refs Issue #248, supersedes PR #256's 0.10.3 schema bump. https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4

… fix for GOOG/GOOGL $4.6T overcount) (#269) Closes the structural half of issue #261 — the OVERCOUNT pattern where SEC companyfacts returns Alphabet's 12.12B total shares to both per-class tickers, producing $4.6T market_cap per ticker vs real ~$1.05T per class. PR-A (PR #264, merged) shipped the multi_class_aggregate_shares_suspected annotate; this PR-B ships the actual fix. Per methodology-scientist Mode B 2026-05-26 (Path 1 reverse- allowlist) + edgar-debugger live probe (Alphabet 10-K accession 0001652044-26-000018). compute/config.py: - New MULTI_CLASS_OVERCOUNT_ALLOWLIST: dict[str, str]: - GOOGL → "us-gaap:CommonClassAMember" (standard namespace) - GOOG → "goog:CapitalClassCMember" (FILER-SPECIFIC namespace gotcha — caught by edgar-debugger probe) - SCHEMA_VERSION bump 0.10.5 → 0.10.6-phase4.5e compute/ingest/fundamentals.py: - Extended _fetch_shares_from_per_filing_xbrl with target_class_member parameter (None=sum-all PR #182 STZ pattern; set=filter to specific class member via xbrl.contexts[ref].dimensions lookup) - New elif branch in _build_snapshot fires when ticker in allowlist + primary plausible + QR_SKIP_FUNDAMENTALS not set; overrides primary IFF per_class < primary; defensive mc_reconcile_failure counter for sanity-check failures (per_class >= primary OR per_class fraction outside 5-95% of primary) compute/output/schemas.py + frontend/lib/types.ts + snapshot: - Triple lockstep — 2 additive Metadata fields: multi_class_per_class_override_count (expected steady-state ≈ 2) multi_class_mc_reconcile_failure_count (defensive Rule-18 guard) compute/main.py — wire counters from _FALLBACK_STATS Tests +9 (1216 → 1225): - test_config.py: schema pin update, allowlist membership pin, disjoint-allowlist invariant test - test_ingest/test_fundamentals.py: GOOG override (filer-namespace), GOOGL override (standard namespace), non-allowlist ticker doesn't fire, QR_SKIP_FUNDAMENTALS escape-hatch, per_class >= primary sanity skip, mc_reconcile warning on <5% fraction, None return silently skipped. Plus 3 existing _FALLBACK_STATS tests updated to new 5-key dict shape (was 3 keys). ZERO behavior change for 500 non-allowlist tickers. 2 allowlist tickers (GOOG/GOOGL) gain corrected shares_outstanding (~5.4B/~5.8B from prior 12.12B overcount); flows through to market_cap (~$4.6T → ~$1.05T per class), pe_ratio_ttm, fair-price ensemble. The multi_class_aggregate_shares_suspected annotate (PR-A) continues to fire correctly (CIK collision invariant holds). Verification: - ruff clean - python -m compute.output.schema_check — triple in sync at 0.10.6 - pytest 1225 passed (offline), 7 skipped (factors extras), 24 deselected (@network — GOOG/GOOGL live drift-detector deferred) PHASE_STATUS_INFLIGHT.md side-file satisfies §Conventions lockstep per PR #237 convention. https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4 Co-authored-by: Claude <noreply@anthropic.com>

vercel Bot deployed to Preview May 21, 2026 16:15 View deployment

dackclup marked this pull request as ready for review May 21, 2026 16:20

dackclup merged commit a612901 into main May 21, 2026
4 checks passed

dackclup deleted the claude/per-filing-xbrl-shares-fallback-i176 branch May 21, 2026 16:20

This was referenced May 23, 2026

Process hygiene #5 — Quarterly cohort-threshold fire-rate review (recurring tracker) #130

Open

fix(ingest): DD eps_diluted TTM derivation + STZ fallback logger.warning #220

Merged

This was referenced Jun 2, 2026

bug: BF-B — share_count_extraction_missing fires despite revenue > 0 (PR #182 dimensional fallback returned null) #376

Open

data: GEV (GE Vernova) — spinoff gap in shares_outstanding; add to issue #10 watchlist #379

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ingest): per-filing XBRL fallback recovers STZ-style dimensional shares_outstanding (closes #176)#182

fix(ingest): per-filing XBRL fallback recovers STZ-style dimensional shares_outstanding (closes #176)#182
dackclup merged 1 commit into
mainfrom
claude/per-filing-xbrl-shares-fallback-i176

dackclup commented May 21, 2026

Uh oh!

vercel Bot commented May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dackclup commented May 21, 2026

Summary

Approach

Live verification (this session)

Files changed

Test plan

Constraints upheld

Follow-ups (out of scope)

Uh oh!

vercel Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented May 21, 2026 •

edited

Loading