Skip to content

fix(ingest): per-filing XBRL fallback recovers STZ-style dimensional shares_outstanding (closes #176)#182

Merged
dackclup merged 1 commit into
mainfrom
claude/per-filing-xbrl-shares-fallback-i176
May 21, 2026
Merged

fix(ingest): per-filing XBRL fallback recovers STZ-style dimensional shares_outstanding (closes #176)#182
dackclup merged 1 commit into
mainfrom
claude/per-filing-xbrl-shares-fallback-i176

Conversation

@dackclup
Copy link
Copy Markdown
Owner

Summary

Root-cause fix for issue #176 — actually recovers the missing shares_outstanding for STZ-style filers, paired with PR #181 (visibility annotate).

The investigation comment on issue #176 documents the SEC API probing that motivated this fix: companyfacts aggregate filters out dimensional facts, but per-filing XBRL exposes them with is_dimensioned=True and a period_instant "as-of" column.

Approach

New private helper _fetch_shares_from_per_filing_xbrl(company) in compute/ingest/fundamentals.py:

  1. Pulls the most recent 10-K (falls back to 10-Q if no 10-K on file).
  2. Aggregates dei:EntityCommonStockSharesOutstanding across all dimensional contexts at the most-recent period_instant.
  3. Falls back to us-gaap:CommonStockSharesIssued if dei is empty.
  4. Wrapped in graceful-degradation try/except — any failure returns None and the upstream PR-feat(defense): add share_count_extraction_missing annotate (closes #176) #181 annotate share_count_extraction_missing keeps firing as the safety net.

Trigger gate matches the PR-#181 signature exactly: shares_outstanding is None AND revenue > 0 AND total_assets > 0. Universe-wide HTTP cost is bounded to ~1-3 tickers per cron (blast radius = 1 on 2026-05-14).

Live verification (this session)

Ticker Recovered shares Ground truth
STZ 172.20M (Class A 172.17M + Class B 26K) ~172M total ✓
AAPL 14.78B ~15B ✓
WMT 7.97B ~8B (post-2024 split) ✓

Files changed

  • compute/ingest/fundamentals.py_fetch_shares_from_per_filing_xbrl(company) + the in-_build_snapshot fallback trigger after balance_values is filled.
  • tests/test_ingest/test_fundamentals_xbrl_fallback.py (new file) — 9 offline mock tests + 1 @network STZ live drift-detector.
  • CLAUDE.md + AGENTS.md — lockstep ship-with-every-PR entries (Gotchas + Recently-merged + cross-tool agent note about the new edgartools Filing.xbrl().facts.get_facts_by_concept call surface).

Test plan

  • ruff check compute/ tests/ clean
  • python -m pytest -m "not network" -q1049 passed (1040 → 1049, +9 new tests)
  • EDGAR_USER_AGENT=... python -m pytest --run-network -k live_stz — passes
  • Live SEC probe: STZ / AAPL / WMT all match ground truth
  • CI green
  • Pre-merge prod-sim diff (expect: STZ now has a populated market_cap + shares_outstanding + EPS/P/E; the share_count_extraction_missing annotate stops firing for STZ; rank may move from 308 because some valuation methods that needed shares can now compute)
  • User authorize merge

Constraints upheld

Follow-ups (out of scope)

  • Cache layer for per-filing XBRL — if the affected-ticker count grows past ~5/cron, add a compute/cache/per_filing_xbrl/ directory keyed by (cik, accession_no). XBRL doesn't change after filing so TTL is effectively infinite. Not needed today.
  • Drift-detector manifest for the edgartools Filing.xbrl().facts.get_facts_by_concept API surface — defer to the next time edgartools' CHANGELOG shows movement in that area.
  • Per-filing-XBRL extraction for other partial-extraction failure modes — same pattern could rescue eps_basic / eps_diluted for dimensional filers, but isn't needed today (the affected tickers' EPS extraction goes through a different path that handles dimensions differently).

https://claude.ai/code/session_01HHo4UHKc9iKKytkKfxfVnA


Generated by Claude Code

…shares_outstanding (closes #176)

Issue #176 ships in two PRs:
  - PR #181 (visibility) — annotate `share_count_extraction_missing`
    surfaces tickers where shares_outstanding extraction fails.
  - This PR (root cause) — actually recovers the missing share count.

Live SEC probe (2026-05-21) confirmed STZ files
`dei:EntityCommonStockSharesOutstanding` only with Class A / Class B
share-class dimensions. The SEC `companyfacts` aggregate API filters
out dimensional facts (companyconcept API returns HTTP 404 on STZ for
both `dei:EntityCommonStockSharesOutstanding` and
`us-gaap:CommonStockSharesOutstanding`), so the primary extraction
path via `Company.facts.get_concept` returns None even though revenue
+ balance sheet extract cleanly.

New `_fetch_shares_from_per_filing_xbrl(company)` pulls the most
recent 10-K (falls back to 10-Q if none on file), aggregates
`dei:EntityCommonStockSharesOutstanding` across all dimensional
contexts at the most-recent `period_instant` (share-count facts are
instant-type, not flow-type), and returns the sum. Falls back to
`us-gaap:CommonStockSharesIssued` if the dei concept is empty.

Wrapped in graceful-degradation try/except — any failure returns
None and the upstream PR-#181 annotate keeps firing as the safety
net. Triggered ONLY when the primary extraction returns None AND
revenue > 0 AND total_assets > 0 (the PR-#181 signature), bounding
universe-wide HTTP cost to ~1-3 tickers per cron (blast radius = 1
on the 2026-05-14 baseline).

Live verification:
  STZ:  172.20M shares (Class A 172.17M + Class B 26K) ✓
  AAPL: 14.78B  ✓
  WMT:  7.97B   ✓

Tests 1040 → 1049: +9 offline mock tests covering positive
STZ-signature aggregation, most-recent period_instant filter,
us-gaap fallback chain, six graceful-degradation paths, plus 1
@network STZ live drift-detector (locks the period_instant column
+ get_facts_by_concept shape against future edgartools API drift).

No schema change — operates at the snapshot-construction layer.

https://claude.ai/code/session_01HHo4UHKc9iKKytkKfxfVnA
@vercel
Copy link
Copy Markdown

vercel Bot commented May 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
quantrank Ready Ready Preview, Comment May 21, 2026 4:15pm

@dackclup dackclup marked this pull request as ready for review May 21, 2026 16:20
@dackclup dackclup merged commit a612901 into main May 21, 2026
4 checks passed
@dackclup dackclup deleted the claude/per-filing-xbrl-shares-fallback-i176 branch May 21, 2026 16:20
dackclup added a commit that referenced this pull request May 23, 2026
…ing (#220)

Two cron-#3 audit follow-ups (2026-05-23) folded into one PR. Both
surfaced by stock-detail-auditor + root-caused by edgar-debugger.

## Bug 1 — DD eps_diluted XBRL single-period mis-parse

DD shipped 2026-05-23 with `eps_diluted=0.39` against `NI/shares =
$7M/410M = $0.017` (~23× off). Root cause: `compute/main.py`
`_build_raw_metrics` was passing `snapshot.eps_diluted` raw to
RawMetrics. The XBRL `EarningsPerShareDiluted` concept returns the
**latest single-period value** per `fundamentals.py:114-117` — for a
quarterly filer that's one quarter's EPS, not TTM.

`pe_ratio_ttm` was already on the NI/shares path since audit #6 /
PR #49, so the valuation chain held internal consistency — but the
EPS display field on /stock/DD rendered the wrong number to users.

Fix: compute `ttm_eps = NI / shares` once and use for both
`eps_basic` + `eps_diluted` display fields. The basic-vs-diluted
spread on the S&P 500 is typically < 1-3% — within display
precision. Negative net_income preserves sign (loss-year tickers
show "−$0.42 EPS" not "null"). `pe_ratio_ttm` formula unchanged.

## Bug 2 — STZ shares_outstanding fallback silent failure

STZ on 2026-05-23 ran with `shares_outstanding=null` + `market_cap=
null` despite PR #182's per-filing XBRL fallback. 2026-05-21 live
SEC probe confirmed the fallback works (returns 172.20M). Two days
later under cron load it returned None silently.

Root cause: PR #182's outer `except: return None` was bare — no
log line on the failure path. The operator couldn't distinguish
transient SEC 429 from structural XBRL drift without re-running a
live probe. PR-3d amplification pattern parallel: graceful
degradation correct, but silence kills observability.

Fix:
- Thread `ticker` arg into `_fetch_shares_from_per_filing_xbrl`
  (optional kwarg, back-compat preserved — existing 8 offline tests
  still pass without the kwarg)
- Emit `logger.warning("shares_outstanding fallback FAILED for %s
  — %s: %s", ticker, type(e).__name__, e)` on the outer except;
  the message includes a note that `share_count_extraction_missing`
  annotate will fire as the safety net
- Inner two `except` blocks (filings.head() + get_facts_by_concept)
  log at DEBUG so the failure mode is distinguishable in verbose
  mode without spamming default-level logs

Annotate `share_count_extraction_missing` (PR #181) keeps firing
upstream — this PR is observability-only, not a new recovery path.

## Tests (+7)

`tests/test_main.py` (+5):
- `test_build_raw_metrics_eps_diluted_derived_from_ni_not_xbrl_singleperiod`
  pins the DD case: snap.eps_diluted=0.39 + NI=7M + shares=410M →
  RawMetrics.eps_diluted = 7M/410M ≈ $0.017 (NOT 0.39)
- `test_build_raw_metrics_eps_preserves_negative_sign_on_loss_year`
  loss-year shows signed EPS, pe_ttm null
- `test_build_raw_metrics_eps_null_when_shares_outstanding_missing`
  STZ regression case — eps fields null, no exception
- `test_build_raw_metrics_eps_null_when_zero_shares` defensive guard
- `test_build_raw_metrics_pe_ttm_unchanged_by_eps_fix` audit-#6 /
  PR #49 regression guard — pe_ttm logic preserved

`tests/test_ingest/test_fundamentals_xbrl_fallback.py` (+2):
- `test_per_filing_fallback_emits_warning_on_outer_except` pins the
  logger.warning emission with caplog when get_filings raises
- `test_per_filing_fallback_ticker_arg_optional_for_back_compat`
  pins the back-compat path — no ticker kwarg → "?" sentinel in the
  log message, existing call sites unbroken

Tests 1049 → 1056 (+7). Pre-existing optional-deps skips
(ipca / qlib / OSAP — `.[factors]` extra) unaffected.

## Verification

- `ruff check .` — clean
- `python -m pytest tests/ -m "not network" --ignore=tests/test_features/test_osap_e2e_integration.py --ignore=tests/test_ingest/test_osap.py` —
  **1103 passed, 7 skipped, 20 deselected**
- No schema / Pydantic / TypeScript / snapshot triple touch
- No frontend touch

## Issues filed in parallel (cron-#3 audit follow-ups)

- **#217** stock-detail-auditor factor-exposure proxy heuristic
  (OSAP false-positive prevention)
- **#218** verify-helper Section L OSAP invariant assertion

## Cross-references

- Issue #176 / PR #181 / PR #182 — STZ shares_outstanding recovery
  ladder this PR completes
- Audit #6 / PR #49 — `pe_ratio_ttm` NI/shares pattern this PR
  extends to `eps_diluted` / `eps_basic` display fields
- 2026-05-23 cron #3 stock-detail-auditor + edgar-debugger reports

Co-authored-by: Claude <noreply@anthropic.com>
dackclup pushed a commit that referenced this pull request May 25, 2026
12 tests for the trigger-decision branch added by the previous commit
(cd9cbd9). Suite 1171 → 1182 (+11 offline + 1 @network gated).

Coverage:
- Boundary + branch (6): ERIE shape (2542 fires) · plausible 100M (no
  fire) · None backward-compat (PR #182 STZ path) · strict-< floor at
  99_999/100_000/100_001
- Gate preservation (2): too-low + revenue=0 blocks · too-low +
  total_assets=0 blocks (PR #182 invariants preserved)
- Logging (1): caplog distinguishes primary=None vs primary=2542 for
  operator visibility
- Hypothesis property (1): @given(st.one_of(st.none(),
  st.integers(0, 10M))) — fallback fires iff primary is None or
  primary < 100_000; no @settings(deadline=None) per CLAUDE.md
- Config pin (1): MIN_PLAUSIBLE_SHARE_COUNT == 100_000 drift detector
- @network (1): live ERIE probe, recovered shares ∈ (50M, 65M) —
  mirrors STZ/AAPL/WMT @network pins from PR #182

Verified: pytest tests/test_ingest/test_fundamentals.py -m "not network"
→ 11 passed, 1 deselected, 0 failures. ruff clean.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
dackclup pushed a commit that referenced this pull request May 25, 2026
…ausibly-low primary extraction (#246)

Closes #246. ERIE was returning shares_outstanding=2542 (real ~57M)
because SEC companyfacts aggregate API filters out dimensional facts;
ERIE files Class A (~54.9M) only with dimension, Class B (~2,541)
without, so aggregate returns Class B only. PR #182 STZ fallback
existed but trigger was strict `shares is None` so ERIE's non-None
2,542 slipped past. This PR extends the trigger to also fire when
shares < MIN_PLAUSIBLE_SHARE_COUNT (100K — 30× safer than any
plausible S&P 500 minimum share count).

Production changes:
- compute/config.py: new MIN_PLAUSIBLE_SHARE_COUNT = 100_000 constant
- compute/ingest/fundamentals.py: trigger extended `is None OR < 100K`;
  logger surfaces primary=<None|count> for operator visibility
- PHASE_STATUS_INFLIGHT.md: in-flight entry per §Conventions

Tests follow in a subsequent commit (test-engineer spawn in flight).

Side-effect coverage:
- BRK-B (1.64M, above threshold): not fixed here (Class A→B 1500:1
  conversion is separate methodology call); existing DQ veto stays
- V/FOXA/NWS/NWSA: above threshold, covered by issue #248 PR2a/PR2b
  per methodology-scientist 2026-05-25 verdict

Backward-compat: ERIE risk_flags[data_quality_input_corruption] will
stop firing post-fix (shares becomes plausible → TBVPS gate doesn't
trigger). No existing test pins ERIE's veto state.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
dackclup pushed a commit that referenced this pull request May 25, 2026
12 tests for the trigger-decision branch added by the previous commit
(cd9cbd9). Suite 1171 → 1182 (+11 offline + 1 @network gated).

Coverage:
- Boundary + branch (6): ERIE shape (2542 fires) · plausible 100M (no
  fire) · None backward-compat (PR #182 STZ path) · strict-< floor at
  99_999/100_000/100_001
- Gate preservation (2): too-low + revenue=0 blocks · too-low +
  total_assets=0 blocks (PR #182 invariants preserved)
- Logging (1): caplog distinguishes primary=None vs primary=2542 for
  operator visibility
- Hypothesis property (1): @given(st.one_of(st.none(),
  st.integers(0, 10M))) — fallback fires iff primary is None or
  primary < 100_000; no @settings(deadline=None) per CLAUDE.md
- Config pin (1): MIN_PLAUSIBLE_SHARE_COUNT == 100_000 drift detector
- @network (1): live ERIE probe, recovered shares ∈ (50M, 65M) —
  mirrors STZ/AAPL/WMT @network pins from PR #182

Verified: pytest tests/test_ingest/test_fundamentals.py -m "not network"
→ 11 passed, 1 deselected, 0 failures. ruff clean.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
dackclup pushed a commit that referenced this pull request May 25, 2026
… (V/NWS/NWSA/FOX/FOXA/BRK-B/STZ)

Closes the V/Visa root-cause undercount surfaced by PR2a (PR #256)
cross-source observability — SEC `companyfacts` aggregate API filters
dimensional facts, so multi-class issuers' primary `shares_outstanding`
value is mechanically incomplete (Class A only).

Locked across 6 grill-me + 3 sub-agent verdicts:

- **methodology-scientist** (Mode B) — Damodaran 2019 *Investment
  Valuation* 3rd ed. Ch. 16: total common shares outstanding = SUM
  across all classes; voting-premium discount applies to PRICE only.
  Summed dimensional value is definitionally the truth (0% delta
  threshold, NOT 10% as originally proposed — a 10% gate would
  suppress the truth for any filer with minor classes < 10%).
- **performance-engineer** (Mode B) — universal peek-XBRL adds ~5 min
  wall-clock and breaches the <5 min warm-cache budget (current p95
  fundamentals latency 16.25s already over the 15s warn threshold).
  Allowlist limits peek to 7 tickers → ~5s wall-clock total.
- **edgar-debugger** (Mode B) — verified allowlist completeness via
  EPS cross-check on production output: V (4.5x), NWS/NWSA (1.56x),
  FOX/FOXA (2.2x), BRK-B (1300x improvement; ~14% residual Class A
  1500x weighting deferred to Q3 2026-08-19 cohort audit), STZ
  (existing None-trigger path). GOOG/GOOGL excluded (file
  non-dimensionally; companyfacts works).

**Code changes**:
- `compute/config.py` — SCHEMA_VERSION 0.10.3 → 0.10.4-phase4.5e;
  `MULTI_CLASS_SHARE_ALLOWLIST: frozenset[str]` (7 tickers) with
  Damodaran provenance docstring + BRK-B caveat + expansion-gate
  procedure.
- `compute/ingest/fundamentals.py` — `_FALLBACK_STATS` gains
  `dimensional_override` counter; new `elif` branch in `_build_snapshot`
  fires for allowlist tickers with plausible primary; reuses
  existing `_fetch_shares_from_per_filing_xbrl` (PR #182 STZ path);
  overrides when `summed > primary`.
- `compute/main.py` — read + log + wire new counter to Metadata.
- `compute/output/schemas.py` + `frontend/lib/types.ts` +
  `frontend/lib/schema-snapshot.json` — new
  `Metadata.shares_fallback_dimensional_override_count: int | None`
  (additive, nullable; PATCH bump).

**Behavior delta**:
- 7 allowlist tickers see corrected `shares_outstanding` →
  `market_cap` / `pe_ratio_ttm` / fair-price ensemble update
  (some Top-N rank moves likely)
- 495 other S&P 500 tickers: ZERO change (composite / risk_flags /
  fair_price / top5 rotation unchanged)
- STZ unchanged (already on None-trigger path)

**Test pin**: `tests/test_config.py` — schema version pin update +
new `test_multi_class_share_allowlist_membership` (catches accidental
allowlist mutation). 7 new offline test cases for the dimensional
override path land in tests/test_ingest/test_fundamentals.py in a
follow-up commit (test-engineer sub-agent writing in parallel).

Expected Metadata fingerprint (post first cron):
- `shares_fallback_dimensional_override_count` ≈ 6
  (V + NWS + NWSA + FOX + FOXA + BRK-B; STZ captured by None-trigger
  path first)

Refs Issue #248, supersedes PR #256's 0.10.3 schema bump.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
dackclup added a commit that referenced this pull request May 26, 2026
… fix for GOOG/GOOGL $4.6T overcount) (#269)

Closes the structural half of issue #261 — the OVERCOUNT pattern
where SEC companyfacts returns Alphabet's 12.12B total shares to
both per-class tickers, producing $4.6T market_cap per ticker vs
real ~$1.05T per class. PR-A (PR #264, merged) shipped the
multi_class_aggregate_shares_suspected annotate; this PR-B ships
the actual fix.

Per methodology-scientist Mode B 2026-05-26 (Path 1 reverse-
allowlist) + edgar-debugger live probe (Alphabet 10-K accession
0001652044-26-000018).

compute/config.py:
- New MULTI_CLASS_OVERCOUNT_ALLOWLIST: dict[str, str]:
  - GOOGL → "us-gaap:CommonClassAMember" (standard namespace)
  - GOOG  → "goog:CapitalClassCMember" (FILER-SPECIFIC namespace
    gotcha — caught by edgar-debugger probe)
- SCHEMA_VERSION bump 0.10.5 → 0.10.6-phase4.5e

compute/ingest/fundamentals.py:
- Extended _fetch_shares_from_per_filing_xbrl with target_class_member
  parameter (None=sum-all PR #182 STZ pattern; set=filter to specific
  class member via xbrl.contexts[ref].dimensions lookup)
- New elif branch in _build_snapshot fires when ticker in allowlist +
  primary plausible + QR_SKIP_FUNDAMENTALS not set; overrides primary
  IFF per_class < primary; defensive mc_reconcile_failure counter
  for sanity-check failures (per_class >= primary OR per_class fraction
  outside 5-95% of primary)

compute/output/schemas.py + frontend/lib/types.ts + snapshot:
- Triple lockstep — 2 additive Metadata fields:
  multi_class_per_class_override_count (expected steady-state ≈ 2)
  multi_class_mc_reconcile_failure_count (defensive Rule-18 guard)

compute/main.py — wire counters from _FALLBACK_STATS

Tests +9 (1216 → 1225):
- test_config.py: schema pin update, allowlist membership pin,
  disjoint-allowlist invariant test
- test_ingest/test_fundamentals.py: GOOG override (filer-namespace),
  GOOGL override (standard namespace), non-allowlist ticker doesn't
  fire, QR_SKIP_FUNDAMENTALS escape-hatch, per_class >= primary
  sanity skip, mc_reconcile warning on <5% fraction, None return
  silently skipped. Plus 3 existing _FALLBACK_STATS tests updated
  to new 5-key dict shape (was 3 keys).

ZERO behavior change for 500 non-allowlist tickers. 2 allowlist
tickers (GOOG/GOOGL) gain corrected shares_outstanding (~5.4B/~5.8B
from prior 12.12B overcount); flows through to market_cap (~$4.6T
→ ~$1.05T per class), pe_ratio_ttm, fair-price ensemble. The
multi_class_aggregate_shares_suspected annotate (PR-A) continues
to fire correctly (CIK collision invariant holds).

Verification:
- ruff clean
- python -m compute.output.schema_check — triple in sync at 0.10.6
- pytest 1225 passed (offline), 7 skipped (factors extras), 24
  deselected (@network — GOOG/GOOGL live drift-detector deferred)

PHASE_STATUS_INFLIGHT.md side-file satisfies §Conventions lockstep
per PR #237 convention.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants