Skip to content

feat(scoring): Issue #261 PR-A — multi_class_aggregate_shares_suspected annotate (CIK-collision detector)#264

Merged
dackclup merged 2 commits into
mainfrom
claude/issue-261-multi-class-annotate
May 26, 2026
Merged

feat(scoring): Issue #261 PR-A — multi_class_aggregate_shares_suspected annotate (CIK-collision detector)#264
dackclup merged 2 commits into
mainfrom
claude/issue-261-multi-class-annotate

Conversation

@dackclup
Copy link
Copy Markdown
Owner

Summary

Closes the observability half of issue #261 — GOOG/GOOGL multi-class shares overcount where both tickers store Alphabet's 12.12B total shares (companyfacts aggregate-only filer), producing $4.6T market_cap per ticker vs real ~$1.05T per class.

Per methodology-scientist Mode B verdict 2026-05-26 (NEEDS-MORE-CALIBRATION):

  • PR-A (this PR) — annotate-only multi_class_aggregate_shares_suspected (CIK-collision detector)
  • PR-B (next PR) — reverse-allowlist per-class XBRL extraction (structural fix). Gated on the edgar-debugger probe that confirmed per-class dimensional contexts are XBRL-available.

Detector

New compute/scoring/multi_class_shares.py exports a pure universe-level function:

detect_multi_class_aggregate_shares_suspected(
    cik_by_ticker: dict[str, str | None],
    market_cap_by_ticker: dict[str, float | None],
) -> set[str]

Trigger (methodology-scientist Mode B):

  1. Ticker's CIK collides with ≥ 1 other ticker in the universe (CIK-collision signature of a multi-class filer)
  2. Ticker's market_cap > MARKET_CAP_FLOOR_RATIO × universe-median(market_cap) (MARKET_CAP_FLOOR_RATIO = 0.10)

Annotate-only per portable-annotate-before-veto; composite rank unchanged.

Schema bump

0.10.4-phase4.5e0.10.5-phase4.5e (PATCH; one additive Metadata field). Triple lockstep:

  • compute/output/schemas.pyMetadata.multi_class_aggregate_shares_suspected_count: int | None
  • frontend/lib/types.ts — TypeScript mirror
  • frontend/lib/schema-snapshot.json — regenerated

Wiring in compute/main.py

  • Pre-loop: build cik_by_ticker + market_cap_by_ticker dicts (universe-level scan needs full data upfront)
  • Inside loop: emit multi_class_aggregate_shares_suspected to valuation_warnings when ticker in flagged set; increment counter
  • Post-loop: wire counter to Metadata(...)

Tests (1196 → 1216 passing, +13 new + 1 updated)

tests/test_scoring/test_multi_class_shares.py:

  1. MARKET_CAP_FLOOR_RATIO constant pin
  2. Empty universe → empty set
  3. No-collision universe → empty set
  4. Canonical GOOG/GOOGL case → both fire
  5. Collision below floor (micro-class artifact) → empty set
  6. Partial above-floor → only above-floor sibling fires
  7. None CIK → excluded from collision detection
  8. None market_cap → excluded from firing set
  9. All-None market_caps → empty set (no divide-by-zero)
  10. Three-way collision → all fire
  11. Threshold strict-inequality boundary (at-floor excluded by >)
  12. Hypothesis property: firing set ⊆ collision set (regression guard)

tests/test_config.py::test_schema_version_is_phase4_5e updated 0.10.40.10.5-phase4.5e.

Edgar-debugger findings (live probe, 2026-05-26)

Filing inspected: Alphabet 10-K accession 0001652044-26-000018, FY2025.

VERDICT: PER-CLASS-AVAILABLE-IN-XBRL

Per-class share counts ARE present as dimensional facts on us-gaap:CommonStockSharesOutstanding at the balance-sheet date:

StatementClassOfStockAxis member Shares Maps to
us-gaap:CommonClassAMember 5.822B GOOGL
us-gaap:CommonClassBMember 0.837B (founders, not traded)
goog:CapitalClassCMember 5.429B GOOG

Per-class sum = 12.088B ≈ aggregate (perfect reconcile per Damodaran 2019 Ch. 16 Σ MC_class = MC_total identity).

CRITICAL gotcha for PR-B: GOOG Class C uses the filer-specific namespace goog:CapitalClassCMember, NOT the standard us-gaap:CommonClassCMember. An allowlist keyed on the standard namespace would silently return zero rows for GOOG. PR-B's allowlist will need:

MULTI_CLASS_SHARE_PER_CLASS_ALLOWLIST = {
    "GOOGL": "us-gaap:CommonClassAMember",   # standard
    "GOOG":  "goog:CapitalClassCMember",      # filer-namespace — gotcha
}

Behavior impact

ZERO behavior change for 496 non-colliding S&P 500 tickers — composite / risk_flags / fair_price / top5 rotation unchanged. The 6 multi-class tickers (GOOG / GOOGL / NWS / NWSA / FOX / FOXA) gain the new annotate in valuation_warnings; composite rank unaffected.

Expected Metadata fingerprint post first cron:

  • multi_class_aggregate_shares_suspected_count ≈ 6

Verification ladder

  • ruff check . — All checks passed
  • python -m compute.output.schema_check — clean (triple in sync)
  • pytest tests/ -m "not network"1216 passed, 7 skipped (ipca/qlib not installed), 24 deselected (@network)
  • Vercel preview (auto-deploys post-push)
  • CI: full lint+test+schema+frontend gauntlet

Deferred follow-ups (not in this PR)

  • PR-B — reverse-allowlist per-class XBRL extraction (structural fix). Code shape proposed by edgar-debugger; allowlist must use filer-namespace goog: member for GOOG.
  • Combined-MC reconcile invariant as Rule-18 diagnostic (|Σ MC_per_class − MC_aggregate| / MC_aggregate < 0.05) — methodology-scientist Mode B Q3 suggestion.
  • Q3 2026-08-19 quarterly audit: walk the multi_class_aggregate_shares_suspected_count history; decide whether to retire after ≥ 2 crons of clean reconcile + recalibrate the 10% floor.

Lockstep

Test plan

  • CI: Python lint+test should pass (513+ test files; this PR adds 1 new test file)
  • CI: Frontend build should pass (single additive Metadata field; no consumer migration)
  • Verify on Vercel preview: /stock/GOOG and /stock/GOOGL should now show multi_class_aggregate_shares_suspected in valuation_warnings on the next cron-feat(phase-2): SEC EDGAR fundamentals + per-stock detail pages #4

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4


Generated by Claude Code

…ed annotate (CIK-collision detector)

Closes the observability half of issue #261 (GOOG/GOOGL multi-class
shares overcount; $4.6T market_cap displayed for ~$1T per class).
Methodology-scientist Mode B verdict 2026-05-26: NEEDS-MORE-CALIBRATION
— split into PR-A (annotate-only, this PR) + PR-B (reverse-allowlist
per-class XBRL extraction, next PR).

New module compute/scoring/multi_class_shares.py:
- detect_multi_class_aggregate_shares_suspected(cik_by_ticker,
  market_cap_by_ticker) -> set[str]
- Trigger: ticker's CIK collides with ≥ 1 other ticker AND
  market_cap > 10% × universe-median(market_cap)
- MARKET_CAP_FLOOR_RATIO = 0.10 (Damodaran 2019 Ch. 16 + Mode B
  verdict; recalibration target Q3 2026-08-19)
- Pure function; graceful on None CIK / None market_cap inputs

Wired into compute/main.py:
- Pre-compute cik_by_ticker + market_cap_by_ticker dicts BEFORE
  the per-ticker scoring loop (universe-level scan)
- Per-ticker emit inside loop alongside cross_source_disagreement
- Counter wired to Metadata.multi_class_aggregate_shares_suspected_count

Schema bump 0.10.4 → 0.10.5-phase4.5e (PATCH; additive Metadata
field). Triple lockstep: schemas.py + types.ts + snapshot.json.

ZERO behavior change for 496 non-colliding S&P 500 tickers —
composite / risk_flags / fair_price / top5 unchanged. 6 multi-
class tickers (GOOG / GOOGL / NWS / NWSA / FOX / FOXA) gain the
annotate in valuation_warnings.

Edgar-debugger 2026-05-26 live probe (Alphabet 10-K accession
0001652044-26-000018) confirms per-class dimensional contexts ARE
available in XBRL for PR-B structural fix. Critical gotcha for PR-B:
GOOG Class C uses filer-specific namespace `goog:CapitalClassCMember`
NOT standard `us-gaap:CommonClassCMember` — allowlist must key on
filer-namespace member.

Tests: 1216 passing offline (+13 new in test_multi_class_shares.py:
empty universe / no-collision / canonical GOOG-GOOGL / micro-class
below-floor / partial-above-floor / None-CIK / None-mcap / all-None /
3-way collision / threshold-boundary / Hypothesis subset property +
test_config.py schema version pin updated).

Verification:
- ruff check . — All checks passed
- python -m compute.output.schema_check — clean
- pytest tests/ -m "not network" — 1216 passed, 7 skipped (ipca/qlib
  not installed), 24 deselected

PHASE_STATUS_INFLIGHT.md side-file entry satisfies §Conventions
"ship with every PR" lockstep per PR #237 convention. No CLAUDE.md
/ AGENTS.md substance change — the annotate doesn't introduce a new
invariant; the pattern is already covered in §Gotchas under
shares-extraction.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
@vercel
Copy link
Copy Markdown

vercel Bot commented May 26, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
quantrank Ready Ready Preview, Comment May 26, 2026 1:26am

CI ci-triage-engineer verdict: ruff I001 fired in CI on
tests/test_scoring/test_multi_class_shares.py:14 because
`from hypothesis import given, strategies as st` was joined as
one statement, which ruff 0.4 splits + reorders to two separate
`from hypothesis import ...` lines per the project's isort config.

Local `ruff check .` passed before push (likely cached/diff scope),
CI runs a clean ruff invocation on the full tree.

Applied via `ruff check --fix tests/test_scoring/test_multi_class_shares.py`
— pure mechanical sort/split, no behavior change. 12 tests still
pass (test_multi_class_shares.py).

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4
@dackclup dackclup marked this pull request as ready for review May 26, 2026 03:07
@dackclup dackclup merged commit d9c6229 into main May 26, 2026
4 of 5 checks passed
@dackclup dackclup deleted the claude/issue-261-multi-class-annotate branch May 26, 2026 03:07
dackclup added a commit that referenced this pull request May 26, 2026
…utput_anomalous + writer-parity for veto cohort UI (#265)

Closes issue #262 (DQIC dual-surface emission inconsistency) per
methodology-scientist Mode B verdict 2026-05-26 (APPROVED-AS-ANNOTATE,
Path 3 = rename).

The bug: data_quality_input_corruption emitted from TWO independent
check sites with different trigger conditions:
- Site 1 (risk_overlay.py:411) — INPUT-level corruption; risk_flags VETO
- Site 2 (ensemble.py:545) — OUTPUT-level anomaly; valuation_warnings ANNOTATE

Site 2's check is strictly broader. Universe scan 2026-05-23 cron #3:
- 2 tickers fire BOTH (ERIE rank 69, BRK-B rank 223)
- 4 tickers fire Site 1 ONLY (MTB/CPT/MRNA/HBAN) — UI explainability gap
- 1 ticker fires Site 2 ONLY (NVR rank 267) — Top-5-safety gap if rose

The bigger smell than NVR Top-5 risk is the UI gap: FairPriceCard.tsx:82
reads only valuation_warnings, so MTB/CPT/MRNA/HBAN render all-null
fair-price with NO explanation chip.

Path 3 fix:
- Rename Site 2 (ensemble.py:533+545) → valuation_output_anomalous
- Writer-parity in compute/main.py: when DQIC veto fires in risk_flags,
  ALSO emit valuation_output_anomalous to valuation_warnings (closes
  UI gap for veto-only cohort)
- applicability.py SKIP_REASONS taxonomy gains valuation_output_anomalous
  (count 25 → 26); legacy data_quality_input_corruption retained for
  backward-compat on pre-rename JSON snapshots
- sanity.py:83 IC-smoke exclusion ORs both identifiers
- FairPriceCard.tsx:82 dataQualityIssue check ORs both identifiers

Tests:
- test_ensemble.py 4 assertions — assert new identifier on Site 2 emit
- test_tier2_schema.py::test_B4_skip_reasons_count → 25 to 26
- test_sanity_smoke.py / test_recommendation.py unchanged
  (legacy-snapshot path verified via OR check; veto identifier
  unchanged)

No schema bump (string-identifier rename only). SCHEMA_VERSION stays
0.10.5-phase4.5e. Triple lockstep unchanged.

ZERO composite-rank impact — composite scores / risk_flags VETO
identifiers / Top-5 rotation unchanged.

Verification: ruff clean / schema_check clean / pytest 1216 passed
(unchanged from PR #264 baseline).

PHASE_STATUS_INFLIGHT.md side-file entry satisfies §Conventions
"ship with every PR" lockstep per PR #237 convention. No CLAUDE.md
/ AGENTS.md substance change — rename doesn't introduce new
invariant; methodology verdict documented in INFLIGHT entry.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 26, 2026
…Craft frontend (#266)

Cuts the v1.3.0-phase4.5e release tag, closing the Phase 4.5e Form-4
insider-clustering ladder (PRs #167+#205+#222+#224+#238) and shipping
the LedgerCraft frontend reskin (A1-A3+B1-B4+animation PRs 1-3+#244
polish+dark-mode tooltip fixes through PR #263) since v1.2.0-phase4.5
(6d414a9, 2026-05-17).

Scope (3 files):
- pyproject.toml — version 0.3.0 → 1.3.0
- docs/release-notes/v1.3.0-phase4.5e.md (NEW) — release body grouped
  by Form-4 cluster / data-quality / defense layer / frontend /
  methodology + agent infra / CI hygiene; ~800 words
- PHASE_STATUS.md — Current state schema 0.10.4 → 0.10.5-phase4.5e,
  defense layer headline 32 → 33 declared flags, production-run
  pointer refreshed to 26423296287

Pre-flight ladder verified by release-captain (opus):
- ruff clean
- pytest 1216 passed (offline)
- schema_check in sync at 0.10.5-phase4.5e
- verify-production-output Section A-G + I-L PASS; Section H 1
  known FAIL (orphan BK.json legacy snapshot, pre-existing)
- frontend build verified via vercel-preview-auditor (sonnet) on
  main HEAD e6013ba — 506/506 routes compiled, types validated,
  runtime clean, 3-route UA probe PASS

Defense scorecard: 7 active vetoes unchanged
(altman_distress / sloan_accruals_top_decile / net_issuance_top_decile
/ non_reliance_filing / beneish_manipulation_veto /
dechow_manipulation_veto / data_quality_input_corruption).
Headline 32 → 33 declared boolean flags
(adds multi_class_aggregate_shares_suspected per PR #264; PR #265
DQIC rename is identifier-shape, not new flag).

Production output: metadata.json reports 0.10.4-phase4.5e from cron
#4 (2026-05-26T01:12); next weekday cron Wed 2026-05-27 22:00 UTC
re-renders at full 0.10.5-phase4.5e semantics. Tag is anchored to
code, not last committed snapshot per release-tag SKILL.md §Gotchas.

CVE baseline 25 → 15 open (0C/6H/7M/2L); all 15 are next@14.x SSR
advisories with zero exploitability on static-export.

Post-merge: tag command + GitHub Release creation require explicit
user authorization per CLAUDE.md §Executing actions with care.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 26, 2026
… fix for GOOG/GOOGL $4.6T overcount) (#269)

Closes the structural half of issue #261 — the OVERCOUNT pattern
where SEC companyfacts returns Alphabet's 12.12B total shares to
both per-class tickers, producing $4.6T market_cap per ticker vs
real ~$1.05T per class. PR-A (PR #264, merged) shipped the
multi_class_aggregate_shares_suspected annotate; this PR-B ships
the actual fix.

Per methodology-scientist Mode B 2026-05-26 (Path 1 reverse-
allowlist) + edgar-debugger live probe (Alphabet 10-K accession
0001652044-26-000018).

compute/config.py:
- New MULTI_CLASS_OVERCOUNT_ALLOWLIST: dict[str, str]:
  - GOOGL → "us-gaap:CommonClassAMember" (standard namespace)
  - GOOG  → "goog:CapitalClassCMember" (FILER-SPECIFIC namespace
    gotcha — caught by edgar-debugger probe)
- SCHEMA_VERSION bump 0.10.5 → 0.10.6-phase4.5e

compute/ingest/fundamentals.py:
- Extended _fetch_shares_from_per_filing_xbrl with target_class_member
  parameter (None=sum-all PR #182 STZ pattern; set=filter to specific
  class member via xbrl.contexts[ref].dimensions lookup)
- New elif branch in _build_snapshot fires when ticker in allowlist +
  primary plausible + QR_SKIP_FUNDAMENTALS not set; overrides primary
  IFF per_class < primary; defensive mc_reconcile_failure counter
  for sanity-check failures (per_class >= primary OR per_class fraction
  outside 5-95% of primary)

compute/output/schemas.py + frontend/lib/types.ts + snapshot:
- Triple lockstep — 2 additive Metadata fields:
  multi_class_per_class_override_count (expected steady-state ≈ 2)
  multi_class_mc_reconcile_failure_count (defensive Rule-18 guard)

compute/main.py — wire counters from _FALLBACK_STATS

Tests +9 (1216 → 1225):
- test_config.py: schema pin update, allowlist membership pin,
  disjoint-allowlist invariant test
- test_ingest/test_fundamentals.py: GOOG override (filer-namespace),
  GOOGL override (standard namespace), non-allowlist ticker doesn't
  fire, QR_SKIP_FUNDAMENTALS escape-hatch, per_class >= primary
  sanity skip, mc_reconcile warning on <5% fraction, None return
  silently skipped. Plus 3 existing _FALLBACK_STATS tests updated
  to new 5-key dict shape (was 3 keys).

ZERO behavior change for 500 non-allowlist tickers. 2 allowlist
tickers (GOOG/GOOGL) gain corrected shares_outstanding (~5.4B/~5.8B
from prior 12.12B overcount); flows through to market_cap (~$4.6T
→ ~$1.05T per class), pe_ratio_ttm, fair-price ensemble. The
multi_class_aggregate_shares_suspected annotate (PR-A) continues
to fire correctly (CIK collision invariant holds).

Verification:
- ruff clean
- python -m compute.output.schema_check — triple in sync at 0.10.6
- pytest 1225 passed (offline), 7 skipped (factors extras), 24
  deselected (@network — GOOG/GOOGL live drift-detector deferred)

PHASE_STATUS_INFLIGHT.md side-file satisfies §Conventions lockstep
per PR #237 convention.

https://claude.ai/code/session_01JwntEE4PNAXSMkZxRA9BB4

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 27, 2026
…AUDE.md (#271)

Refactors a user-shared research report (Master Prompt + 6 phase sub-prompts
+ CLAUDE.md template) into the existing doc surface without creating a new
.claude/skills/agentic-6-phase/ skill. The report's underlying logic is
already implemented in the 18 subagents + CLAUDE.md §Auto-routing; what
was genuinely missing was a 6-phase mapping table a new session can scan
in < 30 sec on top of the 9 phases.

Scope (2 substance files + 1 INFLIGHT entry):

- WORKFLOW.md — new section "Agentic 6-Phase Cadence" between §"Tools
  You'll Use Daily" and §"Phase Overview". Mapping table (Step × Fire
  trigger × Subagent(s) × Done when) over Planning → Code Gen →
  Integration → Test → Deploy → Monitor + 5 cadence invariants. Reuses
  the 18 standing subagents — no new agent files. Session-start protocol
  cites schema 0.10.5-phase4.5e (PRs #264 + #265; cron #4 still at
  0.10.4, next cron Wed 2026-05-27 re-renders at 0.10.5), defense
  layer 33 declared = 7 vetoes + 26 annotates, tag v1.3.0-phase4.5e,
  CVE baseline 15 open (0C / 6H / 7M / 2L) after PR #194 patch +
  PR #226 triage.
- CLAUDE.md — new §Conventions bullet "Session-start phase
  identification" (~5 lines) pointing readers at PHASE_STATUS.md
  §"Current state" + WORKFLOW.md §"Agentic 6-Phase Cadence" using the
  standing 18 subagents.
- PHASE_STATUS_INFLIGHT.md — new in-flight entry per PR #237 side-file
  lockstep convention.

Out of scope (deliberately NOT done per user direction 2026-05-27):

- NO .claude/skills/agentic-6-phase/ — overhead exceeds benefit
- NO Master Prompt / phase sub-prompts copied into the repo
- NO edits to any of the 18 subagent files under .claude/agents/
- NO AGENTS.md substance edit — the cadence is Claude-Code-subagent-
  specific; cross-tool agents would route differently. INFLIGHT entry
  satisfies §Conventions "ship with every PR" lockstep.

docs-reviewer verdict (2026-05-27, agent id a2c87ed3679f55fe5):
NEEDS-CROSS-REF-FIX — both items applied in this commit:
1. CVE attribution: "after PR #226 triage" → "after PR #194 patch +
   PR #226 triage" (PR #194 closed the 10 advisories; PR #226
   documented the resulting state)
2. Step 4 fire-trigger col: "Sections A-J" → "Sections A-L"
   (Section L added by PR #221 OSAP proxy invariant; internal match
   with the same row's Done-when col)

All else passes: 4 cited numbers, 18 agent names, 3 cross-refs, token
budget (WORKFLOW ≤ 1 page, CLAUDE ≤ 5 lines), Rule 16 + Rule 18 no
contradiction.

Pre-existing SKILL.md schema-version table gap (rows for 0.10.5-phase4.5e
PR #264 + valuation_output_anomalous rename PR #265 missing) escalated
to schema-sentinel as separate doc-only PR per docs-reviewer recommendation
— not blocking on this scope.

Verification:
- ruff check . — N/A (no Python)
- python -m compute.output.schema_check — N/A (no schemas)
- pytest tests/ -m "not network" — N/A (no test surface)
- docs-reviewer subagent — PASS after the 2 fixes above

Co-authored-by: Claude <noreply@anthropic.com>
dackclup pushed a commit that referenced this pull request May 28, 2026
Addresses 3 findings from docs-reviewer (sonnet) substance review of
the housekeeping PR (commit e060cb9):

1. PHASE_STATUS.md:98 — "v1.3.0 release tag" entry in §Next deliverables
   was stale (v1.3.0 shipped 2026-05-26, v1.4.0 shipped 2026-05-27).
   Replaced with forward-look for v1.5.0 gated on Phase 4.5e PR 5 +
   Issue #67 sector-CoE flip.

2. WORKFLOW.md:730 — Phase 4.5 historical task list trailing clause
   "v1.3.0 target pending release-captain ladder (LedgerCraft A-B
   series + this doc-refresh)" was stale. Replaced with the actual
   landing dates + SHAs for both v1.3.0 (5db3b97) and v1.4.0
   (bbca9ca).

3. docs/METHODOLOGY.md — annotate-only flag count out of sync with
   declared flags:
   - Line 16: "21 annotate-only flags" → "23 annotate-only flags"
   - Line 164: section heading "(21)" → "(23)"
   - Added 2 new bullets at end of §Annotate-only flags section:
     - `multi_class_aggregate_shares_suspected` (Issue #261 PR-A,
       PR #264) — CIK-collision detector; identity-equation check
       per Damodaran 2019 Ch. 16; no academic prior (data-quality)
     - `valuation_output_anomalous` (Issue #262, PR #265) — Site-2
       output-anomaly detector renamed from `data_quality_input_corruption`;
       no academic prior (data-quality); semantic distinction from
       Site-1 input-corruption veto documented inline

Both new METHODOLOGY entries are data-quality detectors with no new
academic claim — `multi_class_aggregate_shares_suspected` cites the
existing Damodaran 2019 anchor (already in the file) for the
per-class market-cap identity equation, and `valuation_output_anomalous`
is the renamed Site-2 emission of the existing `data_quality_input_corruption`
defense. Neither requires methodology-scientist verdict per the
"Internal — data-quality" pattern shared with `goodwill_heavy` /
`data_quality_input_corruption` Site-1 already in the section.
dackclup added a commit that referenced this pull request May 28, 2026
…v1.4.0 (#286)

* chore(docs): housekeeping PR-B — drain INFLIGHT + bump pointers post-v1.4.0

Phase B post-tag housekeeping. Drains 7 stale `(in flight, ...)` markers
from PHASE_STATUS_INFLIGHT.md (PRs #269, #267, #271, #280, #281, #282, #285)
to `(merged YYYY-MM-DD, <SHA>)`, and bumps stale schema/tag pointers
across CLAUDE.md / PHASE_STATUS.md / SKILL.md / WORKFLOW.md to reflect
the v1.4.0-phase4.6 release at `bbca9cac`.

Changes:
- CLAUDE.md §Phase status — schema `0.10.2-phase4.5e` → `0.10.7-phase4.6`,
  tag `v1.3.0-phase4.5e` → `v1.4.0-phase4.6` (2026-05-27, `bbca9cac`),
  "Recently merged" list refreshed from PRs #147-#154 → PRs #264-#285
- PHASE_STATUS.md §Current state — mirrored pointer bump; production-run
  pointer → `559c5269` (cron-#5 2026-05-27 chore commit); "Recently
  merged" prepended with 22 entries since v1.3.0, legacy list relabeled
  as "Earlier"
- SKILL.md schema-version table — 3 new rows: `0.10.7-phase4.6`
  (PR #283 release), `0.10.6-phase4.5e` (PR #269 GOOG/GOOGL per-class
  XBRL fix), `0.10.5-phase4.5e` (PR #264 multi-class CIK detector)
- WORKFLOW.md §Agentic 6-Phase Cadence Session-start protocol — pointer
  block bumped to current state
- PHASE_STATUS_INFLIGHT.md — 7 stale markers drained + new entry for
  this PR appended at end

AGENTS.md substance untouched per the existing delegation pattern
(line 372-375: "Canonical 'current state' lives in CLAUDE.md §Phase
status. Schema-version history table is in SKILL.md."). Cross-tool
agents reading state pull from CLAUDE.md as the source of truth.

Doc-only PR — `ruff` / `schema_check` trivially pass; no compute /
schema / scoring / valuation / frontend / Python / TS code change.

* docs(review-fix): docs-reviewer NEEDS-CROSS-REF-FIX — 3 items

Addresses 3 findings from docs-reviewer (sonnet) substance review of
the housekeeping PR (commit e060cb9):

1. PHASE_STATUS.md:98 — "v1.3.0 release tag" entry in §Next deliverables
   was stale (v1.3.0 shipped 2026-05-26, v1.4.0 shipped 2026-05-27).
   Replaced with forward-look for v1.5.0 gated on Phase 4.5e PR 5 +
   Issue #67 sector-CoE flip.

2. WORKFLOW.md:730 — Phase 4.5 historical task list trailing clause
   "v1.3.0 target pending release-captain ladder (LedgerCraft A-B
   series + this doc-refresh)" was stale. Replaced with the actual
   landing dates + SHAs for both v1.3.0 (5db3b97) and v1.4.0
   (bbca9ca).

3. docs/METHODOLOGY.md — annotate-only flag count out of sync with
   declared flags:
   - Line 16: "21 annotate-only flags" → "23 annotate-only flags"
   - Line 164: section heading "(21)" → "(23)"
   - Added 2 new bullets at end of §Annotate-only flags section:
     - `multi_class_aggregate_shares_suspected` (Issue #261 PR-A,
       PR #264) — CIK-collision detector; identity-equation check
       per Damodaran 2019 Ch. 16; no academic prior (data-quality)
     - `valuation_output_anomalous` (Issue #262, PR #265) — Site-2
       output-anomaly detector renamed from `data_quality_input_corruption`;
       no academic prior (data-quality); semantic distinction from
       Site-1 input-corruption veto documented inline

Both new METHODOLOGY entries are data-quality detectors with no new
academic claim — `multi_class_aggregate_shares_suspected` cites the
existing Damodaran 2019 anchor (already in the file) for the
per-class market-cap identity equation, and `valuation_output_anomalous`
is the renamed Site-2 emission of the existing `data_quality_input_corruption`
defense. Neither requires methodology-scientist verdict per the
"Internal — data-quality" pattern shared with `goodwill_heavy` /
`data_quality_input_corruption` Site-1 already in the section.

---------

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 28, 2026
…ket_cap` 2.2× inflated) (#292)

* fix(ingest): Issue #288 — GOOG/GOOGL XBRL concept-name omission

Closes #288. `multi_class_per_class_override_count = 0` on every
production cron since PR #269 landed 2026-05-26 — both GOOG and GOOGL
rendered inflated `market_cap` (~$4.66T / $4.71T) instead of correct
per-class values (~$2.09T / $2.59T).

Root cause (edgar-debugger verdict 2026-05-28):
`compute/ingest/fundamentals.py:735` `_fetch_shares_from_per_filing_xbrl`
filter mode queried only 2 XBRL concepts (`dei:EntityCommonStockSharesOutstanding`
+ `us-gaap:CommonStockSharesIssued`). Alphabet's 10-K files per-class
share counts under `us-gaap:CommonStockSharesOutstanding` — the missing
3rd concept. Primary path at lines 115-124 already queries all 3 in
this order; XBRL fallback path drifted out of parity.

Existing tests at `test_fundamentals.py:822-857` mock
`_fetch_shares_from_per_filing_xbrl` entirely (`return_value=per_class`)
— they confirm `_build_snapshot` Branch-3 trigger but never exercise
the actual concept-lookup path. Bug survived the test suite.

Fix (9 files):
- `compute/ingest/fundamentals.py:735-749` — add `us-gaap:CommonStockSharesOutstanding`
  to the concept tuple (between the 2 existing entries, matching primary
  path order); fix misleading docstring at lines 686-687
- `compute/ingest/fundamentals.py:48-71` — `_FALLBACK_STATS` dict gains
  `"per_class_attempt": 0`; reset wired in `reset_fallback_stats()`
- `compute/ingest/fundamentals.py:~1030` — increment `per_class_attempt`
  AT TOP of Branch 3 elif (before XBRL call), so the counter captures
  "branch entered" regardless of XBRL success
- `compute/config.py:30` — schema PATCH bump `0.10.7-phase4.6 → 0.10.8-phase4.6`
- `compute/output/schemas.py:~340` — new `Metadata.multi_class_per_class_attempt_count: int | None = None`
  field (Rule 18 disambiguator)
- `compute/main.py:~2023` — wire `multi_class_per_class_attempt_count`
  to Metadata construction
- `frontend/lib/types.ts:~233` — mirror TS field
- `frontend/lib/schema-snapshot.json` — regenerated via `--update-snapshot`
- `tests/test_config.py` — schema version pin `0.10.7 → 0.10.8`;
  docstring updated to reference Issue #288
- `tests/test_ingest/test_issue288_xbrl_concept_tuple.py` (NEW) — 4
  regression tests: GOOG class-C lookup, GOOGL class-A lookup, concept-
  tuple inclusion pin (`assert "us-gaap:CommonStockSharesOutstanding"
  in concept_list` — explicit guard against re-omission), and Branch-3
  attempt-counter wiring. Would have FAILED on pre-fix code.

Rule 18 disambiguation (the new counter):
- `attempt == override == 0` → Branch 3 never triggered
- `attempt > 0`, `override = 0` → XBRL lookup returned None (regression #288)
- `attempt == override > 0` → normal operation; post-fix steady-state = 2

Impact (display-only, NOT a scoring regression):
- Composite scores / rankings / Rule 16 / Top-5 rotation UNAFFECTED
  (`market_cap` not an 8-pillar input)
- `multi_class_aggregate_shares_suspected` annotate safety net continues
  firing (PR #264)
- `/stock/GOOG` + `/stock/GOOGL` UI renders correct per-class market_cap
  on next cron
- `pe_ratio_ttm` re-derives from corrected shares

Verification:
- `ruff check .` — PASS
- `python -m compute.output.schema_check` — PASS (triple in sync at 0.10.8-phase4.6)
- `schema-sentinel` verdict — TRIPLE-IN-SYNC
- `python -m pytest tests/test_ingest/test_issue288_xbrl_concept_tuple.py` — 4 passed
- `python -m pytest tests/test_config.py tests/test_output/ -q -m "not network"` — 70 passed

Deferred (NOT in this PR):
- @network GOOG/GOOGL drift-detector test (live SEC, EDGAR_USER_AGENT required)
- Issue #289 NVR DQIC fix (Option C per methodology-scientist) — separate PR

* fix(test): update 3 _FALLBACK_STATS pins for new per_class_attempt key

CI failure on PR #292 — 3 existing `_FALLBACK_STATS` pin-tests in
`test_fundamentals.py` (lines 419 / 450 / 712) hardcoded the dict to
exactly 5 keys. This PR's Issue #288 fix added a 6th key
(`per_class_attempt`) for the Rule 18 disambiguator. The pin-tests
correctly caught the schema change; just needed updates.

Changes:
- `test_reset_fallback_stats_zeros_counters` — set `per_class_attempt=5`
  in the non-zero preamble; expect `per_class_attempt: 0` post-reset
- `test_get_fallback_stats_returns_copy_not_reference` — expect 6-key
  shape after reset
- `test_get_fallback_stats_returns_five_keys_after_dimensional_path` →
  renamed to `_returns_six_keys_after_dimensional_path` + updated
  docstring to reference Issue #288's `per_class_attempt` addition;
  pin updated to 6-key shape (dimensional path doesn't touch
  per_class_attempt, so stays at 0)

Verification:
- `python -m pytest tests/test_ingest/test_fundamentals.py tests/test_ingest/test_issue288_xbrl_concept_tuple.py` — 32 passed
- All 3 previously-failing tests now pass post-fix

* fix(test): remove unused pytest + SimpleNamespace imports (ruff F401/I001)

PR #292 Python (lint + test) CI failed on commit c428fe6 due to ruff
F401 (unused imports) + I001 (import ordering) in the new regression
test file. Auto-fixable via `ruff check --fix`.

Leftover imports from draft iterations:
- `from types import SimpleNamespace` — never referenced (final version
  uses `unittest.mock.MagicMock` instead)
- `import pytest` — leftover from a draft that used @pytest.mark
  decorators; final version uses plain `assert` statements with no
  markers

Local `python -m pytest tests/test_ingest/` confirms 32 passed
post-fix; ci-triage-engineer 2026-05-28 verdict: ruff-F401-unused-import
+ compounding ruff-I001-import-ordering (HIGH confidence).

---------

Co-authored-by: Claude <noreply@anthropic.com>
dackclup added a commit that referenced this pull request May 28, 2026
…+ bump pointers (#295)

End-of-day Track-A2 housekeeping. After 6 PRs landed on main today
(#286 / #290 / #291 / #292 / #293 / #294), the CLAUDE.md / PHASE_STATUS.md
/ SKILL.md pointers drifted again — schema bumped via PR #292
(0.10.7 → 0.10.8-phase4.6); USE_SECTOR_COE flipped via PR #294. This
PR closes the doc-drift loop so session N+1 reads correct state.

Changes (4 files, doc-only):

- CLAUDE.md §Phase status — schema `0.10.7-phase4.6 → 0.10.8-phase4.6`;
  defense layer narrative notes `USE_SECTOR_COE = True` post-#294;
  new "Post-tag production patches" subsection citing PRs #292 / #293
  / #294. "Recently merged" list prepended with 6 same-day entries;
  legacy "Earlier (PR #264 → PR #285)" subsection relabeled.

- PHASE_STATUS.md §Current state — schema mirror; new "Post-tag
  production patches" row; Production-run pointer `559c5269 →
  0ad1d57` (cron #69 chore-commit). "Recently merged" prepended.

- SKILL.md schema-version table — new top row for `0.10.8-phase4.6`
  (PR #292 GOOG/GOOGL XBRL fix + Rule 18 disambiguator).

- PHASE_STATUS_INFLIGHT.md — 6 stale `(in flight, 2026-05-28)`
  markers drained to `(merged 2026-05-28, <SHA>)` (PRs #286 / #290 /
  #291 / #292 / #293 / #294). Bodies preserved.

Doc-only PR — `ruff` / `schema_check` pass; no compute / schema /
scoring / valuation / frontend / Python / TS change. CLAUDE.md
substance touched (pointer block + Recently merged refresh).
AGENTS.md substance unchanged per the delegation-pattern (PR #291
already bumped this morning).

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants