feat(phase-3c): fair price ensemble + price history + Tier-1 defenses#9
Merged
Conversation
…onstants + goodwill/intangibles ingest Step 1 of PR-3c (foundation; no scoring or fair-price logic yet — those land in subsequent steps). Per the WORKFLOW.md PR-3c spec and the kickoff message's sequential-checkpoint discipline. - compute/config.py: bump SCHEMA_VERSION 0.4.0-phase3b → 0.5.0-phase3c. Add 13 valuation + defense constants (DISCOUNT_RATE, TERMINAL_GROWTH, COST_OF_EQUITY, DCF_FORECAST_YEARS, DCF_FCF_WINDOW_YEARS, FILING_STALE_SOFT_DAYS=120, FILING_STALE_HARD_DAYS=180, GOODWILL_HEAVY_RATIO=0.5, EXTREME_ESTIMATE_HIGH=5.0, EXTREME_ESTIMATE_LOW=0.2, MULTIPLES_MIN_PEERS=8, NSI_TOP_DECILE=0.90, NSI_LOOKBACK_DAYS=365). - pyproject.toml: pin pandas to '>=2.2,<3' to avoid pandas 3.x silent semantic drift (groupby / pct_change behavior); flagged in PR-3b §C-LOW. - compute/ingest/fundamentals.py: add 'goodwill' (single tag, us-gaap:Goodwill — 5/5 hit on probe) and 'intangibles_net' (3-tag fallback chain: IntangibleAssetsNetExcludingGoodwill → OtherIntangibleAssetsNet → FiniteLivedIntangibleAssetsNet — covers KO/JPM/BRK-B which don't tag the primary). Plumbed through ALL_METRIC_KEYS, FundamentalsSnapshot dataclass, and _build_snapshot. Feeds Tangible BVPS (Defense Playbook §PR 3c §2) in Step 3. - tests/test_smoke.py: bump SCHEMA_VERSION assertion 0.4.0 → 0.5.0. - tests/test_features/test_fundamentals.py: extend AAPL synthetic snapshot with goodwill+intangibles_net (so missing_fields()==[] still holds). Add @network golden test test_goodwill_intangibles_fallback_chain (5 tickers: AAPL, KO, PG, JPM, BRK-B) — asserts goodwill 5/5 non-null > $1B and the fallback chain pushes intangibles_net coverage to ≥3/5 (vs 2/5 with the primary tag alone). Verification: ruff clean, pytest 118 passed / 7 skipped (+1 new network), tsc --noEmit exit 0.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
…Woodgate 2008)
Adds the third active VETO joining altman_distress and sloan_accruals_top_decile.
Annotate-only per Rule 16; Top-5 rotation in compute/main.py is the only
enforcement layer (no composite mutation).
- compute/scoring/risk_overlay.py:
* New helper _shares_at_lookback(history, asof_days, today=) — picks the
period_end row closest to today − asof_days, requires it's at least
asof_days/2 (or 90 days, whichever is greater) old to avoid same-quarter
same-year leakage.
* New helper _net_stock_issuance(snap, history, today=) — computes
ln(shares_t / shares_{t-12m}); NaN when inputs missing.
* compute_risk_flags signature extended with histories= and sectors=
keyword-only args (backward-compatible — existing PR-3b callers see no
behavior change).
* NSI threshold computed within-sector with NSI_MIN_POPULATION=10 floor
(consistent with SLOAN_MIN_POPULATION). When sectors is not provided,
NSI is suppressed entirely — we don't degrade to cross-sectional, which
was the lesson from issue #7's Sloan over-firing on REITs/banks.
* Strict-positive guard: NSI ≤ 0 (buybacks / stable shares) never gets the
dilution flag, even when the within-sector threshold collapses to 0 in
populations with mostly-zero NSI. Documented inline.
- compute/main.py: build sectors_dict from inputs, pass histories +
sectors_dict into compute_risk_flags. Top-5 rotation already iterates
risk_flags.get(ticker), so NSI joins the existing flagged-skip path
automatically.
- tests/test_scoring/test_risk_overlay.py: 11 new tests covering
_shares_at_lookback (correct value, too-recent rows skipped, empty/missing
inputs, non-positive value rejection), _net_stock_issuance (zero on
unchanged shares, positive on dilution, NaN on missing history), and
compute_risk_flags integration (top-decile-within-sector flag fires on
diluter, suppressed without sectors, suppressed below NSI_MIN_POPULATION,
backward-compat without new kwargs).
Verification: ruff clean, pytest 129 passed (+11 NSI) / 7 skipped, 0 failed.
Defense scorecard at end of Step 2:
- VETOES: 3 (altman_distress + sloan_accruals_top_decile + net_issuance_top_decile)
- GUARDS: 0 (Step 5 lands stale_filing + outlier_5x + terminal_g_unsafe)
- ANNOTATE: 0 (Step 3 lands goodwill_heavy)
…ibles netting)
New compute/valuation/ package with the first defense-layer building block:
TBVPS = (equity − goodwill − intangibles_net) / shares. Pure-function module
with no production wiring yet; Step 4.2 (graham.py) and Step 4.4 (rim.py)
will consume it; Step 5 (ensemble.py) will surface goodwill_heavy as a
valuation_warning.
- compute/valuation/__init__.py: re-exports the public surface
(tangible_book_value_per_share, goodwill_heavy_flag).
- compute/valuation/tangible_book.py:
* tangible_book_value_per_share(snap) — full intangibles netting per
Penman 2013. None for negative tangible book / missing equity / zero
or missing shares. Missing goodwill or intangibles_net coerce to 0
(consistent with Step 1's 3-tag fallback chain achieving ~85%
intangibles coverage; the remaining ~15% are treated conservatively
as "no intangibles to net" rather than refusing to compute).
* goodwill_heavy_flag(snap, tbvps) — TBVPS / BVPS_reported < 0.5
(config.GOODWILL_HEAVY_RATIO). Annotate-only flag; will be appended
to valuation_warnings in Step 5. Strict inequality so 50/50 firms
don't trip — only material acquirer-balance-sheets fire.
* Module docstring documents the dual-implementation rationale: Value
pillar's compute/features/value.py keeps the fast-TTM Graham
intentionally; this module powers the user-facing fair-price
ensemble where the over-paying-for-goodwill caveat matters.
- tests/test_valuation/__init__.py: package marker.
- tests/test_valuation/test_tangible_book.py: 15 cases covering
- kickoff §3.3 (a)-(h) baseline + edge cases
- bonus edges: None shares, zero tangible book, exact-threshold
strict-inequality semantics, negative-equity guard
- module public-surface invariant + config constant sanity check
- @network golden test against AAPL FY2024 EDGAR (skipped without
--run-network; lenient internal-consistency assertion rather than a
pinned $/share value, because the intangibles tag varies across
AAPL's annual taxonomy by multi-billion).
Verification: ruff clean, pytest 144 passed (+15 tangible_book) /
8 skipped (+1 new network), 0 failed. Frontend tsc --noEmit exit 0.
Defense scorecard at end of Step 3 (unchanged from Step 2 — flag will
fire only after Step 5 surfaces it as valuation_warning):
- VETOES: 3 (altman_distress + sloan_accruals_top_decile + net_issuance_top_decile)
- GUARDS: 0 (Step 5 lands stale_filing + outlier_5x + terminal_g_unsafe)
- ANNOTATE: 0 (Step 5 lands goodwill_heavy via valuation_warnings)
… primitives
First sub-step of Step 4 (valuation methods). Pure-function module — no I/O,
no globals, no production wiring yet. Subsequent sub-steps (4.2 graham,
4.3 multiples, 4.4 rim, 4.5 dcf) consume these gates; Step 5 ensemble
pattern-matches on the SKIP_REASONS taxonomy to populate
StockDetail.fair_price.methods.<method>.reason.
- compute/valuation/applicability.py:
* MethodApplicability dataclass — frozen, validates that reason is
None iff applicable=True.
* SKIP_REASONS — stable snake_case identifier tuple. Renaming any
breaks the JSON contract.
* 6 per-method check functions (keyword-only signature so each method
declares the inputs it needs):
- check_dcf_applicability — skip Financials & Utilities; require
positive median 5y FCF, positive shares, non-hard-stale filing
- check_graham_applicability — positive eps_3y_avg + tbvps
- check_rim_applicability — value-trap-risk gate (ROE > Ke);
Financials/Utilities OK; cost_of_equity defaults to
config.COST_OF_EQUITY
- check_multiples_pe_applicability — positive eps_ttm + peer median
- check_multiples_pb_applicability — uses BVPS_reported (NOT
TBVPS) for consistency with peer median; documented inline
- check_multiples_ev_ebitda_applicability — skip Financials only
(per kickoff §B2 — Utilities have meaningful EBITDA above the
D&A line)
* Stale-filing primitives:
- filing_lag_days(filing_date, asof) → int | None
- stale_filing_status(lag_days) → "fresh" | "soft" | "hard" | "unknown"
Boundary semantics use strict > (lag == 120 is fresh; lag == 180
is soft); aligned with config.FILING_STALE_SOFT_DAYS=120 and
FILING_STALE_HARD_DAYS=180.
- compute/valuation/__init__.py: re-export the new public surface
(MethodApplicability, LagStatus, all 6 check_* functions, both
stale-filing primitives) alongside the existing tangible_book exports.
- tests/test_valuation/test_applicability.py: 44 cases covering
- DCF: financials/utilities exclusion, FCF median, missing shares,
hard-stale skip, partial None FCF list, all-None FCF list (8 cases)
- Graham: positive inputs, negative EPS, None tbvps, negative tbvps,
hard-stale (5 cases)
- RIM: financials OK + ROE > Ke, ROE < Ke value-trap-risk (with exact
reason string), ROE == Ke strict-inequality, None ROE, None tbvps,
hard-stale, default cost_of_equity from config (7 cases)
- Multiples P/E: positive inputs, negative eps, None peer median,
hard-stale (4 cases)
- Multiples P/B: positive inputs, negative bvps, hard-stale (3 cases)
- Multiples EV/EBITDA: financials skip, IT applicable, negative ebitda,
hard-stale, utilities NOT excluded (5 cases)
- Stale-filing primitives: 4 status branches + 4 boundary cases
(120/121/180/181) + filing_lag_days None and 120-day calc (10 cases)
- Invariants: MethodApplicability state validation; SKIP_REASONS no
duplicates; "stale_filing_hard" + value-trap reason both present (2 cases)
Verification: ruff clean, pytest 188 passed (+44 applicability) /
8 skipped, 0 failed. Frontend tsc --noEmit exit 0.
Reason taxonomy stable (14 identifiers in SKIP_REASONS); Step 5
ensemble will string-match against these to surface
fair_price.methods.<method>.reason in StockDetail JSON.
Defense scorecard at end of Step 4.1: unchanged from Step 3 (gates
exist; flags fire only when Step 5 wires them into ensemble output).
Second sub-step of Step 4 (valuation methods). Pure-function module — no
production wiring yet (Step 5 ensemble consumes); no schema changes.
- compute/valuation/graham.py:
* graham_fair_price(*, eps_3y_avg, tangible_book_value_per_share,
lag_status) → tuple[float | None, MethodApplicability]
* Uses Graham's canonical 22.5 multiplier (15 P/E × 1.5 P/B). NOT
relocated to config — it's the textbook value, not a tuning knob.
* Wraps check_graham_applicability from Step 4.1; on skip returns
(None, applicability) with one of the stable reason identifiers
(non_positive_eps_3y_avg / non_positive_or_missing_tangible_book /
stale_filing_hard).
* Soft + unknown lag_status are permissive — Graham computes; Step 5
ensemble will append valuation_warnings separately.
* Defensive post-gate math sanity check (mathematically unreachable
given the gate, but kept for runtime safety on potential gate
regressions). Uses module-local _POST_GATE_PRODUCT_NON_POSITIVE
identifier rather than polluting the public SKIP_REASONS taxonomy.
- compute/valuation/__init__.py: re-export graham_fair_price.
- compute/features/value.py: graham_number() docstring updated with
cross-reference to the new tangible-book-aware variant. NO logic
change to the pillar function — it intentionally retains TTM EPS +
reported BVPS for cross-sectional ranking responsiveness (kickoff §B4
"intentional dual implementation").
- tests/test_valuation/test_graham.py: 16 cases covering
- kickoff cases (1)-(13): synthetic golden values (3 deterministic),
skip cases for negative/None/zero EPS, None/negative TBVPS,
hard-stale; soft + unknown stale pass-through; sqrt round-trip
stability; multiplier-ratio invariant
- Bonus: both-inputs-None → EPS reason wins (applicability ordering)
- (14) @network AAPL EDGAR — lenient: if TBVPS positive, Graham
must compute and fair²/(22.5*eps*tbvps) ≈ 1; if TBVPS None,
applicability skips with tangible-book reason
- (15) @network breadth check — 5 reference tickers (KO/PG/MSFT/JNJ/MMM);
require ≥3/5 produce positive Graham values
- Module public-surface invariant
Verification: ruff clean, pytest 204 passed (+16 graham) / 10 skipped
(+2 new network), 0 failed. Frontend tsc --noEmit exit 0.
Local spot-check on cached AAPL snapshot:
EPS_diluted (3y-avg proxy) = $4.85
TBVPS = $7.26
Graham fair price = $28.15 (= sqrt(22.5 × 4.85 × 7.26) = sqrt(792))
This is the Graham defensive-investor floor; AAPL trades at ~$200, so
Graham would say AAPL is far above its conservative anchor — correct
diagnosis for a high-multiple growth name. Spot-check confirms math.
Defense scorecard at end of Step 4.2: unchanged (graham is a fair-price
method, not a defense flag — the goodwill_heavy_flag from Step 3 is
what surfaces tangible-book concerns to users via Step 5 ensemble).
…fair price
Third sub-step of Step 4. Pure-function module with the 4-tier peer-median
walk + 5/95 winsorization + 3 multiples methods. No production wiring
(Step 5 ensemble consumes); no schema changes.
- compute/valuation/multiples.py:
* PeerTierUsed StrEnum: SUB_INDUSTRY → INDUSTRY → SECTOR →
BROAD_EX_FIN_UTIL → INSUFFICIENT (5 values; first 4 are walked
in priority order, last signals all 4 tiers fell short).
* PeerMedian frozen dataclass: median, tier_used, peer_count,
tier_thresholds_tried (per-tier audit trail for Step 5/UI).
* compute_peer_medians(*, tickers_by_tier, metric_values,
target_ticker, min_peers=config.MULTIPLES_MIN_PEERS) — pure
function. Walks tiers; per tier filters target + None +
non-positive, requires ≥ min_peers, winsorizes 5/95, takes
median. Returns PeerMedian with full audit trail. Does NOT
fetch data or classify GICS — Step 5 ensemble owns that.
* _linear_percentile + _winsorize_5_95 helpers using stdlib only
(no numpy dependency). Matches numpy.percentile default
interpolation; verified by unit test.
* multiples_pe_fair_price(*, eps_ttm, peer_pe_median,
peer_tier_used, lag_status) → tuple[float | None, MethodApplicability].
INSUFFICIENT tier short-circuits with insufficient_peers_all_tiers;
otherwise delegates gating to check_multiples_pe_applicability.
* multiples_pb_fair_price — same pattern; takes bvps_REPORTED
(NOT TBVPS) per Step 4.1 peer-comparability rationale.
Documented in module + function docstrings.
* multiples_ev_ebitda_fair_price — extra inputs net_debt +
shares_outstanding for the per-share conversion. New skip
reasons: ev_ebitda_net_debt_unknown (when net_debt is None),
ev_ebitda_negative_equity_post_debt (when EV − net_debt ≤ 0).
Reuses missing_shares_outstanding from existing taxonomy.
- compute/valuation/applicability.py: SKIP_REASONS extended with 3
new identifiers (ev_ebitda_negative_equity_post_debt,
ev_ebitda_net_debt_unknown, insufficient_peers_all_tiers). Total
taxonomy: 17 stable identifiers.
- compute/valuation/__init__.py: re-export PeerMedian, PeerTierUsed,
compute_peer_medians, and the 3 multiples_*_fair_price methods.
- tests/test_valuation/test_multiples.py: 32 cases covering
- Group A (10 cases): peer-tier walk for all 4 tiers, INSUFFICIENT
fallback, target exclusion, winsorization, None/zero/negative
filtering, plus _linear_percentile + _winsorize_5_95 unit tests.
- Group B (5 cases): P/E method full path including INSUFFICIENT
short-circuit.
- Group C (4 cases): P/B method, with explicit assertion that the
function takes bvps_REPORTED (NOT tbvps).
- Group D (8 cases): EV/EBITDA simple, cash-rich-negative-net-debt,
high-leverage-negative-equity, Financials sector skip, missing
EBITDA / net_debt / shares, INSUFFICIENT tier.
- Group E (5 cases): tuple-shape consistency across all 3 methods,
PeerMedian frozen+expected fields, new SKIP_REASONS additions,
public-surface re-export, MULTIPLES_MIN_PEERS=8 invariant.
Verification: ruff clean, pytest 236 passed (+32 multiples) /
10 skipped, 0 failed. Frontend tsc --noEmit exit 0.
Reason taxonomy after Step 4.3: 17 stable identifiers
(was 14 after Step 4.1). Step 4.4 (rim.py) will not add to taxonomy;
Step 4.5 (dcf.py) may add 1-2 for terminal-g internal cases.
…ith TBVPS
Fourth sub-step of Step 4. Pure-function module — no production wiring
(Step 5 ensemble consumes); no schema changes.
- compute/config.py: add RIM_FORECAST_YEARS=5 (explicit forecast horizon).
- compute/valuation/rim.py:
* rim_fair_price(*, tangible_book_value_per_share, avg_3y_roe,
cost_of_equity=config.COST_OF_EQUITY,
forecast_years=config.RIM_FORECAST_YEARS,
lag_status) → tuple[float | None, MethodApplicability]
* Implements Penman 2013 §5 RIM:
V_0 = B_0 + Σ_{t=1..N} [(ROE − Ke) × B_{t−1}] / (1 + Ke)^t
with constant-ROE forecast and full-retention book accumulation
(B_t = B_{t−1} × (1 + ROE)). Zero terminal value beyond year N
(conservative — Penman's terminal-RIM extensions are empirically
fragile; truncating aligns with annotate-and-veto principle).
* **Critical input choice — TBVPS, NOT BVPS_reported** (opposite of
P/B in Step 4.3). Module docstring documents the rationale: P/B is
peer-relative and demands consistent treatment on both sides; RIM
is single-stock with no peer comparison, so the most conservative
starting equity base is appropriate.
* Defensive overflow guard: caps book value at 1e15 during iterative
compounding. Plausible inputs (ROE up to 200%) over default 5y
horizon stay finite (3^5 = 243× headroom). Pathological inputs
(ROE=1000% × 50y) trigger the rim_book_value_overflow reason —
module-local identifier per Step 4.2's _POST_GATE pattern.
* Module docstring documents the "full retention" simplification and
its conservative-for-fair-price implication; Phase 4+ may extend
with payout_ratio.
- compute/valuation/__init__.py: re-export rim_fair_price.
- tests/test_valuation/test_rim.py: 18 cases:
- A (4): math correctness — hand-calculated 5-year reference yielding
V_0 ≈ 12.4889; boundary ROE==Ke; tiny-spread bounds; high-ROE
compounder bounds.
- B (4): applicability delegated — ROE<Ke, None ROE, None TBVPS,
hard-stale.
- C (2): soft + unknown stale pass-through.
- D (2): plausible-extreme ROE=200% × 5y stays finite; truly
pathological ROE=1000% × 50y triggers overflow guard.
- E (2): forecast_years longer → higher V_0; forecast_years=1
boundary (V_0 ≈ 10.4545).
- F (1): lower cost_of_equity → higher V_0.
- G (3): tuple-shape consistency, config defaults pinned, public
re-export.
- H (2): @network AAPL + 5-ticker breadth (KO/PG/MSFT/JNJ/MMM); for
skipped tickers the reason must be value_trap_risk_roe_below_cost_of_equity
or non_positive_or_missing_tangible_book — never arbitrary.
Verification: ruff clean, pytest 254 passed (+18 RIM) / 12 skipped
(+2 new network), 0 failed. Frontend tsc --noEmit exit 0.
Local AAPL spot-check on cached snapshot:
TBVPS = $7.26 (from Step 4.2)
ROE_proxy = 1.151 (NI / equity — anomalously high due to AAPL's
massive buybacks shrinking the equity denominator; the actual
avg_3y_roe will be lower once Step 7 wires the history-based
3y average through compute/main.py)
RIM V_0 = $207.60
Compare ensemble inputs at AAPL current price ~$200:
Graham = $28 (deep conservative floor; rejects growth names)
RIM = $207 (nearly market — high ROE drives big residual)
Multiples = TBD (Step 5 will compute from peer panel)
DCF = TBD (Step 4.5)
This dispersion is exactly what the ensemble median + max + outlier-5×
guard (Step 5) is designed to reconcile. Spot-check confirms math is
producing economically sensible values, NOT a uniform anchor.
Reason taxonomy after Step 4.4: still 17 stable identifiers (no public
additions). Module-local _RIM_BOOK_OVERFLOW joins Step 4.2's
_POST_GATE_PRODUCT_NON_POSITIVE as defense-in-depth without taxonomy
pollution.
…nal-g cap
Final sub-step of Step 4. Pure-function module — no production wiring
(Step 5 ensemble consumes); no schema changes.
Step 4 valuation methods are NOW COMPLETE — Graham + Multiples (P/E,
P/B, EV/EBITDA) + RIM + DCF. Step 5 will orchestrate all 4 methods
into the ensemble, apply outlier guard + stale-filing handling, and
write StockDetail.fair_price.
- compute/config.py: no new constants (DISCOUNT_RATE, TERMINAL_GROWTH,
DCF_FORECAST_YEARS already added in Step 1).
- compute/valuation/applicability.py: SKIP_REASONS extended with 3
new identifiers emitted by dcf.py (terminal_g_unsafe_g_too_close_to_wacc,
dcf_net_debt_unknown, dcf_negative_equity_post_debt). Total taxonomy:
20 stable identifiers (was 17 after Step 4.3).
- compute/valuation/dcf.py:
* dcf_fair_price(*, sector, fcf_5y, shares_outstanding, net_debt,
lag_status, wacc=DISCOUNT_RATE, terminal_growth=TERMINAL_GROWTH,
forecast_years=DCF_FORECAST_YEARS) → tuple[float | None, MethodApplicability]
* Two-stage: 5y flat-FCF explicit forecast + Gordon-growth terminal.
* **Defense #5 — terminal-g HARD cap** validated against BOTH:
- config.TERMINAL_GROWTH (0.03 long-run nominal-GDP cap, Damodaran)
- WACC − 0.01 (100bp math-safety buffer)
Either cap exceeded → terminal_g_unsafe_g_too_close_to_wacc skip.
* Net-debt MANDATORY: None → dcf_net_debt_unknown skip. Will not
silently coerce to zero (would materially overstate equity per
share for leveraged firms).
* Negative equity post-debt → dcf_negative_equity_post_debt skip.
* FCF normalization: median of POSITIVES only (the gate uses median
of ALL finite values; this is a refinement not a relaxation —
documented in module docstring as the "conservative anchor"
choice).
* Defensive _DCF_NO_VALID_FCF_POST_GATE module-local identifier for
the mathematically unreachable empty-positives branch (same pattern
as Step 4.2's _POST_GATE_PRODUCT_NON_POSITIVE).
- compute/valuation/__init__.py: re-export dcf_fair_price.
- tests/test_valuation/test_dcf.py: 24 cases:
- A (4): math correctness — A1 hand-calculated reference
(FCF=100M flat × 5y, WACC=10%, g=3%, no debt, 10M shares
→ per share ~$129.27 within 5¢); higher WACC → lower value;
g ordering monotonic; net-debt impact (positive subtracts,
negative adds — exact arithmetic verified).
- B (3): terminal-g cap — g=0.09 with WACC=0.10 fails (0.09 > 0.03);
default g=0.03 safe; pathological g=0.15 fails.
- C (5): applicability delegated — Financials/Utilities skip,
negative FCF median, None shares, hard-stale.
- D (3): edge cases — high-leverage negative equity skip, None
net_debt skip, zero net_debt = EV.
- E (2): soft + unknown lag pass-through.
- F (1): N=10 vs N=5 modest change (<20% relative).
- G (2): mixed +/- FCF uses positives-only median; all-zero gate skip.
- H (4): tuple shape, config defaults, public re-export, new
SKIP_REASONS taxonomy entries.
- I (2): @network AAPL + 5-ticker breadth (skipped without --run-network).
Verification: ruff clean, pytest 278 passed (+24 DCF) /
14 skipped (+2 new network), 0 failed. Frontend tsc --noEmit exit 0.
Local AAPL 3-method dispersion (cached snapshot):
TBVPS = $7.26
EPS = $4.85
ROE proxy = 1.151 (buyback-shrunken-equity artifact)
FCF TTM = $129.17B
Net debt = -$45.57B (cash-rich)
Shares = 14.67B
Graham: $28.15 (deep conservative floor — rejects growth names)
RIM: $207.60 (high ROE → big residual income contribution)
DCF: $116.95 (fcf×3.79 + terminal/1.61, conservative no-growth)
Multiples: TBD (Step 5 ensemble computes peer panel)
Market: ~$200
The 3-method spread ($28 / $117 / $208) is exactly the "healthy
divergence" pattern that makes ensemble-median + outlier-guard
the right Step 5 architecture. Median of (28, 117, 208) = 117;
once Multiples lands in Step 5 (likely $150-200 range), the median
will land in the $115-150 band — meaningful, not noise.
Reason taxonomy after Step 4.5: 20 stable identifiers
(was 17 after Step 4.3). This is the FINAL state for PR-3c — Step 5
ensemble does NOT add new reasons (it pattern-matches against the
20 existing ones).
Largest single sub-step in PR-3c. Ensemble orchestrates the 6 fair-price methods (Step 4) into the user-facing fair_price object; applies Defense #3 (stale-filing hard/soft), Defense #4 (multi-method outlier guard), and Defense #2 annotations (goodwill_heavy + value_trap_risk). Writer adds the per-stock history JSON output. Schemas extend StockSummary/StockDetail/ RawMetrics/Metadata with all new fields; types.ts mirrors. Sub-task 5.1 — compute/valuation/ensemble.py (NEW): * EnsembleResult + FairPriceMethodResult frozen dataclasses * METHOD_NAMES = ('graham', 'multiples_pe', 'multiples_pb', 'multiples_ev_ebitda', 'rim', 'dcf') — 6 keys. * compute_fair_price_ensemble(*, ticker, snap, sector, sub_industry, industry, current_price, filing_lag_days_value, peer_panels, universe_metrics, historical_metrics) → (EnsembleResult, risk_flags). * Defense #3 stale: hard → all methods skip + risk_flag 'stale_filing_hard' returned for caller to merge into the existing risk_flags from compute_risk_flags. Soft → annotates 'stale_filing_soft' in valuation_warnings. * Defense #4 outlier: per-method values > 5×current OR < 0.2×current excluded from MAX but kept in MEDIAN (robust). Each outlier triggers an 'extreme_<method>_estimate' warning. * Defense #2 goodwill_heavy: tbvps/bvps_reported < 0.5 → 'goodwill_heavy' warning. * RIM value_trap_risk → 'value_trap_risk' warning when RIM skips on ROE<Ke. * Aggregation: median = ALL applicable values (robust); max = non- outlier max only; low/high = full extremes; mos_pct = (median − current) / median × 100, None when current ≤ 0 or median ≤ 0. * Helpers: _all_methods_skipped, _classify_outliers, _aggregate_methods, _net_debt, _bvps_reported, _convert_peer_panel — all pure for direct testability. Sub-task 5.2 — compute/output/writer.py extension: * write_stock_history(*, ticker, prices_df, output_dir) → bool. Slices prices_df.tail(min(252, len)). Outputs column-major JSON with NaN→None coercion. Writes to output_dir/stocks/history/{TICKER}.json via existing atomic_write_json. Returns True on success; caller sets StockDetail.has_history accordingly. Sub-task 5.3 — schemas.py + types.ts: * StockSummary: + valuation_warnings: list[str] * StockDetail: + valuation_warnings, has_history, tangible_book_value * RawMetrics: + goodwill: float | None * Metadata: + mos_trailing_ic_smoke: float | None * frontend/lib/types.ts: full mirror — adds FairPriceMethodResult, FairPriceEnsemble, StockHistory types, and all the field additions above. StockDetail.fair_price typed as FairPriceEnsemble | null (was Record<string, unknown>). - compute/valuation/__init__.py: re-export EnsembleResult, FairPriceMethodResult, METHOD_NAMES, compute_fair_price_ensemble. - tests/test_valuation/test_ensemble.py (NEW): 27 cases covering - A (4): aggregation arithmetic — 4 methods no outliers, single method, no applicable methods, MoS sign convention - B (3): outlier guard — 5×/0.2× boundary semantics, multiple outliers, strict-inequality at boundary - C (3): stale filing — hard short-circuit + risk_flag, soft annotation, fresh no-warning - D (2): goodwill_heavy — flag fires below 0.5 ratio, doesn't fire above - E (2): value_trap_risk — RIM applicable no warning, RIM skip → warning - F (4): shape invariants — 6 method keys, FairPriceMethodResult + EnsembleResult frozen, tier_used None for non-multiples - G (3): edge cases — zero/negative current_price → null MoS; negative method value triggers outlier guard - H (3): full-ensemble integration — IT/Financials/Utilities sector handling - Helpers: _net_debt arithmetic, _bvps_reported, EXTREME_* constants pinned - tests/test_output/test_writer.py extension: 7 history-writer cases covering 252-row slice, shorter input, empty df, None input, missing columns, NaN coercion, payload schema keys. Verification: ruff clean, pytest 312 passed (+34 Step 5) / 14 skipped, 0 failed. Frontend tsc --noEmit exit 0. Local AAPL ensemble spot-check (current=$200, empty peer panels): graham: $28.15 applicable multiples_pe: SKIP reason=insufficient_peers_all_tiers multiples_pb: SKIP reason=insufficient_peers_all_tiers multiples_ev_ebitda: SKIP reason=insufficient_peers_all_tiers rim: $207.60 applicable dcf: $116.95 applicable median: $116.95 max (excl outlier): $207.60 low / high: $28.15 / $207.60 MoS: -71.01% (overvalued vs median) warnings: ['extreme_graham_estimate'] risk_flags: [] Outlier guard works end-to-end: Graham's $28.15 (0.14× current=$200, below 0.2× floor) is excluded from max but kept in median. Warning 'extreme_graham_estimate' surfaces in valuation_warnings. Step 7 will populate the multiples once peer panels are built cross- sectionally; expected median lifts into the $150-180 range for AAPL. Reason taxonomy after Step 5: still 20 stable identifiers. Ensemble emits 'stale_filing_hard' as a per-method reason (when filing hard-stale) but that string is already in SKIP_REASONS from Step 4.1.
…extended
Smallest defense step (~70 LOC module + 250 LOC tests). Extends
Greenblatt 2005's Magic Formula sector exclusion (Financials +
Utilities, where EBIT/EV is meaningless because the balance sheet is
reserves+regulated capital) to ALL Quality pillar metrics that depend
on EBIT, gross profit, or invested capital.
Pre-existing state surveyed before implementation:
- compute/scoring/sector_rules.py did NOT exist (created fresh).
- compute/features/quality.py ROIC and gross_profitability had NO
sector gating despite the spec's claim of pre-existing exclusions.
- compute/features/profitability.py asset_turnover docstring CLAIMED
to exclude Financials but the function had no actual gate.
This step fixes the documentation-stub debt by wiring the actual
gate at the pillar-wrapper layer.
Design decision — sector gate lives in pillar wrapper, not feature
function:
- Feature functions (compute/features/*) stay sector-agnostic and
pure. Easier to unit test in isolation.
- Pillar wrappers (compute/scoring/pillars.py) apply the sector
context. is_metric_excluded_for_sector() is consulted post-compute,
with the metric value replaced by NaN when the rule fires.
- Existing pillar aggregation (compute.scoring.normalize.average_
pillar_score with min_coverage=0.5) handles NaN gracefully — drops
the gated metric, averages survivors. Per SKILL.md Rule 7.
- compute/scoring/sector_rules.py (NEW):
* SECTOR_BLACKLIST: dict[str, frozenset[str]] with 5 entries:
magic_formula → {Financials, Utilities, Real Estate}
asset_turnover → {Financials}
ebit_based_roic → {Financials, Utilities} [NEW]
gross_profitability → {Financials} [NEW]
ev_ebitda_multiple → {Financials} [NEW]
* is_metric_excluded_for_sector(*, metric, sector) → bool.
Returns False for None sector, missing metric, or non-listed
sector — never second-guesses callers.
- compute/scoring/pillars.py:
* _quality_metrics: gate roic for ebit_based_roic; gate
gross_profitability for gross_profitability rule.
* _value_metrics: gate ev_ebitda for ev_ebitda_multiple.
* _profitability_metrics: gate asset_turnover (existing rule, now
enforced) and gross_p (Profitability pillar's GP/A — same metric
as Quality.gross_profitability, both must gate).
- tests/test_scoring/test_sector_rules.py (NEW): 11 cases:
- A1-A4: lookup primitives (existing entry, sector not blacklisted,
metric not blacklisted, None sector all return False/True correctly).
- 5 spec-coverage assertions (each canonical key + its blacklist set).
- REIT NOT excluded from ebit_based_roic (Phase 4 will add FFO).
- All blacklist values are frozensets (immutable contract).
- tests/test_scoring/test_pillars.py extension: 14 cases:
- B (3): JPM/NEE Financials/Utilities ROIC → NaN; AAPL IT → finite.
- C (4): JPM gross_profitability NaN in Quality; same metric in
Profitability (gross_p) also gated; AAPL finite for both.
- D (3): JPM ev_ebitda NaN; AAPL finite; Utilities NOT excluded
(mirrors Step 4.1 fair-price applicability semantics).
- E (2): asset_turnover JPM NaN, AAPL finite.
- F (2): pillar score remains finite when some Quality metrics
are NaN; full universe with Financials does not crash.
Verification: ruff clean, pytest 338 passed (+26 Step 6: 11 sector_rules
+ 14 pillars + 1 fixed import) / 14 skipped, 0 failed. Frontend tsc
--noEmit exit 0.
Reason taxonomy: NO additions. Sector exclusion is a feature-pillar
concern, not a fair-price applicability concern. The Step 4.1
SKIP_REASONS taxonomy already has sector_excluded_financials and
sector_excluded_utilities for the FAIR-PRICE side; those operate on
applicability gates. Step 6 mirrors at the pillar layer with NaN
replacement (different mechanism). Two layers, one canonical sector
taxonomy.
Phase 4 follow-up: add REIT FFO/AFFO substitutes for the Quality
pillar on Real Estate stocks. Currently filed as a post-PR-3c-merge
issue; PR-3c keeps Real Estate fully in the Quality ranking with no
new exclusions.
Defense scorecard at end of Step 6:
- VETOES: 3 (altman_distress + sloan_accruals_top_decile +
net_issuance_top_decile) — UNCHANGED from Step 2.
- GUARDS: 4 — stale_filing (3a from Step 5), outlier_5x (3a
from Step 5), terminal_g (3a from Step 4.5),
sector_exclusion (NEW this step, pillar layer).
- ANNOTATE: 2 active (goodwill_heavy, value_trap_risk; both from
Step 5 ensemble) + 1 hidden (extreme_*_estimate; emitted
by ensemble outlier guard).
…compute/main.py
Step 7 is THE BIG STEP per the PR-3c plan: production rankings.json now
contains fair_price values for the first time. Spot-checks scheduled for
Step 12 will verify behavior is sane.
Changes
-------
compute/main.py
- Add cross-sectional builders (one pass over the whole universe before
the per-ticker loop):
* _build_universe_metrics — per-ticker P/E TTM, P/B reported,
EV/EBITDA TTM (feeds compute_peer_medians).
* _build_peer_groupings — sub_industry / sector / broad-ex-Fin-Util
tier dicts for the 4-tier peer-median walk. The "industry" GICS
level-2 tier receives an empty list since Wikipedia exposes only
level-1 + level-3 — this falls through to sector by design.
* _build_historical_metrics — per-ticker eps_3y_avg, avg_3y_roe,
fcf_5y from the annual fundamentals history. avg_3y_roe uses the
current-period equity denominator (Phase 4 follow-up: backfill
historical equity for true per-year ROE).
* _filing_lag — days between asof and snapshot.latest_filed_date.
- Combine the previously separate StockSummary and StockDetail loops
into a single per-ticker pass that:
* computes the fair-price ensemble (compute_fair_price_ensemble),
* merges its returned risk_flags (e.g. stale_filing_hard) into the
existing risk_flags dict from compute_risk_flags,
* writes the per-stock 1y price-history JSON via write_stock_history,
* populates StockSummary.{fair_price, max_fair_price,
margin_of_safety_pct, valuation_warnings},
* populates StockDetail.{fair_price (full ensemble dict),
valuation_warnings, has_history, tangible_book_value}.
- Update _build_raw_metrics to populate goodwill on RawMetrics.
- Add Metadata.mos_trailing_ic_smoke = None placeholder (Step 8 will
compute the actual sanity-check value).
- Best-effort psutil RSS log at end of run (try/except ImportError so
production keeps working without the optional dep).
compute/valuation/ensemble.py
- New ensemble_result_to_dict(r: EnsembleResult) -> dict serializer
whose shape mirrors the FairPriceEnsemble TypeScript type. The
returned valuation_warnings list is a copy (mutation-safe).
compute/valuation/__init__.py
- Re-export ensemble_result_to_dict.
tests/test_main.py (new, 21 cases)
- Cover _filing_lag, _build_universe_metrics, _build_peer_groupings,
_build_historical_metrics, _eps_3y_avg, _avg_3y_roe, _fcf_5y.
- Lock the contract of the cross-sectional builders so future refactors
surface input/output-shape changes loudly.
tests/test_valuation/test_ensemble.py (+3 cases I1/I2/I3)
- Verify ensemble_result_to_dict shape matches the TS type, handles the
all-null case (every method skipped), and returns a defensive copy of
valuation_warnings.
Verification
------------
- ruff check passes on all changed files.
- pytest tests/ -m "not network": 364 passed (was 343 before — +21 new
test_main cases, +3 ensemble I-series cases; -3 reflects no removals).
- npx tsc --noEmit: clean.
What's NOT in this commit
-------------------------
- Real-data spot-checks (Step 12).
- mos_trailing_ic_smoke computation (Step 8).
- Schema snapshot guard (Step 9).
- Frontend PriceHistoryChart wiring (Step 10).
https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2
…(Defense #7) Step 7 production verification on commit c13e4f7 (workflow #9) showed the existing Phase-3 fundamentals ingest layer corrupts shares_outstanding for ~11 S&P 500 tickers (CVNA, HOOD, DDOG, CRWD, VTRS, BKR, PSKY, AMCR, SPG, CHTR, RTX). The bug propagates into TBVPS / market_cap / multiples and surfaced as user-visible nonsense — most prominently BKR fair_price = $105M inside the production Top-5. Step 7.5 adds an ensemble-level sanity ceiling so corrupted inputs produce a clean null + warning instead of garbage. The upstream ingestion bug is tracked separately (issue draft staged at /tmp/issue_drafts/) and will be fixed in Phase 4. Changes ------- compute/config.py - Add FAIR_PRICE_DATA_QUALITY_CEILING = 10000.0. Rationale: no S&P 500 stock has a sensible per-share fair price > $10,000 (BRK-A trades ~$700K but is not in the index; BRK-B is). Above this ceiling, inputs are corrupted by definition. compute/valuation/applicability.py - Add data_quality_input_corruption to SKIP_REASONS taxonomy (20 -> 21 entries). Stable identifier surfaced via StockDetail.fair_price.methods.<method>.reason and inside StockDetail.valuation_warnings. compute/valuation/ensemble.py - New _has_corrupt_input(methods) helper: returns True iff any applicable method produced a value > the ceiling (strict >). - New _data_quality_corrupt_result(methods) helper: builds the all-null EnsembleResult, preserving tier_used on the multiples methods for diagnostics, with a single data_quality_input_corruption warning. - Wire the sanity sweep into compute_fair_price_ensemble after the 6 methods compute but BEFORE outlier classification + aggregation. No risk_flags are appended (data quality is an upstream-ingest concern, not a ranking veto). frontend/lib/types.ts - Document the reason taxonomy on FairPriceMethodResult, including the new data_quality_input_corruption entry. No schema shape change — it's a string member of the existing union. tests/test_valuation/test_ensemble.py (+4 cases) - test_data_quality_sanity_guard_triggers_on_extreme_method_value: one method at $10,001 nulls all 6 + emits warning + empty risk_flags + preserves tier_used. - test_data_quality_guard_boundary_exactly_at_ceiling: $10,000 exactly does NOT trip (strict >). - test_data_quality_guard_skipped_methods_dont_trigger: applicable=False methods bypass the check. - test_data_quality_guard_end_to_end_via_full_ensemble: integration through compute_fair_price_ensemble with a corrupted snapshot (shares_outstanding=10) yields the canonical all-null payload. Verification ------------ - ruff check passes on all changed files. - pytest tests/ -m "not network": 368 passed (was 364 -> +4 new). - npx tsc --noEmit: clean. - All 9 F1 tickers (median > 10x current) from the Step 7 verification report should now show fair_price=null with the new warning. Will re-confirm via workflow_dispatch re-trigger. Issue drafts staged (not yet filed) ----------------------------------- /tmp/issue_drafts/issue_shares_outstanding_bug.md /tmp/issue_drafts/issue_avg_3y_roe_denominator.md /tmp/issue_drafts/issue_mos_display_clamping.md What's NOT in this commit ------------------------- - Fix to compute/ingest/fundamentals.py (deferred to Phase 4 — issue draft staged). - Fix to compute/main.py::_avg_3y_roe denominator (Phase 4 follow-up). - Frontend mos_pct clamping (Step 10). https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2
Adds a single cross-sectional sanity smoke test that runs once per
weekly compute and surfaces a number on Metadata.mos_trailing_ic_smoke.
This is NOT a backtest — it's a same-day Spearman rank correlation
between StockSummary.margin_of_safety_pct and trailing 1-year return,
intended for operators to spot-check that the fair-price ensemble is
producing values with some signal-to-noise relative to historical
price moves. A non-trivial correlation (positive OR negative) is
informative; near-zero says "the field is essentially uncorrelated
with recent return drift."
Changes
-------
compute/scoring/sanity.py (NEW, ~95 LOC)
- compute_mos_trailing_ic(*, rankings, prices_by_ticker, lookback_days=252)
-> float | None
- Skips per-ticker:
* mos_pct is None (ensemble couldn't aggregate)
* data_quality_input_corruption in valuation_warnings
(Step 7.5 / Defense #7 fired — explicit guard against future
regressions where the warning is surfaced even with a non-null
mos_pct)
* ticker missing from prices_by_ticker
* len(prices) < lookback_days
* lookback or trailing close <= 0 (defensive)
- Returns None when:
* Fewer than MIN_SAMPLE=30 valid pairs after filtering
* All mos_pct values identical (corr undefined)
* All trailing returns identical (corr undefined)
* Computed coefficient is non-finite (defensive)
- Spearman implemented as Pearson on the rank vectors via
pd.Series.rank() — mathematically identical to scipy.stats.spearmanr
but doesn't require adding scipy to project dependencies.
- Heavy-tail rationale documented: 143/502 stocks had mos_pct outside
[-99%, +500%] in Step 7 verification; Pearson would be dominated
by those outliers.
compute/main.py
- Wire compute_mos_trailing_ic into run_weekly_compute, called once
after the per-ticker loop builds the summaries list, before
Metadata construction.
- metadata.mos_trailing_ic_smoke = compute_mos_trailing_ic(...)
- INFO log of the result.
tests/test_scoring/test_sanity_smoke.py (NEW, 16 cases)
- A1/A2/A3: math correctness (perfect rank corr = 1.0; perfect
inverse = -1.0; random pairs in [-1, 1]).
- B1/B2/B3: sample-size threshold — 29 → None, 30 → float, 100 → float.
- C1/C2: identical mos / identical returns → None.
- C3/C4: None mos / data_quality_input_corruption skipped.
- D1/D2: insufficient prices / missing ticker → skipped.
- E1: zero/negative lookback close → skipped.
- F1: integration — 50-stock universe with 5 corruption-warning
tickers + 5 None-mos tickers + 40 valid pairs → finite Spearman.
- 2 module-constant regression tests (DEFAULT_LOOKBACK_DAYS=252,
MIN_SAMPLE=30).
Verification
------------
- ruff check passes on all changed files.
- pytest tests/ -m "not network": 386 passed (was 368 -> +18:
16 new sanity tests + 2 constant regressions).
- npx tsc --noEmit: clean.
What's NOT in this commit
-------------------------
- Production verification of the actual smoke value — deferred to
Step 12 final verification along with the rest of the schema.
- METHODOLOGY.md note about smoke-test-not-backtest distinction —
Step 11 documentation work.
https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2
Catches silent schema drift between the Pydantic output schemas in
compute/output/schemas.py and the TypeScript types in
frontend/lib/types.ts at CI time. PR-3c added 6 new fields + 3 new
types; the guard makes future drift impossible to merge accidentally.
Changes
-------
compute/output/schema_check.py (NEW, ~245 LOC)
- generate_snapshot() emits a deterministic dict
{ModelName: {field_name: {type, required, default}}}, alphabetized
at both levels, with human-readable type strings ("float | None",
"list[str]", "dict | None") and JSON-safe default representations
(None / scalars verbatim; <required> sentinel; <factory:list> /
<factory:PillarScores> for default_factory; <repr:...> for
anything else).
- check_snapshot() compares stored vs fresh, returns
(in_sync: bool, diff: str | None) with a grouped-by-model diff:
added / removed / changed fields per model.
- update_snapshot() writes frontend/lib/schema-snapshot.json with
trailing newline.
- main(argv) — CLI entry point. Exit codes: 0 in-sync, 1 drift, 2
unexpected error. Prints a human-readable diff + resolution
instructions on drift.
- Tracks 6 models: DataQuality, Metadata, PillarScores, RawMetrics,
StockDetail, StockSummary.
frontend/lib/schema-snapshot.json (NEW, 7.6 KB / 374 lines)
- Initial snapshot generated from the schemas as of c346ed5.
- Confirms the 6 new Phase-3c fields are tracked: StockSummary
fair_price / max_fair_price / margin_of_safety_pct /
valuation_warnings; StockDetail fair_price / valuation_warnings /
has_history / tangible_book_value; Metadata mos_trailing_ic_smoke;
RawMetrics goodwill.
.github/workflows/ci.yml
- New "Schema snapshot guard" step in the python job, between Ruff
and Pytest. Runs `python -m compute.output.schema_check` — fails
the job on drift before pytest spends time.
tests/test_output/test_schema_check.py (NEW, 23 cases)
- A1-A5: snapshot generation contract (top-level keys, alphabetical
ordering, required field-info keys, TRACKED_MODELS canary that
catches new BaseModel subclasses missing from the registry).
- B1-B2: round-trip semantics (update then check is in-sync;
mutating the file triggers drift).
- C1-C3: diff message quality (added/removed/changed fields appear
with the model name; type changes show old → new).
- D1-D4: CLI surface (--update-snapshot writes; no-arg in-sync
returns 0; drift returns 1 with resolution text; missing file
surfaces a helpful message).
- E1-E3: critical Phase-3c fields are tracked (regression guard).
- F1: the committed snapshot matches the live schemas (a pytest-side
mirror of the CI check, so test runs catch drift early too).
- 4 misc: factory/required normalization sentinels, invalid-JSON
handling, parametrized smoke for both CLI modes.
Manual break/revert verification (Step 9.4)
-------------------------------------------
1. Injected a dummy field into a tracked Pydantic model.
2. Ran `python -m compute.output.schema_check` → exit 1, diff printed
the model + field name + new type info, plus resolution
instructions ("update types.ts then run --update-snapshot OR
revert").
3. Restored schemas.py → re-ran → exit 0, "in sync" message.
4. `diff` confirms schemas.py is byte-identical to the original.
Verification
------------
- ruff check passes on the entire repo.
- pytest tests/ -m "not network": 409 passed (was 386 -> +23 new
schema_check cases).
- npx tsc --noEmit: clean.
- python -m compute.output.schema_check: exit 0, "in sync".
What's NOT in this commit
-------------------------
- Production rankings.json change (this is a CI-time guard only).
- Frontend updates for Step 10 (PriceHistoryChart, mos clamping
per Issue 3).
https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2
… chart
First user-visible PR-3c step. Surfaces the new ensemble fields in the
rankings table + stock detail page, adds a lazy-loaded 1y price chart
backed by the per-stock history JSONs from Step 5, and clamps the
display of extreme MoS values per the Step-7 verification report
(Issue 3 acceptance criteria).
Changes
-------
frontend/lib/format.ts (NEW, ~55 LOC)
- formatMosPct(mos | null) → { display, tooltip, isClamped }:
* null → "—" (no tooltip)
* mos < -99 → "< −99%" (tooltip shows raw value)
* mos > 500 → "> +500%" (tooltip shows raw value)
* else → "+50.0%" / "-25.0%" with sign + 1 dp
- formatFairPrice(value | null):
* null → "—"
* value >= 1_000_000 → "—" (defensive against future regressions
even after Step 7.5 sanity guard)
* value < 0.01 → "< $0.01"
* else → "$NNN.NN"
- mosColorClass(mos): semantic palette tokens (emerald 700/600 for
undervalued, slate 500 for near-fair, rose 600 for overvalued).
frontend/components/RankingTable.tsx
- Two new sortable columns: "Fair price" + "MoS".
- Sort comparator now nulls-last (don't fight for the top of an
ascending sort when there's no data).
- New column-default sort: composite_score / fair_price /
margin_of_safety_pct → desc; everything else → asc.
- Data-quality flag rendering: stocks with valuation_warnings
including "data_quality_input_corruption" show "⚠ —" with a
title tooltip naming the Step 7.5 sanity guard, instead of the
fair-price number.
- Mobile cards: third row added for "Fair $X · MoS ±Y%".
frontend/components/PriceHistoryChart.tsx (NEW, ~125 LOC)
- "use client" component; lazy fetch via useEffect from the static
/data/stocks/history/{TICKER}.json files written by Step 5.
- Honors NEXT_PUBLIC_BASE_PATH for sub-path deploys.
- Loading + error + empty states all render at h-64 to prevent
layout shift on mount.
- Column-major → row-major transformation; null closes are dropped
(preserves the gap rather than drawing through it).
- Recharts LineChart with monotone interpolation, $-axis ticks,
tooltip showing date + close to 2 dp.
frontend/components/FairPriceCard.tsx (NEW, ~150 LOC)
- 4-stat headline grid: median fair / margin of safety / max
(ex-outliers) / tangible BVPS.
- Per-method breakdown table: Graham, P/E, P/B, EV/EBITDA, RIM, DCF.
Each row shows formatted value or italic "skipped" with the
reason in the title attribute (so hovering surfaces the
applicability gate). Multiples methods get a small label
showing the peer tier used ("vs sub_industry peers" etc).
- Warning chips below the table for each entry in
valuation_warnings — amber pills, snake_case → spaces.
- Renders gracefully when ensemble is null (snapshot missing) or
data_quality_input_corruption fires (replaces median/max with
em-dash).
frontend/app/stock/[ticker]/page.tsx
- Hero block adds an MoS chip below the price (small, color-coded
via mosColorClass).
- New "Price (1y)" section above the fundamentals: renders
PriceHistoryChart when detail.has_history is true, otherwise an
h-64 placeholder.
- New FairPriceCard section between Price and the existing Raw
fundamentals table.
- Footer note updated from Phase 3b → Phase 3c with one sentence
describing the 6-method ensemble.
What's NOT in this commit
-------------------------
- Frontend test framework (none configured; CI's `tsc --noEmit`
+ `next build` are the type-correctness guarantees).
- New top-level npm dependencies (recharts already in package.json).
- Any compute/* changes — frontend-only commit.
- Issue 3 polish (sparkbar / shadcn Tooltip primitive) — the
current implementation uses native title attributes per spec
("native title is acceptable for now").
Verification
------------
- `npx tsc --noEmit`: clean.
- `npm run build`: ✓ 506 routes pre-rendered (1 home + 1 not-found
+ 502 stock detail pages + 2 misc), route /stock/[ticker] is
98.9 kB (recharts + ensemble card; same chunked JS shared across
all 502 pages).
- `ruff check .`: clean (entire repo).
- `python -m compute.output.schema_check`: in-sync (no schema
changes, just consumes existing fields).
- `pytest tests/ -m "not network"`: 409 passed (no Python-side
changes; sanity check that nothing regressed).
https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2
Documentation-only commit. No code changes. PHASE_STATUS.md (+69 lines) - PR 3c row flipped to ✅ DONE 2026-05-09 with full description: 7 defenses delivered (the original 6 + Defense #7 added mid-PR at Step 7.5), all schema additions, frontend wire-up scope, and production verification stats. - Defense scorecard table updated: now-vs-v1.0 view across all 3 defense layers. - New "Phase 3c verified production stats" + "Phase 3c acceptance checklist" subsections (parallel to the existing Phase 1/2 blocks). - Phase 3 sub-PR plan: 3c moved from 🟡 NEXT → ✅ DONE; 3d/3e ETAs updated. - v1.0 ETA: ~2-3 days remaining (3d + 3e). SKILL.md (net +3 lines, but conceptual change) - Schema-versions section converted from bullet list to a 3-column Markdown table (Schema / Phase / What changed). Phase 3c row documents all 7 defenses, every new schema field, the schema-snapshot CI guard, and the 21-entry reason taxonomy. docs/METHODOLOGY.md (+178 lines) - New "Fair-price ensemble" section: per-method table, aggregation semantics, why-median-not-mean, why-dispersion-matters, sign convention citing Damodaran 2012. - New "Defense layer" section: 3 vetoes / 5 numerical guards / 5+ annotate-only flags as separate tables with sources cited (Altman, Sloan, Pontiff-Woodgate, Penman, Damodaran, Greenblatt). Annotate-vs-veto philosophy spelled out with a Q/A grid. - New "Sanity tests" subsection: NOT-a-backtest disclaimer, Spearman-not-Pearson rationale, null-return semantics, references Phase 4+ for real predictive validation. - Composite-weights table preserved; Realistic-expectations + Honest- limits sections preserved verbatim from the prior doc. What's NOT in this commit ------------------------- - WORKFLOW.md changes (Phase 3c was already documented as the roadmap target; the commit is the execution flip in PHASE_STATUS). - README.md disclaimer block (Phase 3e adds the Honest Limitations section per the kickoff spec). - stock_ranking_knowledge.md (separate authoritative reference, unchanged in PR 3c). Verification ------------ - ruff check . → clean - python -m compute.output.schema_check → in-sync (no schema changes) - pytest tests/ -m "not network" → 409 passed - File sizes: PHASE_STATUS 183→252; SKILL 569→572; METHODOLOGY 58→236 https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2
…hema-table cleanup Documentation-only follow-up to commit d648b1d. Four fixes: 1. SKILL.md — schema versions table, Phase 3c row. The "(×4)" count for extreme_<method>_estimate was wrong: the outlier guard applies to all 6 ensemble method slots (graham, multiples_pe, multiples_pb, multiples_ev_ebitda, rim, dcf), though in practice only 2-3 fire per stock. Replaced with "(×6 method slots — <method> is one of …; in practice 2-3 fire per stock)" so the slot enumeration is explicit. (The literal `<method>` placeholder was already correct in the file; the earlier review paste lost it to markdown HTML parsing.) 2. METHODOLOGY.md — Annotate-only flags section. Expanded the extreme_<method>_estimate bullet to enumerate the 6 method names and spell out the [0.2x, 5x] outlier band + "kept in MEDIAN, excluded from MAX" semantics, matching Defense #4 in the Numerical guards table. 3. METHODOLOGY.md — Active vetoes table. Updated Altman citation from "Altman 1968" to "Altman 1968, Hotchkiss 2003 update for non-manufacturers". The Z″ < 1.10 threshold comes from the 2003 update in Altman & Hotchkiss (Corporate Financial Distress and Bankruptcy, 3rd ed., Wiley), not the original 1968 paper. Matches the citation already in stock_ranking_knowledge.md §1.2. 4. SKILL.md — schema versions table, Phase 4-8 rows removed. Replaced with a one-line note pointing to WORKFLOW.md "Defense Roadmap" as the single source of truth for unshipped schemas. Reason: the prior Phase 4-8 descriptions reflected the pre-Defense-Playbook roadmap and didn't match the post-2026-05-08 WORKFLOW.md updates (Issue #7 Sloan fix, shares_outstanding ingestion fix, REIT FFO/AFFO substitutes, cross-source validator, IC decay monitor, Bao-Ke ML fraud overlay, MAPIE conformal). The Phase 8 row also conflicted with the current "production hardening" target (S&P 1500 expansion deferred beyond v2.0). Schema table now documents shipped schemas only; roadmap doc owns the rest. Verification ------------ - ruff check . → clean - python -m compute.output.schema_check → in-sync - pytest tests/ -m "not network" → 409 passed - grep checks confirm both files now match the review spec. https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2
6 tasks
dackclup
pushed a commit
that referenced
this pull request
May 10, 2026
Step 3 of PR 3d. Adds the 8-K event scoring module that backs: - Defense #9 (Item 4.02 "Non-Reliance on Previously Issued Financial Statements") — HARD VETO. Joins altman / sloan / NSI as the 4th active veto at v1.0. - Defense #10 (Item 4.01 "Changes in Registrant's Certifying Accountant") — annotate-only. Reg S-K Item 304 mandates the same disclosure for benign reasons, so false-positive rate is too high for veto. Both defenses surface in StockDetail.tier2_events (Pydantic field lands in Step 4) and the user-visible flag list. Per SKILL.md Rule 16, neither modifies the composite score. Changes ------- compute/config.py (+11 LOC) - New constants: * EDGAR_8K_CACHE_DIR = CACHE_DIR / "edgar_8k" * EDGAR_8K_CACHE_TTL_SECONDS = 7 * 86400 * EDGAR_8K_ITEM_TEXT_EXCERPT_CHARS = 500 compute/scoring/eight_k_events.py (NEW, ~310 LOC) - ItemFlag frozen dataclass — return shape for both check_* funcs. Fields: fired (bool), filing_date (str|None), filing_url (str|None), raw_item_text (str|None, ≤ EXCERPT_CHARS). - fetch_recent_8k_filings(ticker, lookback_days) -> list[dict] | None. Wraps edgartools' Company.get_filings(form="8-K", filing_date=...); parses each filing via filing.obj() (returns EightK with .items attribute returning list[str] like ["Item 5.02", "Item 9.01"]); extracts item-text excerpts from EightK.sections (best-effort — shape varies across edgartools versions, gracefully degrades to empty excerpts). - Returns None on EDGAR rate-limit / network failure / missing identity / ticker-not-found. Returns [] on successful fetch with zero 8-Ks in window. - check_non_reliance(ticker) — Item 4.02, 365-day lookback. - check_auditor_change(ticker) — Item 4.01, 730-day lookback. - Both accept optional `filings=` kwarg for unit-test injection. - Most-recent match wins when multiple 4.02 / 4.01 fire in window. - Item-number regex is dot-anchored both sides ("\bItem\s+4\.\s*02\b") so "Item 4.020" does NOT match "Item 4.02". - _ensure_edgar_identity is lazy (logged warning, not RuntimeError) on missing EDGAR_USER_AGENT — Tier-2 features are non-fatal, unlike fundamentals. Cache layer (inlined in eight_k_events.py, ~80 LOC) - JSON-on-disk at compute/cache/edgar_8k/<ticker>.json (gitignored by existing compute/cache/ rule). - 7-day TTL — safe because 4.02/4.01 events are sticky once filed (they don't disappear). - Cache hit requires cached_lookback >= requested_lookback (so a 365d entry can't serve a 730d request). - Atomic write via tmp + os.replace. - Corrupt JSON / unparseable timestamps treated as miss (logged warn). - Filename ticker-sanitized via [^A-Za-z0-9_-] regex (BRK-B works, path-traversal attempts neutralized). - invalidate_cache(ticker) — public helper, idempotent. tests/test_scoring/test_eight_k_events.py (NEW, 28 cases — 25 unit/cache + 3 @network) - A1-A14: synthetic Filing fixture tests (item parsing, lookback windows, multiple matches, case variants, excerpt truncation, frozen dataclass, item-number boundary precision). - B1-B6: cache layer (miss → fetch, hit → no fetch, expired → refetch, invalidate, corrupt JSON, lookback-undersize miss). - 2 ticker-path safety tests (BRK-B preservation, path traversal). - C1-C3: @network smoke against real SEC EDGAR (skipped without EDGAR_USER_AGENT). Asserts 5 known-clean tickers (AAPL/MSFT/GOOGL/ JPM/KO) have ≤1 fired flag, AAPL has neither 4.02 nor 4.01, cache effectiveness via timing. Verification ------------ - ruff check . -> clean (1 pytest.raises(Exception) lint fix — switched to FrozenInstanceError specifically) - python -m compute.output.schema_check -> in-sync (Step 4 adds the Pydantic tier2_events field) - pytest tests/ -m "not network" -> 464 passed (was 439 -> +25 new unit/cache; +3 @network properly skipped) - npx tsc --noEmit -> clean Edgartools API notes (for Step 5 wire-up) ------------------------------------------ - Company.get_filings(form="8-K", filing_date=(start, end)) returns an EntityFilings iterable. - Each Filing has .obj() that returns an EightK (for 8-K forms). - EightK.items returns List[str] like ["Item 5.02", "Item 9.01"] via a 3-tier fallback parser (modern sections → chunked_document → text-pattern extraction). Handles SGML legacy filings (1999-2001). - EightK.sections is the source for item-body excerpts but its shape varies (sometimes dict, sometimes list); the module guards with `if isinstance(sections, dict)` and degrades to empty excerpts if the shape doesn't match expectations. What's NOT in this commit ------------------------- - Pydantic schema additions (Step 4: StockDetail.tier2_events field + Metadata.tier2_coverage_pct) - Risk-overlay integration (Step 4: non_reliance_filing flag joins the risk_flags list) - compute/main.py wire-up (Step 5) - Frontend Tier2EventCard (Step 6) - New pip dependencies — uses existing edgartools + stdlib https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2
dackclup
pushed a commit
that referenced
this pull request
May 10, 2026
Step 4 of PR 3d. Wires Defenses #8 / #9 / #10 into the JSON contract and makes #9 (8-K Item 4.02 non-reliance) the **4th active hard veto** at v1.0. Changes ------- compute/output/schemas.py - StockDetail.tier2_events: dict | None = None Display payload populated by Step 5; shape (when set): {"going_concern_disclosure": bool, "non_reliance_filing": bool, "auditor_change": bool, "latest_8k_filing_date": str | None, # ISO YYYY-MM-DD "latest_8k_filing_url": str | None} - Metadata.tier2_coverage_pct: float | None = None Population-level fetch-success rate. None when Tier-2 disabled (e.g., env var missing). frontend/lib/types.ts - New `Tier2Events` type mirroring the Python dict shape - `StockDetail.tier2_events: Tier2Events | null` - `Metadata.tier2_coverage_pct: number | null` - Inline doc on Tier2Events explaining which fields are veto vs annotate. compute/valuation/applicability.py - SKIP_REASONS: 21 → 24 entries. New stable identifiers: going_concern_disclosure, non_reliance_filing, auditor_change. These are tracked here so the JSON-contract reason taxonomy is complete; the same strings also appear in StockDetail.tier2_events (display) and risk_flags (only non_reliance_filing — hard veto). compute/scoring/risk_overlay.py - Module docstring updated: "three vetoes" → "four vetoes" with non_reliance_filing entry citing eight_k_events.check_non_reliance. - compute_risk_flags acquires a new optional kwarg `non_reliance_by_ticker: dict[str, bool] | None = None`: * Default (None): per-ticker fallback to check_non_reliance(ticker), which hits the 7-day on-disk EDGAR cache or returns ItemFlag(fired=False) when identity is unset (= test environment, sandbox). * Explicit dict: tests + Step 5 inject pre-computed results. Step 5 will share fetch work between this veto path and the StockDetail.tier2_events display path so the EDGAR fetch happens once per ticker per compute run, not twice. This is a slight extension of the spec's pure-inline `check_non_reliance(ticker)` call, but it keeps the function unit-testable without network mocking and avoids a duplicate fetch in production. The default behavior matches the spec exactly when the kwarg is omitted. frontend/lib/schema-snapshot.json - Regenerated via `python -m compute.output.schema_check --update-snapshot`. Diff: +tier2_coverage_pct under Metadata, +tier2_events under StockDetail. No collateral drift. tests/test_output/test_tier2_schema.py (NEW, 13 cases) - A1-A5: Pydantic field validation (StockDetail.tier2_events accepts dict / None; Metadata.tier2_coverage_pct accepts float / None; JSON round-trip preserves the dict shape). - B1-B5: SKIP_REASONS taxonomy (3 new entries present, count = 24, all entries unique). - D1-D3: schema-snapshot file (committed snapshot includes both new fields with correct type/required/default shape). tests/test_scoring/test_risk_overlay.py (+6 cases) - C1-C6: Defense #9 non_reliance integration: * inject {ticker: True} → flag appears * inject {ticker: False} → no flag * empty inject dict → no flag * default path with no EDGAR_USER_AGENT → no flag (existing PR-3c tests rely on this contract; tests use monkeypatch to ensure a clean cache + identity state) * additive with altman/sloan — all 4 vetoes can fire together * inject dict for ticker A doesn't pollute ticker B Verification ------------ - ruff check . -> clean (1 import-sort fix auto-applied) - python -m compute.output.schema_check -> in-sync after regen - pytest tests/ -m "not network" -> 483 passed (was 464 -> +19 new) - npx tsc --noEmit -> clean What's NOT in this commit ------------------------- - compute/main.py wire-up (Step 5 — pre-fetches Tier-2 data in parallel with fundamentals, populates tier2_events display dict + injects non_reliance_by_ticker into compute_risk_flags) - Frontend Tier2EventCard / PillarRadarChart / FairPriceBarChart components (Steps 6-8) https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2
dackclup
pushed a commit
that referenced
this pull request
May 10, 2026
Step 5 of PR 3d. Wires Defenses #8 / #9 / #10 into the production weekly-compute pipeline. After this commit, rankings.json carries the 4th active veto (non_reliance_filing) and StockDetail.tier2_events on every stock; metadata.json reports population-level Tier-2 coverage. Architecture ------------ - New `compute/ingest/filing_text.py` (~210 LOC): 10-K text fetch with 90-day on-disk cache. Mirrors the eight_k_events.py cache pattern (atomic write via tmp + os.replace; fetched_at TTL gate; safe_ticker filename sanitization). Returns None on every failure mode (rate limit, missing identity, no recent 10-K) — never raises. - New `compute/scoring/tier2.py` (~180 LOC): Tier2Result frozen dataclass + fetch_tier2_for_ticker orchestrator + tier2_events_dict + coverage_pct helpers. The orchestrator catches every per-defense exception so one bad ticker can't crash the run. - Reuses `fetch_recent_8k_filings` ONCE per ticker (with the larger 730d lookback that covers both 4.02 and 4.01 windows) — both `check_non_reliance` and `check_auditor_change` operate on the same in-memory filing list. Avoids a duplicate EDGAR call per ticker. compute/config.py - New EDGAR_10K_TEXT_CACHE_DIR + EDGAR_10K_TEXT_CACHE_TTL_SECONDS (= 90 days). 10-K filings are annual so an 89-day stale cache hit returns the same filing. compute/main.py - New "Step 4b" between fundamentals + risk-flag computation: parallel Tier-2 fetch via ThreadPoolExecutor(max_workers=EDGAR_MAX_WORKERS=5). Same parallelism budget as fundamentals — well under SEC's 10/sec rate limit. - non_reliance_by_ticker dict built from tier2_results, injected into compute_risk_flags. Avoids the duplicate fetch the inline default path would have triggered. Only fired tickers go in (per Step 4 spec: dict.get(ticker, False) default). - Per-ticker StockDetail loop populates tier2_events from tier2_events_dict(tier2_results.get(ticker)). Tickers absent from the dict get tier2_events=None — graceful "no Tier-2 data" surface. - Metadata.tier2_coverage_pct populated from coverage_pct(tier2_results). None when universe is empty; 0.0 when all fetches failed; rounded to 2 decimal places otherwise. - Added `Tier2Result` to imports for type clarity (linter wanted it in a separate `from .. import` line because of the `as` alias on coverage_pct — accepted). Failure isolation ----------------- Three layers of safety: 1. Each underlying fetcher (fetch_latest_10k_text, fetch_recent_8k_filings) returns None on any failure — never raises. 2. fetch_tier2_for_ticker wraps each per-defense call in try/except; one defense's failure doesn't abort the orchestrator. 3. The compute/main.py executor loop also catches exceptions from fut.result() — defensive, since the orchestrator already swallows everything. A failed-fetch ticker simply won't appear in tier2_results; the per-ticker loop's tier2_results.get(ticker) returns None, which builds a StockDetail with tier2_events=None. tests/test_scoring/test_tier2.py (NEW, 17 cases) - A1-A6: orchestration permutations (clean, partial 10-K fail, partial 8-K fail, total fail, exception caught, both 8-K items present). - B1-B4: tier2_events_dict shape + non_reliance > auditor_change preference for latest_8k_filing date/url + 5-key contract check. - C1-C5: coverage_pct including 100% / 0% / 49.80% / empty / single. - D1: end-to-end synthetic 10-ticker pipeline covering all 3 defenses. - D2: Tier2Result frozen dataclass. Verification ------------ - ruff check . -> clean - python -m compute.output.schema_check -> in-sync - pytest tests/ -m "not network" -> 500 passed (was 483 -> +17 new) - npx tsc --noEmit -> clean - main.py wire-up smoke-imports cleanly; sanity grep confirms tier2_results / tier2_coverage_pct / tier2_events / non_reliance inject all wired through. Performance budget ------------------ Cold cache estimate (first run): - 502 tickers × 2 EDGAR fetches each (10-K + 8-K) at ~5 parallel workers = ~200s = ~3.5 min. Well under the +10-15 min budget. Subsequent weekly runs: mostly cache hits → +30-60s. NO new asyncio / concurrency primitives — ThreadPoolExecutor matches the existing fundamentals-fetch pattern. What's NOT in this commit ------------------------- - Frontend Tier2EventCard / PillarRadarChart / FairPriceBarChart (Steps 6-8) - Documentation updates (Step 9) - Production verification via workflow_dispatch (Step 10) https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2
dackclup
pushed a commit
that referenced
this pull request
May 10, 2026
…e wire Step 6 of PR 3d. Adds the user-visible surface for Defenses #8 / #9 / #10 — a card that lists fired regulatory events with severity coding (HARD VETO red pill for non-reliance; Annotate amber pill for going-concern + auditor-change). Renders nothing when there are no events or when the StockDetail predates the schema (graceful forward-compat for stocks/*.json files written under PR-3c schema). Position: between PriceHistoryChart and FairPriceCard — regulatory events affect investability more than valuation, so they sit higher in the visual hierarchy. Changes ------- frontend/components/Tier2EventCard.tsx (NEW, ~165 LOC) - "use client" component with strict TypeScript types (no `any`). - Props: tier2_events: Tier2Events | null, ticker: string. - Renders null when: * tier2_events is null OR undefined (loose-equality check — `undefined` is the runtime shape for stock JSONs written under pre-PR-3d schemas, before Step 10's compute regenerates them with the field populated) * All 3 flags are false (clean ticker) - Otherwise renders rows in priority order: non_reliance_filing first (hard veto), then going_concern_disclosure, then auditor_change. Date footer (latest 8-K) shown only when an 8-K flag fired AND a date is present. - "View filing" link with target=_blank + rel=noopener,noreferrer for the 8-K rows; going-concern has no link (text scan, not 8-K). - Inline SVG icons (lucide-react is NOT in package.json — spec's hard constraint says "NO new npm dependencies"). Three 24px stroke icons styled to match lucide's visual language: AlertOctagon (veto), AlertTriangle (going-concern), UserMinus (auditor-change), plus a small ExternalLink for the filing-link affordance. - Light-theme palette matching existing components (rose/amber/slate ring-1 ring-inset badges) — the spec's bg-card/text-foreground tokens reference shadcn dark-theme but the project uses bg-white/text-slate-700. - Accessibility: aria-label on section, role="status" on severity pills, aria-hidden on decorative icons. - Mobile-first: stacked rows on <sm, side-by-side on sm+. frontend/app/stock/[ticker]/page.tsx - Import Tier2EventCard. - Wired between the Price (1y) section and the FairPriceCard block, per spec ordering: chart → events → fair price → fundamentals. Edge case fixed during build verification ----------------------------------------- Initial implementation guarded with `tier2_events === null`. The production stock JSONs committed under PR 3c lack the `tier2_events` key entirely (the schema is forward-compatible: the field is optional in Pydantic, so existing files just don't have it). JavaScript JSON.parse returns `undefined` for absent keys, not `null` — so `=== null` missed the case and the destructure crashed during `next build` for all 502 stocks. Fixed to `== null` (loose equality catches both null + undefined). Comment in the component explains the forward-compat reasoning. Tests (frontend) ---------------- The frontend has no test framework configured (no jest / vitest / @testing-library in package.json). Per spec ("If neither has component tests, skip in favor of visual regression"), no component tests added. `tsc --noEmit` + `next build` are the type/build correctness guarantees: - npx tsc --noEmit -> clean - npm run build -> 506 / 506 routes pre-rendered cleanly What's NOT in this commit ------------------------- - Visual snapshot regression tests (no harness; would require adding playwright or storybook — out of scope) - PillarRadarChart (Step 7) - FairPriceBarChart (Step 8) Verification ------------ - npx tsc --noEmit -> clean - npm run build -> 506 / 506 routes ✓ - ruff check . -> clean (no Python touched) - pytest tests/ -m "not network" -> 500 passed (no Python touched; sanity-check that nothing regressed) Visual spot-checks deferred to Vercel preview --------------------------------------------- I cannot render the component locally; spot-checks happen on the Vercel preview deploy after this commit lands. Spec scenarios: 1. Stock with no Tier-2 events (most production stocks at commit 9cd2c74) → card hidden ✓ (forward-compat null-check) 2. Stock with auditor_change only → amber Annotate row + link 3. Stock with non-reliance fired → red HARD VETO row + link 4. All 3 fired → 3-row card + 8-K date footer Production stock JSONs at HEAD won't have tier2_events populated (Step 10 workflow_dispatch is what triggers regeneration). So the preview will show "no Tier-2 events" everywhere; full visual verification of fired states happens at Step 10. https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2
dackclup
pushed a commit
that referenced
this pull request
May 10, 2026
…efer Run #14 timeout root cause: SEC EDGAR API throttling amplified by tenacity retry policy (max=30s × 3 attempts = 60-90s per failed stock). Run #11 (2 days ago) finished in 23m on same code path; Run #14 stuck 1h+ in fundamentals stage = 3-6x SEC API slowdown during incident. Mitigations: 1. Tighten retry on BOTH _build_snapshot and _build_annual_history: stop=(stop_after_delay(30) | stop_after_attempt(2)), wait=wait_exponential(min=2, max=8). Caps per-stock retry at ~30s. 2. Per-stock fundamentals + history fetch timeout (fut.result(timeout=45)) — graceful skip on stuck-task. Defensive backstop; real cap is the inner tenacity stop_after_delay. 3. Suppress noisy edgartools concept-miss UserWarnings via facts._suppress_warnings = True after company.get_facts(). Skips the difflib fuzzy-match suggestion pass and frees stderr for triage. 4. Per-stock latency histogram (<5s / 5-15s / 15-30s / 30s+) with thresholds aligned to retry-policy tiers, plus p50/p95 + top-20 slow tickers logged for Phase 4 throttling-detection visibility. 5. fundamentals_coverage_pct + fundamentals_latency_p50_seconds + fundamentals_latency_p95_seconds in Metadata mirror the existing tier2_coverage_pct. ALSO: defer 8-K event fetches (Defenses #9 + #10) to Phase 4. Three workflow timeouts (#12, #13, #14) consumed budget; ship PR 3d with going-concern (Defense #8) only. _EIGHT_K_DEFENSES_ENABLED feature flag gates the 8-K branch — single-line flip in Phase 4 to re-enable once the pre-cache layer lands. Schema unchanged; 8-K event fields in tier2_events emit but always False/None until Phase 4. Active veto count temporarily 3 (was planned 4); restored in Phase 4. Tests: 511 → 526 (+15: 5 deferred-mode tier2, 5 histogram/percentile/ tuple-return main, 1 retry-policy contract, 4 fixture-extended A/D tests for 8-K wiring). Tracked: /tmp/issue_drafts/issue_8k_events_phase4.md + /tmp/issue_drafts/issue_fundamentals_resilience_phase4.md. https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2
dackclup
added a commit
that referenced
this pull request
May 10, 2026
…ience (#12) feat(phase-3d): Tier-2 going-concern + UI polish + fundamentals resilience (#12) Ships Tier-2 going-concern defense (Defense #8, annotate-only, known 10.8% FP rate to refine in Phase 4) + 3 frontend UI components (Tier2EventCard, PillarRadarChart, FairPriceBarChart) + fundamentals resilience (retry tightening + per-stock timeout + latency observability). 8-K event defenses (#9 + #10) DEFERRED to Phase 4 due to SEC API throttling cost during integration. Schema additions: tier2_events, tier2_coverage_pct, fundamentals_coverage_pct, latency p50/p95. Schema 0.5.0-phase3c → 0.6.0-phase3d. Tests 409 → 526. Defense scorecard: 3 vetoes / 5 guards / 6 annotate. v1.0 ETA: PR 3e (Tier-3 + Honest Limitations). Generated with Claude Code · Tested with Anthropic API
This was referenced May 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
feat(phase-3c): Fair price ensemble + price history + Tier-1 defenses
Closes the v1.0 milestone halfway: ships the fair-price ensemble + 6 numerical defenses + price-history files. After this merges, PR 3d (Tier-2 events) and PR 3e (Beneish/Dechow + v1.0 tag) close out v1.0.
Scope
Fair-price ensemble (6 methods)
Aggregation: median of all applicable + max excluding outliers (values outside [0.2×, 5×] of current price). MoS = (median − current) / median.
Price history
Per-stock 1y OHLCV at
stocks/history/{TICKER}.json(column-major, ~30 KB each, lazy-loaded by frontend).Defense backbone
altman_distress(Z″ < 1.10),sloan_accruals_top_decile,net_issuance_top_decile(NEW; within-sector, Pontiff-Woodgate 2008)stale_filing(120 / 180 d), outlier 5×, terminal_g, sector_exclusions, data_quality_$10K_ceiling (Defense Investigate NVDA Sloan accruals flag — genuine red flag or growth-stage artifact? #7, added mid-PR after BKR=$105M spot-check)goodwill_heavy,value_trap_risk,extreme_{method}_estimate(one per method slot —{method}is one ofgraham/multiples_pe/multiples_pb/multiples_ev_ebitda/rim/dcf; in practice 2–3 fire per stock),stale_filing_soft,data_quality_input_corruptionProduction verification (commit
1a25d84, run #11)0.4.0-phase3b→0.5.0-phase3cmos_trailing_ic_smoke: −0.2216 (consistent with current momentum-up regime; not a predictive claim — seecompute/scoring/sanity.pydocstring)25618712829, last_update2026-05-10T03:49:51ZTests
frontend/lib/schema-snapshot.json) prevents Python ↔ TypeScript driftReviewer checklist
Post-merge actions
v0.5.0-phase3cclaude/phase-3c-fair-price-defenses/tmp/issue_drafts/:shares_outstandingingestion bug (~11 tickers)_avg_3y_roedenominator bug (221 falsevalue_trap_riskflags)mos_pctfrontend display clamping (already implemented in Step 10; decide whether to file for trace or skip)Defense Playbook lineage
Prior research (docs PR #8, commit
e7c418c) established the 26-defense roadmap. PR 3c implements Tier-1 (defenses 1-6 + bonus Defense #7). PR 3d implements Tier-2 (going-concern + 8-K events). PR 3e implements Tier-3 (Beneish M-Score + Dechow F-Score + Honest Limitations). v1.0 tags after PR 3e merges.Generated with Claude Code · Tested with Anthropic API
Session: https://claude.ai/code/session_015649aRyi2bvciQYZVNACd2