Skip to content

Composite scoring should respect data_quality_input_corruption #18

@dackclup

Description

@dackclup

Background

SPG case from Run #15 (2026-05-10) reveals defense-chain gap:

  • data_quality guard correctly suppresses fair_price ensemble when fundamentals corrupted
  • But 8-pillar composite still uses corrupted values
  • Result: corrupted-data stocks can rank at top of composite list (SPG ranked feat(phase-0): project scaffolding #1 despite obvious data corruption — market_cap $1.62M vs actual ~$76B; tangible BVPS $648,521/share vs actual ~$20-30)

Current behavior

Stage 1: Fetch fundamentals → may return corrupted values
Stage 2: Data quality guard → fires data_quality_input_corruption
         → fair_price methods all skip
         → fair_price.median = null
Stage 3: Composite scoring → STILL uses corrupted pillar inputs
         → produces inflated/erroneous score
Stage 4: Top-5 enforcement → relies on Sloan/Altman/NSI vetoes
         to filter, but data_quality is annotate-not-veto

In SPG's case, the Sloan veto happened to fire (suppressing it from effective Top-5 correctly), but that's coincidental — there's no guarantee data-corrupted stocks always trip another veto.

Proposed fix (Phase 4)

Option Approach Trade-off
A Penalize composite when data_quality_input_corruption fires (e.g., −50% downweight) Soft signal; doesn't guarantee removal from top
B Set composite = null (exclude from rankings entirely) Cleanest UX but loses transparency about the corrupted entity existing
C Promote data_quality_input_corruption to veto status alongside Sloan/Altman/NSI Consistent with existing veto framework — corrupted-data stock keeps rank but loses entered_top5 badge

Recommend Option C: most consistent with existing veto framework. data_quality is currently annotate-only because false positives exist; raising bar to veto requires reviewing FP rate first (currently 8/502 ≈ 1.6% — acceptable for veto promotion).

Alternative: Keep annotate, but UI should surface "rank suppressed due to data quality issue" indicator on the corrupted-data stock's rankings card.

Severity

High. Top-of-rankings UX issue, user trust signal. A user landing on / and seeing SPG #1 with $1.62M market cap will lose confidence in the entire ranking layer.

Phase 4 priority

High.

Labels

phase-4, defense-chain, data-quality, ux

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions