Skip to content

feat(coverage): surface missing-SKU gaps as automated reports #1

@Seungpyo1007

Description

@Seungpyo1007

What

Detect SKUs that exist in upstream catalogs but are missing from the curated TechAPI dataset, and surface them as an actionable list (issue body or PR comment) so curators can fill the gaps.

Why

The current curated dataset is by design a subset, but it's hard to know what's missing without manual auditing. As of the initial scaffold, the CPU coverage alone is ~925 records — credible but visibly sparse for older Intel/AMD lineups and mid-tier Xeon/EPYC. We need a continuous signal of "what does upstream have that we don't?" so curation effort is targeted.

Sources (per category, kebab-case slugs reconciled)

  • CPU — Intel ARK (vendor product index), AMD product pages, Wikipedia List_of_Intel_*_microprocessors, TechPowerUp CPU DB
  • GPU — NVIDIA / AMD / Intel product pages, TechPowerUp GPU DB, Wikipedia List_of_*_graphics_processing_units
  • SoC / Smartphone / Brand — vendor product pages + Wikipedia infobox tables

Per source: produce a canonical set of slugs, then compute set(upstream) - set(curated).

Deliverables

  • app/coverage/ package
  • One module per source (intel_ark.py, wikipedia_cpu.py, …) that fetches & yields a normalized list of (category, slug, name, url) tuples
  • app/coverage/report.py — aggregates and writes a Markdown report (top-N missing per category, with source link for each)
  • New workflow .github/workflows/coverage-report.yml — weekly cron, runs the aggregator, opens or updates a single sticky issue on TechAPI titled "Coverage gaps (auto-generated)" with the latest report
  • Tests covering at least the normalization layer (input HTML/JSON → slug set), with vendored fixtures

Out of scope

  • Actually adding the missing records (that's #2)
  • Quality scoring of existing records (deferred)

Acceptance

  • python -m app.coverage exits 0 and writes coverage-report.md
  • Workflow runs weekly and updates the sticky issue on TechAPI
  • At least 3 sources wired up (1 for CPU, 1 for GPU, 1 for smartphone/SoC) end-to-end

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions