Skip to content

epic(sbom): SBOM as single source of truth for all version data #35

@castrojo

Description

@castrojo

Goal

Make static/data/sbom-attestations.json — populated nightly by the SBOM pipeline — the single, authoritative source of truth for all package version data shown on the site. All scripts that currently scrape GitHub release body HTML or use stale fallback data must instead read from this cache.

Problem

The documentation site currently has two competing version data sources:

  1. SBOM cache (sbom-attestations.json) — accurate, cryptographically verified, full RPM manifest (2,676 packages per release). But it is always empty in production because the nightly pipeline uses a stale PAT (PROJECT_READ_TOKEN) that returns HTTP 403.

  2. GitHub release body HTML scraping (fetch-github-driver-versions.js) — fragile regex matching against Markdown tables in release bodies. The SBOM overlay in this script is listed as optional and only applies when the cache is populated — which it never is.

The result: the driver versions page, changelogs, and images catalog all show version data scraped from release bodies, not from the verified SBOM.

Root Cause

fetch-github-sbom.js uses the GitHub Packages API (/orgs/ublue-os/packages/container/bluefin/versions) to enumerate image tags. This API requires packages:read scope on a cross-org PAT. The PAT expired/was rotated and the pipeline has been silently failing ever since.

The PAT is not needed. Public GHCR images are accessible anonymously. The GitHub Releases API (/repos/ublue-os/bluefin/releases) provides tag names without any elevated scope. The GHCR manifest API provides digests via an anonymous bearer token.

Product Catalog and SBOM Coverage

GTS is retired — it is not present in any streamOrder in PRODUCT_SPECS and must not appear in STREAM_SPECS.

The correct stream catalog (derived from live PRODUCT_SPECS in fetch-github-images.js):

SBOM Stream ID Package Org Stream SBOM Status
bluefin-stable bluefin ublue-os stable Has SBOMs (keyless)
bluefin-latest bluefin ublue-os latest Has SBOMs (keyless)
bluefin-lts bluefin ublue-os lts No SBOMs yet (key-based)
bluefin-dx-stable bluefin-dx ublue-os stable Has SBOMs (keyless)
bluefin-dx-latest bluefin-dx ublue-os latest Has SBOMs (keyless)
bluefin-dx-lts bluefin-dx ublue-os lts No SBOMs yet (key-based)
bluefin-gdx-lts bluefin-gdx ublue-os lts No SBOMs yet (key-based)
bluefin-gdx-latest bluefin-gdx ublue-os latest Unknown
projectbluefin-dakota dakota projectbluefin latest No signing pipeline

Note: stable-daily and beta are build stages, not separate SBOM streams.

Child Issues

  • castrojo/documentation issue 36 — Replace Packages API with Releases API and anonymous GHCR token in fetch-github-sbom.js
  • castrojo/documentation issue 37 — Remove PAT from update-sbom-cache.yml — use github.token only
  • castrojo/documentation issue 38 — Make fetch-github-driver-versions.js use SBOM as primary source, HTML scraping as NVIDIA-only fallback
  • castrojo/documentation issue 39 — Update AGENTS.md to remove stale packages:read PAT documentation

Success Criteria

  • update-sbom-cache.yml runs nightly with zero PAT dependencies and produces a non-empty sbom-attestations.json
  • fetch-github-driver-versions.js reads kernel, gnome, mesa, podman, systemd, bootc from SBOM; only NVIDIA falls back to release body
  • All version data on driver-versions page, changelogs, and images catalog comes from the SBOM cache
  • npm run typecheck and npm run lint pass with zero errors
  • AGENTS.md no longer documents packages:read as a required scope for SBOM
  • STREAM_SPECS in fetch-github-sbom.js matches the product catalog above — no GTS stream

Branch

feature/firehose-changelogs in castrojo/documentation

Assisted-by: Claude Sonnet 4.6 via GitHub Copilot
Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Architecture Review

Verdict: Architecturally sound. Four clarifications needed before implementation begins.

Strengths

The epic correctly identifies the root failure mode (silent PAT expiry producing an always-empty cache) and proposes an approach with the right properties: replace an elevated credential with a credential-free path, elevate the verified source (SBOM) to primary, and demote the fragile source (release body scraping) to NVIDIA-only fallback. The data quality improvement is real — 2,676 RPM packages from a cryptographically attested SBOM versus regex over Markdown tables. The atomic-write pattern already in fetch-github-sbom.js (write to .tmp then rename) is correct and must be preserved across the rewrite.

Finding 1: The manifest index / multi-arch digest problem (medium severity)

The proposed anonymous GHCR bearer token path fetches the manifest and uses the Docker-Content-Digest header. This returns the image index digest (multi-arch manifest list), not the amd64 platform digest. SBOM attestations are attached to the platform-specific manifest digest. Passing the index digest to cosign or oras will silently find no attestations. Implementation of issue 36 must resolve the amd64 child manifest digest, not the index digest.

Finding 2: Silent success on empty output (medium severity)

If the Releases API returns an empty list due to a temporary outage, the workflow will write a structurally valid but stream-empty JSON and exit zero. The workflow must add a validation step that exits non-zero if all streams produce zero releases.

Finding 3: Implementation sequencing dependency (low severity)

Issue 36 (rewrite the script) must merge and be verified working before issue 37 (remove PAT from workflow) is implemented. If the workflow PAT is removed while the script still calls the Packages API, the next nightly run fails with HTTP 403.

Finding 4: oras login can be eliminated entirely (low severity)

For public GHCR images, oras falls back to anonymous bearer token negotiation automatically when not logged in. The oras login step should be deleted entirely from the workflow, not just have its credential swapped.

QA Checklist

Partial-Cache Boundary Cases

  • When only bluefin-stable and bluefin-latest are populated in sbom-attestations.json and all lts/gdx streams are absent, the driver-versions page renders correctly for the populated streams and shows graceful fallback (not blank or crashed) for unpopulated streams
  • When the SBOM cache file exists but contains { "generatedAt": null, "streams": {} } (the committed seed), downstream scripts detect the empty state and fall back to release-body scraping — no Cannot read properties of null errors
  • When sbom-attestations.json is missing entirely from the static data directory at build time, npm run build fails with a clear error rather than silently building with no version data

CI Signal for Empty Cache

  • If update-sbom-cache.yml writes a cache entry where every stream has zero releases, the workflow exits non-zero — there is no silent success
  • The workflow job summary or step output distinguishes between "cache written with N releases" and "cache written but empty" — an operator can tell from the GitHub Actions UI whether the run produced usable data
  • A workflow run that exits non-zero due to empty cache does NOT overwrite a previously valid cache entry (cache write must be gated behind the validation step)

LTS Streams With No SBOMs

  • bluefin-lts, bluefin-dx-lts, and bluefin-gdx-lts entries in sbom-attestations.json have attestation.present: false — not a missing key, not null, not an error object
  • fetch-github-driver-versions.js does not attempt to read packageVersions for an lts stream when attestation.present is false — it falls back to release-body scraping without logging an error
  • When lts SBOMs are eventually published and attestation.present flips to true, the script picks them up automatically on the next nightly run — no code change is needed
  • The success criterion "all version data comes from SBOM cache" is scoped to streams that have SBOMs — the epic's success criteria must explicitly exclude lts/gdx-lts streams from the SBOM-primary requirement until they publish attestations

GTS Retirement

  • STREAM_SPECS in fetch-github-sbom.js contains exactly 8 stream entries — no bluefin-gts entry
  • sbom-attestations.json output from the rewritten pipeline contains no bluefin-gts key in streams
  • No references to gts appear in any stream mapping, fallback table, or comment in any modified script

Cross-Stream Data Isolation

  • A row belonging to bluefin-dx-stable never reads SBOM data from the bluefin-stable cache entry, even if the cacheKey prefix (stable-YYYYMMDD) matches — the full streamId + cacheKey must be the lookup key
  • projectbluefin/dakota (no signing pipeline) produces no SBOM lookup attempt — no oras discover or cosign calls are made for dakota streams

Error Propagation

  • A transient network error during GHCR token fetch causes the affected stream to be skipped with a logged warning — the script does not crash and still produces output for other streams
  • A non-200 response from the Releases API for one stream does not abort processing of remaining streams

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesttask/p1High priority task

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions