Skip to content

Add shape fingerprint + block auto-merge on shape drift in sync_bcfishpass_csvs.R #64

@NewGraphEnvironment

Description

@NewGraphEnvironment

Problem

data-raw/sync_bcfishpass_csvs.R (and the daily sync-bcfishpass-csvs.yml GHA cron) compares each bcfishpass-sourced CSV against a recorded sha256 byte checksum. When upstream changes the file BYTES, an auto-PR opens and auto-merges. This works for value drift (rows added/edited) but is blind to shape drift — yesterday's user_habitat_classification.csv long→wide reshape (with a column type change) passed straight through the auto-merge guard and broke link's processing code downstream. Same path silently corrupts cached HTML report output and ripples into fresh's API (fresh#176, #177).

Proposed Solution

Compute a shape fingerprint (column names + types) alongside the byte checksum. Branch the sync logic:

Drift type What changed Action
Byte drift, shape unchanged Row edits, additions, deletions Auto-PR + auto-merge as today
Shape drift Cols added / renamed / removed / reshaped / type change Auto-PR opens, labelled schema-drift, NOT auto-merged, GHA fails loud (red X on Actions tab)

Concrete changes

  • Extend the provenance: block in inst/extdata/configs/<bundle>/config.yaml with a shape_checksum field (sha256 of column-name + column-type signature) per CSV
  • Update data-raw/sync_bcfishpass_csvs.R to compute and compare shape fingerprints
  • Branch in .github/workflows/sync-bcfishpass-csvs.yml: shape-drift PRs get schema-drift label and skip the auto-merge step; workflow exits non-zero so the Actions tab shows red
  • Extend lnk_config_verify() to surface shape-fingerprint drift alongside byte-checksum drift

Operational model after this lands (with crate's adapter — see crate#TBF)

  • Most days, byte-only drift → auto-PR + auto-merge as today, zero human action
  • Shape-drift days (rare) → auto-PR halts at merge gate, fails loud on Actions tab
  • Triage flow: 2 coordinated PRs (crate updates schema YAML + adapter + releases; link bumps crate dep + merges schema-drift PR). Pipelines keep running on old shape until both ship — adapter-driven shape stability hides upstream variability from downstream consumers (fresh, reporting)

Acceptance criteria

  • provenance: block in both inst/extdata/configs/bcfishpass/config.yaml and inst/extdata/configs/default/config.yaml includes shape_checksum for every smnorris-sourced file
  • data-raw/sync_bcfishpass_csvs.R computes shape fingerprint at sync time + compares to recorded
  • Sync workflow opens auto-PR + auto-merges for byte-only drift (existing behaviour preserved)
  • Sync workflow opens auto-PR with schema-drift label + DOES NOT auto-merge for shape drift
  • Workflow exits non-zero on shape drift (Actions tab shows red)
  • lnk_config_verify() reports shape-fingerprint drift alongside byte-checksum drift
  • Test: simulate the 2026-04-26 long→wide reshape and confirm the workflow correctly labels + halts merge

Context / related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions