Problem
data-raw/sync_bcfishpass_csvs.R (and the daily sync-bcfishpass-csvs.yml GHA cron) compares each bcfishpass-sourced CSV against a recorded sha256 byte checksum. When upstream changes the file BYTES, an auto-PR opens and auto-merges. This works for value drift (rows added/edited) but is blind to shape drift — yesterday's user_habitat_classification.csv long→wide reshape (with a column type change) passed straight through the auto-merge guard and broke link's processing code downstream. Same path silently corrupts cached HTML report output and ripples into fresh's API (fresh#176, #177).
Proposed Solution
Compute a shape fingerprint (column names + types) alongside the byte checksum. Branch the sync logic:
| Drift type |
What changed |
Action |
| Byte drift, shape unchanged |
Row edits, additions, deletions |
Auto-PR + auto-merge as today |
| Shape drift |
Cols added / renamed / removed / reshaped / type change |
Auto-PR opens, labelled schema-drift, NOT auto-merged, GHA fails loud (red X on Actions tab) |
Concrete changes
- Extend the
provenance: block in inst/extdata/configs/<bundle>/config.yaml with a shape_checksum field (sha256 of column-name + column-type signature) per CSV
- Update
data-raw/sync_bcfishpass_csvs.R to compute and compare shape fingerprints
- Branch in
.github/workflows/sync-bcfishpass-csvs.yml: shape-drift PRs get schema-drift label and skip the auto-merge step; workflow exits non-zero so the Actions tab shows red
- Extend
lnk_config_verify() to surface shape-fingerprint drift alongside byte-checksum drift
Operational model after this lands (with crate's adapter — see crate#TBF)
- Most days, byte-only drift → auto-PR + auto-merge as today, zero human action
- Shape-drift days (rare) → auto-PR halts at merge gate, fails loud on Actions tab
- Triage flow: 2 coordinated PRs (crate updates schema YAML + adapter + releases; link bumps
crate dep + merges schema-drift PR). Pipelines keep running on old shape until both ship — adapter-driven shape stability hides upstream variability from downstream consumers (fresh, reporting)
Acceptance criteria
Context / related
Problem
data-raw/sync_bcfishpass_csvs.R(and the dailysync-bcfishpass-csvs.ymlGHA cron) compares each bcfishpass-sourced CSV against a recorded sha256 byte checksum. When upstream changes the file BYTES, an auto-PR opens and auto-merges. This works for value drift (rows added/edited) but is blind to shape drift — yesterday'suser_habitat_classification.csvlong→wide reshape (with a column type change) passed straight through the auto-merge guard and broke link's processing code downstream. Same path silently corrupts cached HTML report output and ripples into fresh's API (fresh#176, #177).Proposed Solution
Compute a shape fingerprint (column names + types) alongside the byte checksum. Branch the sync logic:
schema-drift, NOT auto-merged, GHA fails loud (red X on Actions tab)Concrete changes
provenance:block ininst/extdata/configs/<bundle>/config.yamlwith ashape_checksumfield (sha256 of column-name + column-type signature) per CSVdata-raw/sync_bcfishpass_csvs.Rto compute and compare shape fingerprints.github/workflows/sync-bcfishpass-csvs.yml: shape-drift PRs getschema-driftlabel and skip the auto-merge step; workflow exits non-zero so the Actions tab shows redlnk_config_verify()to surface shape-fingerprint drift alongside byte-checksum driftOperational model after this lands (with crate's adapter — see crate#TBF)
cratedep + merges schema-drift PR). Pipelines keep running on old shape until both ship — adapter-driven shape stability hides upstream variability from downstream consumers (fresh, reporting)Acceptance criteria
provenance:block in bothinst/extdata/configs/bcfishpass/config.yamlandinst/extdata/configs/default/config.yamlincludesshape_checksumfor every smnorris-sourced filedata-raw/sync_bcfishpass_csvs.Rcomputes shape fingerprint at sync time + compares to recordedschema-driftlabel + DOES NOT auto-merge for shape driftlnk_config_verify()reports shape-fingerprint drift alongside byte-checksum driftContext / related
inst/extdata/configs/bcfishpass/overrides/user_habitat_classification.csvsynced 2026-04-27 from upstream sha 40c4a0a — long→wide reshape with type change broke processingcomms/crate/20260427_fresh_bcfishpass_csv_consumers.md