You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pipeline outputs drift silently when underlying inputs change — CSV syncs, fwapg refreshes, bcfishobs updates. On 2026-04-22 a 0.4 pp shift in BT rearing diff vs bcfishpass looked like a refactor regression. It turned out the legacy script on the same DB produced identical numbers — the drift was entirely from env state changes between the earlier comparison run (2026-04-15) and today. Without a stamp of all inputs at run time, "what changed?" is unanswerable.
This issue is about closing that loop end-to-end: every config CSV carries provenance; every pipeline run emits a stamp; drift between any two runs is diffable from their stamps alone.
Proposed Solution
Two layers:
1. Config-bundle provenance (at rest)
Extend inst/extdata/configs/<name>/config.yaml with a provenance section per synced file:
lnk_config() reads this and exposes cfg$provenance. A lnk_config_verify() helper re-computes checksums on load and warns if any file drifted from its stored hash.
2. Run stamps (at run)
Every pipeline invocation emits a run-stamp object recording:
cfg$provenance (the "at rest" state of every input CSV)
DB snapshot hashes: bcfishobs row count, fwa_stream_networks_sp last-vacuum or relfilenode, bcfishpass reference row counts per species
AOI + schema + break_order + any species = override
Start/end timestamps, elapsed per phase
The resulting comparison tibble (if a reference was provided)
Implementation options:
Expand lnk_stamp() (lnk_stamp: export model parameters for report appendix #24) from "report appendix markdown" into "runtime reproducibility record." The report-appendix flavour becomes one rendering of the same underlying stamp object (as.markdown(stamp)).
A run stamp is emitted as the return value of a forthcoming lnk_pipeline_run() wrapper (not built yet — right now pipelines are composed explicitly via lnk_pipeline_* phase calls).
Scope for a first PR
Add provenance block to inst/extdata/configs/bcfishpass/config.yaml for every file currently tracked. Backfill with the smnorris SHA we know from the research doc (ea3c5d8, synced 2026-04-13).
Add cfg$provenance to the lnk_config() return.
Add lnk_config_verify(cfg) — recomputes sha256 of every provenanced file, reports drift.
Problem
Pipeline outputs drift silently when underlying inputs change — CSV syncs, fwapg refreshes, bcfishobs updates. On 2026-04-22 a 0.4 pp shift in BT rearing diff vs bcfishpass looked like a refactor regression. It turned out the legacy script on the same DB produced identical numbers — the drift was entirely from env state changes between the earlier comparison run (2026-04-15) and today. Without a stamp of all inputs at run time, "what changed?" is unanswerable.
This issue is about closing that loop end-to-end: every config CSV carries provenance; every pipeline run emits a stamp; drift between any two runs is diffable from their stamps alone.
Proposed Solution
Two layers:
1. Config-bundle provenance (at rest)
Extend
inst/extdata/configs/<name>/config.yamlwith aprovenancesection per synced file:lnk_config()reads this and exposescfg$provenance. Alnk_config_verify()helper re-computes checksums on load and warns if any file drifted from its stored hash.2. Run stamps (at run)
Every pipeline invocation emits a run-stamp object recording:
cfg$provenance(the "at rest" state of every input CSV)species =overrideImplementation options:
lnk_stamp()(lnk_stamp: export model parameters for report appendix #24) from "report appendix markdown" into "runtime reproducibility record." The report-appendix flavour becomes one rendering of the same underlying stamp object (as.markdown(stamp)).lnk_pipeline_run()wrapper (not built yet — right now pipelines are composed explicitly vialnk_pipeline_*phase calls).Scope for a first PR
provenanceblock toinst/extdata/configs/bcfishpass/config.yamlfor every file currently tracked. Backfill with the smnorris SHA we know from the research doc (ea3c5d8, synced 2026-04-13).cfg$provenanceto thelnk_config()return.lnk_config_verify(cfg)— recomputes sha256 of every provenanced file, reports drift.lnk_stamp()(reusing lnk_stamp: export model parameters for report appendix #24's scope) to produce a runtime-stamp list that mergescfg$provenancewith runtime software + DB snapshot info.data-raw/compare_bcfishpass.Routput — every verification log starts with a stamp dump.Non-goals
Cross-refs
_targets.Rtarget function should return(diff_tibble, stamp)sotar_read(rollup)carries full lineage.Versions
38-targets-pipeline)