Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: link
Title: Crossing Connectivity Interpretation
Version: 0.10.0
Version: 0.11.0
Authors@R:
person("Allan", "Irvine", , "airvine@newgraphenvironment.com",
role = c("aut", "cre"),
Expand Down Expand Up @@ -28,6 +28,7 @@ Remotes:
Suggests:
bcdata,
bookdown,
digest,
dplyr,
fresh (>= 0.21.0),
knitr,
Expand Down
5 changes: 5 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
# Generated by roxygen2: do not edit by hand

S3method(format,lnk_stamp)
S3method(print,lnk_config)
S3method(print,lnk_stamp)
export(lnk_aggregate)
export(lnk_barrier_overrides)
export(lnk_config)
export(lnk_config_verify)
export(lnk_db_conn)
export(lnk_load)
export(lnk_match)
Expand All @@ -18,6 +21,8 @@ export(lnk_pipeline_species)
export(lnk_rules_build)
export(lnk_score)
export(lnk_source)
export(lnk_stamp)
export(lnk_stamp_finish)
export(lnk_thresholds)
import(DBI)
importFrom(RPostgres,Postgres)
Expand Down
11 changes: 11 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,14 @@
# link 0.11.0

Config-bundle provenance + run stamps — closes the drift attribution loop. Pipeline outputs that shift between runs on the same DB state can now be traced back to which input changed. Closes [#40](https://github.com/NewGraphEnvironment/link/issues/40); supersedes the narrower scope of [#24](https://github.com/NewGraphEnvironment/link/issues/24).

- `inst/extdata/configs/{bcfishpass,default}/config.yaml` carry `provenance:` blocks with sha256 checksums for every tracked file. Externally sourced files (bcfishpass overrides) record `source` URL + `upstream_sha` (`ea3c5d8`, synced 2026-04-13) + `path` within source repo. Generated files (`rules.yaml`) record `generated_from` + `generated_by` + `generator_sha`. Hand-authored files record link's git sha at edit time.
- `lnk_config()` exposes parsed provenance as `cfg$provenance` (named list, one entry per tracked file). `print(cfg)` shows the count of tracked files.
- New `lnk_config_verify(cfg, strict)` recomputes sha256 for every provenanced file and returns a tibble `(file, expected, observed, drift, missing)`. Default warns on drift; `strict = TRUE` errors. `digest` added to Suggests.
- New `lnk_stamp(cfg, conn, aoi, db_snapshot)` returns an `lnk_stamp` S3 list capturing the full set of inputs at run time: cfg provenance with current observed checksums, software versions and git SHAs (link, fresh, R), DB snapshot row counts (`bcfishobs.observations`, `whse_basemapping.fwa_stream_networks_sp`) when conn is provided, AOI + start_time. `lnk_stamp_finish(stamp, result, end_time)` finalizes; `format(stamp, "markdown")` renders for report appendix or run-log dump.
- `data-raw/compare_bcfishpass_wsg.R` now emits a stamp markdown at the head of every WSG run, captured into `data-raw/logs/*.txt` via the standard stderr redirect.
- Tests: 93 new — provenance parsing, drift detection (clean / mutated / missing / strict), bundled-config drift = 0 invariants, stamp shape + markdown rendering + finalization + db-snapshot opt-out.

# link 0.10.0

Default config bundle now uses explicit FWA `edge_type` codes for spawn and rear-stream predicates, matching bcfishpass's 20-year-validated convention.
Expand Down
14 changes: 13 additions & 1 deletion R/lnk_config.R
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,14 @@
#' listed in the manifest
#' - `pipeline` — named list of pipeline knobs from the manifest
#' (`break_order`, `cluster`, `spawn_connected`)
#' - `provenance` — named list of per-file provenance metadata parsed
#' from the manifest's `provenance:` block (or `NULL` when the
#' bundle does not declare it). Each entry is keyed by the file's
#' path relative to `dir` and carries metadata fields such as
#' `source`, `upstream_sha`, `synced`, `checksum`, plus
#' generator-specific keys (`generated_from`, `generated_by`,
#' `generator_sha`) for files produced by tooling. Drift detection
#' against the recorded checksums is in [lnk_config_verify()].
#'
#' @export
#'
Expand Down Expand Up @@ -142,7 +150,8 @@ lnk_config <- function(name_or_path) {
observation_exclusions = read_csv_optional("observation_exclusions"),
wsg_species = read_csv_optional("wsg_species"),
overrides = overrides,
pipeline = manifest$pipeline %||% list()
pipeline = manifest$pipeline %||% list(),
provenance = manifest$provenance
)
class(out) <- c("lnk_config", "list")
out
Expand All @@ -163,6 +172,9 @@ print.lnk_config <- function(x, ...) {
cat(" pipeline: ", paste(names(x$pipeline), collapse = ", "),
"\n", sep = "")
}
if (!is.null(x$provenance)) {
cat(" provenance:", length(x$provenance), "files tracked\n", sep = " ")
}
invisible(x)
}

Expand Down
115 changes: 115 additions & 0 deletions R/lnk_config_verify.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
#' Verify Config Bundle File Checksums
#'
#' Recomputes sha256 for every file declared in the bundle's
#' `provenance:` block and compares against the recorded checksum.
#' Returns a tibble of expected vs observed; flags drift.
#'
#' Use this at run time to detect silent drift — a file that was edited
#' without re-recording its checksum, or an external CSV that was
#' re-synced under the same path. Drift between two pipeline runs on
#' the same DB state with the same package versions almost always
#' traces back to a config-file edit; `lnk_config_verify()` is the
#' fastest way to localize the change.
#'
#' @param cfg An `lnk_config` object from [lnk_config()].
#' @param strict Logical. When `TRUE`, errors if any file has drifted.
#' Default `FALSE` warns and returns the tibble for inspection.
#'
#' @return A tibble with columns:
#'
#' - `file` — path relative to `cfg$dir`
#' - `expected` — checksum recorded in the manifest (sha256 hex)
#' - `observed` — checksum recomputed from the current file (sha256
#' hex)
#' - `drift` — logical, `TRUE` when expected != observed
#' - `missing` — logical, `TRUE` when the file no longer exists on
#' disk (observed is `NA` in this case)
#'
#' The tibble carries one row per provenanced file. When the bundle
#' has no `provenance:` block (`cfg$provenance` is `NULL`) returns
#' an empty tibble with the same columns.
#'
#' @family config
#'
#' @export
#'
#' @examples
#' cfg <- lnk_config("bcfishpass")
#' verify <- lnk_config_verify(cfg)
#' verify
#'
#' \dontrun{
#' # In a verification log: error if anything drifted
#' lnk_config_verify(cfg, strict = TRUE)
#' }
lnk_config_verify <- function(cfg, strict = FALSE) {
if (!inherits(cfg, "lnk_config")) {
stop("cfg must be an lnk_config object (from lnk_config())",
call. = FALSE)
}
if (!is.logical(strict) || length(strict) != 1L || is.na(strict)) {
stop("strict must be a single TRUE or FALSE", call. = FALSE)
}

prov <- cfg$provenance
if (is.null(prov) || length(prov) == 0L) {
return(.lnk_verify_empty())
}

if (!requireNamespace("digest", quietly = TRUE)) {
stop("Package 'digest' is required for lnk_config_verify(). ",
"Install with: install.packages('digest')",
call. = FALSE)
}

rows <- lapply(names(prov), function(rel) {
expected <- prov[[rel]][["checksum"]] %||% NA_character_
abs_path <- file.path(cfg$dir, rel)
if (!file.exists(abs_path)) {
return(data.frame(
file = rel,
expected = expected,
observed = NA_character_,
drift = TRUE,
missing = TRUE,
stringsAsFactors = FALSE
))
}
observed <- paste0("sha256:",
digest::digest(file = abs_path, algo = "sha256"))
data.frame(
file = rel,
expected = expected,
observed = observed,
drift = !identical(expected, observed),
missing = FALSE,
stringsAsFactors = FALSE
)
})
out <- do.call(rbind, rows)

if (any(out$drift)) {
drifted <- out[out$drift, "file", drop = TRUE]
msg <- paste0(
"Config bundle '", cfg$name, "' has ", length(drifted),
" file(s) drifted from recorded checksum:\n - ",
paste(drifted, collapse = "\n - "))
if (strict) {
stop(msg, call. = FALSE)
}
warning(msg, call. = FALSE)
}

out
}

.lnk_verify_empty <- function() {
data.frame(
file = character(0),
expected = character(0),
observed = character(0),
drift = logical(0),
missing = logical(0),
stringsAsFactors = FALSE
)
}
Loading