lnk_config: config bundle loader for pipeline variants#39
Merged
NewGraphEnvironment merged 6 commits intomainfrom Apr 22, 2026
Merged
lnk_config: config bundle loader for pipeline variants#39NewGraphEnvironment merged 6 commits intomainfrom
NewGraphEnvironment merged 6 commits intomainfrom
Conversation
Mermaid renders natively on GitHub and in VS Code markdown preview. Color-coding via classDef distinguishes fresh functions, lnk_ functions, and composite operations. Short glossary below the DAG defines the four composite steps that are not single function calls (non-minimal removal, load base segments, sequential breaking, build breaks table). Drops the redundant numbered Pipeline prose — the DAG carries the flow, glossary covers the non-obvious specifics.
Establishes the config-bundle directory convention for variants: - rules.yaml, dimensions.csv, parameters_fresh.csv at the bundle root - overrides/ subdir for all CSVs synced from bcfishpass/data - config.yaml manifest declaring file paths + pipeline parameters - README.md documenting what the variant is and how to regenerate rules All R/ docstrings, data-raw/ scripts, and CLAUDE.md updated to the new paths. `rules.yaml` still points at the same content as before — just moved. `lnk_config()` loader follows in Phase 2. Relates to #37
Pulls the revised README from soul — arrow-subject table, binary open/closed status, two-sided discovery with peer list, topic uniqueness rule, worked example, cross-thread linking. Relates to NewGraphEnvironment/soul#35
- R/lnk_config.R — load a config bundle (rules YAML, dimensions CSV, parameters_fresh, overrides, observation_exclusions, habitat_classification, wsg_species, pipeline knobs) from inst/extdata/configs/<name>/ or any directory containing config.yaml - Name-vs-path resolution: bare names resolve to bundled configs; inputs containing `/` or `\\` are treated as filesystem paths. Prevents a local `bcfishpass/` dir in the CWD from silently shadowing the bundled variant - Manifest validation: required top-level keys, required files entries, referenced files exist, overrides exist - S3 print method for compact inspection - `%||%` moved into R/utils.R (was previously assumed to be base R >= 4.4; link claims >= 4.1) - `yaml` added to DESCRIPTION Imports - 11 unit tests covering: input validation, bundle-vs-path resolution, missing manifest, missing keys, missing files, missing overrides, name-shadowing regression, print method, custom path, full bcfishpass bundle load - pkgdown reference updated, NEWS entry, version bump to 0.2.0 Relates to #37
Replaces hardcoded `system.file` lookups with the config loader. One source of truth for what a config bundle contains; callers get validated paths + loaded parameters in one call. Also updates PWF to reflect fix applied after code-check round 1 — the name-shadowing foot-gun in `.lnk_config_resolve_dir` now has a regression test. Relates to #37
NewGraphEnvironment
added a commit
that referenced
this pull request
Apr 23, 2026
* Archive lnk_config PWF, initialize _targets.R PWF - planning/archive/2026-04-22-lnk-config/ — shipped in link 0.2.0 via PR #39 - planning/active/ — new PWF for _targets.R pipeline (#38): targets + crew orchestration, single-host first, distributed swap deferred until rtj Phase 4 Relates to #38 * lnk_habitat_setup_schema: create per-WSG working schema First of six pipeline phase helpers. Creates the namespaced working schema for a run (e.g. `working_bulk` so parallel WSG runs on the same host do not collide) and ensures `fresh` exists for downstream output tables. Mocked tests cover identifier validation and SQL shape — CREATE SCHEMA / DROP SCHEMA behavior is Postgres's responsibility, not ours to re-verify. Relates to #38 * Rename lnk_habitat_* helpers to lnk_pipeline_*, abstract wsg to aoi Naming rethink before building the remaining five helpers — better to lock the API once than rename six functions later. - Prefix: `lnk_pipeline_*` replaces `lnk_habitat_*`. Only one of the six phases (classify) is strictly habitat work; the others are setup, loading, network prep, segmenting, connectivity. `pipeline` describes what they all are — building blocks composed by `_targets.R` or manual scripts. - Phase names read as verbs: setup → load → prepare → break → classify → connect. Reads like a recipe. - Param: `aoi` replaces `wsg` in the canonical signature `(conn, aoi, cfg, schema)`. `wsg` hardcoded one partition scheme (bcfishpass); `aoi` matches fresh convention (accepts a WSG code today, extends to ltree filters / sf polygons / mapsheets later). - `setup` stays a signature outlier: `(conn, schema, overwrite)`. No aoi/cfg needed — it just makes schemas. Renames: `lnk_habitat_setup_schema()` → `lnk_pipeline_setup()`. Updates tests, Rd, NAMESPACE, PWF, and issue #38 body to match. Relates to #38 * lnk_pipeline_load: crossings and their overrides Second of six pipeline phase helpers. Loads the crossings CSV from fresh, filters to AOI, appends misc crossings, applies modelled fixes (NONE/OBS → PASSABLE) and PSCIS barrier status overrides. Scope is tighter than the original "load_inputs" plan: anything other than anthropogenic crossings (falls, user-identified definite barriers, observation exclusions, habitat classification) moves to `prepare` where it is actually consumed. That keeps each phase's concern clean — load handles crossings; prepare handles network + barriers. Split into three internal `@noRd` helpers (`.lnk_pipeline_load_crossings`, `.lnk_pipeline_apply_fixes`, `.lnk_pipeline_apply_pscis`) so each unit is short and independently testable via `local_mocked_bindings()`. 12 tests covering input validation, update SQL shape, and no-op branches for empty/missing override entries. Full link suite at 169 passing. Relates to #38 * lnk_pipeline_prepare: network + barriers prep in one phase Third of six pipeline phase helpers. Thin orchestrator over six `@noRd` sub-helpers: - `prep_load_aux` — falls (from fresh), user definite barriers, barriers-definite control, expert habitat confirmations from the config bundle - `prep_gradient` — detect gradient barriers on raw FWA via `fresh::frs_break_find()`, prune rows where the control table says `barrier_ind = false`, enrich with `wscode_ltree` / `localcode_ltree` for `fwa_upstream()` joins - `prep_natural` — build natural_barriers = gradient ∪ falls ∪ definite - `prep_overrides` — compute barrier skip list via `lnk_barrier_overrides()` against observations + habitat confirms - `prep_minimal` — per-model (bt, salmon, st, wct) barrier tables, each reduced via `fresh::frs_barriers_minimal()` from fresh 0.14.0, unioned into `gradient_barriers_minimal` - `prep_network` — load fresh.streams from FWA with channel_width + stream_order_parent joins + GENERATED gradient/measures/length + a unique `id_segment` Adds `.lnk_quote_literal()` to utils.R — doubles single-quotes for safe SQL literal interpolation (used by the AOI and schema checks). Code-check surfaced one design constraint worth flagging now: `fresh.streams` is a shared schema, so parallel AOI runs on one host would collide. Documented in findings.md with mitigation options for PR 2 (leaning toward `workers = 1` + revisit fresh patches later). 31 new tests — input validation, SQL shape assertions, quote_literal semantics, per-model minimal reduction structure. Full link suite at 200 passing. Relates to #38 * lnk_pipeline_break: segment the network at configured break positions Fourth of six pipeline phase helpers. Splits into four internal `@noRd` sub-helpers: - `break_obs` — build `observations_breaks` from the bcfishobs observations table, filtered by AOI and by the species set from `cfg$wsg_species` (with CT → CT/CCT/ACT/CT/RB expansion to match bcfishobs coding). Data-error and release-exclusion rows from `cfg$observation_exclusions` are filtered out via a temp `<schema>.obs_exclusions` subselect. - `break_habitat_endpoints` — union DRM and URM from `user_habitat_classification`. Creates an empty table when the habitat table wasn't loaded (no config confirms) so the break step is a clean no-op. - `break_crossings` — crossing positions for segmentation. - `break_reassign_id` — reassigns unique `id_segment` after each round via `row_number()` so downstream rounds see contiguous IDs. Main function runs `frs_break_apply` sequentially over `cfg$pipeline$break_order` (default: observations → gradient_minimal → barriers_definite → habitat_endpoints → crossings). Unknown source names error clearly. 13 new tests — input validation, obs species expansion, exclusions filter, habitat empty/non-empty paths, break_order honored from config, unknown source errors. Full link suite at 229 passing. Relates to #38 * lnk_pipeline_classify + lnk_pipeline_connect: final two phases Fifth and sixth of six pipeline phase helpers. All six are now in place. `lnk_pipeline_classify` — builds the access-gating `fresh.streams_breaks` table (gradient FULL set + falls + definite barriers + crossings, each WSG-filtered) then calls `fresh::frs_habitat_classify()` with the rules YAML, per-species parameters, and barrier overrides from the config bundle. Species default derives from `cfg$parameters_fresh$species_code` intersected with per-AOI presence in `cfg$wsg_species`. `lnk_pipeline_connect` — runs fresh's `.frs_run_connectivity` (per-species `frs_cluster` + `.frs_connected_waterbody`) driven by `cfg$parameters_fresh` flags. Accesses fresh's internal orchestrator via `getFromNamespace` — fragility flagged in docs, fresh follow-up will export a stable API. 22 new tests covering input validation, species intersection, access-gating breaks SQL shape, no-species error path. Full link suite at 251 passing. Relates to #38 * Wire compare_bcfishpass.R to the six lnk_pipeline_* helpers Rewrites the 635-line comparison script as a thin orchestrator (136 lines) that: - Loads the bcfishpass config bundle via lnk_config() - Calls lnk_pipeline_setup → load → prepare → break → classify → connect in order - Diffs the resulting fresh.streams_habitat against bcfishpass.habitat_linear_* on the read-only tunnel reference ADMS end-to-end run (~67 s): all species within 5% of bcfishpass. Spawning values identical to the research doc. Rearing values within ~1% of the prior values — acceptable ordering variance from id_segment tie-breaking on segments with coincident (blk, drm). Verification log committed under `data-raw/logs/`. Also along the way: - **cfg$species** added to `lnk_config()` — parses the rules YAML at load time and exposes the classified-species list. `lnk_pipeline_classify_species()` now intersects against that 8-element list rather than `parameters_fresh$species_code` (11 rows including CT/DV/RB which bcfishpass does not model). Previous derivation tried to query `bcfishpass.habitat_linear_ct` and crashed. - **barriers_definite** added to `config.yaml` pipeline `break_order` — was missing. No numeric change on ADMS (no definite barriers there) but matches legacy script behavior on other WSGs. Relates to #38 * NEWS + DESCRIPTION bump to 0.3.0; pkgdown pipeline section * Stamp compare_bcfishpass.R runs with env + DB state Adds a header to every compare run capturing: - link version + git SHA - fresh version - wall-clock timestamp - bcfishobs observation count for the AOI - bcfishpass reference streams row count for the AOI Lesson from today's session: 0.4 pp drift in BT rearing looked like a refactor regression. Legacy script on same DB produced identical numbers — drift was from env state changes (fwapg/bcfishobs/tunnel ref) between the research doc's run (2026-04-15) and today. Without a stamp, "what changed between these two runs" is unanswerable. This is a minimal runtime stamp. Full lineage tracking (CSV provenance + drift detection) is filed as #40 and will expand `lnk_stamp()` (#24) into the canonical source. Also commits a stamped verification log under `data-raw/logs/20260422_04_verify_stamped_ADMS.txt` — becomes the reference baseline for future drift checks. Relates to #38, #24, #40
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
lnk_config(name_or_path)loads a pipeline config bundle (rules YAML, dimensions CSV, parameters_fresh, overrides, observation_exclusions, habitat_classification, wsg_species, pipeline knobs) as a singlelnk_configS3 listinst/extdata/configs/<name>/convention with aconfig.yamlmanifest — bundles are portable (any directory containingconfig.yamlworks)inst/extdata/configs/bcfishpass/(rules.yaml, dimensions.csv, parameters_fresh.csv, overrides/)data-raw/compare_bcfishpass.Rnow sources paths vialnk_config("bcfishpass")instead of hardcodingsystem.file()lookupsWhy
The habitat classification pipeline has dozens of knobs scattered across
inst/extdata/and hardcoded incompare_bcfishpass.R. Building pipeline variants (newgraph defaults, min-spawn, channel-type-first breaking) required copy-paste forks of the compare script. One variant per config bundle, one call tolnk_config(), same pipeline code. Unblocks the_targets.Rrefactor (#38).Abstraction decisions (after a round of code-check)
/or\\are treated as paths. Without this, a localbcfishpass/directory in the CWD silently shadows the bundled config. Regression test covers this.rules_yamlanddimensions_csvstay as paths (rules.yaml is consumed as a path byfrs_habitat_classify()). Everything else loads eagerly into tibbles.config.yamlis the single source of truth for what's in a bundle. Required-key + required-file validation gives clear errors.%||%moved toR/utils.R— link claims R >= 4.1; base%||%landed in 4.4, so we define our own (with# nolint: object_name_linter.).configs/default/variant — the "newgraph defaults" bundle belongs in its own PR where the real departures from bcfishpass (intermittent streams, saner spawn gradient min, expanded lake rearing) are wired in. Tracked in Habitat eligibility override CSV with edge_type and feature_code defaults #19, Literature and observation evidence for habitat eligibility departures #20, Growing season degree days and thermal energy as intrinsic potential variables #21.Test plan
data-raw/compare_bcfishpass.Rparses/code-checkround 1 — one real issue (resolver foot-gun), fixed + regression test addedFixes #37
Relates to NewGraphEnvironment/sred-2025-2026#24