Skip to content

lnk_config: config bundle loader for pipeline variants#39

Merged
NewGraphEnvironment merged 6 commits intomainfrom
37-lnk-config
Apr 22, 2026
Merged

lnk_config: config bundle loader for pipeline variants#39
NewGraphEnvironment merged 6 commits intomainfrom
37-lnk-config

Conversation

@NewGraphEnvironment
Copy link
Copy Markdown
Owner

Summary

  • New lnk_config(name_or_path) loads a pipeline config bundle (rules YAML, dimensions CSV, parameters_fresh, overrides, observation_exclusions, habitat_classification, wsg_species, pipeline knobs) as a single lnk_config S3 list
  • Establishes inst/extdata/configs/<name>/ convention with a config.yaml manifest — bundles are portable (any directory containing config.yaml works)
  • Relocates the bcfishpass variant into inst/extdata/configs/bcfishpass/ (rules.yaml, dimensions.csv, parameters_fresh.csv, overrides/)
  • data-raw/compare_bcfishpass.R now sources paths via lnk_config("bcfishpass") instead of hardcoding system.file() lookups

Why

The habitat classification pipeline has dozens of knobs scattered across inst/extdata/ and hardcoded in compare_bcfishpass.R. Building pipeline variants (newgraph defaults, min-spawn, channel-type-first breaking) required copy-paste forks of the compare script. One variant per config bundle, one call to lnk_config(), same pipeline code. Unblocks the _targets.R refactor (#38).

Abstraction decisions (after a round of code-check)

Test plan

  • 11 unit tests: input validation, bundle-vs-path resolution, missing manifest, missing required keys, missing required files entries, missing overrides, name-shadowing regression, print method, custom-path load, full bcfishpass bundle load
  • Full link test suite: 149 / 149 passing
  • data-raw/compare_bcfishpass.R parses
  • lintr clean on changed files (pre-existing warnings in untouched files remain)
  • /code-check round 1 — one real issue (resolver foot-gun), fixed + regression test added
  • Full BULK run to confirm byte-identical output — deferred; change is path-source-only, no structural edits

Fixes #37

Relates to NewGraphEnvironment/sred-2025-2026#24

Mermaid renders natively on GitHub and in VS Code markdown preview. Color-coding via classDef distinguishes fresh functions, lnk_ functions, and composite operations. Short glossary below the DAG defines the four composite steps that are not single function calls (non-minimal removal, load base segments, sequential breaking, build breaks table).

Drops the redundant numbered Pipeline prose — the DAG carries the flow, glossary covers the non-obvious specifics.
- planning/archive/2026-04-22-bcfishpass-comparison/ — prior work closed (all 4 WSGs within 5%, shipped fresh 0.13.5–0.13.8)
- planning/active/ — new PWF for lnk_config (#37): config bundle loader with inst/extdata/configs/<name>/ layout

Relates to #37
Establishes the config-bundle directory convention for variants:
- rules.yaml, dimensions.csv, parameters_fresh.csv at the bundle root
- overrides/ subdir for all CSVs synced from bcfishpass/data
- config.yaml manifest declaring file paths + pipeline parameters
- README.md documenting what the variant is and how to regenerate rules

All R/ docstrings, data-raw/ scripts, and CLAUDE.md updated to the new paths. `rules.yaml` still points at the same content as before — just moved. `lnk_config()` loader follows in Phase 2.

Relates to #37
Pulls the revised README from soul — arrow-subject table, binary open/closed status, two-sided discovery with peer list, topic uniqueness rule, worked example, cross-thread linking.

Relates to NewGraphEnvironment/soul#35
- R/lnk_config.R — load a config bundle (rules YAML, dimensions CSV, parameters_fresh, overrides, observation_exclusions, habitat_classification, wsg_species, pipeline knobs) from inst/extdata/configs/<name>/ or any directory containing config.yaml
- Name-vs-path resolution: bare names resolve to bundled configs; inputs containing `/` or `\\` are treated as filesystem paths. Prevents a local `bcfishpass/` dir in the CWD from silently shadowing the bundled variant
- Manifest validation: required top-level keys, required files entries, referenced files exist, overrides exist
- S3 print method for compact inspection
- `%||%` moved into R/utils.R (was previously assumed to be base R >= 4.4; link claims >= 4.1)
- `yaml` added to DESCRIPTION Imports
- 11 unit tests covering: input validation, bundle-vs-path resolution, missing manifest, missing keys, missing files, missing overrides, name-shadowing regression, print method, custom path, full bcfishpass bundle load
- pkgdown reference updated, NEWS entry, version bump to 0.2.0

Relates to #37
Replaces hardcoded `system.file` lookups with the config loader. One source of truth for what a config bundle contains; callers get validated paths + loaded parameters in one call.

Also updates PWF to reflect fix applied after code-check round 1 — the name-shadowing foot-gun in `.lnk_config_resolve_dir` now has a regression test.

Relates to #37
@NewGraphEnvironment NewGraphEnvironment merged commit 59e363a into main Apr 22, 2026
1 check passed
@NewGraphEnvironment NewGraphEnvironment deleted the 37-lnk-config branch April 22, 2026 16:43
NewGraphEnvironment added a commit that referenced this pull request Apr 23, 2026
* Archive lnk_config PWF, initialize _targets.R PWF

- planning/archive/2026-04-22-lnk-config/ — shipped in link 0.2.0 via PR #39
- planning/active/ — new PWF for _targets.R pipeline (#38): targets + crew orchestration, single-host first, distributed swap deferred until rtj Phase 4

Relates to #38

* lnk_habitat_setup_schema: create per-WSG working schema

First of six pipeline phase helpers. Creates the namespaced working schema for a run (e.g. `working_bulk` so parallel WSG runs on the same host do not collide) and ensures `fresh` exists for downstream output tables.

Mocked tests cover identifier validation and SQL shape — CREATE SCHEMA / DROP SCHEMA behavior is Postgres's responsibility, not ours to re-verify.

Relates to #38

* Rename lnk_habitat_* helpers to lnk_pipeline_*, abstract wsg to aoi

Naming rethink before building the remaining five helpers — better to lock the API once than rename six functions later.

- Prefix: `lnk_pipeline_*` replaces `lnk_habitat_*`. Only one of the six phases (classify) is strictly habitat work; the others are setup, loading, network prep, segmenting, connectivity. `pipeline` describes what they all are — building blocks composed by `_targets.R` or manual scripts.
- Phase names read as verbs: setup → load → prepare → break → classify → connect. Reads like a recipe.
- Param: `aoi` replaces `wsg` in the canonical signature `(conn, aoi, cfg, schema)`. `wsg` hardcoded one partition scheme (bcfishpass); `aoi` matches fresh convention (accepts a WSG code today, extends to ltree filters / sf polygons / mapsheets later).
- `setup` stays a signature outlier: `(conn, schema, overwrite)`. No aoi/cfg needed — it just makes schemas.

Renames: `lnk_habitat_setup_schema()` → `lnk_pipeline_setup()`. Updates tests, Rd, NAMESPACE, PWF, and issue #38 body to match.

Relates to #38

* lnk_pipeline_load: crossings and their overrides

Second of six pipeline phase helpers. Loads the crossings CSV from fresh, filters to AOI, appends misc crossings, applies modelled fixes (NONE/OBS → PASSABLE) and PSCIS barrier status overrides.

Scope is tighter than the original "load_inputs" plan: anything other than anthropogenic crossings (falls, user-identified definite barriers, observation exclusions, habitat classification) moves to `prepare` where it is actually consumed. That keeps each phase's concern clean — load handles crossings; prepare handles network + barriers.

Split into three internal `@noRd` helpers (`.lnk_pipeline_load_crossings`, `.lnk_pipeline_apply_fixes`, `.lnk_pipeline_apply_pscis`) so each unit is short and independently testable via `local_mocked_bindings()`.

12 tests covering input validation, update SQL shape, and no-op branches for empty/missing override entries. Full link suite at 169 passing.

Relates to #38

* lnk_pipeline_prepare: network + barriers prep in one phase

Third of six pipeline phase helpers. Thin orchestrator over six `@noRd` sub-helpers:

- `prep_load_aux` — falls (from fresh), user definite barriers, barriers-definite control, expert habitat confirmations from the config bundle
- `prep_gradient` — detect gradient barriers on raw FWA via `fresh::frs_break_find()`, prune rows where the control table says `barrier_ind = false`, enrich with `wscode_ltree` / `localcode_ltree` for `fwa_upstream()` joins
- `prep_natural` — build natural_barriers = gradient ∪ falls ∪ definite
- `prep_overrides` — compute barrier skip list via `lnk_barrier_overrides()` against observations + habitat confirms
- `prep_minimal` — per-model (bt, salmon, st, wct) barrier tables, each reduced via `fresh::frs_barriers_minimal()` from fresh 0.14.0, unioned into `gradient_barriers_minimal`
- `prep_network` — load fresh.streams from FWA with channel_width + stream_order_parent joins + GENERATED gradient/measures/length + a unique `id_segment`

Adds `.lnk_quote_literal()` to utils.R — doubles single-quotes for safe SQL literal interpolation (used by the AOI and schema checks).

Code-check surfaced one design constraint worth flagging now: `fresh.streams` is a shared schema, so parallel AOI runs on one host would collide. Documented in findings.md with mitigation options for PR 2 (leaning toward `workers = 1` + revisit fresh patches later).

31 new tests — input validation, SQL shape assertions, quote_literal semantics, per-model minimal reduction structure. Full link suite at 200 passing.

Relates to #38

* lnk_pipeline_break: segment the network at configured break positions

Fourth of six pipeline phase helpers. Splits into four internal `@noRd` sub-helpers:

- `break_obs` — build `observations_breaks` from the bcfishobs observations table, filtered by AOI and by the species set from `cfg$wsg_species` (with CT → CT/CCT/ACT/CT/RB expansion to match bcfishobs coding). Data-error and release-exclusion rows from `cfg$observation_exclusions` are filtered out via a temp `<schema>.obs_exclusions` subselect.
- `break_habitat_endpoints` — union DRM and URM from `user_habitat_classification`. Creates an empty table when the habitat table wasn't loaded (no config confirms) so the break step is a clean no-op.
- `break_crossings` — crossing positions for segmentation.
- `break_reassign_id` — reassigns unique `id_segment` after each round via `row_number()` so downstream rounds see contiguous IDs.

Main function runs `frs_break_apply` sequentially over `cfg$pipeline$break_order` (default: observations → gradient_minimal → barriers_definite → habitat_endpoints → crossings). Unknown source names error clearly.

13 new tests — input validation, obs species expansion, exclusions filter, habitat empty/non-empty paths, break_order honored from config, unknown source errors. Full link suite at 229 passing.

Relates to #38

* lnk_pipeline_classify + lnk_pipeline_connect: final two phases

Fifth and sixth of six pipeline phase helpers. All six are now in place.

`lnk_pipeline_classify` — builds the access-gating `fresh.streams_breaks` table (gradient FULL set + falls + definite barriers + crossings, each WSG-filtered) then calls `fresh::frs_habitat_classify()` with the rules YAML, per-species parameters, and barrier overrides from the config bundle. Species default derives from `cfg$parameters_fresh$species_code` intersected with per-AOI presence in `cfg$wsg_species`.

`lnk_pipeline_connect` — runs fresh's `.frs_run_connectivity` (per-species `frs_cluster` + `.frs_connected_waterbody`) driven by `cfg$parameters_fresh` flags. Accesses fresh's internal orchestrator via `getFromNamespace` — fragility flagged in docs, fresh follow-up will export a stable API.

22 new tests covering input validation, species intersection, access-gating breaks SQL shape, no-species error path. Full link suite at 251 passing.

Relates to #38

* Wire compare_bcfishpass.R to the six lnk_pipeline_* helpers

Rewrites the 635-line comparison script as a thin orchestrator (136 lines) that:
- Loads the bcfishpass config bundle via lnk_config()
- Calls lnk_pipeline_setup → load → prepare → break → classify → connect in order
- Diffs the resulting fresh.streams_habitat against bcfishpass.habitat_linear_* on the read-only tunnel reference

ADMS end-to-end run (~67 s): all species within 5% of bcfishpass. Spawning values identical to the research doc. Rearing values within ~1% of the prior values — acceptable ordering variance from id_segment tie-breaking on segments with coincident (blk, drm). Verification log committed under `data-raw/logs/`.

Also along the way:

- **cfg$species** added to `lnk_config()` — parses the rules YAML at load time and exposes the classified-species list. `lnk_pipeline_classify_species()` now intersects against that 8-element list rather than `parameters_fresh$species_code` (11 rows including CT/DV/RB which bcfishpass does not model). Previous derivation tried to query `bcfishpass.habitat_linear_ct` and crashed.
- **barriers_definite** added to `config.yaml` pipeline `break_order` — was missing. No numeric change on ADMS (no definite barriers there) but matches legacy script behavior on other WSGs.

Relates to #38

* NEWS + DESCRIPTION bump to 0.3.0; pkgdown pipeline section

* Stamp compare_bcfishpass.R runs with env + DB state

Adds a header to every compare run capturing:
- link version + git SHA
- fresh version
- wall-clock timestamp
- bcfishobs observation count for the AOI
- bcfishpass reference streams row count for the AOI

Lesson from today's session: 0.4 pp drift in BT rearing looked like a refactor regression. Legacy script on same DB produced identical numbers — drift was from env state changes (fwapg/bcfishobs/tunnel ref) between the research doc's run (2026-04-15) and today. Without a stamp, "what changed between these two runs" is unanswerable.

This is a minimal runtime stamp. Full lineage tracking (CSV provenance + drift detection) is filed as #40 and will expand `lnk_stamp()` (#24) into the canonical source.

Also commits a stamped verification log under `data-raw/logs/20260422_04_verify_stamped_ADMS.txt` — becomes the reference baseline for future drift checks.

Relates to #38, #24, #40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

lnk_config: config bundle loader for pipeline variants

1 participant