Problem
The habitat classification pipeline has dozens of knobs — rules YAML, parameters_fresh.csv, wsg_species_presence.csv, override CSV filenames, break source order, cluster params, connected-waterbody rules per species. These are currently scattered: some in inst/extdata/, some in data-raw/compare_bcfishpass.R, some hardcoded in script logic.
We're about to build pipeline variants (bcfishpass-matching validation config, newgraph defaults, min-spawn, channel-type-first breaking). Each variant needs its own bundle of these knobs. Without a config abstraction, every variant becomes a new script with copy-paste drift.
This is the foundation for a proper _targets.R pipeline — targets can't parallelize across variants cleanly if the config is scattered across the filesystem.
Proposed function
name_or_path: either a bundled config name ("bcfishpass", "default") or an absolute path to a custom config directory
Returns: an lnk_config S3 list with named slots:
list(
name = "bcfishpass",
dir = "<path to config dir>",
rules_yaml = "<path to rules.yaml>", # built artifact, consumed by frs_habitat_classify
dimensions_csv = "<path to dimensions.csv>", # source of rules.yaml, traceability
parameters_fresh = tibble(...), # loaded
wsg_species = tibble(...), # loaded
observation_excl = tibble(...), # loaded
overrides = list( # each CSV loaded as a tibble
modelled_fixes = tibble(...),
pscis_barrier_status = tibble(...),
pscis_xref = tibble(...),
barriers_definite = tibble(...)
),
break_order = c("observations", "gradient_minimal", "habitat_endpoints", "crossings"),
cluster_params = list(three_phase = TRUE, distance_cap = ...),
spawn_connected = list(SK = list(gradient_max = 0.05, ...))
)
Directory layout (convention)
inst/extdata/configs/<name>/
├── config.yaml # top-level manifest, points to all below
├── rules.yaml # built from dimensions.csv
├── dimensions.csv # source of rules.yaml
├── parameters_fresh.csv
├── wsg_species_presence.csv
├── observation_exclusions.csv
├── overrides/
│ ├── user_modelled_crossing_fixes.csv
│ ├── user_pscis_barrier_status.csv
│ ├── pscis_modelledcrossings_streams_xref.csv
│ └── user_barriers_definite.csv
└── README.md # describes the variant, what it's for
config.yaml is the manifest — every other file is relative to the config dir. Custom configs portable: drop a directory anywhere, pass the absolute path.
Abstraction notes
Alternatives considered:
- Hardcode config in
data-raw/compare_bcfishpass.R — current state, doesn't scale to variants
- Single monolithic YAML — one file, no external CSVs. Rejected: overrides are naturally tabular and large (21k rows of modelled crossing fixes). YAML is wrong format
- Separate loaders per config piece (
lnk_config_rules(), lnk_config_overrides(), ...) — rejected: users shouldn't have to wire five loaders together
- Include schema naming (
working_<aoi>) in the config object — rejected: schema names are pipeline concerns, not config concerns. Pipeline decides naming; config stays reusable across AOIs
Key design decisions:
- Return is a list, not an environment or R6 — simple, inspectable in RStudio
- Bundles live in
inst/extdata/configs/ — available via system.file() after install
- CSVs are loaded eagerly into tibbles (not paths) — pipeline steps don't need to know file layout
- Custom configs: pass an absolute path, same return shape
Execution checklist
Tests required
lnk_config("bcfishpass") returns expected list shape with all documented slots
lnk_config("default") returns expected list shape
- Missing manifest → clear error pointing at the missing file
- Missing referenced file → clear error identifying which config slot is broken
- Custom path (absolute) works end-to-end
- All tibbles have expected columns (parameters_fresh, wsg_species, each override CSV)
- Invalid
config.yaml (malformed, missing required keys) → fails validation with useful message
Example must show
- Why — one object representing a complete pipeline configuration, swappable for variants
- How —
cfg <- lnk_config("bcfishpass"), inspect cfg$rules_yaml and cfg$overrides$modelled_fixes
- Wires into — show the
rules_yaml passed to frs_habitat_classify(), and overrides passed to lnk_load()
Not in scope
Cross-refs
Versions
- fresh: 0.13.8
- link: main (0.1.0, target 0.2.0)
- bcfishpass: ea3c5d8
- fwapg: Docker (FWA 20240830)
Problem
The habitat classification pipeline has dozens of knobs — rules YAML,
parameters_fresh.csv,wsg_species_presence.csv, override CSV filenames, break source order, cluster params, connected-waterbody rules per species. These are currently scattered: some ininst/extdata/, some indata-raw/compare_bcfishpass.R, some hardcoded in script logic.We're about to build pipeline variants (
bcfishpass-matching validation config, newgraph defaults, min-spawn, channel-type-first breaking). Each variant needs its own bundle of these knobs. Without a config abstraction, every variant becomes a new script with copy-paste drift.This is the foundation for a proper
_targets.Rpipeline — targets can't parallelize across variants cleanly if the config is scattered across the filesystem.Proposed function
lnk_config(name_or_path)name_or_path: either a bundled config name ("bcfishpass","default") or an absolute path to a custom config directoryReturns: an
lnk_configS3 list with named slots:Directory layout (convention)
config.yamlis the manifest — every other file is relative to the config dir. Custom configs portable: drop a directory anywhere, pass the absolute path.Abstraction notes
Alternatives considered:
data-raw/compare_bcfishpass.R— current state, doesn't scale to variantslnk_config_rules(),lnk_config_overrides(), ...) — rejected: users shouldn't have to wire five loaders togetherworking_<aoi>) in the config object — rejected: schema names are pipeline concerns, not config concerns. Pipeline decides naming; config stays reusable across AOIsKey design decisions:
inst/extdata/configs/— available viasystem.file()after installExecution checklist
/planning-init→ PWF inplanning/active/lnk_config/(this spans multiple commits)inst/extdata/configs/bcfishpass/— keep exact content, just relocateconfig.yamlmanifest schemalnk_config()— file loader + schema validation + return shapeinst/extdata/configs/default/as the newgraph-defaults bundle (initially a clone of bcfishpass; real departures tracked in Habitat eligibility override CSV with edge_type and feature_code defaults #19, Literature and observation evidence for habitat eligibility departures #20, Growing season degree days and thermal energy as intrinsic potential variables #21)data-raw/compare_bcfishpass.Rto calllnk_config("bcfishpass")— verify identical results to current script/code-checkbefore commitFixes #NNEWS.mdentryTests required
lnk_config("bcfishpass")returns expected list shape with all documented slotslnk_config("default")returns expected list shapeconfig.yaml(malformed, missing required keys) → fails validation with useful messageExample must show
cfg <- lnk_config("bcfishpass"), inspectcfg$rules_yamlandcfg$overrides$modelled_fixesrules_yamlpassed tofrs_habitat_classify(), and overrides passed tolnk_load()Not in scope
_targets.Rissue, filed separately)inst/extdata/configs/default/with the full set of newgraph departures from bcfishpass (intermittent streams, spawn gradient min, expanded lake rearing — tracked in Habitat eligibility override CSV with edge_type and feature_code defaults #19, Literature and observation evidence for habitat eligibility departures #20, Growing season degree days and thermal energy as intrinsic potential variables #21)Cross-refs
_targets.Rrefactor (filing separately)Versions