You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
R/lnk_pipeline_prepare.R carries two hardcoded lists that encode bcfishpass's gradient-classification scheme directly in pipeline code. Both belong at the config/derivation layer.
Hardcode 1 — gradient class set in .lnk_pipeline_prep_gradient:
These four classes are bcfishpass's 0.05-step bins spanning the union of per-species access_gradient_max values in parameters_fresh.csv (0.15, 0.20, 0.25) plus one bcfishpass-convention extra bin at 0.30. Any variant using different per-species access tolerances or a different binning scheme is blocked by this hardcode.
Hardcode 2 — per-species class filters in .lnk_pipeline_prep_minimal:
These encode "for each species, which gradient classes count as access barriers" — which is fully derivable from each species's access_gradient_max: a class value g is a barrier for species s when g >= s$access_gradient_max. The in-code TODO on this one already flagged it for follow-up.
What each encodes, and where it should live
The class set is a modelling choice — what gradient bins does the pipeline detect? bcfishpass picks 0.05-step bins from 0.15 to 0.30. Other variants could choose finer or coarser granularity. Decision belongs at the caller / config level.
The per-species filter is derived from per-species access tolerances — no independent decision.
Design
API
lnk_pipeline_prepare accepts an optional classes override:
lnk_pipeline_prepare<-function(conn, aoi, cfg, schema,
observations="bcfishobs.observations",
classes=NULL) {
if (is.null(classes)) {
classes<- .lnk_derive_gradient_classes(cfg$parameters_fresh)
}
# pass classes through to .lnk_pipeline_prep_gradient and# .lnk_pipeline_prep_minimal
}
For the bundled bcfishpass parameters_fresh.csv, this produces c("1500" = 0.15, "2000" = 0.20, "2500" = 0.25). Missing vs the current hardcode: the "3000" bin — bcfishpass convention, no functional impact on access decisions (every position ≥ 0.30 is already ≥ 0.25 and labels "2500", still blocks every species that considers "2500" a barrier).
Reproduction parity
bit-identical rollup on bcfishpass variant requires keeping the bcfishpass class labels exactly, so data-raw/compare_bcfishpass_wsg.R passes the explicit set:
.lnk_pipeline_prep_minimal derives at runtime, no hardcoded models list:
for (spincfg$species) {
sp_amax<-cfg$parameters_fresh$access_gradient_max[
cfg$parameters_fresh$species_code==sp]
if (length(sp_amax) ==0|| is.na(sp_amax) ||sp_amax<=0) nextclasses_for_sp<- names(classes)[classes>=sp_amax]
# build barriers_<species> table for this class set# call fresh::frs_barriers_minimal
}
Note: this switches from bcfishpass's four per-model groupings (bt / ch_cm_co_pk_sk / st / wct) to one per species. Union into gradient_barriers_minimal must still produce the same set of positions for bit-identical output — species in the same bcfishpass group share access_gradient_max values and therefore the same class filter, so union is equivalent.
Scope
Add internal helper .lnk_derive_gradient_classes() in R/utils.R or R/lnk_pipeline_prepare.R.
Extend lnk_pipeline_prepare() signature with classes = NULL; derive when NULL.
.lnk_pipeline_prep_gradient accepts the resolved classes and uses them.
.lnk_pipeline_prep_minimal derives per-species class filters from access_gradient_max, removes hardcoded models list.
data-raw/compare_bcfishpass_wsg.R passes the explicit bcfishpass class set.
Unit tests:
Derivation produces expected classes + names from the bcfishpass parameters_fresh.csv
Per-species filter logic picks the right classes for each species
Derivation handles NA / zero / missing column gracefully
Rerun tar_make() on all four WSGs. Rollup must be bit-identical to pre-change (passing the explicit class set preserves bcfishpass labels).
Short note in research/bcfishpass_comparison.md documenting that the class set is now config-passed rather than hardcoded.
Out of scope
Moving classes into config.yaml$pipeline$gradient_classes. Deferred until we have a second variant that forces a different set.
Scanning other columns in parameters_fresh (cluster_bridge_gradient, rear_gradient_max, etc.) as additional class sources. cluster_bridge_gradient is a continuous segment-level threshold with no role in class detection. Rule-threshold (rear_gradient_max, spawn_gradient_max) segmentation is a different pipeline concept — break positions at rule thresholds to reduce near-threshold averaging — and should be its own issue.
Risks
Label mismatch on bcfishpass variant if the caller forgets to pass the explicit class set. Mitigation: compare_bcfishpass_wsg passes it mandatorily in this PR. A future config.yaml-driven path removes this footgun by making the list declarative in the bundle.
Per-species union not equivalent to per-model union if access_gradient_max values in parameters_fresh don't actually match bcfishpass's grouping. Verify by asserting the post-change gradient_barriers_minimal row count equals pre-change row count per WSG.
Species with NA or zero access_gradient_max (e.g. lake-only species with no stream access requirement) produce no filter. Derivation skips them — they contribute nothing to barrier detection. Test the skip path explicitly.
Reproducibility must remain bit-identical across runs after the change.
Execution
Branch from main. /planning-init + PWF.
/code-check on staged diff before every commit.
PR with SRED tag (Relates to NewGraphEnvironment/sred-2025-2026#24).
Cross-references
Companion cleanup to Wire barriers_definite_control into lnk_barrier_overrides #44 (wire barriers_definite_control into lnk_barrier_overrides). Issues are parallel-safe — no ordering constraint. Both target R/lnk_pipeline_prepare.R but different sub-helpers.
Follow-up issue (to file later): rule-threshold break source — break segments at rear_gradient_max / spawn_gradient_max crossings so classification near thresholds is deterministic. Different concept than this issue; different verification path.
Problem
R/lnk_pipeline_prepare.Rcarries two hardcoded lists that encode bcfishpass's gradient-classification scheme directly in pipeline code. Both belong at the config/derivation layer.Hardcode 1 — gradient class set in
.lnk_pipeline_prep_gradient:These four classes are bcfishpass's 0.05-step bins spanning the union of per-species
access_gradient_maxvalues inparameters_fresh.csv(0.15, 0.20, 0.25) plus one bcfishpass-convention extra bin at 0.30. Any variant using different per-species access tolerances or a different binning scheme is blocked by this hardcode.Hardcode 2 — per-species class filters in
.lnk_pipeline_prep_minimal:These encode "for each species, which gradient classes count as access barriers" — which is fully derivable from each species's
access_gradient_max: a class valuegis a barrier for speciesswheng >= s$access_gradient_max. The in-code TODO on this one already flagged it for follow-up.What each encodes, and where it should live
Design
API
lnk_pipeline_prepareaccepts an optionalclassesoverride:Default derivation
Internal
@noRdhelper:For the bundled bcfishpass
parameters_fresh.csv, this producesc("1500" = 0.15, "2000" = 0.20, "2500" = 0.25). Missing vs the current hardcode: the"3000"bin — bcfishpass convention, no functional impact on access decisions (every position ≥ 0.30 is already ≥ 0.25 and labels "2500", still blocks every species that considers "2500" a barrier).Reproduction parity
bit-identical rollup on bcfishpass variant requires keeping the bcfishpass class labels exactly, so
data-raw/compare_bcfishpass_wsg.Rpasses the explicit set:Per-species filter derivation
.lnk_pipeline_prep_minimalderives at runtime, no hardcodedmodelslist:Note: this switches from bcfishpass's four per-model groupings (bt / ch_cm_co_pk_sk / st / wct) to one per species. Union into
gradient_barriers_minimalmust still produce the same set of positions for bit-identical output — species in the same bcfishpass group shareaccess_gradient_maxvalues and therefore the same class filter, so union is equivalent.Scope
.lnk_derive_gradient_classes()inR/utils.RorR/lnk_pipeline_prepare.R.lnk_pipeline_prepare()signature withclasses = NULL; derive when NULL..lnk_pipeline_prep_gradientaccepts the resolved classes and uses them..lnk_pipeline_prep_minimalderives per-species class filters fromaccess_gradient_max, removes hardcodedmodelslist.data-raw/compare_bcfishpass_wsg.Rpasses the explicit bcfishpass class set.parameters_fresh.csvtar_make()on all four WSGs. Rollup must be bit-identical to pre-change (passing the explicit class set preserves bcfishpass labels).research/bcfishpass_comparison.mddocumenting that the class set is now config-passed rather than hardcoded.Out of scope
classesintoconfig.yaml$pipeline$gradient_classes. Deferred until we have a second variant that forces a different set.parameters_fresh(cluster_bridge_gradient,rear_gradient_max, etc.) as additional class sources.cluster_bridge_gradientis a continuous segment-level threshold with no role in class detection. Rule-threshold (rear_gradient_max,spawn_gradient_max) segmentation is a different pipeline concept — break positions at rule thresholds to reduce near-threshold averaging — and should be its own issue.Risks
compare_bcfishpass_wsgpasses it mandatorily in this PR. A future config.yaml-driven path removes this footgun by making the list declarative in the bundle.access_gradient_maxvalues inparameters_freshdon't actually match bcfishpass's grouping. Verify by asserting the post-changegradient_barriers_minimalrow count equals pre-change row count per WSG.access_gradient_max(e.g. lake-only species with no stream access requirement) produce no filter. Derivation skips them — they contribute nothing to barrier detection. Test the skip path explicitly.Execution
main./planning-init+ PWF./code-checkon staged diff before every commit.Relates to NewGraphEnvironment/sred-2025-2026#24).Cross-references
barriers_definite_controlintolnk_barrier_overrides). Issues are parallel-safe — no ordering constraint. Both targetR/lnk_pipeline_prepare.Rbut different sub-helpers.rear_gradient_max/spawn_gradient_maxcrossings so classification near thresholds is deterministic. Different concept than this issue; different verification path.Versions at start