Skip to content

Gradient classes: derive from parameters_fresh, optional override arg #45

@NewGraphEnvironment

Description

@NewGraphEnvironment

Problem

R/lnk_pipeline_prepare.R carries two hardcoded lists that encode bcfishpass's gradient-classification scheme directly in pipeline code. Both belong at the config/derivation layer.

Hardcode 1 — gradient class set in .lnk_pipeline_prep_gradient:

fresh::frs_break_find(conn, paste0(schema, ".streams_blk"),
  attribute = "gradient",
  classes = c("1500" = 0.15, "2000" = 0.20,
              "2500" = 0.25, "3000" = 0.30),
  to = paste0(schema, ".gradient_barriers_raw"))

These four classes are bcfishpass's 0.05-step bins spanning the union of per-species access_gradient_max values in parameters_fresh.csv (0.15, 0.20, 0.25) plus one bcfishpass-convention extra bin at 0.30. Any variant using different per-species access tolerances or a different binning scheme is blocked by this hardcode.

Hardcode 2 — per-species class filters in .lnk_pipeline_prep_minimal:

models <- list(
  bt              = c(2500, 3000),
  ch_cm_co_pk_sk  = c(1500, 2000, 2500, 3000),
  st              = c(2000, 2500, 3000),
  wct             = c(2000, 2500, 3000)
)

These encode "for each species, which gradient classes count as access barriers" — which is fully derivable from each species's access_gradient_max: a class value g is a barrier for species s when g >= s$access_gradient_max. The in-code TODO on this one already flagged it for follow-up.

What each encodes, and where it should live

  • The class set is a modelling choice — what gradient bins does the pipeline detect? bcfishpass picks 0.05-step bins from 0.15 to 0.30. Other variants could choose finer or coarser granularity. Decision belongs at the caller / config level.
  • The per-species filter is derived from per-species access tolerances — no independent decision.

Design

API

lnk_pipeline_prepare accepts an optional classes override:

lnk_pipeline_prepare <- function(conn, aoi, cfg, schema,
                                 observations = "bcfishobs.observations",
                                 classes = NULL) {
  if (is.null(classes)) {
    classes <- .lnk_derive_gradient_classes(cfg$parameters_fresh)
  }
  # pass classes through to .lnk_pipeline_prep_gradient and
  # .lnk_pipeline_prep_minimal
}

Default derivation

Internal @noRd helper:

.lnk_derive_gradient_classes <- function(params_fresh) {
  g <- sort(unique(params_fresh$access_gradient_max))
  g <- g[!is.na(g) & g > 0]
  setNames(g, sprintf("%04d", round(g * 10000)))
}

For the bundled bcfishpass parameters_fresh.csv, this produces c("1500" = 0.15, "2000" = 0.20, "2500" = 0.25). Missing vs the current hardcode: the "3000" bin — bcfishpass convention, no functional impact on access decisions (every position ≥ 0.30 is already ≥ 0.25 and labels "2500", still blocks every species that considers "2500" a barrier).

Reproduction parity

bit-identical rollup on bcfishpass variant requires keeping the bcfishpass class labels exactly, so data-raw/compare_bcfishpass_wsg.R passes the explicit set:

link::lnk_pipeline_prepare(conn, aoi, cfg, schema,
  classes = c("1500" = 0.15, "2000" = 0.20,
              "2500" = 0.25, "3000" = 0.30))

Per-species filter derivation

.lnk_pipeline_prep_minimal derives at runtime, no hardcoded models list:

for (sp in cfg$species) {
  sp_amax <- cfg$parameters_fresh$access_gradient_max[
    cfg$parameters_fresh$species_code == sp]
  if (length(sp_amax) == 0 || is.na(sp_amax) || sp_amax <= 0) next
  classes_for_sp <- names(classes)[classes >= sp_amax]
  # build barriers_<species> table for this class set
  # call fresh::frs_barriers_minimal
}

Note: this switches from bcfishpass's four per-model groupings (bt / ch_cm_co_pk_sk / st / wct) to one per species. Union into gradient_barriers_minimal must still produce the same set of positions for bit-identical output — species in the same bcfishpass group share access_gradient_max values and therefore the same class filter, so union is equivalent.

Scope

  1. Add internal helper .lnk_derive_gradient_classes() in R/utils.R or R/lnk_pipeline_prepare.R.
  2. Extend lnk_pipeline_prepare() signature with classes = NULL; derive when NULL.
  3. .lnk_pipeline_prep_gradient accepts the resolved classes and uses them.
  4. .lnk_pipeline_prep_minimal derives per-species class filters from access_gradient_max, removes hardcoded models list.
  5. data-raw/compare_bcfishpass_wsg.R passes the explicit bcfishpass class set.
  6. Unit tests:
    • Derivation produces expected classes + names from the bcfishpass parameters_fresh.csv
    • Per-species filter logic picks the right classes for each species
    • Derivation handles NA / zero / missing column gracefully
  7. Rerun tar_make() on all four WSGs. Rollup must be bit-identical to pre-change (passing the explicit class set preserves bcfishpass labels).
  8. Short note in research/bcfishpass_comparison.md documenting that the class set is now config-passed rather than hardcoded.

Out of scope

  • Moving classes into config.yaml$pipeline$gradient_classes. Deferred until we have a second variant that forces a different set.
  • Scanning other columns in parameters_fresh (cluster_bridge_gradient, rear_gradient_max, etc.) as additional class sources. cluster_bridge_gradient is a continuous segment-level threshold with no role in class detection. Rule-threshold (rear_gradient_max, spawn_gradient_max) segmentation is a different pipeline concept — break positions at rule thresholds to reduce near-threshold averaging — and should be its own issue.

Risks

  • Label mismatch on bcfishpass variant if the caller forgets to pass the explicit class set. Mitigation: compare_bcfishpass_wsg passes it mandatorily in this PR. A future config.yaml-driven path removes this footgun by making the list declarative in the bundle.
  • Per-species union not equivalent to per-model union if access_gradient_max values in parameters_fresh don't actually match bcfishpass's grouping. Verify by asserting the post-change gradient_barriers_minimal row count equals pre-change row count per WSG.
  • Species with NA or zero access_gradient_max (e.g. lake-only species with no stream access requirement) produce no filter. Derivation skips them — they contribute nothing to barrier detection. Test the skip path explicitly.
  • Reproducibility must remain bit-identical across runs after the change.

Execution

  • Branch from main. /planning-init + PWF.
  • /code-check on staged diff before every commit.
  • PR with SRED tag (Relates to NewGraphEnvironment/sred-2025-2026#24).

Cross-references

  • Companion cleanup to Wire barriers_definite_control into lnk_barrier_overrides #44 (wire barriers_definite_control into lnk_barrier_overrides). Issues are parallel-safe — no ordering constraint. Both target R/lnk_pipeline_prepare.R but different sub-helpers.
  • Follow-up issue (to file later): rule-threshold break source — break segments at rear_gradient_max / spawn_gradient_max crossings so classification near thresholds is deterministic. Different concept than this issue; different verification path.

Versions at start

  • fresh: 0.14.0
  • link: main (0.5.0, target 0.6.0)
  • bcfishpass: ea3c5d8
  • fwapg: Docker (FWA 20240830)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions