Problem
frs_habitat_overlay()'s format = c("wide", "long") parameter and the related long_value_col carry support for two source-table shapes that have no current production consumer:
| shape |
who uses it |
format = "wide" (per-species suffix: spawning_sk, rearing_sk) |
nobody — was scoped for direct reads of bcfishpass.streams_habitat_known; that integration didn't materialize |
format = "long" (species_code + habitat_type rows + habitat_ind indicator) |
was link, against bcfishpass's pre-2026-04-26 CSV. Bcfishpass moved away. |
| (new bcfishpass shape: row-per-(segment × species) with per-habitat indicator columns) |
bcfishpass authoritative as of 2026-04-26 |
Three formats, one real consumer using one shape.
PR #176's first attempt added a species_col parameter to bolt the new shape onto the existing format = "wide" dispatch. That paper-over compounded the API rather than fixing the design.
Decision
Drop format and long_value_col entirely. Hard-code the canonical shape (the new bcfishpass one). Keep column-name params for genuine flexibility.
Canonical-shape contract on the source table:
species_col — column carrying the species code (one value per row)
by — join keys (default c("blue_line_key", "downstream_route_measure"))
- Each
habitat_types entry — column carrying the indicator for that habitat type
Column names remain caller-customizable via the existing species_col, by, habitat_types params. Indicator coercion is universal: lower(trim(<col>::text)) IN ('true', 't', '1') matches any of those plus integer 1, boolean TRUE; everything else falsy.
Why drop, not decompose
Earlier draft of this issue proposed a 2D species_layout × habitat_layout decomposition. That ships flexibility for shapes nobody uses (3 of 4 cells empty). YAGNI; rebuild when a real second consumer appears.
Flexibility for non-canonical sources (Option B)
Callers with a non-canonical source transform first — a SQL view, a data-raw/ pivot, link's forthcoming lnk_ingest_bcfishpass() — then call overlay. Shape-knowledge lives where it belongs (with the consumer); fresh stays a thin SQL adapter for one shape.
Breaking changes (pre-1.0, single consumer = link)
- Drop parameters:
format, long_value_col
- New required parameter:
species_col (default "species_code" — works for bcfishpass; caller can override)
- Existing callers with
format = "wide" (per-species suffix) need a transform-first pattern (no current callers)
- Existing callers with
format = "long" need a transform-first pattern (one current call site in link, which is already broken against the new bcfishpass CSV — fixing in coordinated link PR)
Tests
- All
format = "wide" tests removed (path no longer exists)
- All
format = "long" tests removed
- Existing canonical-shape tests (formerly
species_col integration) become primary
- New tests for the empty-source, missing-column, malicious-identifier paths
- Bridge mode tests retained — bridge is orthogonal to source shape
Coordination
- Fresh ships
0.22.0 with this change
- link picks up fresh
0.22.0 and updates lnk_pipeline_classify call site to pass species_col = "species_code" (no format, no long_value_col)
- link bumps to
0.12.0 in the same release window
- crate's
lnk_ingest_bcfishpass() registry-driven canonicalization lands separately as a follow-up; not blocking this unblock
Related
Relates to NewGraphEnvironment/sred-2025-2026#24
Problem
frs_habitat_overlay()'sformat = c("wide", "long")parameter and the relatedlong_value_colcarry support for two source-table shapes that have no current production consumer:format = "wide"(per-species suffix:spawning_sk,rearing_sk)bcfishpass.streams_habitat_known; that integration didn't materializeformat = "long"(species_code+habitat_typerows +habitat_indindicator)Three formats, one real consumer using one shape.
PR #176's first attempt added a
species_colparameter to bolt the new shape onto the existingformat = "wide"dispatch. That paper-over compounded the API rather than fixing the design.Decision
Drop
formatandlong_value_colentirely. Hard-code the canonical shape (the new bcfishpass one). Keep column-name params for genuine flexibility.Canonical-shape contract on the source table:
species_col— column carrying the species code (one value per row)by— join keys (defaultc("blue_line_key", "downstream_route_measure"))habitat_typesentry — column carrying the indicator for that habitat typeColumn names remain caller-customizable via the existing
species_col,by,habitat_typesparams. Indicator coercion is universal:lower(trim(<col>::text)) IN ('true', 't', '1')matches any of those plus integer 1, boolean TRUE; everything else falsy.Why drop, not decompose
Earlier draft of this issue proposed a 2D
species_layout × habitat_layoutdecomposition. That ships flexibility for shapes nobody uses (3 of 4 cells empty). YAGNI; rebuild when a real second consumer appears.Flexibility for non-canonical sources (Option B)
Callers with a non-canonical source transform first — a SQL view, a
data-raw/pivot, link's forthcominglnk_ingest_bcfishpass()— then call overlay. Shape-knowledge lives where it belongs (with the consumer); fresh stays a thin SQL adapter for one shape.Breaking changes (pre-1.0, single consumer = link)
format,long_value_colspecies_col(default"species_code"— works for bcfishpass; caller can override)format = "wide"(per-species suffix) need a transform-first pattern (no current callers)format = "long"need a transform-first pattern (one current call site in link, which is already broken against the new bcfishpass CSV — fixing in coordinated link PR)Tests
format = "wide"tests removed (path no longer exists)format = "long"tests removedspecies_colintegration) become primaryCoordination
0.22.0with this change0.22.0and updateslnk_pipeline_classifycall site to passspecies_col = "species_code"(noformat, nolong_value_col)0.12.0in the same release windowlnk_ingest_bcfishpass()registry-driven canonicalization lands separately as a follow-up; not blocking this unblockRelated
link/comms/crate/20260427_fresh_bcfishpass_csv_consumers.mdlnk_pipeline_classify.R— caller to update post-mergeRelates to NewGraphEnvironment/sred-2025-2026#24