Skip to content

frs_habitat_overlay: drop format param, accept only canonical shape #177

@NewGraphEnvironment

Description

@NewGraphEnvironment

Problem

frs_habitat_overlay()'s format = c("wide", "long") parameter and the related long_value_col carry support for two source-table shapes that have no current production consumer:

shape who uses it
format = "wide" (per-species suffix: spawning_sk, rearing_sk) nobody — was scoped for direct reads of bcfishpass.streams_habitat_known; that integration didn't materialize
format = "long" (species_code + habitat_type rows + habitat_ind indicator) was link, against bcfishpass's pre-2026-04-26 CSV. Bcfishpass moved away.
(new bcfishpass shape: row-per-(segment × species) with per-habitat indicator columns) bcfishpass authoritative as of 2026-04-26

Three formats, one real consumer using one shape.

PR #176's first attempt added a species_col parameter to bolt the new shape onto the existing format = "wide" dispatch. That paper-over compounded the API rather than fixing the design.

Decision

Drop format and long_value_col entirely. Hard-code the canonical shape (the new bcfishpass one). Keep column-name params for genuine flexibility.

Canonical-shape contract on the source table:

  • species_col — column carrying the species code (one value per row)
  • by — join keys (default c("blue_line_key", "downstream_route_measure"))
  • Each habitat_types entry — column carrying the indicator for that habitat type

Column names remain caller-customizable via the existing species_col, by, habitat_types params. Indicator coercion is universal: lower(trim(<col>::text)) IN ('true', 't', '1') matches any of those plus integer 1, boolean TRUE; everything else falsy.

Why drop, not decompose

Earlier draft of this issue proposed a 2D species_layout × habitat_layout decomposition. That ships flexibility for shapes nobody uses (3 of 4 cells empty). YAGNI; rebuild when a real second consumer appears.

Flexibility for non-canonical sources (Option B)

Callers with a non-canonical source transform first — a SQL view, a data-raw/ pivot, link's forthcoming lnk_ingest_bcfishpass() — then call overlay. Shape-knowledge lives where it belongs (with the consumer); fresh stays a thin SQL adapter for one shape.

Breaking changes (pre-1.0, single consumer = link)

  • Drop parameters: format, long_value_col
  • New required parameter: species_col (default "species_code" — works for bcfishpass; caller can override)
  • Existing callers with format = "wide" (per-species suffix) need a transform-first pattern (no current callers)
  • Existing callers with format = "long" need a transform-first pattern (one current call site in link, which is already broken against the new bcfishpass CSV — fixing in coordinated link PR)

Tests

  • All format = "wide" tests removed (path no longer exists)
  • All format = "long" tests removed
  • Existing canonical-shape tests (formerly species_col integration) become primary
  • New tests for the empty-source, missing-column, malicious-identifier paths
  • Bridge mode tests retained — bridge is orthogonal to source shape

Coordination

  • Fresh ships 0.22.0 with this change
  • link picks up fresh 0.22.0 and updates lnk_pipeline_classify call site to pass species_col = "species_code" (no format, no long_value_col)
  • link bumps to 0.12.0 in the same release window
  • crate's lnk_ingest_bcfishpass() registry-driven canonicalization lands separately as a follow-up; not blocking this unblock

Related

Relates to NewGraphEnvironment/sred-2025-2026#24

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions