Skip to content

Hardcoded spp_cols drops GR/KO from pipeline species derivation #106

@NewGraphEnvironment

Description

@NewGraphEnvironment

Problem

R/lnk_pipeline_species.R:63-64 and R/lnk_pipeline_break.R:167-168 hardcode the species-presence column list:

spp_cols <- c("bt", "ch", "cm", "co", "ct", "dv",
              "pk", "rb", "sk", "st", "wct")

Effect: any species added to dimensions.csv + parameters_fresh.csv + wsg_species_presence.csv (e.g. GR, KO) is silently dropped from lnk_pipeline_species() output for every WSG, so it never enters classify / connect. The presence column is read from the CSV but ignored at the column-list step.

Surfaced by adding GR + KO species rows to the default config bundle. GR has a populated gr column in wsg_species_presence.csv (many WSGs flagged t), but lnk_pipeline_species("PARS", ...) returns the 11-species list without GR.

Proposed Solution

Derive spp_cols from the wsg_species_presence CSV header — every column except watershed_group_code + notes is a presence flag:

spp_cols <- setdiff(names(row), c("watershed_group_code", "notes"))

DRY both call sites by extracting a shared .lnk_wsg_species_present(row) private helper in R/utils.R.

Update tests to assert that newly-added species (GR, KO) propagate through both lnk_pipeline_species and .lnk_pipeline_break_obs_species when the column is populated.

Relates to NewGraphEnvironment/sred-2025-2026#24

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions