Skip to content

lnk_pipeline_access: compute dam_dnstr_ind / remediated_dnstr_ind from primitives (#124 follow-up) #135

@NewGraphEnvironment

Description

@NewGraphEnvironment

Problem

lnk_pipeline_mapping_code() reproduces bcfp's streams_mapping_code byte-identically for all 8 species on ADMS — as long as the caller merges in bcfp's pre-computed dam_dnstr_ind and remediated_dnstr_ind columns from bcfishpass.streams_access.

Without those, mapping_code_<bt|wct> (resident-flavor) drift on rows where multiple barrier types stack (e.g. PSCIS-then-dam downstream). The resident flavor's CASE is sequence-aware: DAM token only fires when the next downstream anthropogenic barrier IS a dam, not just "any dam exists downstream". My presence-only fallback (has_barriers_dams_dnstr) over-emits DAM for ~14% of segments where bcfp emits ASSESSED.

bcfp's SQL (load_streams_access.sql:140-147):

case
  when array[b.barriers_anthropogenic_dnstr[1]] && b.barriers_dams_dnstr then true
  else false
end as dam_dnstr_ind,

i.e. take the FIRST element of barriers_anthropogenic_dnstr (the next-downstream anthropogenic barrier), check if it's also in barriers_dams_dnstr. If yes → DAM is the most-downstream barrier.

Why this is non-trivial

bcfp's barriers_anthropogenic and barriers_dams use a shared ID space because both tables are populated from the same upstream aggregated_crossings source — barriers_anthropogenic_id and barriers_dams_id reference the same aggregated_crossings_id. So array_overlap works.

In our frs_network_features calls, the feature_id_col is the table-specific PK (barriers_anthropogenic_id, barriers_dams_id) — different ID spaces. Direct overlap doesn't work.

Proposed Solution

Two-step: (1) pass a common-ID column (e.g. aggregated_crossings_id) to frs_network_features for both anthropogenic and dams tables. The returned arrays then share ID space. (2) In R, compute dam_dnstr_ind = ours_anthropogenic[[i]][1] %in% ours_dams[[i]].

# Sketch:
anth <- frs_network_features(
  conn, segments = "bcfishpass.streams", features = "bcfishpass.barriers_anthropogenic",
  feature_id_col = "aggregated_crossings_id",  # common with dams
  direction = "downstream", aoi = "ADMS"
)
dams <- frs_network_features(
  conn, segments = "bcfishpass.streams", features = "bcfishpass.barriers_dams",
  feature_id_col = "aggregated_crossings_id",  # common with anthropogenic
  direction = "downstream", aoi = "ADMS"
)
# Join, then per-row: dam_dnstr_ind = anth$feature_ids[[i]][1] %in% dams$feature_ids[[i]]

Same shape for remediated_dnstr_ind against bcfishpass.barriers_remediations.

Wire this into lnk_pipeline_access so the output tibble includes dam_dnstr_ind and remediated_dnstr_ind natively, and lnk_pipeline_mapping_code picks them up automatically (already supports the column-prefer fallback).

Acceptance

  • lnk_pipeline_access emits dam_dnstr_ind + remediated_dnstr_ind boolean columns when the user passes the common-ID variants of barrier_sources.
  • ADMS validation: mapping_code_bt + mapping_code_wct byte-identical to bcfp without the pre-computed-indicator merge-in step.
  • Roxygen documents the common-ID requirement (which barriers tables have shared aggregated_crossings_id, when it's safe to skip).

Cross-refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions