Skip to content

frs_point_match: match two point datasets along FWA network within instream distance #206

@NewGraphEnvironment

Description

@NewGraphEnvironment

Problem

fresh has primitives for snapping points to FWA streams (frs_point_snap) and for finding upstream/downstream features per segment (frs_network_features), but no primitive for matching two point datasets along the network within a distance threshold.

Concrete use case driving this: link's bcfp parity layer needs to reproduce bcfp's 02_pscis_streams_150m.sql at smnorris/bcfishpass@v0.7.14-125-g6e9cf1c (current tunnel state, bcfishpass.log.model_run_id=121 rebuilt 2026-05-05) -- match PSCIS crossings to modelled crossings within 100m instream distance on the same stream, then keep the nearest PSCIS per modelled crossing. Currently link's lnk_pipeline_crossings is missing this layer, leaving modelled crossings duplicating PSCIS positions in the working schema; cascades into >+1000 false-positive anthropogenic barriers in BULK alone (see link's research/bcfp_table_map.md).

The operation generalizes beyond PSCIS:

  • field-assessed crossings <-> user-added crossings deduplication
  • observations <-> habitat confirmation points
  • any "merge two point datasets on the same FWA network" workflow

Proposed primitive

frs_point_match(conn, table_a, table_b, table_to, distance_max, ...):

  • Both input tables must already be snapped to FWA (carry blue_line_key, downstream_route_measure -- typically via frs_point_snap upstream).
  • Output table has columns from table_a plus a <table_b_id_col> linking column populated where a match within distance_max exists.
  • Matching is on same blue_line_key + instream distance <= distance_max (computed from downstream_route_measure deltas).
  • Dedup: each table_b row links to at most one table_a row -- the closest one (DISTINCT ON (table_b_id) ORDER BY distance ASC).

Signature

frs_point_match(
  conn,
  table_a,                              # schema-qualified, points to match FROM
  table_b,                              # schema-qualified, points to match TO
  table_to,                             # schema-qualified destination
  distance_max,                         # max instream distance (metres)
  table_a_id_col = "id",
  table_b_id_col = "id"
)

Returns conn invisibly. Side effect: drops + recreates table_to with table_a's columns plus the linking ID column.

Network-position columns (blue_line_key, downstream_route_measure) are hard-coded to the FWA convention -- every FWA-snapped point table in the bcfp ecosystem uses these names. Per-side overrides (like frs_network_features got in fresh#204) can be added later if a real divergence appears.

Why fresh, not link

This is a generic FWA-primitive operation. fresh already owns:

  • frs_point_snap -- point <-> stream snap
  • frs_network_features -- segment <-> feature dnstr/upstr

frs_point_match rounds out the point-handling family -- point <-> point along the network. Adding it in fresh makes the primitive available to link, bcfishpass-comparable tooling, and any future packages working with point-on-network data. Name uses singular point to match frs_point_snap (the closest existing analog).

Acceptance

  • frs_point_match(...) produces output byte-identical to bcfp's 02_pscis_streams_150m.sql output (at smnorris/bcfishpass@v0.7.14-125-g6e9cf1c) for a test WSG (after stream-name filtering, which is bcfp-specific and stays in link's caller, not the primitive)
  • Mocked tests covering the dedup logic (multiple table_a matching one table_b)
  • Live test on a small WSG (ADMS or similar) validating the byte-identical claim
  • Roxygen + lintr clean
  • Exported from NAMESPACE

Out of scope

  • Stream-name matching (bcfp does this for PSCIS-specific reasons via gnis_name <-> stream_name regex matching). Caller can layer that on top -- primitive provides just the network-distance match.
  • Multi-stream matching (matching across wscode_ltree subtrees) -- single blue_line_key is enough for the parity case.
  • Bidirectional dedup variants (matching pairs both ways) -- DISTINCT ON (table_b_id) is enough.
  • Per-side column-name overrides (table_*_blk_col etc.) -- add when a real divergence appears.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions