frs_point_match: match two FWA-snapped point datasets within instream distance (v0.30.0)#208
Merged
NewGraphEnvironment merged 9 commits intoMay 11, 2026
Conversation
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
R/frs_point_match.R + man/frs_point_match.Rd + NAMESPACE export. Function matches the fresh#206 issue body signature, follows table_* parameter convention from link/CLAUDE.md, uses .frs_validate_identifier() + .frs_db_execute() helpers, sprintf SQL composition mirroring frs_network_features pattern. Examples include PSCIS<->modelled (bcfp parity) + field-assessed<->user (generic dedup) cases. SQL implements bcfp's 02_pscis_streams_150m.sql algorithm (verified unchanged from local v0.7.13 source through tunnel v0.7.14-125-g6e9cf1c via git log): LEFT JOIN on same blue_line_key + ABS(drm_a - drm_b) < distance_max, DISTINCT ON (table_a_id, blue_line_key) ORDER BY distance_instream ASC NULLS LAST keeps the closest match (or NULL when no match). Lint clean; load_all signature verified. Tests + live byte-identical validation against bcfp follow in Phase 2+3. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…cross-ref Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
tests/testthat/test-frs_point_match.R mirrors frs_network_features test structure: Tier 1 -- validation (7 tests): required args, distance_max scalar positive numeric, .frs_validate_identifier rejection, table_a_id_col must differ from table_b_id_col. Tier 2 -- SQL composition via local_mocked_bindings on .frs_db_execute (5 tests): DROP + CREATE + DISTINCT ON + same-blk join, distance_max numeric literal in predicate, LEFT JOIN preserves unmatched rows, ASC NULLS LAST for closest-wins tiebreak, table_b_id_col carried to SELECT. Total: 20 PASS / 0 FAIL on filter=frs_point_match. Full suite: 987 PASS / 2 FAIL / 3 WARN -- both FAIL and the WARNs are pre-existing on main (verified via git checkout + re-run); not caused by this change. Specifically test-frs_params.R:92 fails the same way before and after. Tier 3 (live DB integration) deferred to Phase 3 byte-identical validation against bcfp -- consistent with frs_network_features's own Phase 3 still being TODO per its file header. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bug surfaced by live integration test: RPostgres can't run multi-
statement SQL ("cannot insert multiple commands into a prepared
statement"). Split DROP TABLE IF EXISTS and CREATE TABLE AS into
two .frs_db_execute calls. Updated tests to capture both SQL calls
via a list-accumulator mock.
Live byte-identical validation against bcfp tunnel
(smnorris/bcfishpass@v0.7.14-125-g6e9cf1c via bcfishpass.log
model_run_id=121, 2026-05-05):
ours: 60 matches | ref: 60 matches
(stream_crossing_id, modelled_crossing_id) pairs identical: 60
only in ours: 0
only in ref: 0
Script captured at /tmp/fresh_206_live_validation.R for PR body
attachment. Tests: 21 PASS / 0 FAIL on filter=frs_point_match.
Reference is bcfishpass.pscis.modelled_crossing_id (the canonical
post-dedup linkage) not bcfishpass.pscis_streams_150m (the pre-dedup
scoring intermediate) -- documented in findings + task_plan.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
DESCRIPTION 0.29.0 -> 0.30.0 (minor bump per R-package conventions for new exported function). NEWS.md 0.30.0 entry covers: - generic over any FWA-snapped point-pair use case - algorithm pinned to smnorris/bcfishpass@v0.7.14-125-g6e9cf1c - table_* parameter convention (per link/CLAUDE.md) - write-to-table contract + RPostgres two-call split rationale - live parity numbers (60/60 byte-identical ADMS) - test coverage summary (21 tests) - first consumer is link#154 R CMD check (--no-tests): 0 errors / 4 warnings / 4 notes -- identical to main per side-by-side check. lintr clean on the new R file. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two algorithmic improvements after BULK validation surfaced gaps:
1) Bidirectional dedup. Initial impl only did a-side dedup (each PSCIS
keeps nearest modelled). bcfp also does b-side dedup (each modelled
keeps nearest PSCIS; losers get NULL'd). Added via ROW_NUMBER OVER
PARTITION BY b_id, CASE-NULLing non-winners. Mirrors bcfp's UPDATE
in lines 172-218 of 02_pscis_streams_150m.sql.
2) tiebreak parameter (c("instream", "planar"), default "instream").
bcfp uses planar Euclidean (ST_Distance on geom) for the b-side
dedup tiebreak; we previously used instream distance for both
threshold and dedup. New `tiebreak = "planar"` requires `geom` on
both inputs (FWA convention) and matches bcfp's tiebreak metric.
Plus reserved-column collision check on table_a (b_id_col,
distance_instream, dedup_metric_internal) to catch upstream-shape
issues at function entry rather than SQL-execution time.
Live validation deltas:
- ADMS: 60/60 byte-identical (unchanged -- no tiebreak collisions
in ADMS data)
- BULK xref-excluded snap-only subset:
initial: 78 / 92 ref pairs identical, 14 in ours-not-ref
+ b-dedup: 77 / 82, 5 in ours-not-ref + 1 in ref-not-ours
+ planar: 77 / 81, 4 in ours-not-ref + 1 in ref-not-ours (raw geom)
- Remaining 4-5 BULK diffs documented in @details as out-of-scope:
bcfp considers multi-stream candidates within 150m planar before
settling on (PSCIS, stream); frs_point_match assumes single-stream
input from upstream snap. Caller can layer multi-stream selection
on top if needed.
Tests: 24 PASS / 0 FAIL on filter=frs_point_match. lintr clean on
the new R file (one pre-existing vignette indentation lint
unchanged).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Renamed `table_a_id_col` -> `col_a_id`, `table_b_id_col` -> `col_b_id` per the `col_<role>` parameter naming convention (link/CLAUDE.md extension). Matches the `table_<role>` precedent (table_a, table_b, table_to, table_in, table_out). Picked over `<role>_col` for the same autocomplete-grouping reason. Live re-validation post-rename: ADMS still 60/60 byte-identical. NEWS.md updated: BULK 5-edge-case divergence now references the follow-up primitive fresh#207 (frs_candidates_pick) that closes it byte-identically via composition. The divergence is structurally at the snap layer (which stream a PSCIS belongs to), not the match layer, so it belongs in a separate primitive. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #206. Ships
frs_point_match()— third primitive in the point-handling family alongsidefrs_point_snapandfrs_network_features. Matches two FWA-snapped point datasets along the network within an instream-distance threshold, with bidirectional dedup and a switchable tiebreak metric.02_pscis_streams_150m.sqlatsmnorris/bcfishpass@v0.7.14-125-g6e9cf1c(tunnelbcfishpass.log.model_run_id=121, rebuilt 2026-05-05).table_<role>/col_<role>conventions:table_a,table_b,table_to,col_a_id,col_b_id,tiebreak,distance_max.Live byte-identical parity vs bcfp
ADMS: 60 / 60 (stream_crossing_id, modelled_crossing_id) pairs byte-identical to
bcfishpass.pscis.modelled_crossing_id. 0 in ours-not-ref, 0 in ref-not-ours.BULK (xref-excluded snap-only subset, since BULK has 1467 manual xref overrides that the caller layers on top of this primitive): 77 / 78 ref pairs identical; 5 in ours-not-ref, 1 in ref-not-ours.
The 5 BULK divergences are bcfp's multi-stream candidate consideration at the SNAP layer (before this primitive runs) — bcfp picks among multiple FWA streams within 150m planar by name + width + distance scoring.
frs_point_matchaccepts a single (PSCIS, stream) per row, so it can't reselect the stream. Documented in@details. Addressed by sibling primitive #207 (frs_candidates_pick): score + filter + dedup candidates per key. With #206 + #207 composed viafrs_point_snap(num_features = N)→frs_candidates_pick→frs_point_match, the bcfp PSCIS-to-stream + PSCIS-to-modelled algorithm reproduces byte-identically.Test plan
expect_error()per arg path) and SQL composition (DROP + CREATE order, DISTINCT ON, bidirectional dedup via PARTITION BY,tiebreakswitch, identifier sanitization, reserved-column collision guards).[ FAIL 0 | WARN 0 | SKIP 0 | PASS 24 ]on filter=frs_point_match.git checkout main+ re-run). No regressions caused by this change.R CMD check --no-tests: 0 errors / 4 warnings / 4 notes — identical to main.lintr::lint("R/frs_point_match.R"): zero lints.roxygen2::roxygenise()regeneratesman/frs_point_match.Rd+frs_network_features.Rd(@family networkcross-reference).Notable design decisions
frs_point_snapwhich returns sf andfrs_network_featureswhich returns tibble). Rationale: the result is a derived dataset not a query result; bcfp's analog also writes table→table; output sizes can be large; DDL split into twoDBI::dbExecutecalls because RPostgres rejects multi-statement SQL in one call.blue_line_key,downstream_route_measure) to the FWA convention. Per-side overrides can be added if a real divergence appears (precedent: fresh#204's per-side wscode/localcode overrides onfrs_network_features).tiebreakparameter (c("instream", "planar"), default"instream"). Default is geometry-free and self-consistent with the threshold filter;"planar"requiresgeomon both inputs and mirrors bcfp's tiebreak metric exactly.table_arow's nearesttable_b; b-side keeps eachtable_brow's nearesttable_a(others get NULL). Mirrors bcfp's two-pass algorithm in lines 172-218 + 226+ of02_pscis_streams_150m.sql.First consumer: link#154 (lnk_pipeline_crossings: missing PSCIS↔modelled 100m-instream auto-snap layer). That issue wires this primitive into link's per-WSG crossings build.
🤖 Generated with Claude Code