Skip to content

Honour barriers_definite_control at the override step, per-species (#44)#47

Merged
NewGraphEnvironment merged 11 commits intomainfrom
44-barriers-definite-control
Apr 23, 2026
Merged

Honour barriers_definite_control at the override step, per-species (#44)#47
NewGraphEnvironment merged 11 commits intomainfrom
44-barriers-definite-control

Conversation

@NewGraphEnvironment
Copy link
Copy Markdown
Owner

Summary

  • Honours user_barriers_definite_control.csv at the observation-override step in lnk_barrier_overrides(). Positions flagged barrier_ind = TRUE can no longer be re-opened by upstream historical observations.
  • Gates the control filter per-species via a new observation_control_apply column in parameters_fresh.csv (TRUE for CH/CM/CO/PK/SK/ST, FALSE for BT/WCT, NA for CT/DV/RB). Matches bcfishpass's per-model SQL — model_access_bt.sql has no control join; model_access_ch_cm_co_pk_sk.sql and model_access_st.sql do. Residents inhabit reaches upstream of anadromous-blocking falls routinely (post-glacial headwater connectivity, no ocean-return requirement), so their observations still override.
  • Habitat-confirmation override path intentionally bypasses the control table — expert-confirmed habitat is higher-trust than observations, and bcfishpass's hab_upstr CTE has no control join either.
  • .lnk_pipeline_prep_overrides() passes the control table to lnk_barrier_overrides() when the config manifest declares barriers_definite_control. Manifest key is the contract; no DB probe.
  • .lnk_pipeline_prep_load_aux() now always creates a schema-valid (possibly empty) barriers_definite_control table when the manifest declares the key — fixes an asymmetric-gating bug that would have raised "relation does not exist" on AOIs with zero control rows.
  • Adds DEAD (Deadman River) to data-raw/_targets.R as the end-to-end test WSG for the filter. It has a single barrier_ind = TRUE control row at FALLS (356361749, 45743) with six anadromous observations upstream and zero habitat coverage — the unique combination that actively exercises the filter. ADMS/BULK/BABL/ELKR are the numerical-parity WSGs; their TRUE control rows are all rescued by either the observation threshold or the habitat path, so they validate that the filter doesn't over-apply but don't stress it. The follow-up evidence is in working_dead.barrier_overrides: at the control position only BT has an override row (control_apply=FALSE), while CH/CM/CO/PK/SK/ST are all blocked.
  • Version 0.5.0 → 0.6.0. NEWS entry, DESCRIPTION bump, research doc research/bcfishpass_comparison.md (DEAD parity table, "Key fixes" row, three-part-fix subsection, DAG update), vignette (5-WSG narrative, pivot column for DEAD), bcfishpass config inst/extdata/configs/bcfishpass/README.md updated, vignette artifacts regenerated from real pipeline state.

Reproducibility: two consecutive targets::tar_destroy(ask = FALSE); targets::tar_make() runs produce bit-identical 46-row rollup tibbles (digest 210c3f8254c47ac88573a80d96a2701e).

Not in this PR

Two information_schema.tables probes in .lnk_pipeline_prep_gradient() and .lnk_pipeline_prep_overrides() (for barriers_definite_control and user_habitat_classification respectively) work correctly today but are indirect — they read DB state rather than the manifest. Migrating them to manifest-driven gating mirrors the pattern this PR introduces. Filed as #46.

One unrelated commit on the branch

22ac1dd ("comms(→link): M1 verified as a ready R-worker host") landed on this branch via a parallel session's branch-landing policy. It adds a single markdown under comms/rtj/ documenting infra readiness and a crew.cluster 0.4.0 API gap for a future distributed-run PR. Completely independent of #44's scope; reviewer should treat it as orthogonal (can be cherry-picked out if desired).

Test plan

  • devtools::test() — 279 PASS, 0 FAIL (pre-existing sf/DBI mock-binding warning in test-lnk_pipeline_break.R is unrelated).
  • tar_make() across 5 WSGs, two consecutive full rebuilds — bit-identical rollups via digest::digest().
  • Direct verification of working_dead.barrier_overrides at the control position — only BT present, CH/CM/CO/PK/SK/ST blocked by the per-species filter.
  • DEAD rollup: all 6 species within 3% of bcfishpass reference.
  • ADMS/BULK/BABL/ELKR rollup numerically identical to pre-fix baseline (filter correctly inactive on these WSGs — TRUE control rows rescued by threshold or habitat).
  • /code-check on Phase 2a and Phase 2b diffs — both clean after 2 rounds each.

Fixes #44
Relates to NewGraphEnvironment/sred-2025-2026#24

🤖 Generated with Claude Code

Vignette additions:

- New "Prerequisites" section names fwapg as the source of the stream-network tables (ltree-typed watershed codes) and the traversal SQL functions (fwa_upstream, fwa_downstream, fwa_watershedatmeasure) the pipeline reads. bcfishobs marked as optional-but-recommended for observation overrides. Comparison tunnel called out as a validation convenience, not a runtime requirement.
- DAG first line changed from "FWA streams (raw)" to "FWA stream network (via fwapg, ltree-enriched)" to match the prerequisites note.
- Observations bullet now explicitly says overrides apply to gradient barriers, falls, AND user-definite barriers — plus an inline link to fwa_upstream since that's the SQL function driving the count.
- New "User-identified definite barriers" bullet in the break-positions list, with inline links to user_barriers_definite.csv in bcfishpass and the mirror in link. Treated the same as falls: always-blocking, always a break position, eligible for per-species override via lnk_barrier_overrides.
- planning/archive/2026-04-23-targets-pipeline/ — three-PR arc closed (PRs #41/#42/#43 shipping 0.3.0 → 0.5.0). Bit-identical rollups across three consecutive tar_make runs.
- planning/active/ — new PWF for #44. Approach per /Users/airvine/.claude/plans/stateful-hopping-feather.md: fix latent ctrl_filter bug in lnk_barrier_overrides (current filter treats any control row as blocking; docstring says only barrier_ind = TRUE blocks) + wire control through from .lnk_pipeline_prep_overrides via manifest-driven gating (cfg$overrides$barriers_definite_control, not information_schema probe). Follow-up issue scoped to migrate remaining probes to manifest-driven gating.

Relates to #44
The `control` parameter's documented contract is "barriers in this table with `barrier_ind = TRUE` cannot be overridden." Previous implementation used `LEFT JOIN control c + WHERE c.blue_line_key IS NULL`, which treated ANY control row as blocking — including `barrier_ind = FALSE` rows (which mark "not actually a barrier" positions, not "do not override" positions).

Replaces the LEFT-JOIN/IS-NULL pattern with a `NOT EXISTS` subquery filtered to `barrier_ind::boolean = true`. Fix:

- Correctly blocks override only when at least one matching control row has `barrier_ind = TRUE`. Mixed TRUE/FALSE within the 1 m position tolerance resolves to "blocked" (TRUE wins).
- Removes a latent row-multiplication issue in the observation-count aggregation: the prior LEFT JOIN could multiply rows before `GROUP BY ... HAVING count(observation_key) >= threshold`, under-counting the threshold when multiple control rows matched one barrier. `NOT EXISTS` is a WHERE-clause subquery — no multiplication.

Applies identically to the observation and habitat override paths (same `ctrl_where` / `ctrl_filter` pair used in both INSERTs).

`.lnk_pipeline_prep_overrides` still calls `lnk_barrier_overrides()` with `control = NULL` — behavioural wiring lands in Phase 2 of this PR.

New `tests/testthat/test-lnk_barrier_overrides.R` — 11 mocked SQL tests covering:

- NOT EXISTS + `= true` clauses appear when `control` is non-NULL (observation path)
- Same when habitat path is taken (observation path disabled)
- Neither clause appears when `control = NULL`

Full suite: 269 PASS.

Relates to #44
`.lnk_pipeline_prep_overrides` now passes `<schema>.barriers_definite_control` to `lnk_barrier_overrides` as the `control` argument whenever `cfg$overrides$barriers_definite_control` is declared by the config manifest. The manifest key is the direct contract — same pattern used for the other override roles in the pipeline. Phase 1's NOT EXISTS filter is what honours it at the SQL layer.

Also fixes an asymmetric-gating bug caught in code review: `.lnk_pipeline_prep_load_aux` previously only wrote `<schema>.barriers_definite_control` when both the manifest declared the key AND the current AOI had matching rows. With Phase 2's manifest-driven gate, that meant AOIs where the manifest was declared but no rows matched would reference a table that was never created, and `lnk_barrier_overrides`'s NOT EXISTS subquery would raise "relation does not exist." Mirrored the `barriers_definite` pattern — whenever the manifest declares the key, write a schema-valid table (empty or populated). The NOT EXISTS against an empty table is always TRUE, so a zero-row AOI correctly blocks nothing.

Two new tests in `test-lnk_pipeline_prepare.R` mock `lnk_barrier_overrides` and assert the resolved `control` arg:

- manifest declares key → `control = "<schema>.barriers_definite_control"`
- manifest omits key → `control = NULL`

Full suite: 271 PASS.

End-to-end verification via `tar_make()` follows in the next commits (rollup regeneration, research doc + NEWS updates, vignette artifact regen).

Relates to #44
Phase 2 applied the NOT EXISTS control filter across every species in the
`params` loop. A post-Phase-2 `tar_make()` drifted 11–22pp *away* from
bcfishpass on ADMS and BABL (BULK and ELKR unchanged — their TRUE control
rows have no upstream observations).

Root cause: bcfishpass applies the control filter only in
`model_access_ch_cm_co_pk_sk.sql` and `model_access_st.sql`. The BT, WCT,
and CT/DV/RB access models don't reference `user_barriers_definite_control`
at all. Their observations are allowed to override anadromous-blocking
falls because residents routinely inhabit reaches upstream of such falls —
post-glacial headwater connectivity populated many upper basins before the
present channel dropped below its late-Pleistocene profile, and residents
don't require ocean return.

Species scope as a parameter, not a hard-coded list:

- New column `observation_control_apply` in
  `inst/extdata/configs/bcfishpass/parameters_fresh.csv`. Logical.
  TRUE for CH, CM, CO, PK, SK, ST; FALSE for BT, WCT; NA for CT, DV, RB
  (which have no `observation_threshold` either — the flag is
  inapplicable).

- `lnk_barrier_overrides()` reads `params$observation_control_apply[i]`
  inside the per-species loop. `isTRUE(as.logical(...))` normalises NA,
  missing-column, character, and unexpected inputs to FALSE — the
  resident-safe default. When FALSE, the NOT EXISTS clause is omitted
  from both the observation and habitat override paths.

Two concerns, two locations: `cfg$overrides$barriers_definite_control`
remains the table-level contract (is the control CSV declared for this
config?); the new column is the application-level contract (does this
species honour it?). Rules YAML stays focused on habitat classification.

Tests (test-lnk_barrier_overrides.R, +3 cases):

- `observation_control_apply = FALSE` → no NOT EXISTS / no
  `c.barrier_ind::boolean` clause in the rendered SQL.
- `observation_control_apply = NA` → same — resident-safe default.
- Mixed params (BT = FALSE, CH = TRUE) → per-species gating confirmed
  by inspecting the two emitted INSERT statements.

Full suite: 279 PASS.

Relates to #44
Phase 2a gated the control filter per-species but left a second defect:
`ctrl_filter` was applied to BOTH the observation-path INSERT and the
habitat-path INSERT in `lnk_barrier_overrides()`. bcfishpass's
`model_access_ch_cm_co_pk_sk.sql` has separate CTEs:

  obs_upstr — joins observations, LEFT JOINs control, filters
              `bc.barrier_ind IS NULL` (control-gated)
  hab_upstr — joins habitat only, no control join at all

The biology: expert-confirmed habitat is higher-trust than observations.
By the time a reviewer has confirmed upstream habitat for a species at
a given position, they have already considered the barrier's passability
and the control-table designation. Observations are noisier and may be
misattributed, so the control table vetoes them; habitat is a direct
assertion that a species uses the reach upstream, so it stands.

The old behaviour under-overrode bcfishpass: on ADMS and BABL, TRUE
control positions that bcfishpass overrode via habitat stayed as
barriers in link, cutting 11-22pp of accessible spawning/rearing km on
CH, CM, CO, PK, SK, ST. Removing the habitat-path gate brings all four
WSGs back into parity.

Changes:

- `lnk_barrier_overrides()`: the habitat-path INSERT no longer
  interpolates `ctrl_where` / `ctrl_filter`. The observation-path INSERT
  is unchanged (still gated by `observation_control_apply` per species).
- Flipped the existing test "control filter applies to habitat overrides
  too" to its corrected form: "habitat override path is NOT gated by
  control (bcfishpass parity)". Filters captured SQL to the habitat
  INSERT only and asserts absence of the NOT EXISTS clause.
- Roxygen on `@param control` documents that habitat confirmations bypass
  the control table entirely.

`devtools::test()`: 279 PASS.

`tar_make()` across ADMS, BULK, BABL, ELKR — all 34 rollup rows within 5%
of bcfishpass reference. Exact numeric match to pre-fix baseline (no
behaviour change vs pre-fix on these 4 WSGs, because habitat
classifications already covered the control-table positions). Filter is
now correctly applied semantically, not relying on coincidence with the
habitat path.

Relates to #44
…0 API gap

7/7 checks pass on M1, 1.1s SSH+Rscript round-trip, NG packages
load, .Renviron propagates. Includes a heads-up that
crew.cluster 0.4.0 only exports HPC-scheduler controllers —
no generic ssh one exists, so PR 3-of-3's launcher needs one of
custom crew_class_launcher, clustermq, or bespoke mirai+ssh.

Landing on 44-barriers-definite-control per soul 2026-04-23
branch-landing ruling (Policy A, commit on peer's current branch
with --only flag).
Phase 2b rollup was numerically identical to pre-fix on all four parity
WSGs (ADMS/BULK/BABL/ELKR) because none of their TRUE control rows
actually exercise the filter. Every row is rescued by either the
observation threshold or the habitat path. The filter semantics were
verified at the unit level but not end-to-end on real data.

Province-wide hunt for TRUE control rows where the filter actually fires
(obs >= threshold upstream AND zero habitat coverage) produced four
candidates: CAMB (11 obs), DEAD (6), LFRA (16 but too large), SALM (7).
Picked DEAD (Deadman River): smallest runtime, single TRUE control row at
FALLS (356361749, 45743), six CH/CM/CO/PK/SK obs upstream, zero habitat
classification for those species upstream. The tight "just above
threshold" condition means pre-fix link would have overridden the fall
(fall skipped, upstream accessible); post-fix link correctly blocks the
override on anadromous species.

bcfishpass reference: the fall at (356361749, 45743) IS present in
bcfishpass.barriers_ch_cm_co_pk_sk post-override. bcfishpass kept it as
a barrier via the control filter. link now matches.

Direct verification on working_dead.barrier_overrides:

- At the control position, exactly one species row is emitted: BT.
  observation_control_apply = FALSE for BT, so the NOT EXISTS clause
  was skipped and the observation count (>= threshold) produced the
  override. Matches bcfishpass BT model (no control join).
- CH, CM, CO, PK, SK, ST all blocked at that position by the filter.
  Matches bcfishpass ch_cm_co_pk_sk and st models (control join
  present).

DEAD rollup: all six species within 3% of bcfishpass reference.

Incremental tar_make: comparison_ADMS/BULK/BABL/ELKR cached from the
Phase 2b run, only comparison_DEAD + rollup rebuilt (42s).

Log files captured:

- 20260423_01_tar_make_post_44_phase2.txt (Phase 2, pre per-species gate)
- 20260423_02_tar_make_phase2a.txt (Phase 2a, species gate, still bad on CH/CO/SK/ST)
- 20260423_03_tar_make_phase2b.txt (Phase 2b, habitat path ungated, baseline rollup)
- 20260423_04_tar_make_repro.txt (Phase 2b reproducibility)
- 20260423_05_tar_make_dead.txt (incremental with DEAD added)

Relates to #44
Evidence trail for the barriers_definite_control investigation.
Referenced from planning/active/task_plan.md.

Relates to #44
Phase 4 of #44. Numerical artifacts regenerated from the 5-WSG
post-DEAD pipeline; narrative updates document the control-filter
wiring and the split between parity WSGs and the end-to-end test WSG.

- DESCRIPTION: Version 0.5.0 -> 0.6.0
- NEWS.md: 0.6.0 entry covering the control-filter wiring, per-species
  gating via observation_control_apply, habitat-path bypass, manifest-
  gated pipeline wiring, asymmetric-gating fix in prep_load_aux, and
  DEAD added as the end-to-end validation WSG.
- inst/extdata/configs/bcfishpass/README.md: 5 WSGs; note that DEAD
  is the control-filter end-to-end test, the other four are parity.
- inst/extdata/vignette-data/*.rds: regenerated from data-raw/
  vignette_reproducing_bcfishpass.R against the current tar_make
  rollup (46 rows, bit-identical across two full rebuilds; digest
  210c3f8254c47ac88573a80d96a2701e).
- research/bcfishpass_comparison.md: adds the DEAD parity table, a
  section explaining why DEAD is the filter test WSG, a row in the
  "Key fixes" table, and a subsection describing the three-part fix
  (observation-path NOT EXISTS, per-species gate, habitat-path bypass).
  DAG rollup node updated to 46 rows; tar_map includes DEAD.
- vignettes/reproducing-bcfishpass.Rmd: 5 WSGs; names DEAD's role;
  pivot tables include DEAD column via intersect() on names(w).

Reproducibility verified: two consecutive `targets::tar_destroy(ask =
FALSE); targets::tar_make()` runs on the same DB state produce
bit-identical rollups (same SHA256 via digest::digest()).

Relates to #44
@NewGraphEnvironment NewGraphEnvironment merged commit 8eda3fb into main Apr 23, 2026
1 check passed
@NewGraphEnvironment NewGraphEnvironment deleted the 44-barriers-definite-control branch April 23, 2026 20:21
NewGraphEnvironment added a commit that referenced this pull request Apr 23, 2026
DEAD was added (PR #47) as a targeted end-to-end test for the
barriers_definite_control filter, not a parity measurement. Mixing it
into the same pivot as ADMS/BULK/BABL/ELKR muddles the table's
purpose. Readers who want DEAD's diff_pct numbers can find them in
research/bcfishpass_comparison.md's dedicated DEAD subsection.

Restores the pivot's intersect() to the original four-WSG list;
adds a short paragraph below the tables pointing at the research
doc for DEAD detail.

Relates to #48
NewGraphEnvironment added a commit that referenced this pull request Apr 23, 2026
* PWF: archive #44, init #48 (user_barriers_definite bypass)

#47 merged as link 0.6.0 (squash 8eda3fb). #48 follows the same
family — user_barriers_definite rows currently eligible for override
in link; bcfishpass appends post-filter. ELKR active test case (4
defect rows).

Relates to #48

* Drop user-definite from natural_barriers (bcfishpass parity)

Previously .lnk_pipeline_prep_natural() unioned barriers_definite into
natural_barriers. That table is the set passed to lnk_barrier_overrides()
as the 'barriers' argument, which emits per-species override rows when
observations upstream clear the threshold. Net effect: user-identified
definite barriers (EXCLUSION zones, MISC reviewer-added positions) could
be re-opened by historical observations. Confirmed active on ELKR at
pre-fix: 4 rows in working_elkr.barrier_overrides matched
working_elkr.barriers_definite positions (Erickson Creek exclusion +
two Spillway MISC entries).

bcfishpass treats user-definite barriers differently. Each
model_access_*.sql builds its barriers CTE from gradient + falls +
subsurfaceflow (no user-definite); observations and habitat filters
run only on that set. user_barriers_definite rows are then appended
post-filter to the final per-model table via UNION ALL. They always
block, never eligible for observation override.

Matching that here — dropped the INSERT INTO natural_barriers FROM
barriers_definite block. barriers_definite is still consumed separately:

  - lnk_pipeline_break() applies it as a sequential break source, so
    segmentation still places a boundary at each user-definite position
  - lnk_pipeline_classify() UNION ALLs it directly into
    fresh.streams_breaks, so it blocks access gating

Test assertion flipped from 'unions gradient + falls + definite' to
'unions gradient + falls only (no definite)'. Roxygen on the exported
lnk_pipeline_prepare() and on the @nord helper one-liner updated to
reflect the new shape; both note bcfishpass parity.

Relates to #48

* Bump to 0.7.0: NEWS, DESCRIPTION, research doc, vignette, artifacts

Phase 3 of #48. Numerical artifacts regenerated from the post-fix
pipeline; narrative updates document the user-definite bypass.

- DESCRIPTION: Version 0.6.0 -> 0.7.0
- NEWS.md: 0.7.0 entry covering the natural_barriers drop, why
  bcfishpass's post-filter UNION ALL is the correct shape, ELKR shift
  numbers, and the scope-of-fix summary on each WSG.
- inst/extdata/vignette-data/*.rds: regenerated from
  data-raw/vignette_reproducing_bcfishpass.R against the current
  tar_make rollup (46 rows, bit-identical across two full rebuilds;
  digest 50908d234e2131fc0842dc3ab653ae78).
- research/bcfishpass_comparison.md: updates ELKR parity table with
  post-fix numbers and a line of pre-fix context. Adds a row to the
  "Key fixes" table and a new subsection "user_barriers_definite
  bypass (#48)" mirroring the format used for the #44 subsection.
- vignettes/reproducing-bcfishpass.Rmd: corrects the user-identified-
  definite-barriers bullet. Old text said "eligible for per-species
  override via lnk_barrier_overrides when enough upstream observations
  clear the threshold" (accurate to pre-#48 code, wrong vs bcfishpass).
  New text: "always-blocking, always a break position, never eligible
  for observation-based override" with the bcfishpass UNION ALL
  reference.
- data-raw/logs/: 20260423_07 and _08 run logs.
- planning: Phase 1 checkbox in task_plan.md.

Reproducibility verified: two consecutive tar_destroy + tar_make runs
produce bit-identical 46-row rollups.

Relates to #48

* Vignette: drop DEAD from parity pivot tables, note in prose

DEAD was added (PR #47) as a targeted end-to-end test for the
barriers_definite_control filter, not a parity measurement. Mixing it
into the same pivot as ADMS/BULK/BABL/ELKR muddles the table's
purpose. Readers who want DEAD's diff_pct numbers can find them in
research/bcfishpass_comparison.md's dedicated DEAD subsection.

Restores the pivot's intersect() to the original four-WSG list;
adds a short paragraph below the tables pointing at the research
doc for DEAD detail.

Relates to #48

* Revert "Vignette: drop DEAD from parity pivot tables, note in prose"

This reverts commit b8cd442.
NewGraphEnvironment added a commit that referenced this pull request Apr 23, 2026
Archive #46 PWF under planning/archive/2026-04-23-manifest-driven-probes/
with a README summarizing scope + PR.

README.md additions:
- Full-pipeline section — lnk_config() + six lnk_pipeline_* helpers
  showing the bcfishpass-reproduction path, with a pointer to the
  Reproducing bcfishpass vignette and to data-raw/_targets.R.

CLAUDE.md updates:
- Branch moved from adms-comparison to main; version noted at 0.7.0.
- Exported Functions bumped from 10 to 18; added lnk_config under
  Core; added a Pipeline helpers subsection documenting all six
  phase functions + lnk_pipeline_species.
- lnk_barrier_overrides updated to show the control arg + the
  observation_control_apply per-species gate + habitat bypass
  (today's PR #47 landed this).
- Open Issues list trimmed of items closed this week (#16, #38,
  #44, #46, #48). Added #40 and #45 (open). Recently closed
  section added to show the PR trail.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wire barriers_definite_control into lnk_barrier_overrides

1 participant