Skip to content

lnk_pipeline_* helpers: extract six pipeline phases (PR 1 of 3)#41

Merged
NewGraphEnvironment merged 10 commits intomainfrom
38-targets-pipeline
Apr 23, 2026
Merged

lnk_pipeline_* helpers: extract six pipeline phases (PR 1 of 3)#41
NewGraphEnvironment merged 10 commits intomainfrom
38-targets-pipeline

Conversation

@NewGraphEnvironment
Copy link
Copy Markdown
Owner

Summary

PR 1 of 3 for issue #38. Extracts the 635-line data-raw/compare_bcfishpass.R orchestration into six composable pipeline phase helpers in R/, then rewrites the compare script as a 136-line thin orchestrator that calls them in order.

This PR is behavior-preserving — no new pipeline semantics. Subsequent PRs layer _targets.R (#38 PR 2) and retire the old script (#38 PR 3) on top of this foundation.

Six helpers

Canonical signature (conn, aoi, cfg, schema); setup is the only outlier (conn, schema, overwrite). Reads like a recipe:

conn |>
  lnk_pipeline_setup(schema, overwrite = TRUE) |>
  lnk_pipeline_load(aoi, cfg, schema) |>
  lnk_pipeline_prepare(aoi, cfg, schema) |>
  lnk_pipeline_break(aoi, cfg, schema) |>
  lnk_pipeline_classify(aoi, cfg, schema) |>
  lnk_pipeline_connect(aoi, cfg, schema)
Helper Role
lnk_pipeline_setup create <schema>, ensure fresh exists
lnk_pipeline_load crossings + misc + modelled fixes (NONE/OBS → PASSABLE) + PSCIS overrides
lnk_pipeline_prepare falls + definite + control + habitat confirms; gradient barriers + ltree; natural_barriers; barrier skip list; per-model frs_barriers_minimal; base segments into fresh.streams
lnk_pipeline_break observations + habitat + crossings break positions; sequential frs_break_apply honoring cfg$pipeline$break_order
lnk_pipeline_classify access-gating breaks table; fresh::frs_habitat_classify
lnk_pipeline_connect .frs_run_connectivity — per-species cluster + connected_waterbody driven by cfg$parameters_fresh flags

Abstraction decisions

  • Prefix lnk_pipeline_*, not lnk_habitat_* — only one of the six phases (classify) is strictly habitat work. The others are setup, data loading, network prep, segmenting, connectivity.
  • Phase names are verbs — setup → load → prepare → break → classify → connect. Reads like a recipe top to bottom.
  • aoi not wsg — matches fresh convention. Accepts WSG code today; extends to ltree filters / sf polygons / mapsheets as future work without a signature change.
  • Caller picks the schema name — no hidden working_<aoi> convention baked into the helpers. Pass whatever makes sense for your partitioning scheme.
  • Every helper returns conn invisibly — composes via |> and tar_target() alike.
  • Fresh internal .frs_run_connectivity accessed via getFromNamespace — flagged as fragile in docs. Fresh follow-up will expose a stable API.

Other changes in this PR

  • lnk_config() gained cfg$species — parses the rules YAML at load and intersects with cfg$wsg_species presence for per-AOI classify targets. Fixes a species-mismatch bug where parameters_fresh (11 species incl. CT/DV/RB) diverged from rules YAML (8 species).
  • inst/extdata/configs/bcfishpass/config.yaml — added barriers_definite to break_order (was missing).
  • .lnk_quote_literal() added to R/utils.R — safe single-quote SQL literals.

Verification

ADMS end-to-end with the new helper composition (data-raw/logs/20260422_03_tar-wireup-verify_ADMS.txt):

species  habitat   ours    ref diff_pct
     BT spawning 368.13 361.71      1.8
     BT  rearing 666.96 674.19     -1.1
     CH spawning 278.92 277.61      0.5
     CH  rearing 315.42 308.23      2.3
     CO spawning 316.08 310.98      1.6
     CO  rearing 351.01 351.19     -0.1
     SK spawning  88.83  85.70      3.7
     SK  rearing 229.85 229.85      0.0
All within 5%: TRUE

Spawning values identical to the research doc. Rearing within ~1% of prior values — ordering variance from id_segment ties on coincident (blk, drm) rows. Accepted and documented in planning/active/findings.md.

Test plan

  • 110+ new tests across the six helpers — input validation, SQL shape, branching, species derivation, break_order honored, unknown-source error
  • Full link suite passing: 251 tests
  • lintr clean on new R files
  • ADMS end-to-end behavioral verification — within 5% of bcfishpass reference
  • /code-check on the full branch diff — clean

Constraints documented for PR 2

fresh.streams is a single shared schema, so parallel AOI runs on one host collide. For PR 2's _targets.R, the initial design will use crew_controller_local(workers = 1) to serialize. Per-AOI fresh output paths would need a fresh upstream change — filed as a follow-up.

Drift lesson

A 0.4 pp shift in BT rearing diff (-0.7 → -1.1) looked like a refactor bug at first. Running the legacy 635-line script on the same DB reproduced my helpers' numbers exactly — drift was entirely from env state changes between the research doc's run (2026-04-15) and today (fwapg / bcfishobs / tunnel ref). Fix: compare script now stamps link/fresh versions + git SHA + bcfishobs + bcfishpass reference row counts at the top of every run. Baseline log committed under data-raw/logs/20260422_04_verify_stamped_ADMS.txt.

Full config-CSV provenance + runtime lineage tracking filed as link#40 — will expand lnk_stamp() (#24) into the canonical source of that lineage.

Fixes part of #38 (PR 1 of 3)

Relates to NewGraphEnvironment/sred-2025-2026#24

- planning/archive/2026-04-22-lnk-config/ — shipped in link 0.2.0 via PR #39
- planning/active/ — new PWF for _targets.R pipeline (#38): targets + crew orchestration, single-host first, distributed swap deferred until rtj Phase 4

Relates to #38
First of six pipeline phase helpers. Creates the namespaced working schema for a run (e.g. `working_bulk` so parallel WSG runs on the same host do not collide) and ensures `fresh` exists for downstream output tables.

Mocked tests cover identifier validation and SQL shape — CREATE SCHEMA / DROP SCHEMA behavior is Postgres's responsibility, not ours to re-verify.

Relates to #38
Naming rethink before building the remaining five helpers — better to lock the API once than rename six functions later.

- Prefix: `lnk_pipeline_*` replaces `lnk_habitat_*`. Only one of the six phases (classify) is strictly habitat work; the others are setup, loading, network prep, segmenting, connectivity. `pipeline` describes what they all are — building blocks composed by `_targets.R` or manual scripts.
- Phase names read as verbs: setup → load → prepare → break → classify → connect. Reads like a recipe.
- Param: `aoi` replaces `wsg` in the canonical signature `(conn, aoi, cfg, schema)`. `wsg` hardcoded one partition scheme (bcfishpass); `aoi` matches fresh convention (accepts a WSG code today, extends to ltree filters / sf polygons / mapsheets later).
- `setup` stays a signature outlier: `(conn, schema, overwrite)`. No aoi/cfg needed — it just makes schemas.

Renames: `lnk_habitat_setup_schema()` → `lnk_pipeline_setup()`. Updates tests, Rd, NAMESPACE, PWF, and issue #38 body to match.

Relates to #38
Second of six pipeline phase helpers. Loads the crossings CSV from fresh, filters to AOI, appends misc crossings, applies modelled fixes (NONE/OBS → PASSABLE) and PSCIS barrier status overrides.

Scope is tighter than the original "load_inputs" plan: anything other than anthropogenic crossings (falls, user-identified definite barriers, observation exclusions, habitat classification) moves to `prepare` where it is actually consumed. That keeps each phase's concern clean — load handles crossings; prepare handles network + barriers.

Split into three internal `@noRd` helpers (`.lnk_pipeline_load_crossings`, `.lnk_pipeline_apply_fixes`, `.lnk_pipeline_apply_pscis`) so each unit is short and independently testable via `local_mocked_bindings()`.

12 tests covering input validation, update SQL shape, and no-op branches for empty/missing override entries. Full link suite at 169 passing.

Relates to #38
Third of six pipeline phase helpers. Thin orchestrator over six `@noRd` sub-helpers:

- `prep_load_aux` — falls (from fresh), user definite barriers, barriers-definite control, expert habitat confirmations from the config bundle
- `prep_gradient` — detect gradient barriers on raw FWA via `fresh::frs_break_find()`, prune rows where the control table says `barrier_ind = false`, enrich with `wscode_ltree` / `localcode_ltree` for `fwa_upstream()` joins
- `prep_natural` — build natural_barriers = gradient ∪ falls ∪ definite
- `prep_overrides` — compute barrier skip list via `lnk_barrier_overrides()` against observations + habitat confirms
- `prep_minimal` — per-model (bt, salmon, st, wct) barrier tables, each reduced via `fresh::frs_barriers_minimal()` from fresh 0.14.0, unioned into `gradient_barriers_minimal`
- `prep_network` — load fresh.streams from FWA with channel_width + stream_order_parent joins + GENERATED gradient/measures/length + a unique `id_segment`

Adds `.lnk_quote_literal()` to utils.R — doubles single-quotes for safe SQL literal interpolation (used by the AOI and schema checks).

Code-check surfaced one design constraint worth flagging now: `fresh.streams` is a shared schema, so parallel AOI runs on one host would collide. Documented in findings.md with mitigation options for PR 2 (leaning toward `workers = 1` + revisit fresh patches later).

31 new tests — input validation, SQL shape assertions, quote_literal semantics, per-model minimal reduction structure. Full link suite at 200 passing.

Relates to #38
Fourth of six pipeline phase helpers. Splits into four internal `@noRd` sub-helpers:

- `break_obs` — build `observations_breaks` from the bcfishobs observations table, filtered by AOI and by the species set from `cfg$wsg_species` (with CT → CT/CCT/ACT/CT/RB expansion to match bcfishobs coding). Data-error and release-exclusion rows from `cfg$observation_exclusions` are filtered out via a temp `<schema>.obs_exclusions` subselect.
- `break_habitat_endpoints` — union DRM and URM from `user_habitat_classification`. Creates an empty table when the habitat table wasn't loaded (no config confirms) so the break step is a clean no-op.
- `break_crossings` — crossing positions for segmentation.
- `break_reassign_id` — reassigns unique `id_segment` after each round via `row_number()` so downstream rounds see contiguous IDs.

Main function runs `frs_break_apply` sequentially over `cfg$pipeline$break_order` (default: observations → gradient_minimal → barriers_definite → habitat_endpoints → crossings). Unknown source names error clearly.

13 new tests — input validation, obs species expansion, exclusions filter, habitat empty/non-empty paths, break_order honored from config, unknown source errors. Full link suite at 229 passing.

Relates to #38
Fifth and sixth of six pipeline phase helpers. All six are now in place.

`lnk_pipeline_classify` — builds the access-gating `fresh.streams_breaks` table (gradient FULL set + falls + definite barriers + crossings, each WSG-filtered) then calls `fresh::frs_habitat_classify()` with the rules YAML, per-species parameters, and barrier overrides from the config bundle. Species default derives from `cfg$parameters_fresh$species_code` intersected with per-AOI presence in `cfg$wsg_species`.

`lnk_pipeline_connect` — runs fresh's `.frs_run_connectivity` (per-species `frs_cluster` + `.frs_connected_waterbody`) driven by `cfg$parameters_fresh` flags. Accesses fresh's internal orchestrator via `getFromNamespace` — fragility flagged in docs, fresh follow-up will export a stable API.

22 new tests covering input validation, species intersection, access-gating breaks SQL shape, no-species error path. Full link suite at 251 passing.

Relates to #38
Rewrites the 635-line comparison script as a thin orchestrator (136 lines) that:
- Loads the bcfishpass config bundle via lnk_config()
- Calls lnk_pipeline_setup → load → prepare → break → classify → connect in order
- Diffs the resulting fresh.streams_habitat against bcfishpass.habitat_linear_* on the read-only tunnel reference

ADMS end-to-end run (~67 s): all species within 5% of bcfishpass. Spawning values identical to the research doc. Rearing values within ~1% of the prior values — acceptable ordering variance from id_segment tie-breaking on segments with coincident (blk, drm). Verification log committed under `data-raw/logs/`.

Also along the way:

- **cfg$species** added to `lnk_config()` — parses the rules YAML at load time and exposes the classified-species list. `lnk_pipeline_classify_species()` now intersects against that 8-element list rather than `parameters_fresh$species_code` (11 rows including CT/DV/RB which bcfishpass does not model). Previous derivation tried to query `bcfishpass.habitat_linear_ct` and crashed.
- **barriers_definite** added to `config.yaml` pipeline `break_order` — was missing. No numeric change on ADMS (no definite barriers there) but matches legacy script behavior on other WSGs.

Relates to #38
Adds a header to every compare run capturing:
- link version + git SHA
- fresh version
- wall-clock timestamp
- bcfishobs observation count for the AOI
- bcfishpass reference streams row count for the AOI

Lesson from today's session: 0.4 pp drift in BT rearing looked like a refactor regression. Legacy script on same DB produced identical numbers — drift was from env state changes (fwapg/bcfishobs/tunnel ref) between the research doc's run (2026-04-15) and today. Without a stamp, "what changed between these two runs" is unanswerable.

This is a minimal runtime stamp. Full lineage tracking (CSV provenance + drift detection) is filed as #40 and will expand `lnk_stamp()` (#24) into the canonical source.

Also commits a stamped verification log under `data-raw/logs/20260422_04_verify_stamped_ADMS.txt` — becomes the reference baseline for future drift checks.

Relates to #38, #24, #40
@NewGraphEnvironment NewGraphEnvironment merged commit 6cece3b into main Apr 23, 2026
1 check passed
@NewGraphEnvironment NewGraphEnvironment deleted the 38-targets-pipeline branch April 23, 2026 00:37
NewGraphEnvironment added a commit that referenced this pull request Apr 23, 2026
…) (#47)

* Vignette: fwapg prerequisites, user-definite barriers break bullet

Vignette additions:

- New "Prerequisites" section names fwapg as the source of the stream-network tables (ltree-typed watershed codes) and the traversal SQL functions (fwa_upstream, fwa_downstream, fwa_watershedatmeasure) the pipeline reads. bcfishobs marked as optional-but-recommended for observation overrides. Comparison tunnel called out as a validation convenience, not a runtime requirement.
- DAG first line changed from "FWA streams (raw)" to "FWA stream network (via fwapg, ltree-enriched)" to match the prerequisites note.
- Observations bullet now explicitly says overrides apply to gradient barriers, falls, AND user-definite barriers — plus an inline link to fwa_upstream since that's the SQL function driving the count.
- New "User-identified definite barriers" bullet in the break-positions list, with inline links to user_barriers_definite.csv in bcfishpass and the mirror in link. Treated the same as falls: always-blocking, always a break position, eligible for per-species override via lnk_barrier_overrides.

* Archive #38 PWF, init #44 PWF for barriers_definite_control wiring

- planning/archive/2026-04-23-targets-pipeline/ — three-PR arc closed (PRs #41/#42/#43 shipping 0.3.0 → 0.5.0). Bit-identical rollups across three consecutive tar_make runs.
- planning/active/ — new PWF for #44. Approach per /Users/airvine/.claude/plans/stateful-hopping-feather.md: fix latent ctrl_filter bug in lnk_barrier_overrides (current filter treats any control row as blocking; docstring says only barrier_ind = TRUE blocks) + wire control through from .lnk_pipeline_prep_overrides via manifest-driven gating (cfg$overrides$barriers_definite_control, not information_schema probe). Follow-up issue scoped to migrate remaining probes to manifest-driven gating.

Relates to #44

* Fix lnk_barrier_overrides control filter semantics

The `control` parameter's documented contract is "barriers in this table with `barrier_ind = TRUE` cannot be overridden." Previous implementation used `LEFT JOIN control c + WHERE c.blue_line_key IS NULL`, which treated ANY control row as blocking — including `barrier_ind = FALSE` rows (which mark "not actually a barrier" positions, not "do not override" positions).

Replaces the LEFT-JOIN/IS-NULL pattern with a `NOT EXISTS` subquery filtered to `barrier_ind::boolean = true`. Fix:

- Correctly blocks override only when at least one matching control row has `barrier_ind = TRUE`. Mixed TRUE/FALSE within the 1 m position tolerance resolves to "blocked" (TRUE wins).
- Removes a latent row-multiplication issue in the observation-count aggregation: the prior LEFT JOIN could multiply rows before `GROUP BY ... HAVING count(observation_key) >= threshold`, under-counting the threshold when multiple control rows matched one barrier. `NOT EXISTS` is a WHERE-clause subquery — no multiplication.

Applies identically to the observation and habitat override paths (same `ctrl_where` / `ctrl_filter` pair used in both INSERTs).

`.lnk_pipeline_prep_overrides` still calls `lnk_barrier_overrides()` with `control = NULL` — behavioural wiring lands in Phase 2 of this PR.

New `tests/testthat/test-lnk_barrier_overrides.R` — 11 mocked SQL tests covering:

- NOT EXISTS + `= true` clauses appear when `control` is non-NULL (observation path)
- Same when habitat path is taken (observation path disabled)
- Neither clause appears when `control = NULL`

Full suite: 269 PASS.

Relates to #44

* Wire barriers_definite_control through prep_overrides (manifest-gated)

`.lnk_pipeline_prep_overrides` now passes `<schema>.barriers_definite_control` to `lnk_barrier_overrides` as the `control` argument whenever `cfg$overrides$barriers_definite_control` is declared by the config manifest. The manifest key is the direct contract — same pattern used for the other override roles in the pipeline. Phase 1's NOT EXISTS filter is what honours it at the SQL layer.

Also fixes an asymmetric-gating bug caught in code review: `.lnk_pipeline_prep_load_aux` previously only wrote `<schema>.barriers_definite_control` when both the manifest declared the key AND the current AOI had matching rows. With Phase 2's manifest-driven gate, that meant AOIs where the manifest was declared but no rows matched would reference a table that was never created, and `lnk_barrier_overrides`'s NOT EXISTS subquery would raise "relation does not exist." Mirrored the `barriers_definite` pattern — whenever the manifest declares the key, write a schema-valid table (empty or populated). The NOT EXISTS against an empty table is always TRUE, so a zero-row AOI correctly blocks nothing.

Two new tests in `test-lnk_pipeline_prepare.R` mock `lnk_barrier_overrides` and assert the resolved `control` arg:

- manifest declares key → `control = "<schema>.barriers_definite_control"`
- manifest omits key → `control = NULL`

Full suite: 271 PASS.

End-to-end verification via `tar_make()` follows in the next commits (rollup regeneration, research doc + NEWS updates, vignette artifact regen).

Relates to #44

* Gate barriers_definite_control per-species via observation_control_apply

Phase 2 applied the NOT EXISTS control filter across every species in the
`params` loop. A post-Phase-2 `tar_make()` drifted 11–22pp *away* from
bcfishpass on ADMS and BABL (BULK and ELKR unchanged — their TRUE control
rows have no upstream observations).

Root cause: bcfishpass applies the control filter only in
`model_access_ch_cm_co_pk_sk.sql` and `model_access_st.sql`. The BT, WCT,
and CT/DV/RB access models don't reference `user_barriers_definite_control`
at all. Their observations are allowed to override anadromous-blocking
falls because residents routinely inhabit reaches upstream of such falls —
post-glacial headwater connectivity populated many upper basins before the
present channel dropped below its late-Pleistocene profile, and residents
don't require ocean return.

Species scope as a parameter, not a hard-coded list:

- New column `observation_control_apply` in
  `inst/extdata/configs/bcfishpass/parameters_fresh.csv`. Logical.
  TRUE for CH, CM, CO, PK, SK, ST; FALSE for BT, WCT; NA for CT, DV, RB
  (which have no `observation_threshold` either — the flag is
  inapplicable).

- `lnk_barrier_overrides()` reads `params$observation_control_apply[i]`
  inside the per-species loop. `isTRUE(as.logical(...))` normalises NA,
  missing-column, character, and unexpected inputs to FALSE — the
  resident-safe default. When FALSE, the NOT EXISTS clause is omitted
  from both the observation and habitat override paths.

Two concerns, two locations: `cfg$overrides$barriers_definite_control`
remains the table-level contract (is the control CSV declared for this
config?); the new column is the application-level contract (does this
species honour it?). Rules YAML stays focused on habitat classification.

Tests (test-lnk_barrier_overrides.R, +3 cases):

- `observation_control_apply = FALSE` → no NOT EXISTS / no
  `c.barrier_ind::boolean` clause in the rendered SQL.
- `observation_control_apply = NA` → same — resident-safe default.
- Mixed params (BT = FALSE, CH = TRUE) → per-species gating confirmed
  by inspecting the two emitted INSERT statements.

Full suite: 279 PASS.

Relates to #44

* Ungate habitat override path from control (bcfishpass parity)

Phase 2a gated the control filter per-species but left a second defect:
`ctrl_filter` was applied to BOTH the observation-path INSERT and the
habitat-path INSERT in `lnk_barrier_overrides()`. bcfishpass's
`model_access_ch_cm_co_pk_sk.sql` has separate CTEs:

  obs_upstr — joins observations, LEFT JOINs control, filters
              `bc.barrier_ind IS NULL` (control-gated)
  hab_upstr — joins habitat only, no control join at all

The biology: expert-confirmed habitat is higher-trust than observations.
By the time a reviewer has confirmed upstream habitat for a species at
a given position, they have already considered the barrier's passability
and the control-table designation. Observations are noisier and may be
misattributed, so the control table vetoes them; habitat is a direct
assertion that a species uses the reach upstream, so it stands.

The old behaviour under-overrode bcfishpass: on ADMS and BABL, TRUE
control positions that bcfishpass overrode via habitat stayed as
barriers in link, cutting 11-22pp of accessible spawning/rearing km on
CH, CM, CO, PK, SK, ST. Removing the habitat-path gate brings all four
WSGs back into parity.

Changes:

- `lnk_barrier_overrides()`: the habitat-path INSERT no longer
  interpolates `ctrl_where` / `ctrl_filter`. The observation-path INSERT
  is unchanged (still gated by `observation_control_apply` per species).
- Flipped the existing test "control filter applies to habitat overrides
  too" to its corrected form: "habitat override path is NOT gated by
  control (bcfishpass parity)". Filters captured SQL to the habitat
  INSERT only and asserts absence of the NOT EXISTS clause.
- Roxygen on `@param control` documents that habitat confirmations bypass
  the control table entirely.

`devtools::test()`: 279 PASS.

`tar_make()` across ADMS, BULK, BABL, ELKR — all 34 rollup rows within 5%
of bcfishpass reference. Exact numeric match to pre-fix baseline (no
behaviour change vs pre-fix on these 4 WSGs, because habitat
classifications already covered the control-table positions). Filter is
now correctly applied semantically, not relying on coincidence with the
habitat path.

Relates to #44

* comms(→link): M1 verified as a ready R-worker host; crew.cluster 0.4.0 API gap

7/7 checks pass on M1, 1.1s SSH+Rscript round-trip, NG packages
load, .Renviron propagates. Includes a heads-up that
crew.cluster 0.4.0 only exports HPC-scheduler controllers —
no generic ssh one exists, so PR 3-of-3's launcher needs one of
custom crew_class_launcher, clustermq, or bespoke mirai+ssh.

Landing on 44-barriers-definite-control per soul 2026-04-23
branch-landing ruling (Policy A, commit on peer's current branch
with --only flag).

* Add DEAD as the end-to-end test WSG for barriers_definite_control

Phase 2b rollup was numerically identical to pre-fix on all four parity
WSGs (ADMS/BULK/BABL/ELKR) because none of their TRUE control rows
actually exercise the filter. Every row is rescued by either the
observation threshold or the habitat path. The filter semantics were
verified at the unit level but not end-to-end on real data.

Province-wide hunt for TRUE control rows where the filter actually fires
(obs >= threshold upstream AND zero habitat coverage) produced four
candidates: CAMB (11 obs), DEAD (6), LFRA (16 but too large), SALM (7).
Picked DEAD (Deadman River): smallest runtime, single TRUE control row at
FALLS (356361749, 45743), six CH/CM/CO/PK/SK obs upstream, zero habitat
classification for those species upstream. The tight "just above
threshold" condition means pre-fix link would have overridden the fall
(fall skipped, upstream accessible); post-fix link correctly blocks the
override on anadromous species.

bcfishpass reference: the fall at (356361749, 45743) IS present in
bcfishpass.barriers_ch_cm_co_pk_sk post-override. bcfishpass kept it as
a barrier via the control filter. link now matches.

Direct verification on working_dead.barrier_overrides:

- At the control position, exactly one species row is emitted: BT.
  observation_control_apply = FALSE for BT, so the NOT EXISTS clause
  was skipped and the observation count (>= threshold) produced the
  override. Matches bcfishpass BT model (no control join).
- CH, CM, CO, PK, SK, ST all blocked at that position by the filter.
  Matches bcfishpass ch_cm_co_pk_sk and st models (control join
  present).

DEAD rollup: all six species within 3% of bcfishpass reference.

Incremental tar_make: comparison_ADMS/BULK/BABL/ELKR cached from the
Phase 2b run, only comparison_DEAD + rollup rebuilt (42s).

Log files captured:

- 20260423_01_tar_make_post_44_phase2.txt (Phase 2, pre per-species gate)
- 20260423_02_tar_make_phase2a.txt (Phase 2a, species gate, still bad on CH/CO/SK/ST)
- 20260423_03_tar_make_phase2b.txt (Phase 2b, habitat path ungated, baseline rollup)
- 20260423_04_tar_make_repro.txt (Phase 2b reproducibility)
- 20260423_05_tar_make_dead.txt (incremental with DEAD added)

Relates to #44

* Add tar_make run logs for #44 phases 2, 2a, 2b, repro, DEAD

Evidence trail for the barriers_definite_control investigation.
Referenced from planning/active/task_plan.md.

Relates to #44

* Bump to 0.6.0: NEWS, DESCRIPTION, research doc, vignette, artifacts

Phase 4 of #44. Numerical artifacts regenerated from the 5-WSG
post-DEAD pipeline; narrative updates document the control-filter
wiring and the split between parity WSGs and the end-to-end test WSG.

- DESCRIPTION: Version 0.5.0 -> 0.6.0
- NEWS.md: 0.6.0 entry covering the control-filter wiring, per-species
  gating via observation_control_apply, habitat-path bypass, manifest-
  gated pipeline wiring, asymmetric-gating fix in prep_load_aux, and
  DEAD added as the end-to-end validation WSG.
- inst/extdata/configs/bcfishpass/README.md: 5 WSGs; note that DEAD
  is the control-filter end-to-end test, the other four are parity.
- inst/extdata/vignette-data/*.rds: regenerated from data-raw/
  vignette_reproducing_bcfishpass.R against the current tar_make
  rollup (46 rows, bit-identical across two full rebuilds; digest
  210c3f8254c47ac88573a80d96a2701e).
- research/bcfishpass_comparison.md: adds the DEAD parity table, a
  section explaining why DEAD is the filter test WSG, a row in the
  "Key fixes" table, and a subsection describing the three-part fix
  (observation-path NOT EXISTS, per-species gate, habitat-path bypass).
  DAG rollup node updated to 46 rows; tar_map includes DEAD.
- vignettes/reproducing-bcfishpass.Rmd: 5 WSGs; names DEAD's role;
  pivot tables include DEAD column via intersect() on names(w).

Reproducibility verified: two consecutive `targets::tar_destroy(ask =
FALSE); targets::tar_make()` runs on the same DB state produce
bit-identical rollups (same SHA256 via digest::digest()).

Relates to #44

* PWF: update progress for #44 Phase 4 + follow-up issue #46 + PR #47

Relates to #44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant