Decouple bcfp compare from link pipeline run (v0.37.0)#173
Merged
Conversation
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Decouple bcfp compare from link pipeline run — split lnk_compare_wsg into lnk_pipeline_run (modelling) + lnk_compare_rollup (comparison), rewrite run_provincial_parity.R resume check to use PG state, add --force flag. Mapping_code lens and family-shape follow-ups captured as out-of-scope in the plan. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New exported function chains setup → load → prepare → crossings → break → classify → connect → species → persist_init → barriers_unify → persist into a single per-WSG call. Modelling-only — writes <persist_schema>.streams + per-species habitat + barriers and drops the working schema on exit. This is the modelling boundary for the link package. Comparison vs bcfishpass (or any future reference) will live in lnk_compare_rollup (next phase) and read the persisted state. Splits the bundled lnk_compare_wsg() into independent halves so a re-pipeline and a re-compare can happen separately. barriers_unify is promoted from gated-behind-with_mapping_code to always-on so <persist_schema>.barriers is canonical PG state for any future reader. 16 tests pass; mirrors test-lnk_compare_wsg.R composition pattern. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
New exported function reads <persist_schema>.streams + per-species streams_habitat_<sp> wide tables, queries the reference database, and returns the long-format rollup tibble. Reads-only — no PG writes, no working schema. Caller persists return value (e.g. saveRDS) if a side-artifact is wanted. Link-side queries rewritten as per-species UNION ALL across wide streams_habitat_<sp> tables (vs the working schema long-format streams_habitat with a species_code column). Species auto-discovered from PG via information_schema.tables probe filtered to streams_habitat_<sp> with rows for the requested WSG. Species-suffix filter regex ^[a-z]+$ defends against stray non-conforming table names being interpolated into raw DDL. Reference dispatch + long-format assembly reused from existing .lnk_compare_wsg_rollup_reference + .lnk_compare_wsg_assemble_rollup. Bit-identical-rollup live-DB validation deferred to Phase 7 smoke matrix where the full cache-state matrix runs against a real persisted WSG. 15 new tests pass; full suite 1164 PASS / 0 FAIL. /code-check clean after one round-1 fragility fix. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Internal helper in R/utils.R that returns TRUE when <persist_schema>.streams contains rows for the given WSG, FALSE otherwise. Two-stage probe: information_schema.tables for table existence then LIMIT 1 row check. The orchestrator loop (run_provincial_parity.R, rewrite in Phase 6) will use this as the canonical resume signal — PG state is authoritative, RDS files are diagnostic side-artifacts. Replaces the buggy RDS-file-existence check that silently skipped pipeline runs when PG was empty but stale RDS files existed (the 2026-05-14 16-WSG "16 OK but only 12 in PG" incident). 6 new test cases mock DBI::dbGetQuery to cover the three outcome states (no table / table with WSG rows / table without WSG rows) plus arg validation. 48 tests PASS in test-utils.R. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…re_rollup Both paths now delegate to the decoupled functions from Phase 1+2. The mapping_code branch additionally calls .lnk_compare_wsg_mapping_code on top of the persisted state, forcing cleanup_working = FALSE on the pipeline call so the working schema survives the mapping_code build. Function body shrinks from ~155 to ~75 lines. All inline phase orchestration moved into lnk_pipeline_run. The wrapper is now a genuine convenience-bundling layer over the new decoupled boundary, not a parallel implementation. Behavior shift: active-species set is discovered from PG state (post-persist via lnk_compare_rollup) rather than cfg-intersection (pre-persist via the old in-function lnk_pipeline_species call). Equivalent on a fresh single-call run; future-proofs callers against config drift between modelling and comparison. Composition tests rewritten to mock lnk_pipeline_run + lnk_compare_rollup at the new boundary instead of per-phase. Arg-validation tests unchanged. test-lnk_compare_wsg.R: 55 PASS. Full suite: 1172 PASS / 0 FAIL. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The single bundled wrapper becomes two compartmentalized scripts that mirror the R-library decoupling from Phases 1-4: - data-raw/wsg_pipeline_run.R — modelling-only. Opens local fwapg conn, stamps via lnk_stamp, calls link::lnk_pipeline_run(). Returns invisible. - data-raw/wsg_compare.R — compare-only. Opens local + reference conns, calls link::lnk_compare_rollup(), renames ref_value to bcfishpass_value when reference is bcfishpass. Returns rollup tibble. Created via git mv of the old bundled file so log --follow preserves history. Four callers updated to the two-call pattern: - data-raw/_targets.R — tar_map target bodies call both sequentially. - data-raw/regress_dams_isolation.R — dams flag now threads to wsg_pipeline_run; wsg_compare reads persisted state. - data-raw/rule_flexibility_demo.R — same two-call pattern. - data-raw/run_provincial_parity.R — sources the two new files, inlines wsg_pipeline_run + wsg_compare in the loop body. The with_mapping_code = TRUE branch still routes through the bundled link::lnk_compare_wsg() wrapper (mapping_code decoupling deferred per #168 scope). /code-check round 1 caught a real connection-leak bug: on.exit() at top-level in a script (not inside a function) binds to globalenv and never fires per-iteration. The mapping_code branch would have leaked 2 conns per WSG over ~225 WSGs and blown postgres max_connections. Fixed by wrapping the branch in an anonymous IIFE so on.exit attaches to a real call frame. Round 2 clean. Phase 6 will rewrite the resume-check logic in run_provincial_parity.R to use PG state rather than RDS file existence; the file-existence cache-skip is intentionally unchanged here. Full suite: 1172 PASS / 0 FAIL. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rewrites the orchestrator's resume gate. PG state is now the canonical
signal of whether the modelling pipeline ran for a given WSG; the RDS
file is a diagnostic side-artifact, not the source of truth.
Resume logic — four branches:
force → re-run pipeline + compare unconditionally
fully cached → PG has rows AND RDS exists and isn't an error stub
(AND has mapping_code if --with-mapping-code is set);
skip
pipeline only → PG has rows but rollup missing/stub; re-run compare
only (~1-2s)
missing → PG empty for this WSG; run pipeline + compare
Closes the 2026-05-14 incident where 4 of 16 WSGs (FINA, CRKD, INGR,
MESI) had stale RDS files from prior failed attempts and were silently
skipped while PG was empty — the dispatch reported "16 OK" but only
12 had actually populated fresh_default.streams.
New CLI flag --force bypasses both checks for a clean re-run, e.g.
after a fwapg refresh or bcfp tunnel rebuild has shifted external
state.
Two new local helpers:
- .is_error_stub(rds_path) — TRUE iff RDS holds a per-WSG error stub
(list(error=..., elapsed_s=...)); invalidates the cache so the next
dispatch retries the WSG.
- .rollup_has_mapping_code(rds_path) — TRUE iff RDS holds a list with
$mapping_code; lets the resume gate force a re-run when the cache
has a bare rollup but the current run wants the mapping_code lens.
Script-level probe_conn opens once at the top for resume probes; per
-WSG wsg_pipeline_run / wsg_compare functions still open + close
their own short-lived conns.
Mapping_code branch (--with-mapping-code) still routes through the
bundled link::lnk_compare_wsg() flow — mapping_code decoupling
deferred per #168 scope.
/code-check round 1 clean (one dead assignment removed). Full suite
1172 PASS / 0 FAIL.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Ran the 4 cache-state smoke matrix on DEAD WSG against isolated schema fresh_smoke168. All four states behaved as designed: A empty 57s pipeline + compare fired B pipeline-cached 9s [compare-only] fired (~6x speedup) C fully cached 2s (cached, skip) no-op D --force 56s both re-fired regardless of cache Post-pipeline PG state matched canonical: 12,301 rows for DEAD in fresh_smoke168.streams (matches fresh.streams count), 6 per-species habitat tables, barriers table populated (confirms barriers_unify always-on promotion). Schema dropped after smoke; canonical fresh.* untouched. State B vs A demonstrates the resume gate value: keeping pipeline PG state and only re-running compare cuts the cycle from 57s to 9s. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Decouple bcfp comparison from the modelling pipeline. Closes #168. The link package deliverable — the per-WSG model in <persist_schema> .streams + per-species habitat + barriers — now runs independently of any comparison framework. Comparison vs bcfishpass (or any future reference) is a diagnostic overlay that reads the persisted state and never gates whether the model itself ran. New exports: - lnk_pipeline_run — modelling-only umbrella; chains 7 lnk_pipeline_* phases plus persist_init + barriers_unify + persist. - lnk_compare_rollup — comparison-only; reads persisted state + reference, returns long-format rollup. Reference-agnostic. Refactored: - lnk_compare_wsg is now a thin wrapper over both. with_mapping_code path retained as bundled flow (decoupling deferred). Operational: - data-raw/compare_bcfishpass_wsg.R split into wsg_pipeline_run.R + wsg_compare.R. - run_provincial_parity.R resume gate now uses PG state via .lnk_wsg_persisted, not RDS file existence. Adds --force flag. Closes the 2026-05-14 incident (silent skips due to stale error stubs). Smoke matrix on DEAD WSG validated all four cache states: empty (57s) → pipeline-cached (9s, ~6x speedup) → fully cached (2s skip) → --force (56s). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Decouple the bcfp comparison from the link modelling pipeline. Closes #168.
The link package deliverable — the per-WSG model in
<persist_schema>.streams+ per-species habitat + barriers — now runs independently of any comparison framework. Comparison vs bcfishpass (or any future reference) is a diagnostic overlay that reads the persisted state and never gates whether the model itself ran.lnk_pipeline_run()(modelling umbrella) +lnk_compare_rollup()(reference-agnostic comparison reader).lnk_compare_wsg()refactored as a thin wrapper.data-raw/run_provincial_parity.Rresume check now uses PG state vialink:::.lnk_wsg_persisted(). RDS files become diagnostic side-artifacts. Closes the 2026-05-14 incident where 4 of 16 WSGs were silently skipped because of stale error-stub RDS. New--forceCLI flag bypasses all caching.data-raw/split.compare_bcfishpass_wsg.R→wsg_pipeline_run.R+wsg_compare.R. 4 callers updated to the two-call pattern.Smoke matrix (Phase 7 — live DB on DEAD WSG / isolated schema)
[compare-only](~6x speedup)(cached, skip)no-op--forceOut of scope (filed as follow-ups in NEWS + archived task_plan)
lnk_compare_mapping_codeas its own family export (promoteswith_mapping_code = TRUE).lnk_compare_wsgtolnk_compare_runfamily rename.data-raw/script renames (stay in Provincial run autonomy: rename scripts to noun_verb + single 'approve once, builds end-to-end' wrapper #172).Test plan
devtools::test()— 1172 PASS / 0 FAIL.devtools::check()— 0 errors; 3 warnings + 2 notes all pre-existing on main (no regression)./code-checkclean across all 6 R+data-raw commits (one round-1 fix landed: connection leak from top-levelon.exitin mapping_code branch).Related Issues
Generated with Claude Code