Study-area mapping_code parity (tunnel-free) + cheap post-consolidate recompute (#175, #205)#206
Merged
NewGraphEnvironment merged 18 commits intoMay 26, 2026
Conversation
… cartesian New export lnk_compare_mapping_code() — segment-level mapping_code parity that reads the bcfp reference from the LOCAL snapshot fresh.streams_vw_bcfp (no :63333 tunnel, no conn_ref by default). Diffs <persist>.streams_mapping_code vs the snapshot on (blue_line_key, downstream_route_measure) per WSG-active species. .lnk_compare_wsg_mapping_code_diff now delegates; shared merge/match in .lnk_mc_diff. Tunnel path kept (pass conn_ref) for back-compat. Caught + fixed a real id_segment-collision bug: id_segment is per-WSG (80,555 distinct / 1.5M persist rows), unique only on the PK (id_segment, watershed_group_code). Joining persist tables on id_segment alone is a ~22x cartesian. Fixed lnk_compare_rollup's 3 habitat joins to the full PK (PARS BT spawning_km 36,820 -> 1,681). Added WSG-active species resolution so absent species (link "" vs bcfp NULL) don't register a spurious 0% match. Root fix (globally-unique position-derived id_segment, bcfp-style) filed as #203. Live PARS BT 98.95% reproduced tunnel-free; 1216 tests pass (lone fail is the env-only db_conn tunnel test). /code-check clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… family lnk_compare_wsg(mapping_code = TRUE) now diffs against the LOCAL bcfp snapshot (fresh.streams_vw_bcfp) via lnk_compare_mapping_code with no conn_ref — the mapping_code lens is tunnel-free. conn_ref is still required for the rollup (lnk_compare_rollup needs bcfp habitat_linear, not in the snapshot). Species now auto-resolve to the WSG-active set rather than a hardcoded 8. Removed the now-dead .lnk_compare_wsg_mapping_code_diff helper (merge/match lives in .lnk_mc_diff); fixed the lnk_mapping_code doc ref. data-raw/wsg_compare.R: added wsg_compare_mapping_code() — tunnel-free (local conn only, no PG_PASS_SHARE/:63333). This is the per-segment mapping_code compare the orchestrator will run on the dispatcher after consolidate (cyphers just run + persist). Verified live: PARS BT 98.95% with PG_PASS_SHARE unset. Composition test repointed to mock lnk_compare_mapping_code. 93 compare / 1216 total tests pass (lone fail = env-only db_conn). /code-check clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3a: make persist/consolidate host- and species-count-agnostic so the 3-WSG smoke's cross-host wide-table drift can't recur. - schema_consolidate.R: COPY by shared column name (runtime intersection, dest ordinal order) instead of positional SELECT-star / FROM STDIN. Handles hosts whose streams_access/streams_mapping_code carry different species column sets. Nothing hardcoded (cols/species/host discovered at runtime). - cypher_prep.sh: seed lnk_persist_init from cfg$species (mirrors lnk_pipeline_run.R:157), not parameters_fresh (11 sp incl CT/DV/RB) - removes the drift at source. Surfaced by the 3-WSG smoke (CRKD/LCHL/ZYMO): cyphers' wide tables had 30 cols vs M1's 24 -> positional COPY failed. Filed #204 for the deeper class (persist_init blind to species-column-set drift). /code-check clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…patch (#175) Phase 3 REVISED: instead of refactoring the 1594-line M4-centric orchestrator (per user "are these already dealt with in our start-to-finish scripts?"), productionize the proven smoke flow into 4 small reusable scripts that reuse every existing piece (cypher_up/prep/down, schema_consolidate, lnk_pipeline_run, wsg_compare_mapping_code). Cross-WSG ;DAM solved WITHOUT a post-consolidate recompute: each host gets a drainage-CLOSED bucket (focal + downstream closure via public.wsg_outlet) run DOWNSTREAM-FIRST, so a WSG's downstream dam barriers are persisted before its access/mapping_code is computed. Validated on PARS (depth 3) -> its Bennett-dam WSGs PCEA/UPCE/LPCE (depth 2) come first. One study area per host; areas are drainage-independent (roots 100/200/400). New: - study_area_wsgs.R closure + DS-first list (public.wsg_outlet) - wsg_run_one.R lnk_pipeline_run(mapping_code=TRUE) for one WSG, local, host-agnostic (LNK_LOAD=loadall dispatcher / library cyphers) - study_area_compare.R tunnel-free wsg_compare_mapping_code loop -> CSV - study_area_run.sh driver: tunnel-free pre-flight -> spin -> prep -> run DS-first buckets (dispatcher + cyphers) -> consolidate -> BURN (minimise idle) -> compare -> CSV; trap-EXIT burn No M4, no ssh m1, no :63333/PG_PASS_SHARE. /code-check clean (fixed burn-verify pipefail, added bucket-overlap warning). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…aunched (#175) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ea-on-dispatcher note (#175) Fresh cyphers race: cypher_up returns when the IP is assigned, before sshd is up, so scp of cypher_prep.sh hit "Connection closed". Poll ssh (up to ~150s, accept-new host key) before scp. Also document: put the largest study area on the dispatcher (fast/free M1); cyphers are slow+paid, give them smaller areas. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…CH (#175) Cyphers git checkout main by default, but the driver scripts (wsg_run_one.R etc.) + branch link live on the feature branch. Pass the dispatcher`s current branch to cypher_prep so cyphers carry the same ref. Branch must be pushed first (cypher_prep does git fetch origin + reset --hard origin/$BRANCH). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…157) Root-causes the 2026-05-25 data loss: the drainage closure pulled in a species-less WSG (LEUT) -> lnk_pipeline_run errored "No species resolved for AOI" -> dispatcher loop `|| exit 1` -> driver FATAL -> trap burned the cyphers WITH their un-consolidated Peace+Skeena data. One bad WSG lost a whole run. Per the records (research/provincial_run_runbook.md, data-raw/wsgs_run_host.R :88 #157) the proven runner already solved both: - study_area_wsgs.R: filter the closure to bundle-species presence (cfg$species in wsg_species_presence) — drops species-less closure WSGs (Fraser: LEUT, LNRS). Matches wsgs_run_host.R exactly. - study_area_run.sh: per-WSG SOFT-FAIL — a WSG error logs WARN and the loop continues; a non-zero host exit is logged, not fatal. Always reaches consolidate so a late failure can't burn cyphers with unconsolidated data. Mirrors wsgs_run_host.R resume-safe behaviour. - wsg_run_one.R: defensive exit-0 skip when lnk_pipeline_species() is empty. /code-check clean (1 round, 0 findings). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
research/study_area_run.md — lean tunnel-free M1-dispatch study-area parity procedure + the 2026-05-25 gotchas (trap-burn data loss, species filter #157, soft-fail, sshd race, wide-table drift, cypher $0.06/hr). data-raw/README.md drivers table gains study_area_run.sh. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…arantee (#175, #205) The deliverable is a methodology correct regardless of machine count + WSG bucketing. Drainage-closed + DS-first per-host is NOT sufficient: a WSG's downstream barriers can be cross-bucket or arrive late in DS-first order, so its access (token1/token2) is computed against an incomplete barrier set. Caught 2026-05-25: FINA 75.5% / PARA 68.6% per-host -> 99%+ after re-modelling on the full consolidated barrier set. So: distribute (any bucketing) -> consolidate -> POST-CONSOLIDATE RECOMPUTE the diverged WSGs (any species <99%) on the dispatcher with the complete barrier set -> re-compare. Bucketing becomes a speed knob, not a correctness lever. Recompute uses the full pipeline today (slow); filed #205 for the cheap access-only recompute (reuse persisted streams/habitat) that makes recompute-ALL bulletproof + fast. Docs: research/study_area_run.md (procedure corrected). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Correct the task_plan claim "cross-WSG ;DAM solved without recompute" — full run disproved it (FINA 75.5%/PARA 68.6% per-host -> 99%+ after recompute). Record authoritative post-recompute parity (median 99.66%) + the methodology finding + genuine divergences (UNRS reservoir, SETN salmon) in research/provincial_parity_2026_05_25.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nsolidate recompute, #205) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The post-consolidate recompute (#175) was running the full pipeline on diverged WSGs — ~2× cost on those WSGs because it re-derived streams/habitat (already correct) just to redo the cheap access step. #205 implements the access-only recompute that reuses persisted streams/habitat/barriers; FINA validated: 11.86s wall (vs ~90s full pipeline = ~8× faster), bcfp parity 99.8% / 57 diffs / ACCESS;DAM top — IDENTICAL to the full-pipeline recompute. Five things had to be right; four are real gotchas worth knowing (RUNBOOK §6): - R/lnk_access.R (new export, @family compare): portable access builder, twin of lnk_mapping_code (table_<role> params). Builds per-species _access + source `_unified` views internally via lnk_barriers_views over the persist barriers; merge=FALSE overwrites, merge=TRUE surgically UPDATEs cross-WSG cols (has_barriers_*, dam_dnstr_ind) on the target while PRESERVING remediated_dnstr_ind and observed access_<sp>=2 from the prior compute. - lnk_access materialises the AOI streams as a REAL TABLE (CREATE TABLE + index id_segment, wscode_ltree GiST, localcode_ltree GiST, blue_line_key, ANALYZE), NOT a view. A view didn't carry small-table stats so the planner picked the ~800k-row barriers as the nested-loop outer driver, blowing cost ~1000× (>10min). With the real table, planner picks the 26k AOI streams as outer, walk takes seconds. - R/lnk_persist_init.R: persist streams + barriers now get the same wscode_ltree / localcode_ltree GIST + btree indexes that fresh::utils.R builds on its working network table (frs_network_features needs them). - R/lnk_mapping_code.R: #203 cross-WSG cartesian fix. The access read was `SELECT * FROM <access> WHERE id_segment IN (SELECT id_segment FROM streams WHERE wsg=aoi)`. Against persist (where id_segment is per-WSG, not globally unique) this matched N×WSGs of duplicates → 50× rows in mc_scratch → PK violation on persist write. Filter by watershed_group_code when the table carries it. - data-raw/wsg_recompute_one.R (new): sibling of wsg_run_one.R. Sets statement_timeout (600s) + lock_timeout (60s) so a runaway/locked query cancels server-side instead of orphaning a backend. data-raw/study_area_run.sh wired to call it + switched to recompute-ALL (cheap → bulletproof; bucketing is now a speed knob, not a correctness lever). Docs: research/study_area_run.md procedure updated; RUNBOOK.md §6 gotchas (orphaned backend / statement_timeout + pkill ≠ cancel query; view-vs-table planner gotcha; #203 cartesian-on-persist). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ata-loss followup) Without the transaction, a failed INSERT (e.g. the #203 PK-violation that caused FINA mc data loss 2026-05-25) leaves the WSG`s rows deleted but not re-inserted. dbWithTransaction makes the pair atomic. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
mapping_codeparity for the 3 FWCP study areas (Peace / Fraser / Skeena, 50 drainage-closed WSGs). Authoritative parity: median 99.66%, mean 99.11%, 130/148 rows ≥99%. Authoritative CSV:data-raw/logs/study_area_run/20260526_055645_compare.csv. Numbers + methodology inresearch/provincial_parity_2026_05_25.md+research/study_area_run.md.data-raw/study_area_run.sh+study_area_wsgs.R/wsg_run_one.R/wsg_recompute_one.R/study_area_compare.R): spin cyphers → DS-first per-host runs → consolidate → burn → cheap recompute-ALL → tunnel-free compare → CSV. No old M4-centric orchestrator, no:63333tunnel.lnk_access— new export, the missing twin oflnk_mapping_code(@family compare,table_<role>params). Portable, schema-aware access builder;merge = TRUEis the surgical UPDATE that powers the cheap post-consolidate recompute (Cheap access-only post-consolidate recompute (bulletproof cross-WSG mapping_code parity, efficiently) #205): ~8× faster than the full-pipeline recompute (FINA 11.9 s wall vs ~90 s, bcfp parity identical).cypher_prepaligned tocfg$species;schema_consolidateCOPYs shared columns by name (shape-tolerant).lnk_mapping_codenow filters access bywatershed_group_codewhen present (theid_segment IN (…)query was 50×-duplicating against persist).streams+barriers(lnk_persist_init) —frs_network_featurestraversal needs them; matchesfresh::utils.Rpattern.RUNBOOK.md§6: orphanedfrs_network_featuresbackends +statement_timeout/lock_timeout;pkill <R> ≠ cancel query; view-vs-table planner direction;#203persist cartesian.soul/conventions/code-check.mdDocker/Postgres: 6 cross-repo Postgres + R-client lessons (pkill≠ cancel; setstatement_timeout+lock_timeout; function-as-join-predicate inlineability; per-tenant key joins are cartesian; view vs real-table planner; two-statement DELETE/INSERT atomicity).Related Issues
lnk_mapping_codesymptom; root issue still open)Test plan
ACCESS;DAM;INTERMITTENT(Bennett dams in PCEA/UPCE, DS-first, no recompute needed for the closure); BT match 99.0%.20260526_055645): 50/50 WSGs, zero errors /[WARN]s, cyphers BURNED clean (✓ doctl: no cypher droplets). Median match 99.66%.lnk_access(merge=TRUE)) reproduces full-pipeline recompute exactly on FINA (0/26,094 mismatches, identical bcfp parity 99.8% / 57 diffs /ACCESS;DAMtop).devtools::document()regeneratesNAMESPACE+man/;devtools::test()1216 PASS / 1 FAIL (the known M1-envtest-lnk_db_conn, unrelated).research/study_area_run.mdprocedure +RUNBOOK.md§6 new gotchas.Notes
lnk_parity_annotateagainstbcfp_divergence_taxonomy.yml.wsgs_run_pipeline.sh(kept untouched for the existing rollup / full-province flow).study_area_run.shis the new path for per-segment mapping_code parity, validated end-to-end on 50 WSGs.linkbranch175-promote-with-mapping-code-flag-to-standHEADd74d992. bcfp referencev0.7.15-14-ge12c1a5(snapshot loaded viasnapshot_bcfp.sh --with-bcfp-views).🤖 Generated with Claude Code