From 2162fb7acc0b852efd5596db58dd8275672ee2aa Mon Sep 17 00:00:00 2001 From: almac2022 Date: Thu, 14 May 2026 14:53:32 -0700 Subject: [PATCH 1/9] Initialize PWF baseline for #172 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Provincial run autonomy + script renames. Built on top of v0.37.0 (#168) which decoupled pipeline/compare and switched the resume gate to PG state — that foundation lets the autonomy CLI surface stay thin and the 8 renames stay mechanical. 7 phases: CLI on trifecta_provincial.sh, CLI on province_run.sh, M4+M1 integration test, 8-script rename, post-rename smoke, rtj cross-repo update, release v0.38.0. Cypher integration deferred to a follow-up issue. Co-Authored-By: Claude Opus 4.7 --- planning/active/findings.md | 67 +++++++++++++++++++++++ planning/active/progress.md | 11 ++++ planning/active/task_plan.md | 101 +++++++++++++++++++++++++++++++++++ 3 files changed, 179 insertions(+) create mode 100644 planning/active/findings.md create mode 100644 planning/active/progress.md create mode 100644 planning/active/task_plan.md diff --git a/planning/active/findings.md b/planning/active/findings.md new file mode 100644 index 00000000..3095572a --- /dev/null +++ b/planning/active/findings.md @@ -0,0 +1,67 @@ +# Findings — Provincial run autonomy + script renames (#172) + +## Issue context + +### Problem + +After PR #171 (v0.36.1) the operational scripts work but are scattered, inconsistently named, and require operator handholding mid-run. Goal: single command — approved once — that runs end-to-end and lands clean output. M4+M1 baseline first; cyphers opt-in after baseline lands repeatably. + +Names lie about scope — these scripts run **any list of WSGs**, not just "provincial". + +### Goals + +1. **Single-command autonomous run.** `bash data-raw/.sh ...` runs everything (state-clean → snapshot → dispatch → pull → consolidate → burn cyphers if any) without further prompts. +2. **Any WSG list.** `--wsgs=A,B,C` accepted at the umbrella level, auto-split via LPT across configured hosts. +3. **Any config bundle.** `--config=default` or `bcfishpass`, `--schema=`. +4. **Any host subset.** `--no-cyphers` (M4+M1 only) for the validated baseline; `--cy-workspaces=...` for full distributed. +5. **Rename for honesty.** No more "provincial" / "trifecta" / "bcfishpass" in script names that work for any list/host count/reference. + +### 16-WSG test set + +`CARP, CRKD, FINA, FINL, FIRE, FOXR, INGR, LOMI, MESI, NATR, OSPK, PARA, PARS, PCEA, TOOD, UOMI` + +## Naming decision (yesterday's session, confirmed locked-in today) + +Resolved before this session started; pulled forward into this plan: + +- **Umbrella**: `province_run.sh` → `wsgs_run_pipeline.sh` (typed by user as `wsgs_run_pipeline.R` — confirmed `.R` was a typo for `.sh`, the user is the operator entry point and that's a shell script). +- **Per-host loop**: `run_provincial_parity.R` → `wsgs_run_host.R` (plural `wsgs_` signals collection; suffix `host` signals scope = one host's bucket). +- **Other wrappers**: user picked "Mixed nouns (more descriptive)" — `state_clean.sh`, `progress_check.sh`, `runs_archive.sh`, `buckets_balance.R`, `schema_consolidate.R`. + +The singular/plural distinction `wsg_*` (one WSG operations, from #168) vs `wsgs_*` (collection-level operations) reads naturally now that #168 has shipped `wsg_pipeline_run.R` + `wsg_compare.R`. + +## Architecture shift vs yesterday's first attempt at #172 + +The scab fixes from yesterday's first attempt (smoke auto-skip when `--wsgs`, archive `--config`, phantom-cy from `paste0("cy", integer(0))` returning `"cy"` due to R constant recycling) are mostly **no longer load-bearing** because #168's PG-state resume gate makes the loop idempotent: + +- A stale RDS no longer silently skips a missing pipeline run. +- Operators can re-dispatch with `--force` to bypass all caching. +- Compare-only re-runs (`pipeline_done && !rollup_ok`) cost ~3s vs ~80s for full pipeline+compare. + +What remains genuinely needed: + +- `--wsgs=A,B,C` filter in `trifecta_provincial.sh` SPLIT_R block. +- `--no-cyphers` mode (force `N_CY=0`, skip cypher subprocess + wrap + pullback paths). +- `--force` passthrough to the per-host Rscript. +- Phantom-cy bug fix (still real — `paste0("cy", integer(0))` returns `"cy"` length-1; need explicit `if (n_cy == 0L) character(0)` branch). +- `province_run.sh` arg parser surface and config-aware ANN_CSV path. + +## Cross-repo coordination (rtj) + +`rtj/scripts/cypher/cypher_run.sh:8` references `~/Projects/repo/link/data-raw/run_provincial_parity.R`. After the rename, this reference becomes stale. + +User confirmed: direct commit + push to rtj is fine ("no one but us using this stuff"). Order: link's rename PR merges first, then a one-line update on rtj/main. This way `cypher_run.sh` never references a missing file on link/main. + +Coordinate via Phase 6 of this plan (not via comms thread — direct commit was approved). + +## Reference counts at start of session + +``` +trifecta_provincial: 15 files reference it +run_provincial_parity: 12 +consolidate_schema: 9 +archive_provincial_runs: 7 +balance_provincial_buckets: 6 +``` + +Most are inside data-raw/ scripts that source each other, plus README/runbook docs and CLAUDE.md. Phase 4 walks each rename + reference update in one commit per file (or one bulk commit — TBD during execution). diff --git a/planning/active/progress.md b/planning/active/progress.md new file mode 100644 index 00000000..db3452da --- /dev/null +++ b/planning/active/progress.md @@ -0,0 +1,11 @@ +# Progress — Provincial run autonomy + script renames (#172) + +## Session 2026-05-14 (afternoon — post-#168) + +- Resumed #172 against the new #168 architecture. PG-state resume + decoupled `wsg_pipeline_run`/`wsg_compare` are now in place, so most of yesterday's first-attempt scab fixes (smoke auto-skip, archive --config, phantom-cy mitigation via fallback paths) became unnecessary. +- Plan-mode exploration — phases approved by user. Locked in yesterday's rename decisions (umbrella = `wsgs_run_pipeline.sh`, per-host loop = `wsgs_run_host.R`, mixed nouns for other wrappers). +- Cypher integration deferred to follow-up — keeps PR scope tight, matches #168 discipline. +- rtj cross-repo update authorized (direct commit + push). +- Created branch `172-provincial-run-autonomy-renames` off main (v0.37.0 baseline). +- Scaffolded PWF baseline (`task_plan.md`, `findings.md`, `progress.md`) with 7 approved phases. +- Next: start Phase 1 — patch `trifecta_provincial.sh` for `--wsgs`, `--no-cyphers`, `--force`, plus phantom-cy fix. diff --git a/planning/active/task_plan.md b/planning/active/task_plan.md new file mode 100644 index 00000000..92371f80 --- /dev/null +++ b/planning/active/task_plan.md @@ -0,0 +1,101 @@ +# Task: Provincial run autonomy + script renames (#172) + +After #168 shipped (v0.37.0, PG-state resume), this PR adds the CLI surface for autonomous M4+M1 runs and renames 8 operational scripts to noun_verb convention. Cypher integration deferred to a follow-up. + +## Phase 1 — CLI surface on `trifecta_provincial.sh` + +Patch the original filename first so the smoke (Phase 3) validates on the known-good name. Rename in Phase 4. + +- [ ] Add `--wsgs=A,B,C` arg parse. In SPLIT_R block (~line 135), intersect `all_wsgs` with the `--wsgs` list when provided; error loud on unknown WSGs. +- [ ] Add `--no-cyphers` arg parse. When set: force `N_CY=0`, skip cypher wrap generation (lines 449-471), skip cypher subprocess launch (lines 523-536), skip cypher RDS pullback, skip cypher R log pullback. +- [ ] Add `--force` arg parse and forward to `Rscript run_provincial_parity.R ... --force`. +- [ ] Fix phantom-cy bug for `n_cy = 0`: `paste0("cy", integer(0))` returns `"cy"` (length-1 due to constant recycling); use explicit `if (n_cy == 0L) character(0)` branch. +- [ ] Harden empty `CY_WORKSPACES` / `N_CY` init for the cy-less path. +- [ ] Update usage block. +- [ ] `bash -n data-raw/trifecta_provincial.sh` syntax-clean. +- [ ] `/code-check` clean on staged diff. +- [ ] Commit "trifecta_provincial.sh: --wsgs filter, --no-cyphers mode, --force passthrough" + +## Phase 2 — CLI surface on `province_run.sh` + +- [ ] Add `--wsgs=`, `--config=`, `--schema=`, `--no-cyphers`, `--force` to arg parser. +- [ ] Defaults: `--config=bcfishpass`, `--schema=""` (use bundle default), `--wsgs=""` (full bundle). +- [ ] Forward new flags to `trifecta_provincial.sh` invocation. +- [ ] When `--no-cyphers`: skip Step 3 (cypher spin), Step 4 (cypher prep), Step 5 cypher iterations (M4+M1 archive only), Step 9 cypher consolidate sources (M1 only), Step 10 cypher burn (trap-EXIT no-op). +- [ ] Step 8 ANN_CSV path: derive from `CONFIG_NAME` so non-bcfishpass bundles get `provincial_/` not hardcoded `provincial_parity/`. +- [ ] Auto-skip-smoke when `--no-cyphers` OR `--wsgs` is set. Place notice AFTER the `exec > >(tee -a "$LOG")` redirect so it lands in the log. +- [ ] Update usage block. +- [ ] `bash -n data-raw/province_run.sh` syntax-clean. +- [ ] `/code-check` clean. +- [ ] Commit "province_run.sh: --wsgs / --config / --schema / --no-cyphers / --force passthrough" + +## Phase 3 — Integration test (M4+M1, 16-WSG default-bundle, pre-rename) + +- [ ] Pre-flight: M4 has bcfp tunnel up, M1 ssh reachable, `fresh.modelled_stream_crossings` present on both hosts. +- [ ] Wipe state via `bash data-raw/province_clean.sh --skip-cy`. +- [ ] Run autonomous: + ```bash + bash data-raw/province_run.sh \ + --wsgs=CARP,CRKD,FINA,FINL,FIRE,FOXR,INGR,LOMI,MESI,NATR,OSPK,PARA,PARS,PCEA,TOOD,UOMI \ + --config=default --schema=fresh_default --no-cyphers --with-mapping-code + ``` +- [ ] Acceptance: exit code 0, ~30–40 min wall, no operator prompts mid-run. +- [ ] Verify `SELECT count(DISTINCT watershed_group_code) FROM fresh_default.streams` = 16 on M4. +- [ ] Verify `fresh_default.streams_habitat_*` populated for each species in the bundle. +- [ ] Verify annotated CSV written with all 16 WSGs. +- [ ] Document outcome in `progress.md` with wall time + per-host breakdown. + +## Phase 4 — Rename 8 scripts + update all live references + +- [ ] `git mv data-raw/province_run.sh data-raw/wsgs_run_pipeline.sh` +- [ ] `git mv data-raw/province_clean.sh data-raw/state_clean.sh` +- [ ] `git mv data-raw/province_progress.sh data-raw/progress_check.sh` +- [ ] `git mv data-raw/trifecta_provincial.sh data-raw/wsgs_dispatch.sh` +- [ ] `git mv data-raw/run_provincial_parity.R data-raw/wsgs_run_host.R` +- [ ] `git mv data-raw/consolidate_schema.R data-raw/schema_consolidate.R` +- [ ] `git mv data-raw/archive_provincial_runs.sh data-raw/runs_archive.sh` +- [ ] `git mv data-raw/balance_provincial_buckets.R data-raw/buckets_balance.R` +- [ ] Update internal references in each renamed file (self-name in usage block, `source()` calls between them, log filenames). +- [ ] Update `data-raw/README.md` (~27 refs). +- [ ] Update `research/provincial_run_runbook.md` (~12 refs). +- [ ] Update `research/post_compact_provincial_handoff.md` (~8 refs). +- [ ] Update `CLAUDE.md` (2 refs). +- [ ] **Do NOT update** `planning/archive/**`, `NEWS.md` historical entries. +- [ ] `bash -n` clean on all 4 renamed shell scripts. +- [ ] `/code-check` clean. +- [ ] Commit "Rename 8 operational scripts to noun_verb convention" + +## Phase 5 — Smoke after rename + +- [ ] 1-WSG smoke via the renamed umbrella: + ```bash + bash data-raw/wsgs_run_pipeline.sh --wsgs=DEAD --config=bcfishpass --no-cyphers + ``` +- [ ] Acceptance: exit code 0, DEAD lands in M4 `fresh.streams`. +- [ ] `devtools::test()` passes. +- [ ] `devtools::check()` — same warning baseline as v0.37.0 (no new warnings). + +## Phase 6 — Cross-repo rtj update + +- [ ] In `~/Projects/repo/rtj`, update `scripts/cypher/cypher_run.sh` reference from `run_provincial_parity.R` → `wsgs_run_host.R`. +- [ ] `bash -n scripts/cypher/cypher_run.sh` clean. +- [ ] Commit "scripts/cypher/cypher_run.sh: update for link wsgs_run_host.R rename" on rtj/main. +- [ ] Push to origin/main. +- [ ] Order: rtj commit lands **after** link's rename PR merges so cypher_run.sh never references a missing file on link/main. + +## Phase 7 — Release v0.38.0 + +- [ ] Update `DESCRIPTION` Version 0.37.0 → 0.38.0. +- [ ] Update `NEWS.md` with v0.38.0 entry covering CLI surface + 8 renames. +- [ ] Update `CLAUDE.md` if any rename touches its references. +- [ ] Commit "Release v0.38.0". +- [ ] `/planning-archive` with slug `provincial-run-autonomy-renames`. +- [ ] `/gh-pr-push` opens PR with SRED tag in body. +- [ ] After merge: `/gh-pr-merge` handles tag + post-merge CI watch + rtj coordination. + +## Validation + +- [ ] Tests pass (`devtools::test()`) +- [ ] `/code-check` clean on each commit +- [ ] PWF checkboxes match landed work +- [ ] `/planning-archive` on completion From 7658ea919e790660c78118f4da970d845dc2d8a3 Mon Sep 17 00:00:00 2001 From: almac2022 Date: Thu, 14 May 2026 16:04:34 -0700 Subject: [PATCH 2/9] trifecta_provincial.sh: --wsgs filter, --no-cyphers mode, --force passthrough MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Three new CLI flags + two bug fixes layered onto the multi-host dispatcher: - --wsgs=A,B,C — restrict to a subset of the bundle's WSG list. Intersected with the presence-filtered set inside SPLIT_R; unknown WSGs error loud via stop(call.=FALSE). - --no-cyphers — skip cypher hosts entirely. Wipes CY_WORKSPACES so N_CY=0; downstream cypher loops (wrap, launch, RDS pullback, R-log pullback) become no-ops naturally. - --force — forward to every per-host Rscript invocation. Bypasses both the PG-state and RDS resume gates from #168. Phantom-cy fix: R's paste0("cy", integer(0)) returns "cy" length-1 (constant recycling), which would put a non-existent cypher in host_keys when n_cy=0. Three-branched cy_host_keys construction: character(0) / "cy" / paste0(...). Empty CY_WORKSPACES init now explicit CY_WS_ARR=() rather than read -r -a (which yields a single-element-empty-string array). Error-surface fix: SPLIT_OUT=$(Rscript ...) under set -e propagates the inner exit code but discards the output. A typo like --wsgs=BOGUS aborted bash with no operator-visible message. Wrapped with explicit || block that dumps SPLIT_OUT to stderr and exits with the R rc, so the R-side stop() reaches the operator. Validated: - bash -n clean - isolated SPLIT_R run with --wsgs=DEAD,ADMS --no-cyphers produces all_wsgs=ADMS,DEAD, n_cy=0, host_keys=m4,m1, cy_host_keys length 0 - --wsgs=BOGUS,ADMS surfaces the R stop() message verbatim Co-Authored-By: Claude Opus 4.7 --- data-raw/trifecta_provincial.sh | 78 ++++++++++++++++++++++++++++++--- planning/active/progress.md | 3 +- planning/active/task_plan.md | 18 ++++---- 3 files changed, 84 insertions(+), 15 deletions(-) diff --git a/data-raw/trifecta_provincial.sh b/data-raw/trifecta_provincial.sh index 20d185e1..37f19f9f 100755 --- a/data-raw/trifecta_provincial.sh +++ b/data-raw/trifecta_provincial.sh @@ -16,11 +16,19 @@ # ./trifecta_provincial.sh # 3-host default: M4 + M1 + 1 cypher # ./trifecta_provincial.sh --with-mapping-code # per-WSG mapping_code lens # ./trifecta_provincial.sh --cy-workspaces=job1,job2,job3 # 5-host: 3 cyphers +# ./trifecta_provincial.sh --wsgs=ADMS,BULK,DEAD --no-cyphers # M4+M1, 3 WSGs +# ./trifecta_provincial.sh --force --no-cyphers # force re-run, M4+M1 only # # CLI flags: # --config= bundle (default: bcfishpass) # --schema= override cfg$pipeline$schema # --rds-dir= override per-bundle RDS dir +# --wsgs= restrict to a subset of the bundle's WSG list +# (intersected with bundle presence-filtered set; +# unknown WSGs error loud) +# --no-cyphers skip cypher hosts entirely; dispatch M4+M1 only +# --force forward --force to the per-host Rscript +# (bypasses both PG-state and RDS resume gates) # --host-speeds= per-host speed factor vs M4 (default: m4=1.0,m1=0.83,cy=1.83). # Higher = slower. Used in LPT bucket projection. # Per-cypher overrides via --host-speeds=...,cy1=1.83,cy2=2.10 @@ -32,7 +40,8 @@ # --with-mapping-code pass through to run_provincial_parity.R # --skip-preflight skip version-match check (debug only) # -# Estimated wall: ~2 hours single-cypher, ~50-60 min 3-cypher. +# Estimated wall: ~2 hours single-cypher, ~50-60 min 3-cypher, +# ~30-40 min M4+M1 only (16-WSG default-bundle smoke test). set -euo pipefail @@ -59,12 +68,18 @@ HOST_SPEEDS="m4=1.0,m1=0.79,cy=1.23" declare -A CYN_BUCKETS=() WITH_MAPPING_CODE="" SKIP_PREFLIGHT=0 +WSGS_FILTER="" +NO_CYPHERS=0 +FORCE_FLAG="" for arg in "$@"; do case "$arg" in --config=*) CONFIG="${arg#--config=}" ;; --schema=*) SCHEMA="${arg#--schema=}" ;; --rds-dir=*) RDS_DIR="${arg#--rds-dir=}" ;; + --wsgs=*) WSGS_FILTER="${arg#--wsgs=}" ;; + --no-cyphers) NO_CYPHERS=1 ;; + --force) FORCE_FLAG="--force" ;; --host-speeds=*) HOST_SPEEDS="${arg#--host-speeds=}" ;; --m4-bucket=*) M4_OVERRIDE="${arg#--m4-bucket=}" ;; --m1-bucket=*) M1_OVERRIDE="${arg#--m1-bucket=}" ;; @@ -79,13 +94,28 @@ for arg in "$@"; do esac done +# --no-cyphers wipes the cypher workspace list so the rest of the +# script sees a zero-cypher plan. Bash array init from an empty string +# via `read -r -a` produces a single-element array containing "" — +# clear it explicitly to get a true empty list. +if [ "$NO_CYPHERS" -eq 1 ]; then + CY_WORKSPACES="" +fi + EXTRA_ARGS="--config=$CONFIG" [ -n "$SCHEMA" ] && EXTRA_ARGS="$EXTRA_ARGS --schema=$SCHEMA" [ -n "$RDS_DIR" ] && EXTRA_ARGS="$EXTRA_ARGS --rds-dir=$RDS_DIR" [ -n "$WITH_MAPPING_CODE" ] && EXTRA_ARGS="$EXTRA_ARGS $WITH_MAPPING_CODE" +[ -n "$FORCE_FLAG" ] && EXTRA_ARGS="$EXTRA_ARGS $FORCE_FLAG" -# Parse cypher workspace list into array -IFS=',' read -r -a CY_WS_ARR <<< "$CY_WORKSPACES" +# Parse cypher workspace list into array. Empty CY_WORKSPACES (set by +# --no-cyphers) yields N_CY=0; all `for ((i=0; i 0L) { + stop("--wsgs contains WSGs not in the bundle presence-filtered set: ", + paste(unknown, collapse = ", "), call. = FALSE) + } + all_wsgs <- sort(intersect(all_wsgs, requested)) + cat("[LPT] --wsgs subset: ", length(all_wsgs), " of ", + length(requested), " requested kept\n", sep = "") +} + # Parse --host-speeds=m4=1.0,m1=0.83,cy=1.83 into a named numeric vector parse_speeds <- function(s) { pairs <- strsplit(s, ",", fixed = TRUE)[[1]] @@ -154,10 +200,25 @@ if (!all(c("m4","m1","cy") %in% names(speeds))) { # Hosts in the plan: m4, m1, cy1..cyN_CY (each cypher workspace is its # own host). Per-cypher speed: take \`cyN\` from --host-speeds if present, # else fall back to the generic \`cy\` factor. +# +# Phantom-cy guard for the --no-cyphers / empty workspace list case: +# R's \`paste0("cy", integer(0))\` returns \`"cy"\` (length 1) due to +# constant recycling, which would put a non-existent cypher in +# host_keys. Three-branch the construction to keep n_cy = 0 honest. cy_ws_csv <- "$CY_WORKSPACES" -cy_ws <- strsplit(cy_ws_csv, ",", fixed = TRUE)[[1]] +cy_ws <- if (nzchar(cy_ws_csv)) { + strsplit(cy_ws_csv, ",", fixed = TRUE)[[1]] +} else { + character(0) +} n_cy <- length(cy_ws) -cy_host_keys <- if (n_cy == 1L) "cy" else paste0("cy", seq_len(n_cy)) +cy_host_keys <- if (n_cy == 0L) { + character(0) +} else if (n_cy == 1L) { + "cy" +} else { + paste0("cy", seq_len(n_cy)) +} host_keys <- c("m4", "m1", cy_host_keys) host_factor <- numeric(length(host_keys)) names(host_factor) <- host_keys @@ -295,7 +356,12 @@ for (h in host_keys) { } SPLIT_EOF -SPLIT_OUT=$(Rscript "$SPLIT_R" 2>&1) +SPLIT_OUT=$(Rscript "$SPLIT_R" 2>&1) || { + rc=$? + echo "ERROR: split/LPT R script failed (rc=$rc):" >&2 + echo "$SPLIT_OUT" >&2 + exit "$rc" +} echo "$SPLIT_OUT" | grep -E "^\[LPT\]" || true M4_WSGS=$(echo "$SPLIT_OUT" | awk -F'=' '$1=="M4" {print $2}') diff --git a/planning/active/progress.md b/planning/active/progress.md index db3452da..407e4303 100644 --- a/planning/active/progress.md +++ b/planning/active/progress.md @@ -8,4 +8,5 @@ - rtj cross-repo update authorized (direct commit + push). - Created branch `172-provincial-run-autonomy-renames` off main (v0.37.0 baseline). - Scaffolded PWF baseline (`task_plan.md`, `findings.md`, `progress.md`) with 7 approved phases. -- Next: start Phase 1 — patch `trifecta_provincial.sh` for `--wsgs`, `--no-cyphers`, `--force`, plus phantom-cy fix. +- **Phase 1 done.** Added `--wsgs=`, `--no-cyphers`, `--force` to `trifecta_provincial.sh`. Fixed phantom-cy (R's `paste0("cy", integer(0))` → `"cy"` recycling bug) via 3-branch `cy_host_keys`. Hardened empty-`CY_WORKSPACES` init. `/code-check` round 1 caught a silent-abort bug (R `stop()` exits bash without operator-visible message under `SPLIT_OUT=$(...)`); fixed with explicit `||` block dumping SPLIT_OUT to stderr. Round 2 clean. SPLIT_R logic verified via isolated R run. +- Next: Phase 2 — propagate the same flags through `province_run.sh` umbrella. diff --git a/planning/active/task_plan.md b/planning/active/task_plan.md index 92371f80..71255712 100644 --- a/planning/active/task_plan.md +++ b/planning/active/task_plan.md @@ -6,14 +6,16 @@ After #168 shipped (v0.37.0, PG-state resume), this PR adds the CLI surface for Patch the original filename first so the smoke (Phase 3) validates on the known-good name. Rename in Phase 4. -- [ ] Add `--wsgs=A,B,C` arg parse. In SPLIT_R block (~line 135), intersect `all_wsgs` with the `--wsgs` list when provided; error loud on unknown WSGs. -- [ ] Add `--no-cyphers` arg parse. When set: force `N_CY=0`, skip cypher wrap generation (lines 449-471), skip cypher subprocess launch (lines 523-536), skip cypher RDS pullback, skip cypher R log pullback. -- [ ] Add `--force` arg parse and forward to `Rscript run_provincial_parity.R ... --force`. -- [ ] Fix phantom-cy bug for `n_cy = 0`: `paste0("cy", integer(0))` returns `"cy"` (length-1 due to constant recycling); use explicit `if (n_cy == 0L) character(0)` branch. -- [ ] Harden empty `CY_WORKSPACES` / `N_CY` init for the cy-less path. -- [ ] Update usage block. -- [ ] `bash -n data-raw/trifecta_provincial.sh` syntax-clean. -- [ ] `/code-check` clean on staged diff. +- [x] Add `--wsgs=A,B,C` arg parse. SPLIT_R block intersects `all_wsgs` with the `--wsgs` list; errors loud on unknown WSGs via `stop(call. = FALSE)`. Verified end-to-end with `--wsgs=BOGUS,ADMS`. +- [x] Add `--no-cyphers` arg parse. Wipes `CY_WORKSPACES=""` → empty `CY_WS_ARR` → `N_CY=0`; all `for ((i=0; i Date: Thu, 14 May 2026 16:09:48 -0700 Subject: [PATCH 3/9] province_run.sh: --wsgs / --config / --schema / --no-cyphers / --force passthrough MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Five new CLI flags layered onto the 10-step provincial wrapper, plus config-aware paths so the script works with any bundle / WSG subset / host count. - --wsgs=A,B,C restrict to a subset of the bundle WSG list - --config= bundle (default: bcfishpass) - --schema= override cfg$pipeline$schema for the output - --no-cyphers M4+M1 only — skip cypher spin/prep/archive/burn - --force forward to per-host Rscript (bypass resume gates) Build DISPATCH_FLAGS for trifecta_provincial.sh passthrough. Step gating under --no-cyphers: - Step 3 (cypher_up.sh) skipped - Step 4 (cypher_prep.sh) skipped - Step 5 cypher archives skipped (M4+M1 always archive) - Step 7 invocation omits --cy-workspaces=... - Step 9 consolidate sources M1-only variant - Step 10 trap-EXIT burn no-op (CYPHERS_UP stays 0) Step 8 ANN_CSV path is now config-aware: provincial_parity/ for bcfishpass (back-compat), provincial_/ otherwise. Step 9 target-schema resolution: explicit --schema wins, else lnk_config(CONFIG_NAME)$pipeline$schema lookup with stop() inside the Rscript heredoc plus post-lookup empty/NULL guards. Replaces a silent "fresh" fallback that masked misconfigured --config= flags (round-1 code-check fix). Auto-skip-smoke when --no-cyphers OR --wsgs is set — the smoke harness assumes 3 cyphers + fixed per-host WSGs, both break under the new subset modes. Validated: - bash -n clean - TARGET_SCHEMA lookup: bcfishpass→"fresh", default→"fresh", BOGUS errors loud with bundle-path hint. - /code-check round 2 clean. Co-Authored-By: Claude Opus 4.7 --- data-raw/province_run.sh | 255 ++++++++++++++++++++++++++--------- planning/active/progress.md | 3 +- planning/active/task_plan.md | 20 +-- 3 files changed, 202 insertions(+), 76 deletions(-) diff --git a/data-raw/province_run.sh b/data-raw/province_run.sh index 0d55da6d..fea194c8 100755 --- a/data-raw/province_run.sh +++ b/data-raw/province_run.sh @@ -19,9 +19,22 @@ # only attempts burn when there's something to burn. # # Usage: -# bash data-raw/province_run.sh [--skip-smoke] [--no-mapping-code] [--keep-cyphers] +# bash data-raw/province_run.sh [flags] # -# Total wall: ~95-110 min Cypher cost: ~$1-2 +# Flags: +# --wsgs=A,B,C restrict to a WSG subset (full bundle if omitted) +# --config= bundle name (default: bcfishpass) +# --schema= override cfg$pipeline$schema (default: bundle default) +# --no-cyphers M4+M1 only — skip cypher spin/prep/burn entirely +# --force forward --force to per-host Rscript (bypass resume gates) +# --skip-smoke skip the smoke pre-check +# --no-mapping-code drop the mapping_code lens +# --keep-cyphers don't burn cyphers on exit (debug) +# +# Total wall: +# ~95-110 min for full provincial (3 cyphers) +# ~30-40 min for --wsgs=<16-WSG-set> --no-cyphers (M4+M1 only) +# Cypher cost: ~$1-2 per full provincial; $0 with --no-cyphers. set -euo pipefail @@ -29,11 +42,21 @@ set -euo pipefail SKIP_SMOKE=0 NO_MAPPING=0 KEEP_CYPHERS=0 +WSGS_FILTER="" +CONFIG_NAME="bcfishpass" +SCHEMA="" +NO_CYPHERS=0 +FORCE_FLAG="" for arg in "$@"; do case "$arg" in --skip-smoke) SKIP_SMOKE=1 ;; --no-mapping-code) NO_MAPPING=1 ;; --keep-cyphers) KEEP_CYPHERS=1 ;; + --wsgs=*) WSGS_FILTER="${arg#--wsgs=}" ;; + --config=*) CONFIG_NAME="${arg#--config=}" ;; + --schema=*) SCHEMA="${arg#--schema=}" ;; + --no-cyphers) NO_CYPHERS=1 ;; + --force) FORCE_FLAG="--force" ;; *) echo "FATAL: unknown arg: $arg" >&2; exit 1 ;; esac done @@ -41,6 +64,14 @@ done MAPPING_FLAG="--with-mapping-code" [ "$NO_MAPPING" = "1" ] && MAPPING_FLAG="" +# Build the passthrough flag string for trifecta_provincial.sh + trifecta_smoke.sh. +DISPATCH_FLAGS="" +[ -n "$WSGS_FILTER" ] && DISPATCH_FLAGS="$DISPATCH_FLAGS --wsgs=$WSGS_FILTER" +[ -n "$CONFIG_NAME" ] && DISPATCH_FLAGS="$DISPATCH_FLAGS --config=$CONFIG_NAME" +[ -n "$SCHEMA" ] && DISPATCH_FLAGS="$DISPATCH_FLAGS --schema=$SCHEMA" +[ "$NO_CYPHERS" = "1" ] && DISPATCH_FLAGS="$DISPATCH_FLAGS --no-cyphers" +[ -n "$FORCE_FLAG" ] && DISPATCH_FLAGS="$DISPATCH_FLAGS $FORCE_FLAG" + REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)" cd "$REPO_ROOT" TS="$(date -u +%Y%m%d_%H%M%S)" @@ -52,10 +83,26 @@ exec > >(tee -a "$LOG") 2>&1 START_EPOCH=$(date +%s) echo "=== province_run.sh $TS ===" echo " log: $LOG" +echo " config: $CONFIG_NAME" +[ -n "$SCHEMA" ] && echo " schema: $SCHEMA" +[ -n "$WSGS_FILTER" ] && echo " wsgs: $WSGS_FILTER" +echo " no-cyphers: $([ "$NO_CYPHERS" = "0" ] && echo no || echo YES)" +echo " force: $([ -n "$FORCE_FLAG" ] && echo YES || echo no)" echo " mapping: $([ "$NO_MAPPING" = "0" ] && echo with || echo without)" echo " smoke: $([ "$SKIP_SMOKE" = "0" ] && echo on || echo SKIPPED)" echo " keep-cy: $([ "$KEEP_CYPHERS" = "0" ] && echo no || echo YES)" +# Auto-skip smoke when the smoke harness's preconditions are not met. +# trifecta_smoke.sh assumes 3 cypher workspaces (job1/job2/job3) and a +# fixed per-host WSG triplet; both break under --no-cyphers or --wsgs. +# Setting SKIP_SMOKE here (after the log redirect) keeps the notice in +# the log for post-hoc inspection. +if [ "$SKIP_SMOKE" = "0" ] && { [ "$NO_CYPHERS" = "1" ] || [ -n "$WSGS_FILTER" ]; }; then + echo "[auto-skip-smoke] --no-cyphers or --wsgs is set; trifecta_smoke.sh" + echo " assumptions don't hold — skipping Step 6." + SKIP_SMOKE=1 +fi + # --- trap: burn cyphers on exit, but only if we ever spun them --- CYPHERS_UP=0 burn_cyphers() { @@ -123,52 +170,58 @@ wait $M4_PID || { echo "FATAL: M4 snapshot failed; see $LOG_DIR/${TS}_snapshot_m wait $M1_PID || { echo "FATAL: M1 snapshot failed; see $LOG_DIR/${TS}_snapshot_m1.log"; exit 1; } echo " ✓ snapshots done" -# --- Step 3: spin 3 cyphers (parallel) --- -echo "=== Step 3: cypher_up.sh job1/job2/job3 ===" -cd ~/Projects/repo/rtj/scripts/cypher -for WS in job1 job2 job3; do - ./cypher_up.sh --workspace "$WS" > "$LOG_DIR/${TS}_up_$WS.log" 2>&1 & -done -wait -cd "$REPO_ROOT" +# --- Step 3: spin 3 cyphers (parallel) — skipped under --no-cyphers --- declare -A CY_IP -for WS in job1 job2 job3; do - IP=$(cd ~/Projects/repo/rtj/env/do/dev/cypher && TF_WORKSPACE="$WS" tofu output -raw droplet_ip 2>/dev/null) || { - echo "FATAL: tofu output droplet_ip failed for $WS; see $LOG_DIR/${TS}_up_$WS.log" - exit 1 - } - [ -n "$IP" ] || { echo "FATAL: empty droplet_ip for $WS"; exit 1; } - CY_IP[$WS]="$IP" - echo " cy[$WS] = $IP" -done -CYPHERS_UP=1 # trap EXIT will now attempt burn +if [ "$NO_CYPHERS" = "0" ]; then + echo "=== Step 3: cypher_up.sh job1/job2/job3 ===" + cd ~/Projects/repo/rtj/scripts/cypher + for WS in job1 job2 job3; do + ./cypher_up.sh --workspace "$WS" > "$LOG_DIR/${TS}_up_$WS.log" 2>&1 & + done + wait + cd "$REPO_ROOT" + for WS in job1 job2 job3; do + IP=$(cd ~/Projects/repo/rtj/env/do/dev/cypher && TF_WORKSPACE="$WS" tofu output -raw droplet_ip 2>/dev/null) || { + echo "FATAL: tofu output droplet_ip failed for $WS; see $LOG_DIR/${TS}_up_$WS.log" + exit 1 + } + [ -n "$IP" ] || { echo "FATAL: empty droplet_ip for $WS"; exit 1; } + CY_IP[$WS]="$IP" + echo " cy[$WS] = $IP" + done + CYPHERS_UP=1 # trap EXIT will now attempt burn -# --- Step 4: per-cypher prep (parallel) --- -echo "=== Step 4: cypher_prep.sh on all 3 cyphers ===" -for WS in job1 job2 job3; do - IP="${CY_IP[$WS]}" - ( scp -q data-raw/cypher_prep.sh "cypher@$IP:/tmp/cypher_prep.sh" && \ - ssh "cypher@$IP" "bash /tmp/cypher_prep.sh" ) > "$LOG_DIR/${TS}_prep_$WS.log" 2>&1 & -done -wait -for WS in job1 job2 job3; do - if ! grep -q "snapshot_bcfp.sh: complete" "$LOG_DIR/${TS}_prep_$WS.log" 2>/dev/null; then - echo "FATAL: cypher[$WS] prep failed; see $LOG_DIR/${TS}_prep_$WS.log" - exit 1 - fi -done -echo " ✓ cyphers prepped" + # --- Step 4: per-cypher prep (parallel) --- + echo "=== Step 4: cypher_prep.sh on all 3 cyphers ===" + for WS in job1 job2 job3; do + IP="${CY_IP[$WS]}" + ( scp -q data-raw/cypher_prep.sh "cypher@$IP:/tmp/cypher_prep.sh" && \ + ssh "cypher@$IP" "bash /tmp/cypher_prep.sh" ) > "$LOG_DIR/${TS}_prep_$WS.log" 2>&1 & + done + wait + for WS in job1 job2 job3; do + if ! grep -q "snapshot_bcfp.sh: complete" "$LOG_DIR/${TS}_prep_$WS.log" 2>/dev/null; then + echo "FATAL: cypher[$WS] prep failed; see $LOG_DIR/${TS}_prep_$WS.log" + exit 1 + fi + done + echo " ✓ cyphers prepped" +else + echo "=== Step 3+4: SKIPPED (--no-cyphers) ===" +fi -# --- Step 5: archive prior RDS on all 5 hosts (parallel) --- +# --- Step 5: archive prior RDS — M4+M1 always, cyphers only when up --- echo "=== Step 5: archive_provincial_runs.sh on all hosts ===" bash data-raw/archive_provincial_runs.sh > "$LOG_DIR/${TS}_archive_m4.log" 2>&1 & ssh m1 'cd ~/Projects/repo/link/data-raw && ./archive_provincial_runs.sh' \ > "$LOG_DIR/${TS}_archive_m1.log" 2>&1 & -for WS in job1 job2 job3; do - IP="${CY_IP[$WS]}" - ssh "cypher@$IP" 'cd ~/Projects/repo/link/data-raw && ./archive_provincial_runs.sh' \ - > "$LOG_DIR/${TS}_archive_$WS.log" 2>&1 & -done +if [ "$NO_CYPHERS" = "0" ]; then + for WS in job1 job2 job3; do + IP="${CY_IP[$WS]}" + ssh "cypher@$IP" 'cd ~/Projects/repo/link/data-raw && ./archive_provincial_runs.sh' \ + > "$LOG_DIR/${TS}_archive_$WS.log" 2>&1 & + done +fi wait echo " ✓ archived" @@ -187,9 +240,19 @@ if [ "$SKIP_SMOKE" = "0" ]; then fi # --- Step 7: FULL DISPATCH --- -echo "=== Step 7: full provincial dispatch (~80-95 min wall) ===" +# When --no-cyphers OR --wsgs is set, omit --cy-workspaces so +# trifecta_provincial.sh runs with the M4+M1-only plan it derives +# from DISPATCH_FLAGS (which includes --no-cyphers, --wsgs, etc.). +if [ "$NO_CYPHERS" = "0" ] && [ -z "$WSGS_FILTER" ]; then + TRIFECTA_CY_ARG="--cy-workspaces=job1,job2,job3" + echo "=== Step 7: full provincial dispatch (~80-95 min wall) ===" +else + TRIFECTA_CY_ARG="" + echo "=== Step 7: subset dispatch — see DISPATCH_FLAGS below ===" +fi +echo " DISPATCH_FLAGS=$DISPATCH_FLAGS" cd "$REPO_ROOT/data-raw" -if ! bash trifecta_provincial.sh --cy-workspaces=job1,job2,job3 $MAPPING_FLAG \ +if ! bash trifecta_provincial.sh $TRIFECTA_CY_ARG $DISPATCH_FLAGS $MAPPING_FLAG \ > "$LOG_DIR/${TS}_full.log" 2>&1; then echo "WARNING: trifecta_provincial.sh exited non-zero; partial result may exist" # don't exit — let acceptance + consolidate inspect what landed @@ -199,8 +262,15 @@ tail -15 "$LOG_DIR/${TS}_full.log" cd "$REPO_ROOT" # --- Step 8: acceptance bar --- +# RDS dir is config-aware: bcfishpass → provincial_parity (legacy +# name, kept for back-compat); any other bundle → provincial_. echo "=== Step 8: acceptance bar ===" -ANN_CSV=$(ls -1t data-raw/logs/provincial_parity/*_annotated.csv 2>/dev/null | head -1 || true) +if [ "$CONFIG_NAME" = "bcfishpass" ]; then + RDS_DIR_NAME="provincial_parity" +else + RDS_DIR_NAME="provincial_${CONFIG_NAME}" +fi +ANN_CSV=$(ls -1t "data-raw/logs/$RDS_DIR_NAME"/*_annotated.csv 2>/dev/null | head -1 || true) if [ -z "$ANN_CSV" ]; then echo " ✗ no annotated.csv found — dispatch likely failed before annotation" exit 1 @@ -215,44 +285,97 @@ if [ "$N_UNEXP" -gt 0 ]; then echo " WARNING: $N_UNEXP UNEXPLAINED rows — surface to user; consolidate still proceeds" fi -# --- Step 9: consolidate fresh schema → M4 --- -echo "=== Step 9: consolidate fresh schema ===" +# --- Step 9: consolidate target schema → M4 --- +# Target schema: --schema= if provided, else cfg$pipeline$schema for +# the bundle (best-effort lookup via Rscript). Sources list is built +# dynamically — M1 always present; cyphers only when --no-cyphers +# wasn't set. +echo "=== Step 9: consolidate target schema ===" ORCH_LOG=$(ls -1t data-raw/logs/*_trifecta_provincial_orchestrator.txt 2>/dev/null | head -1 || true) if [ -z "$ORCH_LOG" ]; then echo " ✗ no orchestrator log found — cannot extract per-host buckets" exit 1 fi M1_BUCKET=$(grep '^ m1 bucket:' "$ORCH_LOG" | sed 's/.*bucket: //' || true) -CY1_BUCKET=$(grep '^ cypher\[job1\] bucket:' "$ORCH_LOG" | sed 's/.*bucket: //' || true) -CY2_BUCKET=$(grep '^ cypher\[job2\] bucket:' "$ORCH_LOG" | sed 's/.*bucket: //' || true) -CY3_BUCKET=$(grep '^ cypher\[job3\] bucket:' "$ORCH_LOG" | sed 's/.*bucket: //' || true) -if [ -z "$M1_BUCKET" ] || [ -z "$CY1_BUCKET" ] || [ -z "$CY2_BUCKET" ] || [ -z "$CY3_BUCKET" ]; then - echo " ✗ failed to extract one or more buckets from $ORCH_LOG" - echo " m1=$M1_BUCKET cy1=$CY1_BUCKET cy2=$CY2_BUCKET cy3=$CY3_BUCKET" +if [ -z "$M1_BUCKET" ]; then + echo " ✗ failed to extract m1 bucket from $ORCH_LOG" exit 1 fi +if [ "$NO_CYPHERS" = "0" ]; then + CY1_BUCKET=$(grep '^ cypher\[job1\] bucket:' "$ORCH_LOG" | sed 's/.*bucket: //' || true) + CY2_BUCKET=$(grep '^ cypher\[job2\] bucket:' "$ORCH_LOG" | sed 's/.*bucket: //' || true) + CY3_BUCKET=$(grep '^ cypher\[job3\] bucket:' "$ORCH_LOG" | sed 's/.*bucket: //' || true) + if [ -z "$CY1_BUCKET" ] || [ -z "$CY2_BUCKET" ] || [ -z "$CY3_BUCKET" ]; then + echo " ✗ failed to extract cypher buckets from $ORCH_LOG" + echo " cy1=$CY1_BUCKET cy2=$CY2_BUCKET cy3=$CY3_BUCKET" + exit 1 + fi +fi + +# Resolve target schema name: explicit --schema wins, else look up +# cfg$pipeline$schema for the bundle. Explicit guards rather than a +# silent "fresh" fallback so a misconfigured --config= surfaces loud. +if [ -n "$SCHEMA" ]; then + TARGET_SCHEMA="$SCHEMA" +else + TARGET_SCHEMA=$(Rscript -e " + cfg <- link::lnk_config('$CONFIG_NAME') + s <- cfg\$pipeline\$schema + if (is.null(s) || !nzchar(s)) stop('cfg\$pipeline\$schema missing for bundle \"$CONFIG_NAME\"') + cat(s) + ") || { + echo " ✗ failed to resolve target schema for --config=$CONFIG_NAME" >&2 + echo " (lnk_config may be missing the bundle, or cfg\$pipeline\$schema is unset)" >&2 + exit 1 + } + if [ -z "$TARGET_SCHEMA" ] || [ "$TARGET_SCHEMA" = "NULL" ]; then + echo " ✗ lnk_config('$CONFIG_NAME')\$pipeline\$schema returned empty/NULL" >&2 + exit 1 + fi +fi +echo " target schema: $TARGET_SCHEMA" cd "$REPO_ROOT/data-raw" -M1_BUCKET="$M1_BUCKET" CY1_BUCKET="$CY1_BUCKET" CY2_BUCKET="$CY2_BUCKET" CY3_BUCKET="$CY3_BUCKET" \ -CY1_IP="${CY_IP[job1]}" CY2_IP="${CY_IP[job2]}" CY3_IP="${CY_IP[job3]}" \ -Rscript -e ' +if [ "$NO_CYPHERS" = "0" ]; then + SOURCES_R="list( + list(host = 'm1', via = 'docker', bucket = strsplit(Sys.getenv('M1_BUCKET'), ',')[[1]]), + list(host = paste0('cypher@', Sys.getenv('CY1_IP')), via = 'docker', bucket = strsplit(Sys.getenv('CY1_BUCKET'), ',')[[1]]), + list(host = paste0('cypher@', Sys.getenv('CY2_IP')), via = 'docker', bucket = strsplit(Sys.getenv('CY2_BUCKET'), ',')[[1]]), + list(host = paste0('cypher@', Sys.getenv('CY3_IP')), via = 'docker', bucket = strsplit(Sys.getenv('CY3_BUCKET'), ',')[[1]]) + )" + M1_BUCKET="$M1_BUCKET" CY1_BUCKET="$CY1_BUCKET" CY2_BUCKET="$CY2_BUCKET" CY3_BUCKET="$CY3_BUCKET" \ + CY1_IP="${CY_IP[job1]}" CY2_IP="${CY_IP[job2]}" CY3_IP="${CY_IP[job3]}" \ + TARGET_SCHEMA="$TARGET_SCHEMA" SOURCES_R="$SOURCES_R" \ + Rscript -e ' +suppressPackageStartupMessages({library(link)}) +source("consolidate_schema.R") +sources <- eval(parse(text = Sys.getenv("SOURCES_R"))) +result <- consolidate_schema(schema = Sys.getenv("TARGET_SCHEMA"), + sources = sources, backup = TRUE) +print(result) +saveRDS(result, "/tmp/consolidate_result.rds") +' > "$LOG_DIR/${TS}_consolidate.log" 2>&1 || { + echo " ✗ consolidate_schema.R failed; see $LOG_DIR/${TS}_consolidate.log" + exit 1 + } +else + # M1-only consolidate (no cyphers). + M1_BUCKET="$M1_BUCKET" TARGET_SCHEMA="$TARGET_SCHEMA" \ + Rscript -e ' suppressPackageStartupMessages({library(link)}) source("consolidate_schema.R") result <- consolidate_schema( - schema = "fresh", - sources = list( - list(host = "m1", via = "docker", bucket = strsplit(Sys.getenv("M1_BUCKET"), ",")[[1]]), - list(host = paste0("cypher@", Sys.getenv("CY1_IP")), via = "docker", bucket = strsplit(Sys.getenv("CY1_BUCKET"), ",")[[1]]), - list(host = paste0("cypher@", Sys.getenv("CY2_IP")), via = "docker", bucket = strsplit(Sys.getenv("CY2_BUCKET"), ",")[[1]]), - list(host = paste0("cypher@", Sys.getenv("CY3_IP")), via = "docker", bucket = strsplit(Sys.getenv("CY3_BUCKET"), ",")[[1]]) - ), - backup = TRUE) + schema = Sys.getenv("TARGET_SCHEMA"), + sources = list(list(host = "m1", via = "docker", + bucket = strsplit(Sys.getenv("M1_BUCKET"), ",")[[1]])), + backup = TRUE) print(result) saveRDS(result, "/tmp/consolidate_result.rds") ' > "$LOG_DIR/${TS}_consolidate.log" 2>&1 || { - echo " ✗ consolidate_schema.R failed; see $LOG_DIR/${TS}_consolidate.log" - exit 1 -} + echo " ✗ consolidate_schema.R failed; see $LOG_DIR/${TS}_consolidate.log" + exit 1 + } +fi echo " ✓ consolidated (see $LOG_DIR/${TS}_consolidate.log)" cd "$REPO_ROOT" diff --git a/planning/active/progress.md b/planning/active/progress.md index 407e4303..2b800b40 100644 --- a/planning/active/progress.md +++ b/planning/active/progress.md @@ -9,4 +9,5 @@ - Created branch `172-provincial-run-autonomy-renames` off main (v0.37.0 baseline). - Scaffolded PWF baseline (`task_plan.md`, `findings.md`, `progress.md`) with 7 approved phases. - **Phase 1 done.** Added `--wsgs=`, `--no-cyphers`, `--force` to `trifecta_provincial.sh`. Fixed phantom-cy (R's `paste0("cy", integer(0))` → `"cy"` recycling bug) via 3-branch `cy_host_keys`. Hardened empty-`CY_WORKSPACES` init. `/code-check` round 1 caught a silent-abort bug (R `stop()` exits bash without operator-visible message under `SPLIT_OUT=$(...)`); fixed with explicit `||` block dumping SPLIT_OUT to stderr. Round 2 clean. SPLIT_R logic verified via isolated R run. -- Next: Phase 2 — propagate the same flags through `province_run.sh` umbrella. +- **Phase 2 done.** Added 5 new flags to `province_run.sh`. Gated Step 3+4 (cypher spin/prep), Step 5 cypher archive, Step 9 cypher-source consolidate, and trap-EXIT burn behind `if NO_CYPHERS=0`. Step 7 omits `--cy-workspaces=...` under `--no-cyphers` or `--wsgs`. Step 8 ANN_CSV path is config-aware. Auto-skip-smoke fires when smoke assumptions don't hold. `/code-check` round 1 caught a silent TARGET_SCHEMA fallback bug (masked misconfigured `--config=` with hardcoded "fresh"); fixed with explicit guards. Round 2 clean. +- Next: Phase 3 — M4+M1 16-WSG integration test on the patched-but-not-yet-renamed scripts. diff --git a/planning/active/task_plan.md b/planning/active/task_plan.md index 71255712..a6696962 100644 --- a/planning/active/task_plan.md +++ b/planning/active/task_plan.md @@ -20,15 +20,17 @@ Patch the original filename first so the smoke (Phase 3) validates on the known- ## Phase 2 — CLI surface on `province_run.sh` -- [ ] Add `--wsgs=`, `--config=`, `--schema=`, `--no-cyphers`, `--force` to arg parser. -- [ ] Defaults: `--config=bcfishpass`, `--schema=""` (use bundle default), `--wsgs=""` (full bundle). -- [ ] Forward new flags to `trifecta_provincial.sh` invocation. -- [ ] When `--no-cyphers`: skip Step 3 (cypher spin), Step 4 (cypher prep), Step 5 cypher iterations (M4+M1 archive only), Step 9 cypher consolidate sources (M1 only), Step 10 cypher burn (trap-EXIT no-op). -- [ ] Step 8 ANN_CSV path: derive from `CONFIG_NAME` so non-bcfishpass bundles get `provincial_/` not hardcoded `provincial_parity/`. -- [ ] Auto-skip-smoke when `--no-cyphers` OR `--wsgs` is set. Place notice AFTER the `exec > >(tee -a "$LOG")` redirect so it lands in the log. -- [ ] Update usage block. -- [ ] `bash -n data-raw/province_run.sh` syntax-clean. -- [ ] `/code-check` clean. +- [x] Add `--wsgs=`, `--config=`, `--schema=`, `--no-cyphers`, `--force` to arg parser. Defaults: `bcfishpass`, empty schema, no filter, cyphers on, no force. +- [x] Build `DISPATCH_FLAGS` for passthrough; forwarded to `trifecta_provincial.sh` invocation in Step 7. +- [x] Gate Step 3 (cypher spin) + Step 4 (cypher prep) behind `if NO_CYPHERS=0`. Step 5 only iterates cypher archive when `NO_CYPHERS=0` (M4+M1 always archive). `CYPHERS_UP=1` only sets inside the cypher branch, so trap-EXIT burn correctly no-ops under `--no-cyphers`. +- [x] Step 7 omits `--cy-workspaces=...` when `--no-cyphers` or `--wsgs` is set (trifecta_provincial.sh derives the M4+M1-only plan from DISPATCH_FLAGS). +- [x] Step 8 ANN_CSV path config-aware: `provincial_parity/` (bcfishpass back-compat) vs `provincial_/`. +- [x] Step 9 consolidate: split into multi-host (M1+cy1+cy2+cy3) vs M1-only branches. Target schema resolved via `--schema` first, else `lnk_config(CONFIG_NAME)$pipeline$schema` lookup with explicit error/empty/NULL guards (round-1 code-check fix; round 1 had silent fallback masking misconfigured `--config=`). +- [x] Auto-skip-smoke when `--no-cyphers` OR `--wsgs` is set. Notice placed AFTER `exec > >(tee -a "$LOG")` redirect so it lands in the log. +- [x] Update usage block. +- [x] `bash -n data-raw/province_run.sh` syntax-clean. +- [x] Empirically verified TARGET_SCHEMA lookup: bcfishpass → "fresh", default → "fresh" (operator must `--schema=fresh_default` for default-bundle isolation), BOGUS → errors loud. +- [x] `/code-check` round 1: 1 real bug (TARGET_SCHEMA fallback) fixed. Round 2 clean. - [ ] Commit "province_run.sh: --wsgs / --config / --schema / --no-cyphers / --force passthrough" ## Phase 3 — Integration test (M4+M1, 16-WSG default-bundle, pre-rename) From 89da2843d479d595648ce2e88e6b6d2b57b31fc6 Mon Sep 17 00:00:00 2001 From: almac2022 Date: Thu, 14 May 2026 16:57:03 -0700 Subject: [PATCH 4/9] state_clean.sh + province_run.sh: scoped --schemas= mode + Step 0 pre-clean MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When the umbrella receives --schema=, fire a Step 0 pre-clean that drops on every host before dispatch starts. Eliminates the leftover-WSG class of consolidate failures: pg_dump --schema= on a source host that has accumulated WSGs from prior runs would pull rows that collide with destination's current bucket data (observed empirically in today's 16-WSG Phase 3 dispatch — 4 of M1's 8 WSGs failed pg_restore on duplicate keys). state_clean.sh (currently named province_clean.sh until Phase 4 rename) gains a --schemas=A,B,C scoped mode: - Drops ONLY the listed schemas (parameterized IN-list via awk) - Skips the heuristic working* / fresh_* / fresh wipe - Skips the snapshot_bcfp.sh reload (canonical fresh isn't touched) - Wall: ~10-20s vs ~2-3 min for full mode Empty --schemas= is explicitly rejected with FATAL to prevent silent destructive fall-through under dynamic-arg construction (round-1 code-check fix). province_run.sh Step 0: - Fires when --schema= is set on the umbrella - Routes through state_clean --schemas=$SCHEMA [--skip-cy] - Captures non-zero exit and aborts the umbrella loud - Skipped when --schema is empty (writes go to canonical fresh; Step 1+2 snapshot handles that path) Validated: - bash -n clean both files - Empty --schemas= guard fires as expected ("FATAL: --schemas= requires at least one schema") Co-Authored-By: Claude Opus 4.7 --- data-raw/province_clean.sh | 64 +++++++++++++++++++++++++++++++++----- data-raw/province_run.sh | 23 ++++++++++++++ 2 files changed, 80 insertions(+), 7 deletions(-) diff --git a/data-raw/province_clean.sh b/data-raw/province_clean.sh index e767f8e3..ad08943c 100755 --- a/data-raw/province_clean.sh +++ b/data-raw/province_clean.sh @@ -18,27 +18,52 @@ # - bcfishpass_ref (reference data; not pipeline output) # # Usage: -# bash data-raw/province_clean.sh [--cy-workspaces=job1,job2,job3] [--skip-m1] [--skip-cy] +# bash data-raw/province_clean.sh [flags] +# +# Flags: +# --cy-workspaces=A,B,C cypher workspaces to clean (default: job1,job2,job3) +# --skip-m1 skip M1 +# --skip-cy skip all cyphers +# --schemas=A,B,C SCOPED MODE — drop ONLY these exact schemas. +# Skips the working*/fresh_*/fresh heuristic AND +# skips the snapshot_bcfp.sh re-run (canonical state +# not touched). Use for per-bundle pre-cleans like +# `--schemas=fresh_default` before subset dispatches. # # Honors /tmp/cy_ips.env if present (set by trifecta_provincial.sh dispatch # wrapper). Otherwise derives cypher IPs from tofu state. # -# Expected wall: ~2-3 min (parallel across all hosts). +# Expected wall: +# Full mode (default): ~2-3 min (drop + snapshot reload, parallel) +# Scoped (--schemas=): ~10-20 s (drop only) set -euo pipefail CY_WORKSPACES="job1,job2,job3" SKIP_M1=0 SKIP_CY=0 +SCOPED_SCHEMAS="" +SAW_SCHEMAS=0 for arg in "$@"; do case "$arg" in --cy-workspaces=*) CY_WORKSPACES="${arg#--cy-workspaces=}" ;; --skip-m1) SKIP_M1=1 ;; --skip-cy) SKIP_CY=1 ;; + --schemas=*) SCOPED_SCHEMAS="${arg#--schemas=}"; SAW_SCHEMAS=1 ;; *) echo "FATAL: unknown arg: $arg" >&2; exit 1 ;; esac done +# Guard against empty `--schemas=` falling through to the destructive +# heuristic full-wipe. Callers that build the arg dynamically and end +# up with an empty value need to know — silently wiping `fresh` is the +# wrong default for "the operator forgot to populate $VAR". +if [ "$SAW_SCHEMAS" = "1" ] && [ -z "$SCOPED_SCHEMAS" ]; then + echo "FATAL: --schemas= requires at least one schema (got empty value)." >&2 + echo " Omit --schemas= entirely to invoke the heuristic full-wipe mode." >&2 + exit 1 +fi + REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)" # --- Resolve cypher IPs --- @@ -71,12 +96,28 @@ if [ "$SKIP_CY" = "0" ]; then fi echo " ✓ killed" -# --- Step 2-4: drop stale schemas + fresh, parallel across hosts --- -# Combined DROP block: drops working_*, fresh_*, AND fresh itself. -DROP_SQL="SELECT 'DROP SCHEMA \"' || schema_name || '\" CASCADE' FROM information_schema.schemata WHERE (schema_name LIKE 'working%' OR schema_name LIKE 'fresh_%' OR schema_name = 'fresh') AND schema_name NOT IN ('bcfishpass_ref') \\gexec +# --- Step 2-4: drop stale schemas, parallel across hosts --- +# Default (heuristic) mode: drops working_*, fresh_*, AND fresh +# itself; recreates empty `fresh`. +# +# Scoped mode (--schemas=A,B,C): drops ONLY the listed exact schemas; +# does NOT recreate `fresh`. Use for per-bundle pre-cleans before subset +# dispatches (e.g. --schemas=fresh_default). +if [ -n "$SCOPED_SCHEMAS" ]; then + # Build a literal IN-list of schema names, double-quoted for safety. + SCOPED_IN=$(echo "$SCOPED_SCHEMAS" | awk -F',' '{ + for(i=1;i<=NF;i++) { + gsub(/^[ \t]+|[ \t]+$/, "", $i) + printf("%s\047%s\047", (i==1?"":","), $i) + } + }') + DROP_SQL="SELECT 'DROP SCHEMA \"' || schema_name || '\" CASCADE' FROM information_schema.schemata WHERE schema_name IN ($SCOPED_IN) \\gexec" + echo "--- step 2-4 (scoped): drop schemas [$SCOPED_SCHEMAS], parallel ---" +else + DROP_SQL="SELECT 'DROP SCHEMA \"' || schema_name || '\" CASCADE' FROM information_schema.schemata WHERE (schema_name LIKE 'working%' OR schema_name LIKE 'fresh_%' OR schema_name = 'fresh') AND schema_name NOT IN ('bcfishpass_ref') \\gexec CREATE SCHEMA fresh;" - -echo "--- step 2-4: drop stale schemas + recreate fresh, parallel ---" + echo "--- step 2-4: drop stale schemas + recreate fresh, parallel ---" +fi ( PGPASSWORD=postgres psql -h localhost -p 5432 -U postgres -d fwapg < "/tmp/clean_m4.log" 2>&1 @@ -107,6 +148,15 @@ wait echo " ✓ schemas dropped + fresh recreated empty" # --- Step 5: reload modelled_stream_crossings via snapshot_bcfp.sh --force --- +# Skipped under --schemas= (scoped mode): canonical fresh schema wasn't +# touched, so modelled_stream_crossings is still present. +if [ -n "$SCOPED_SCHEMAS" ]; then + echo "--- step 5: SKIPPED (scoped mode — canonical fresh untouched) ---" + echo + echo "=== province_clean.sh complete (scoped: [$SCOPED_SCHEMAS]) $(date -u +%Y-%m-%dT%H:%M:%SZ) ===" + exit 0 +fi + echo "--- step 5: snapshot_bcfp.sh --force on all hosts ---" ( diff --git a/data-raw/province_run.sh b/data-raw/province_run.sh index fea194c8..e254f3f4 100755 --- a/data-raw/province_run.sh +++ b/data-raw/province_run.sh @@ -50,6 +50,7 @@ FORCE_FLAG="" for arg in "$@"; do case "$arg" in --skip-smoke) SKIP_SMOKE=1 ;; + --with-mapping-code) ;; # default-on; accept explicitly for symmetry with --no-mapping-code --no-mapping-code) NO_MAPPING=1 ;; --keep-cyphers) KEEP_CYPHERS=1 ;; --wsgs=*) WSGS_FILTER="${arg#--wsgs=}" ;; @@ -157,6 +158,28 @@ doctl compute droplet list --no-header >/dev/null 2>&1 || { echo " ✗ doctl no [ "$fail" = "0" ] || { echo "FATAL: pre-flight failed; aborting before spend"; exit 1; } echo " ✓ pre-flight clean" +# --- Step 0: pre-clean target schema (when --schema= is set) --- +# Drops $SCHEMA on every host before dispatch so the per-WSG pipeline +# writes land into a clean slate AND so consolidate's pg_dump source +# contains only the current run's bucket (no leftover WSGs from prior +# runs colliding with destination data). Uses state_clean.sh in +# scoped mode (--schemas=...) which skips the canonical-fresh wipe and +# the snapshot_bcfp.sh reload. +# +# Skipped when --schema is empty (writes go to the bundle's default +# schema, typically the canonical `fresh` — which Step 1+2's snapshot +# already handles). +if [ -n "$SCHEMA" ]; then + echo "=== Step 0: pre-clean target schema [$SCHEMA] ===" + CLEAN_ARGS="--schemas=$SCHEMA" + [ "$NO_CYPHERS" = "1" ] && CLEAN_ARGS="$CLEAN_ARGS --skip-cy" + bash data-raw/province_clean.sh $CLEAN_ARGS > "$LOG_DIR/${TS}_preclean.log" 2>&1 || { + echo "FATAL: pre-clean failed; see $LOG_DIR/${TS}_preclean.log" + exit 1 + } + echo " ✓ pre-cleaned" +fi + # --- Step 1+2: snapshot_bcfp.sh on M4 + M1 (parallel) --- echo "=== Step 1+2: snapshot_bcfp.sh --force on M4 + M1 ===" ( PGUSER=postgres PGPASSWORD=postgres PGHOST=localhost PGPORT=5432 PGDATABASE=fwapg \ From ebb2d78d1ffe6c236b44ae4f0e81cc49c44b7fc1 Mon Sep 17 00:00:00 2001 From: almac2022 Date: Thu, 14 May 2026 17:48:10 -0700 Subject: [PATCH 5/9] =?UTF-8?q?Phase=203=20acceptance=20=E2=80=94=2016/16?= =?UTF-8?q?=20WSGs=20in=20fresh=5Fdefault.streams=20on=20M4?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two-attempt journey: Attempt 1 (no pre-clean): 16-WSG dispatch landed cleanly on M4+M1's fwapg per-host, RDS files all came back, annotated CSV written, but consolidate hit 6 duplicate-key errors during pg_restore. Root cause: M1's fresh_default had leftover WSGs from yesterday's province-wide run; pg_dump --schema=fresh_default pulled rows outside the current bucket which collided with M4's data. Only 12 of 16 landed. Root-cause fix landed in commit 89da284 (state_clean.sh scoped mode + province_run.sh Step 0 pre-clean). Operator-friendly: --schema= auto-pre-cleans that schema on all hosts before dispatch. Attempt 2 (with pre-clean): 16 of 16 WSGs in M4 fresh_default.streams (468,631 rows), 20m total wall (Step 0 + cold-cache pipeline + Step 9 consolidate), exit 0, no operator prompts. Annotated CSV: 343 rows (263 NOT_APPLICABLE + 66 UNEXPLAINED at >=2% + 14 WITHIN_TOLERANCE). The 66 UNEXPLAINED are methodology divergence for the default bundle vs bcfishpass; expected for these northern WSGs and surfaced to the operator as a non-blocking WARNING. Autonomy CLI validated end-to-end: - --wsgs=CARP,CRKD,FINA,FINL,FIRE,FOXR,INGR,LOMI,MESI,NATR,OSPK,PARA, PARS,PCEA,TOOD,UOMI - --config=default --schema=fresh_default - --no-cyphers --with-mapping-code --force Co-Authored-By: Claude Opus 4.7 --- planning/active/progress.md | 7 ++++++- planning/active/task_plan.md | 19 ++++++++++--------- 2 files changed, 16 insertions(+), 10 deletions(-) diff --git a/planning/active/progress.md b/planning/active/progress.md index 2b800b40..760b3832 100644 --- a/planning/active/progress.md +++ b/planning/active/progress.md @@ -10,4 +10,9 @@ - Scaffolded PWF baseline (`task_plan.md`, `findings.md`, `progress.md`) with 7 approved phases. - **Phase 1 done.** Added `--wsgs=`, `--no-cyphers`, `--force` to `trifecta_provincial.sh`. Fixed phantom-cy (R's `paste0("cy", integer(0))` → `"cy"` recycling bug) via 3-branch `cy_host_keys`. Hardened empty-`CY_WORKSPACES` init. `/code-check` round 1 caught a silent-abort bug (R `stop()` exits bash without operator-visible message under `SPLIT_OUT=$(...)`); fixed with explicit `||` block dumping SPLIT_OUT to stderr. Round 2 clean. SPLIT_R logic verified via isolated R run. - **Phase 2 done.** Added 5 new flags to `province_run.sh`. Gated Step 3+4 (cypher spin/prep), Step 5 cypher archive, Step 9 cypher-source consolidate, and trap-EXIT burn behind `if NO_CYPHERS=0`. Step 7 omits `--cy-workspaces=...` under `--no-cyphers` or `--wsgs`. Step 8 ANN_CSV path is config-aware. Auto-skip-smoke fires when smoke assumptions don't hold. `/code-check` round 1 caught a silent TARGET_SCHEMA fallback bug (masked misconfigured `--config=` with hardcoded "fresh"); fixed with explicit guards. Round 2 clean. -- Next: Phase 3 — M4+M1 16-WSG integration test on the patched-but-not-yet-renamed scripts. +- **Phase 3 done — autonomy validated.** Two-attempt journey: + - **Attempt 1**: 16-WSG dispatch ran fine (16/16 RDS, 17m wall, exit 0) but consolidate hit 6 duplicate-key errors. M1's `fresh_default` had leftover WSGs from yesterday's province-wide dispatch; `pg_dump --schema=fresh_default` pulled rows for WSGs outside the current bucket, colliding with M4's data. Only 12 of 16 landed. + - **Root-cause fix** (Phase 1.5 commit `89da284`): added `state_clean.sh --schemas=` scoped mode (drops only listed schemas, skips canonical-fresh wipe + snapshot reload) + `province_run.sh` Step 0 pre-clean that fires when `--schema=` is set. Empty `--schemas=` guard added per round-1 code-check. + - **Attempt 2** (with pre-clean): 16/16 WSGs in `fresh_default.streams` on M4, 20m wall (pre-clean + cold-cache pipeline + consolidate). Exit 0. No mid-run prompts. Annotated CSV: 343 rows; 66 UNEXPLAINED at ≥2% surfaced as WARNING (methodology divergence for `default` bundle, expected for northern-WSG test set). +- Pre-existing limitation surfaced (consolidate stale-state collision) → resolved as part of #172 scope because the autonomy story requires it; the umbrella now genuinely runs end-to-end without operator handholding even when the cluster has leftover state. +- Next: Phase 4 — rename 8 operational scripts to noun_verb convention (git mv + reference updates). diff --git a/planning/active/task_plan.md b/planning/active/task_plan.md index a6696962..ef9d786c 100644 --- a/planning/active/task_plan.md +++ b/planning/active/task_plan.md @@ -35,19 +35,20 @@ Patch the original filename first so the smoke (Phase 3) validates on the known- ## Phase 3 — Integration test (M4+M1, 16-WSG default-bundle, pre-rename) -- [ ] Pre-flight: M4 has bcfp tunnel up, M1 ssh reachable, `fresh.modelled_stream_crossings` present on both hosts. -- [ ] Wipe state via `bash data-raw/province_clean.sh --skip-cy`. -- [ ] Run autonomous: +- [x] Pre-flight: M4 has bcfp tunnel up, M1 ssh reachable, `fresh.modelled_stream_crossings` present on both hosts. Verified at session start. +- [x] First attempt (no pre-clean) surfaced a consolidate edge case: M1's `fresh_default` had leftover WSGs from yesterday's province-wide run; `pg_dump --schema=fresh_default` pulled rows for WSGs outside the current bucket, colliding with M4's destination data on pg_restore. Six duplicate-key errors; 12 of 16 WSGs landed. +- [x] Root-cause fix: added `state_clean.sh --schemas=` scoped mode + `province_run.sh` Step 0 pre-clean. When `--schema=` is set, umbrella drops the target schema on all hosts BEFORE Step 1. +- [x] Run autonomous (relaunch with pre-clean): ```bash bash data-raw/province_run.sh \ --wsgs=CARP,CRKD,FINA,FINL,FIRE,FOXR,INGR,LOMI,MESI,NATR,OSPK,PARA,PARS,PCEA,TOOD,UOMI \ - --config=default --schema=fresh_default --no-cyphers --with-mapping-code + --config=default --schema=fresh_default --no-cyphers --with-mapping-code --force ``` -- [ ] Acceptance: exit code 0, ~30–40 min wall, no operator prompts mid-run. -- [ ] Verify `SELECT count(DISTINCT watershed_group_code) FROM fresh_default.streams` = 16 on M4. -- [ ] Verify `fresh_default.streams_habitat_*` populated for each species in the bundle. -- [ ] Verify annotated CSV written with all 16 WSGs. -- [ ] Document outcome in `progress.md` with wall time + per-host breakdown. +- [x] Acceptance: exit code 0, 20m wall (under 30-40 min budget), no operator prompts mid-run. +- [x] Verified: `fresh_default.streams` = **16/16 WSGs** on M4, 468,631 rows. CARP,CRKD,FINA,FINL,FIRE,FOXR,INGR,LOMI,MESI,NATR,OSPK,PARA,PARS,PCEA,TOOD,UOMI all present. +- [x] Per-species habitat tables: bt, gr, ko, rb + barriers (correct for the geographic test set — northern WSGs without CH/CO/SK/ST presence). +- [x] Annotated CSV: `data-raw/logs/provincial_default/202605141658_annotated.csv` — 343 rows (263 NOT_APPLICABLE + 66 UNEXPLAINED + 14 WITHIN_TOLERANCE). 66 UNEXPLAINED at ≥2% surfaced as WARNING (methodology divergence, expected for `default` vs bcfishpass). +- [x] Consolidate (M1 → M4) succeeded — no duplicate-key conflicts now that pre-clean handles stale state. ## Phase 4 — Rename 8 scripts + update all live references From 575c625c34eca209f76e8b448342968ea90a1a4e Mon Sep 17 00:00:00 2001 From: almac2022 Date: Thu, 14 May 2026 17:52:48 -0700 Subject: [PATCH 6/9] Rename 8 operational scripts to noun_verb convention Mechanical rename pass on a known-working state (Phases 1-3 of #172 shipped + validated end-to-end before this commit lands): province_run.sh -> wsgs_run_pipeline.sh province_clean.sh -> state_clean.sh province_progress.sh -> progress_check.sh trifecta_provincial.sh -> wsgs_dispatch.sh run_provincial_parity.R -> wsgs_run_host.R consolidate_schema.R -> schema_consolidate.R archive_provincial_runs.sh -> runs_archive.sh balance_provincial_buckets.R -> buckets_balance.R All renames done via git mv so `git log --follow` traces history across the rename boundary. Internal reference updates and all live tree cross-references applied in this same commit so blame stays clean. Updated callers / docs: - The 8 renamed scripts themselves (self-references in usage blocks, log filename literals, cross-script invocations). - data-raw/{trifecta_smoke.sh,query_schema_delta.R,wsg_compare.R, wsg_pipeline_run.R,README.md} - research/{provincial_run_runbook,post_compact_provincial_handoff, bcfp_compare_mapping_code,distributed_2hosts_2026_05_01, provincial_parity_2026_05_{01,11,12}}.md - CLAUDE.md, R/utils.R Sealed (intentionally NOT updated): - NEWS.md historical entries (v0.36.1 / v0.37.0 etc. reference the old names because that's what those releases shipped) - planning/archive/** (sealed history of prior PWFs) The wsg_*-family naming makes the singular/plural distinction meaningful: wsg_pipeline_run.R / wsg_compare.R are per-WSG halves (from #168); wsgs_run_host.R loops over a host's bucket; wsgs_dispatch .sh fans the work across hosts; wsgs_run_pipeline.sh is the operator umbrella that wraps the whole thing. Other wrappers got "mixed nouns" (more descriptive) per yesterday's locked-in decision: state_clean / progress_check / runs_archive / buckets_balance / schema_consolidate. bash -n clean on all renamed shells. /code-check round 1 clean: 7 concerns verified including sed idempotency (no new name contains any old name as substring). Co-Authored-By: Claude Opus 4.7 --- CLAUDE.md | 2 +- R/utils.R | 2 +- data-raw/README.md | 36 +++++++-------- ...provincial_buckets.R => buckets_balance.R} | 10 ++-- ...province_progress.sh => progress_check.sh} | 12 ++--- data-raw/query_schema_delta.R | 2 +- ...ive_provincial_runs.sh => runs_archive.sh} | 6 +-- ...solidate_schema.R => schema_consolidate.R} | 10 ++-- .../{province_clean.sh => state_clean.sh} | 14 +++--- data-raw/trifecta_smoke.sh | 14 +++--- data-raw/wsg_compare.R | 2 +- data-raw/wsg_pipeline_run.R | 2 +- ...rifecta_provincial.sh => wsgs_dispatch.sh} | 40 ++++++++-------- ...un_provincial_parity.R => wsgs_run_host.R} | 8 ++-- .../{province_run.sh => wsgs_run_pipeline.sh} | 46 +++++++++---------- planning/active/progress.md | 3 +- planning/active/task_plan.md | 25 ++++------ research/bcfp_compare_mapping_code.md | 2 +- research/distributed_2hosts_2026_05_01.md | 8 ++-- research/post_compact_provincial_handoff.md | 32 ++++++------- research/provincial_parity_2026_05_01.md | 2 +- research/provincial_parity_2026_05_11.md | 18 ++++---- research/provincial_parity_2026_05_12.md | 10 ++-- research/provincial_run_runbook.md | 30 ++++++------ 24 files changed, 165 insertions(+), 171 deletions(-) rename data-raw/{balance_provincial_buckets.R => buckets_balance.R} (95%) rename data-raw/{province_progress.sh => progress_check.sh} (88%) rename data-raw/{archive_provincial_runs.sh => runs_archive.sh} (90%) rename data-raw/{consolidate_schema.R => schema_consolidate.R} (98%) rename data-raw/{province_clean.sh => state_clean.sh} (92%) rename data-raw/{trifecta_provincial.sh => wsgs_dispatch.sh} (95%) rename data-raw/{run_provincial_parity.R => wsgs_run_host.R} (98%) rename data-raw/{province_run.sh => wsgs_run_pipeline.sh} (91%) diff --git a/CLAUDE.md b/CLAUDE.md index eaf85b98..5e69da47 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -15,7 +15,7 @@ Infrastructure stack stabilized; methodology research is the active focus. Recen - **v0.27.0** (#114, #45) — `cfg$pipeline$gradient_classes` config knob + per-species filter derivation in `prep_minimal`. Bit-identical bcfp parity by default; lets bundles override the break vector. - **v0.28.0** (#119, #45-followup) — Orphan-class break source: classes below all species' access thresholds enter `gradient_barriers_minimal` as a shared `barriers_orphan` table (segmentation only, no access semantics). New `default_extrabreaks` bundle. Province-wide methodology delta (231 vs 225 WSGs): SK spawn +6.7%, BT/RB/ST/WCT/GR spawn +1–2%, CH/ST rear -11%, BT/CO/GR rear -3 to -7%. "Ceiling sub-segment" mechanism — flat parts of mixed reaches now pass spawn; steep pockets previously folded into rear via averaging exceed rear ceiling as standalone segments. -- **v0.29.0** (#120, #118) — DB hygiene: `cleanup_working = TRUE` in `compare_bcfishpass_wsg` drops working schemas after rollup; `keep_source = FALSE` in `consolidate_schema` drops source schemas after pg_restore. Prevents the disk-full incident that crashed cypher 2026-05-04. +- **v0.29.0** (#120, #118) — DB hygiene: `cleanup_working = TRUE` in `compare_bcfishpass_wsg` drops working schemas after rollup; `keep_source = FALSE` in `schema_consolidate` drops source schemas after pg_restore. Prevents the disk-full incident that crashed cypher 2026-05-04. Cypher recovered (volumes nuked); needs fwapg reload before next provincial run. Methodology research is active: `default_extrabreaks` proved the mechanism; next variants include spawn-only / rear-only break vectors (each adds ~4 new classes; per-WSG cost ~1.3–1.4× vs default) and the channel-class breaks research (#52, blocked on fresh-side helper). diff --git a/R/utils.R b/R/utils.R index 512c24d5..82c726ec 100644 --- a/R/utils.R +++ b/R/utils.R @@ -193,7 +193,7 @@ #' Probes `.streams` for `watershed_group_code = aoi`. #' Returns `TRUE` when the table exists and has at least one row for #' the WSG; `FALSE` when the table is absent OR has no rows for the -#' WSG. Used by the orchestrator loop (run_provincial_parity.R) as the +#' WSG. Used by the orchestrator loop (wsgs_run_host.R) as the #' canonical resume check — PG state is authoritative, RDS files are #' diagnostic side-artifacts. #' diff --git a/data-raw/README.md b/data-raw/README.md index 1d5bce14..b36c4dc7 100644 --- a/data-raw/README.md +++ b/data-raw/README.md @@ -76,10 +76,10 @@ The dispatch hierarchy: trifecta → run_provincial → compare_wsg. | Script | Calls | Purpose | |--------|-------|---------| -| `trifecta_provincial.sh` | `run_provincial_parity.R` (×N hosts) | M4 + M1 + N-cypher orchestrator. Inline LPT bucket allocation (reads `_per_wsg_times.csv` from prior runs, computes balanced split using `--host-speeds=`), pre-flight version check across all hosts, parallel dispatch, RDS pull-back, post-pull `lnk_parity_annotate` against the divergence taxonomy. See "Provincial dispatch" section below for full flag reference + gotchas. | +| `wsgs_dispatch.sh` | `wsgs_run_host.R` (×N hosts) | M4 + M1 + N-cypher orchestrator. Inline LPT bucket allocation (reads `_per_wsg_times.csv` from prior runs, computes balanced split using `--host-speeds=`), pre-flight version check across all hosts, parallel dispatch, RDS pull-back, post-pull `lnk_parity_annotate` against the divergence taxonomy. See "Provincial dispatch" section below for full flag reference + gotchas. | | `trifecta_15wsg.sh` | same | 15-WSG smoke variant (legacy 3-host, hardcoded WSG list). | -| `trifecta_smoke.sh` | `trifecta_provincial.sh` | N-host smoke shim: one small WSG per host, ~3 min wall. See `Provincial dispatch` section. | -| `run_provincial_parity.R` | `compare_bcfishpass_wsg.R` per WSG | Single-host provincial dispatcher. Loops every WSG in `wsg_species_presence`, saves per-WSG RDS, emits per-WSG times CSV. After the loop, optionally annotates the host's bucket against `research/bcfp_divergence_taxonomy.yml` (writes `__annotated.csv`). Accepts `--wsgs=`, `--config=`, `--schema=`, `--rds-dir=`, `--with-mapping-code`. | +| `trifecta_smoke.sh` | `wsgs_dispatch.sh` | N-host smoke shim: one small WSG per host, ~3 min wall. See `Provincial dispatch` section. | +| `wsgs_run_host.R` | `compare_bcfishpass_wsg.R` per WSG | Single-host provincial dispatcher. Loops every WSG in `wsg_species_presence`, saves per-WSG RDS, emits per-WSG times CSV. After the loop, optionally annotates the host's bucket against `research/bcfp_divergence_taxonomy.yml` (writes `__annotated.csv`). Accepts `--wsgs=`, `--config=`, `--schema=`, `--rds-dir=`, `--with-mapping-code`. | | `compare_bcfishpass_wsg.R` | `lnk_pipeline_*` family | Single-WSG end-to-end runner. Sources both connections (local fwapg + bcfp tunnel), runs the 6-phase pipeline, persists, emits comparison rollup tibble (link vs bcfp). The atomic unit of work in every multi-WSG run above. | ## Pipeline support @@ -88,14 +88,14 @@ Run-adjacent helpers (planning, consolidation across hosts). | Script | Purpose | |--------|---------| -| `balance_provincial_buckets.R` | Standalone LPT planner for the 3-host case. Reads per-host wall times from prior runs and prints buckets ready to paste into `trifecta_provincial.sh --m4-bucket=…`. **Superseded for the N-host orchestrator** — `trifecta_provincial.sh` now computes the LPT plan inline at dispatch time using the same algorithm. Kept here for one-off planning + cross-checks. Dedups `(wsg, host)` and across hosts before LPT so multi-run CSV accumulation doesn't double-assign WSGs. | -| `consolidate_schema.R` | pg_dump from M1 + cypher → scp to M4 → pg_restore --data-only. Bucket-aware destination cleanup (DELETEs each source host's WSG bucket from destination tables before restore — avoids duplicate-key violations on re-consolidation). `ok = TRUE` requires pg_restore rc=0 AND post-restore row count > 0; rc=0 with empty schema flags as failure. | -| `archive_provincial_runs.sh` | Moves the current top-level `_per_wsg_times.csv` + `*.rds` + `*_annotated.csv` artifacts in `provincial_/` to `archive//`. Operator cadence: run between provincial runs when you want the LPT planner to use the most recent run only. Skip to median-over multiple recent runs. | -| `trifecta_smoke.sh` | Thin shim over `trifecta_provincial.sh` — one small WSG per host (m4→DEAD, m1→ELKR, cyN→ADMS/BABL/BULL). ~3 min wall. Exercises every orchestrator code path (preflight, dispatch, tunnel, RDS pull-back, annotation) before committing to a 200-WSG run. All flags pass through (e.g. `--cy-workspaces=`, `--with-mapping-code`). | +| `buckets_balance.R` | Standalone LPT planner for the 3-host case. Reads per-host wall times from prior runs and prints buckets ready to paste into `wsgs_dispatch.sh --m4-bucket=…`. **Superseded for the N-host orchestrator** — `wsgs_dispatch.sh` now computes the LPT plan inline at dispatch time using the same algorithm. Kept here for one-off planning + cross-checks. Dedups `(wsg, host)` and across hosts before LPT so multi-run CSV accumulation doesn't double-assign WSGs. | +| `schema_consolidate.R` | pg_dump from M1 + cypher → scp to M4 → pg_restore --data-only. Bucket-aware destination cleanup (DELETEs each source host's WSG bucket from destination tables before restore — avoids duplicate-key violations on re-consolidation). `ok = TRUE` requires pg_restore rc=0 AND post-restore row count > 0; rc=0 with empty schema flags as failure. | +| `runs_archive.sh` | Moves the current top-level `_per_wsg_times.csv` + `*.rds` + `*_annotated.csv` artifacts in `provincial_/` to `archive//`. Operator cadence: run between provincial runs when you want the LPT planner to use the most recent run only. Skip to median-over multiple recent runs. | +| `trifecta_smoke.sh` | Thin shim over `wsgs_dispatch.sh` — one small WSG per host (m4→DEAD, m1→ELKR, cyN→ADMS/BABL/BULL). ~3 min wall. Exercises every orchestrator code path (preflight, dispatch, tunnel, RDS pull-back, annotation) before committing to a 200-WSG run. All flags pass through (e.g. `--cy-workspaces=`, `--with-mapping-code`). | -## Provincial dispatch (`trifecta_provincial.sh`) +## Provincial dispatch (`wsgs_dispatch.sh`) -The flagship orchestrator. Dispatches `run_provincial_parity.R` across +The flagship orchestrator. Dispatches `wsgs_run_host.R` across M4 + M1 + N cyphers in parallel, pulls RDS files back, and emits a province-wide annotated CSV. @@ -106,7 +106,7 @@ cd ~/Projects/repo/link/data-raw # Optional: archive prior run's CSVs first if you want LPT to plan # against this run only (not median-of-recent-runs): -./archive_provincial_runs.sh +./runs_archive.sh # Smoke-test first (~3 min, one small WSG per host) — catches preflight, # tunnel, dispatch, and annotation surprises before the full run: @@ -114,14 +114,14 @@ cd ~/Projects/repo/link/data-raw ./trifecta_smoke.sh --cy-workspaces=job1,job2,job3 # 5-host smoke # Full run: -./trifecta_provincial.sh # 3-host default -./trifecta_provincial.sh --cy-workspaces=job1,job2,job3 # 5-host +./wsgs_dispatch.sh # 3-host default +./wsgs_dispatch.sh --cy-workspaces=job1,job2,job3 # 5-host # Add per-segment mapping_code lens (+50% cost): -./trifecta_provincial.sh --cy-workspaces=job1,job2,job3 --with-mapping-code +./wsgs_dispatch.sh --cy-workspaces=job1,job2,job3 --with-mapping-code # Custom host-speed factors (lower = faster): -./trifecta_provincial.sh --host-speeds=m4=1.0,m1=0.83,cy=1.83 +./wsgs_dispatch.sh --host-speeds=m4=1.0,m1=0.83,cy=1.83 ``` **Recommended cadence:** archive → smoke → full run. The smoke catches @@ -240,7 +240,7 @@ Research scratchpad and one-off verification scripts. | Script | Purpose | |--------|---------| -| `_targets.R` | The (legacy) targets pipeline that pre-dated `trifecta_provincial.sh`. Kept for the multi-WSG comparison harness. | +| `_targets.R` | The (legacy) targets pipeline that pre-dated `wsgs_dispatch.sh`. Kept for the multi-WSG comparison harness. | | `exp_gradient_extra_breaks.R` | Experimental script that prototyped the orphan-class break source via in-line `frs_break_apply` before it was absorbed into `lnk_pipeline_prep_minimal()` (link v0.28.0). Kept as the smoke-test reference. | | `rule_flexibility_demo.R` / `rule_flexibility_render.R` | Demonstration of rules.yaml format flexibility, rendered as RMarkdown. | | `regress_dams_isolation.R` | One-off regression test for the dams-isolation work (link #109). | @@ -266,9 +266,9 @@ discoverable without opening the file: Run artifacts land in subdirectories of `data-raw/logs/` keyed by topic: -- `provincial_parity/`, `provincial_default/`, `provincial_default_extrabreaks/` — per-WSG RDS + per-WSG times CSV from `run_provincial_parity.R`. +- `provincial_parity/`, `provincial_default/`, `provincial_default_extrabreaks/` — per-WSG RDS + per-WSG times CSV from `wsgs_run_host.R`. - `methodology_delta/` — schema-vs-schema delta RDS from `methodology_delta_query.R`. -- `dumps_/` — pg_dump custom-format files from `consolidate_schema.R`. +- `dumps_/` — pg_dump custom-format files from `schema_consolidate.R`. - `_*.txt` — orchestrator + per-host run logs. Reusable helper scripts read from these directories without hardcoded @@ -290,7 +290,7 @@ single bundle's persistent schema. Rough footprints on a 232-WSG run: The 2026-05-04 cypher disk-full incident filled a 96 GB droplet with 3 accumulated bundles + 60 working schemas at once. After the v0.29.0 hygiene fixes (`compare_bcfishpass_wsg(cleanup_working = TRUE)` drops -working schemas on completion; `consolidate_schema(keep_source = FALSE)` +working schemas on completion; `schema_consolidate(keep_source = FALSE)` drops source persistent schema after successful pg_restore), a single-bundle-in-flight worker holds ~60 GB total — comfortable on the existing 96 GB cypher tier. diff --git a/data-raw/balance_provincial_buckets.R b/data-raw/buckets_balance.R similarity index 95% rename from data-raw/balance_provincial_buckets.R rename to data-raw/buckets_balance.R index 63dbe6c3..cabfcd69 100644 --- a/data-raw/balance_provincial_buckets.R +++ b/data-raw/buckets_balance.R @@ -9,7 +9,7 @@ # cypher caught up. This script projects ~10-15 min savings per run. # # Usage: -# Rscript data-raw/balance_provincial_buckets.R +# Rscript data-raw/buckets_balance.R # # Hardcoded inputs (override above when re-running with new baseline): # - Yesterday's host log paths (one per host) @@ -23,16 +23,16 @@ suppressPackageStartupMessages({}) logs_dir <- "/Users/airvine/Projects/repo/link/data-raw/logs" -# Prefer per-WSG CSVs (emitted by run_provincial_parity.R after this script +# Prefer per-WSG CSVs (emitted by wsgs_run_host.R after this script # was added). Fall back to text-log regex parsing for older runs. csvs <- list.files(file.path(logs_dir, "provincial_parity"), pattern = "_per_wsg_times\\.csv$", full.names = TRUE) csvs <- c(csvs, list.files(file.path(logs_dir, "provincial_default"), pattern = "_per_wsg_times\\.csv$", full.names = TRUE)) if (length(csvs) == 0) { - m4_log <- file.path(logs_dir, "202605031423_trifecta_provincial_m4.txt") - m1_log <- file.path(logs_dir, "202605031423_trifecta_provincial_m1.txt") - cy_log <- "/Users/airvine/Projects/repo/rtj/scripts/cypher/logs/202605031423_cypher-run_202605031423_trifecta_provincial_cypher.txt" + m4_log <- file.path(logs_dir, "202605031423_wsgs_dispatch_m4.txt") + m1_log <- file.path(logs_dir, "202605031423_wsgs_dispatch_m1.txt") + cy_log <- "/Users/airvine/Projects/repo/rtj/scripts/cypher/logs/202605031423_cypher-run_202605031423_wsgs_dispatch_cypher.txt" } # Host speed factors (M4 = reference). Use yesterday's per-host MEAN diff --git a/data-raw/province_progress.sh b/data-raw/progress_check.sh similarity index 88% rename from data-raw/province_progress.sh rename to data-raw/progress_check.sh index 4ff3332b..aa2026d3 100755 --- a/data-raw/province_progress.sh +++ b/data-raw/progress_check.sh @@ -1,5 +1,5 @@ #!/usr/bin/env bash -# province_progress.sh — report live progress of an in-flight provincial dispatch. +# progress_check.sh — report live progress of an in-flight provincial dispatch. # # Reads each host's newest `_per_wsg_times.csv` (by mtime, NOT by date glob — # cypher logs use UTC, M4/M1 use local TZ; date-globbing across hosts breaks @@ -9,9 +9,9 @@ # plus a sample of the most recent completions to see what each host is on. # # Usage: -# bash data-raw/province_progress.sh [--cy-workspaces=job1,job2,job3] [--mtime-min=120] +# bash data-raw/progress_check.sh [--cy-workspaces=job1,job2,job3] [--mtime-min=120] # -# Honors /tmp/cy_ips.env if present (set by trifecta_provincial.sh dispatch). +# Honors /tmp/cy_ips.env if present (set by wsgs_dispatch.sh dispatch). # Otherwise derives cypher IPs from tofu state per workspace. # # --mtime-min=N : only consider CSVs modified in the last N minutes (default 240 @@ -85,11 +85,11 @@ done echo # Orchestrator-side state -if pgrep -f "trifecta_provincial.sh" >/dev/null 2>&1; then - PID=$(pgrep -f "trifecta_provincial.sh" | head -1) +if pgrep -f "wsgs_dispatch.sh" >/dev/null 2>&1; then + PID=$(pgrep -f "wsgs_dispatch.sh" | head -1) echo "dispatch process: ✓ PID=$PID (running)" # Find latest orchestrator log to extract bucket sizes if possible - LATEST_ORCH=$(find ~/Projects/repo/link/data-raw/logs -maxdepth 1 -name '*_trifecta_provincial_orchestrator.txt' -mmin -$MTIME_MIN | sort | tail -1) + LATEST_ORCH=$(find ~/Projects/repo/link/data-raw/logs -maxdepth 1 -name '*_wsgs_dispatch_orchestrator.txt' -mmin -$MTIME_MIN | sort | tail -1) if [ -n "$LATEST_ORCH" ]; then echo "orchestrator log: $LATEST_ORCH" grep -E "total WSGs|projected finish" "$LATEST_ORCH" 2>/dev/null | head -2 diff --git a/data-raw/query_schema_delta.R b/data-raw/query_schema_delta.R index 0b131897..12fccfbc 100644 --- a/data-raw/query_schema_delta.R +++ b/data-raw/query_schema_delta.R @@ -5,7 +5,7 @@ # `.streams` + `.streams_habitat_` pairs. # # Schemas are populated by lnk_pipeline_persist via a provincial trifecta -# run (see trifecta_provincial.sh). This script reads streams + +# run (see wsgs_dispatch.sh). This script reads streams + # streams_habitat_ from each schema and emits: # 1. Province-wide totals per species (spawn / rear / accessible km) # 2. Per-species delta summary (km, percent, # WSGs shifted) diff --git a/data-raw/archive_provincial_runs.sh b/data-raw/runs_archive.sh similarity index 90% rename from data-raw/archive_provincial_runs.sh rename to data-raw/runs_archive.sh index f67fb4ed..a43f909f 100755 --- a/data-raw/archive_provincial_runs.sh +++ b/data-raw/runs_archive.sh @@ -1,7 +1,7 @@ #!/usr/bin/env bash # Archive per-run artifacts in data-raw/logs/provincial_/ to # data-raw/logs/provincial_/archive// so the LPT planner -# (both trifecta_provincial.sh inline and balance_provincial_buckets.R) +# (both wsgs_dispatch.sh inline and buckets_balance.R) # sees only the LATEST run's _per_wsg_times.csv files in the top level. # # Operator cadence: run this BEFORE kicking off a new provincial run if @@ -10,8 +10,8 @@ # smoothing out noisy one-offs but slower to react to host changes). # # Usage: -# ./archive_provincial_runs.sh # bcfishpass (default) -# ./archive_provincial_runs.sh --config=default # different bundle +# ./runs_archive.sh # bcfishpass (default) +# ./runs_archive.sh --config=default # different bundle # # What's archived: # - *_per_wsg_times.csv (drives LPT) diff --git a/data-raw/consolidate_schema.R b/data-raw/schema_consolidate.R similarity index 98% rename from data-raw/consolidate_schema.R rename to data-raw/schema_consolidate.R index 374442e9..8e24d89a 100644 --- a/data-raw/consolidate_schema.R +++ b/data-raw/schema_consolidate.R @@ -1,12 +1,12 @@ #!/usr/bin/env Rscript -# data-raw/consolidate_schema.R +# data-raw/schema_consolidate.R # # Consolidate a Postgres schema from multiple remote hosts onto the # local fwapg via `pg_dump -Fc` + scp + `pg_restore --data-only`. # # Usage: # Source this file (or Rscript), then: -# consolidate_schema( +# schema_consolidate( # schema = "fresh", # sources = list( # list(host = "m1", via = "docker", container = "fresh-db"), @@ -18,7 +18,7 @@ # pattern (fresh schema across M4 + M1 + cypher) but not generalized # to arbitrary table subsets, alternative protocols (COPY streaming, # logical replication), or arbitrary destination conns. Promote to -# `lnk_consolidate_schema()` only after using it 2-3 times for +# `lnk_schema_consolidate()` only after using it 2-3 times for # different schemas — the right API will emerge from real usage. # # Why pg_dump/restore not COPY streaming: per-host dumps act as @@ -60,7 +60,7 @@ #' retry. #' #' @return Invisibly: list of per-source pg_dump + restore outcomes. -consolidate_schema <- function(schema, +schema_consolidate <- function(schema, sources, backup = TRUE, dest_conn = link::lnk_db_conn(), @@ -246,7 +246,7 @@ consolidate_schema <- function(schema, # --------------------------------------------------------------------------- if (!interactive() && length(commandArgs(trailingOnly = TRUE)) == 0L && sys.nframe() == 0L) { - consolidate_schema( + schema_consolidate( schema = "fresh", sources = list( list(host = "m1", via = "docker"), diff --git a/data-raw/province_clean.sh b/data-raw/state_clean.sh similarity index 92% rename from data-raw/province_clean.sh rename to data-raw/state_clean.sh index ad08943c..a43690d8 100755 --- a/data-raw/province_clean.sh +++ b/data-raw/state_clean.sh @@ -1,5 +1,5 @@ #!/usr/bin/env bash -# province_clean.sh — wipe link-pipeline state on all hosts to a known-clean +# state_clean.sh — wipe link-pipeline state on all hosts to a known-clean # baseline. Idempotent. Runs in <5 min wall on a healthy cluster. # # What it cleans: @@ -18,7 +18,7 @@ # - bcfishpass_ref (reference data; not pipeline output) # # Usage: -# bash data-raw/province_clean.sh [flags] +# bash data-raw/state_clean.sh [flags] # # Flags: # --cy-workspaces=A,B,C cypher workspaces to clean (default: job1,job2,job3) @@ -30,7 +30,7 @@ # not touched). Use for per-bundle pre-cleans like # `--schemas=fresh_default` before subset dispatches. # -# Honors /tmp/cy_ips.env if present (set by trifecta_provincial.sh dispatch +# Honors /tmp/cy_ips.env if present (set by wsgs_dispatch.sh dispatch # wrapper). Otherwise derives cypher IPs from tofu state. # # Expected wall: @@ -79,12 +79,12 @@ if [ "$SKIP_CY" = "0" ]; then done fi -echo "=== province_clean.sh starting $(date -u +%Y-%m-%dT%H:%M:%SZ) ===" +echo "=== state_clean.sh starting $(date -u +%Y-%m-%dT%H:%M:%SZ) ===" echo " hosts: M4 + $([ "$SKIP_M1" = "0" ] && echo "M1 + ")$([ "$SKIP_CY" = "0" ] && echo "${#CY_IPS[@]} cyphers" || echo "no cyphers")" # --- Step 1: kill in-flight processes --- echo "--- step 1: kill in-flight dispatch ---" -ps -ef | grep -E "trifecta_provincial|cypher_run|run_provincial_parity|ssh.*cypher@|ssh.*-R.*m1|consolidate_schema" \ +ps -ef | grep -E "wsgs_dispatch|cypher_run|wsgs_run_host|ssh.*cypher@|ssh.*-R.*m1|schema_consolidate" \ | grep -v grep | awk '{print $2}' | xargs -r kill -9 2>/dev/null || true sleep 2 [ "$SKIP_M1" = "0" ] && ssh m1 'pkill -9 -f "Rscript.*run_provincial" 2>/dev/null' 2>&1 || true @@ -153,7 +153,7 @@ echo " ✓ schemas dropped + fresh recreated empty" if [ -n "$SCOPED_SCHEMAS" ]; then echo "--- step 5: SKIPPED (scoped mode — canonical fresh untouched) ---" echo - echo "=== province_clean.sh complete (scoped: [$SCOPED_SCHEMAS]) $(date -u +%Y-%m-%dT%H:%M:%SZ) ===" + echo "=== state_clean.sh complete (scoped: [$SCOPED_SCHEMAS]) $(date -u +%Y-%m-%dT%H:%M:%SZ) ===" exit 0 fi @@ -204,4 +204,4 @@ if [ "$SKIP_CY" = "0" ]; then done fi -echo "=== province_clean.sh complete $(date -u +%Y-%m-%dT%H:%M:%SZ) ===" +echo "=== state_clean.sh complete $(date -u +%Y-%m-%dT%H:%M:%SZ) ===" diff --git a/data-raw/trifecta_smoke.sh b/data-raw/trifecta_smoke.sh index 863324b8..57afb4ac 100755 --- a/data-raw/trifecta_smoke.sh +++ b/data-raw/trifecta_smoke.sh @@ -1,5 +1,5 @@ #!/usr/bin/env bash -# Smoke variant of trifecta_provincial.sh — one small WSG per host. +# Smoke variant of wsgs_dispatch.sh — one small WSG per host. # # Thin shim that calls the production orchestrator with explicit per-host # bucket overrides. Goal: exercise EVERY code path the full provincial run @@ -19,7 +19,7 @@ # cy2 → BABL (small generic) # cy3 → BULL (small generic) # -# All other flags pass through to trifecta_provincial.sh. +# All other flags pass through to wsgs_dispatch.sh. set -euo pipefail @@ -32,7 +32,7 @@ for arg in "$@"; do --cy-workspaces=*) CY_WORKSPACES="${arg#--cy-workspaces=}" ;; --m4-bucket=*|--m1-bucket=*|--cy-bucket=*|--cy[1-9]-bucket=*) echo "ERROR: trifecta_smoke.sh does not accept manual bucket overrides." >&2 - echo " Smoke picks one small WSG per host. Use trifecta_provincial.sh directly to override." >&2 + echo " Smoke picks one small WSG per host. Use wsgs_dispatch.sh directly to override." >&2 exit 2 ;; *) @@ -89,7 +89,7 @@ find "$SMOKE_DIR" -maxdepth 1 -name '*.rds' 2>/dev/null | sort > "$PRE_RDS_LIST" # Run the orchestrator. Don't exec — we want control flow back so we # can run the smoke-pass assertion afterward. ORCH_RC=0 -bash "$SCRIPT_DIR/trifecta_provincial.sh" \ +bash "$SCRIPT_DIR/wsgs_dispatch.sh" \ --m4-bucket=DEAD \ --m1-bucket=ELKR \ --cy-workspaces="$CY_WORKSPACES" \ @@ -134,11 +134,11 @@ if [ -n "$ERR_LINE" ]; then echo "" >&2 echo "[smoke] FAILED: $N WSG(s) errored: $WSGS" >&2 echo "[smoke] inspect logs:" >&2 - echo " data-raw/logs/_trifecta_provincial_*.txt (orchestrator-side)" >&2 + echo " data-raw/logs/_wsgs_dispatch_*.txt (orchestrator-side)" >&2 echo " rtj/scripts/cypher/logs/_cypher-run_*.txt (cypher-side R output)" >&2 - echo "[smoke] DO NOT dispatch trifecta_provincial.sh until these are fixed." >&2 + echo "[smoke] DO NOT dispatch wsgs_dispatch.sh until these are fixed." >&2 exit 5 fi -echo "[smoke] PASS: all $(echo "$NEW_RDS" | wc -l | tr -d ' ') new RDS are successful tibbles. Safe to dispatch trifecta_provincial.sh." +echo "[smoke] PASS: all $(echo "$NEW_RDS" | wc -l | tr -d ' ') new RDS are successful tibbles. Safe to dispatch wsgs_dispatch.sh." exit $ORCH_RC diff --git a/data-raw/wsg_compare.R b/data-raw/wsg_compare.R index af601989..64de9423 100644 --- a/data-raw/wsg_compare.R +++ b/data-raw/wsg_compare.R @@ -2,7 +2,7 @@ # # Compare-only wrapper around `link::lnk_compare_rollup(reference = "bcfishpass")` # for the targets pipeline in data-raw/_targets.R and the orchestrator -# scripts (run_provincial_parity.R, trifecta_*.sh). +# scripts (wsgs_run_host.R, trifecta_*.sh). # # Reads persisted state in .streams + streams_habitat_ # (written by `wsg_pipeline_run.R` or any prior modelling call), queries diff --git a/data-raw/wsg_pipeline_run.R b/data-raw/wsg_pipeline_run.R index 29005d88..fefc702c 100644 --- a/data-raw/wsg_pipeline_run.R +++ b/data-raw/wsg_pipeline_run.R @@ -2,7 +2,7 @@ # # Modelling-only wrapper around `link::lnk_pipeline_run()` for the # targets pipeline (data-raw/_targets.R) and the orchestrator scripts -# (run_provincial_parity.R, trifecta_*.sh). +# (wsgs_run_host.R, trifecta_*.sh). # # Writes per-WSG segment-level data into the persistent # .streams + per-species streams_habitat_ + barriers diff --git a/data-raw/trifecta_provincial.sh b/data-raw/wsgs_dispatch.sh similarity index 95% rename from data-raw/trifecta_provincial.sh rename to data-raw/wsgs_dispatch.sh index 37f19f9f..f3b06a79 100755 --- a/data-raw/trifecta_provincial.sh +++ b/data-raw/wsgs_dispatch.sh @@ -1,6 +1,6 @@ #!/usr/bin/env bash # Provincial parity orchestrator — dispatch across M4 + M1 + N cyphers. -# Each host runs `run_provincial_parity.R --wsgs= --config=` +# Each host runs `wsgs_run_host.R --wsgs= --config=` # (resume-safe; skips WSGs whose RDS already exists). After all hosts # finish, pulls every host's RDS files back to M4, binds them, and writes # `_annotated.csv` against `research/bcfp_divergence_taxonomy.yml`. @@ -13,11 +13,11 @@ # Per-host `---bucket=` overrides still take precedence. # # Usage: -# ./trifecta_provincial.sh # 3-host default: M4 + M1 + 1 cypher -# ./trifecta_provincial.sh --with-mapping-code # per-WSG mapping_code lens -# ./trifecta_provincial.sh --cy-workspaces=job1,job2,job3 # 5-host: 3 cyphers -# ./trifecta_provincial.sh --wsgs=ADMS,BULK,DEAD --no-cyphers # M4+M1, 3 WSGs -# ./trifecta_provincial.sh --force --no-cyphers # force re-run, M4+M1 only +# ./wsgs_dispatch.sh # 3-host default: M4 + M1 + 1 cypher +# ./wsgs_dispatch.sh --with-mapping-code # per-WSG mapping_code lens +# ./wsgs_dispatch.sh --cy-workspaces=job1,job2,job3 # 5-host: 3 cyphers +# ./wsgs_dispatch.sh --wsgs=ADMS,BULK,DEAD --no-cyphers # M4+M1, 3 WSGs +# ./wsgs_dispatch.sh --force --no-cyphers # force re-run, M4+M1 only # # CLI flags: # --config= bundle (default: bcfishpass) @@ -37,7 +37,7 @@ # --cy-bucket= single-cypher override (only valid with 1 workspace) # --cy-workspaces= comma-list of cypher tofu workspaces (default: "default") # --cyN-bucket= per-cypher override (1-indexed, e.g. --cy1-bucket=...) -# --with-mapping-code pass through to run_provincial_parity.R +# --with-mapping-code pass through to wsgs_run_host.R # --skip-preflight skip version-match check (debug only) # # Estimated wall: ~2 hours single-cypher, ~50-60 min 3-cypher, @@ -129,7 +129,7 @@ REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)" LOG_DIR="$REPO_ROOT/data-raw/logs" mkdir -p "$LOG_DIR" TS=$(date +%Y%m%d%H%M) -ORCH_LOG="$LOG_DIR/${TS}_trifecta_provincial_orchestrator.txt" +ORCH_LOG="$LOG_DIR/${TS}_wsgs_dispatch_orchestrator.txt" # --------------------------------------------------------------------------- # Bucket allocation. Greedy LPT (Longest Processing Time first): @@ -151,7 +151,7 @@ ORCH_LOG="$LOG_DIR/${TS}_trifecta_provincial_orchestrator.txt" # Manual --m4-bucket / --m1-bucket / --cyN-bucket overrides ALWAYS take # precedence over the computed LPT plan. # --------------------------------------------------------------------------- -SPLIT_R="$LOG_DIR/${TS}_trifecta_provincial_split.R" +SPLIT_R="$LOG_DIR/${TS}_wsgs_dispatch_split.R" cat > "$SPLIT_R" < "$CY_SHELL" < "$M4_SHELL" < "$M4_LOG" 2>&1 ) & @@ -579,10 +579,10 @@ M4_PID=$! # - M1 must allow reverse port-forwards (OpenSSH default = yes). # - Nothing else listening on M1's port 63333 (M1 has no persistent # tunnel; this is free in practice). -M1_LOG="$LOG_DIR/${TS}_trifecta_provincial_m1.txt" +M1_LOG="$LOG_DIR/${TS}_wsgs_dispatch_m1.txt" ( ssh -o ServerAliveInterval=60 -o ServerAliveCountMax=10 \ -R 63333:127.0.0.1:63333 m1 \ - "cd ~/Projects/repo/link/data-raw && Rscript run_provincial_parity.R '--wsgs=$M1_WSGS' $EXTRA_ARGS" \ + "cd ~/Projects/repo/link/data-raw && Rscript wsgs_run_host.R '--wsgs=$M1_WSGS' $EXTRA_ARGS" \ > "$M1_LOG" 2>&1 ) & M1_PID=$! @@ -591,7 +591,7 @@ declare -a CY_PIDS=() declare -a CY_LOGS=() for ((i=0; i "$CY_LOG" 2>&1 ) & @@ -650,7 +650,7 @@ for ((i=0; i/dev/null | head -1 || true) if [ -n "$CY_R_LOG" ] && [ -f "$CY_R_LOG" ]; then - cp "$CY_R_LOG" "$LOG_DIR/${TS}_trifecta_provincial_cypher_${WS}_R.txt" + cp "$CY_R_LOG" "$LOG_DIR/${TS}_wsgs_dispatch_cypher_${WS}_R.txt" fi done @@ -705,7 +705,7 @@ cat(n_ok, n_err) echo "[trifecta-provincial] local RDS: $TOTAL_RDS / $TOTAL pulled — $N_OK OK, $N_ERR errors" if [ "$N_ERR" -gt 0 ]; then echo "[trifecta-provincial] WARN: $N_ERR error-stub RDS found. Inspect cypher-side R logs:" - ls "$LOG_DIR/${TS}_trifecta_provincial_cypher_"*_R.txt 2>/dev/null | sed 's/^/ /' || true + ls "$LOG_DIR/${TS}_wsgs_dispatch_cypher_"*_R.txt 2>/dev/null | sed 's/^/ /' || true fi else echo "[trifecta-provincial] local RDS file count: 0 / $TOTAL (no files pulled — all hosts failed?)" diff --git a/data-raw/run_provincial_parity.R b/data-raw/wsgs_run_host.R similarity index 98% rename from data-raw/run_provincial_parity.R rename to data-raw/wsgs_run_host.R index 0657fb78..a65d8289 100644 --- a/data-raw/run_provincial_parity.R +++ b/data-raw/wsgs_run_host.R @@ -28,7 +28,7 @@ # These are accepted as known gaps in this baseline. # # Run from data-raw/: -# Rscript run_provincial_parity.R > logs/_provincial_parity.txt 2>&1 & +# Rscript wsgs_run_host.R > logs/_provincial_parity.txt 2>&1 & suppressPackageStartupMessages({ library(link); library(fresh); library(dplyr); library(DBI); library(RPostgres) @@ -121,7 +121,7 @@ cat("Output dir :", out_dir, "\n\n") t_total <- Sys.time() # Per-WSG timings CSV (one row appended per WSG completion). -# Drives data-raw/balance_provincial_buckets.R for future LPT planning; +# Drives data-raw/buckets_balance.R for future LPT planning; # replaces the regex-parse-the-text-log path. Host-tagged via Sys.info() # so multi-host trifecta runs produce comparable rows. host_id <- Sys.info()[["nodename"]] @@ -203,7 +203,7 @@ stamp_bcfp_baseline <- function(config_name, link_schema) { bcfp_model_run_id = bcfp$model_run_id, bcfp_model_version = bcfp$model_version, bcfp_date_completed = bcfp$date_completed, - notes = "auto-stamped at run_provincial_parity.R start", + notes = "auto-stamped at wsgs_run_host.R start", stringsAsFactors = FALSE) write.table(row, csv_path, sep = ",", row.names = FALSE, col.names = FALSE, quote = FALSE, append = TRUE) @@ -343,7 +343,7 @@ cat("WSGs completed:", length(list.files(out_dir, pattern = "\\.rds$")), "\n") # Post-loop annotation: bind all per-WSG RDS rollups, annotate against # the bcfp divergence taxonomy, write `__annotated.csv`. # Each host writes its own bucket's annotated CSV. The orchestrator -# (trifecta_provincial.sh) does the province-wide aggregate after the +# (wsgs_dispatch.sh) does the province-wide aggregate after the # RDS pull-back step. # # Skipped if the taxonomy YAML doesn't exist relative to the script's diff --git a/data-raw/province_run.sh b/data-raw/wsgs_run_pipeline.sh similarity index 91% rename from data-raw/province_run.sh rename to data-raw/wsgs_run_pipeline.sh index e254f3f4..4b619874 100755 --- a/data-raw/province_run.sh +++ b/data-raw/wsgs_run_pipeline.sh @@ -1,5 +1,5 @@ #!/usr/bin/env bash -# province_run.sh — top-level wrapper for the full provincial parity run. +# wsgs_run_pipeline.sh — top-level wrapper for the full provincial parity run. # # Orchestrates the 10-step sequence documented in # research/post_compact_provincial_handoff.md: @@ -7,7 +7,7 @@ # 1+2. snapshot_bcfp.sh on M4 + M1 (parallel) # 3. cypher_up.sh job1/job2/job3 (parallel) # 4. cypher_prep.sh on each cypher (parallel) -# 5. archive_provincial_runs.sh on all 5 hosts (parallel) +# 5. runs_archive.sh on all 5 hosts (parallel) # 6. SMOKE — fail-fast # 7. FULL DISPATCH # 8. acceptance bar check (UNEXPLAINED at |diff_pct|>=2% == 0) @@ -19,7 +19,7 @@ # only attempts burn when there's something to burn. # # Usage: -# bash data-raw/province_run.sh [flags] +# bash data-raw/wsgs_run_pipeline.sh [flags] # # Flags: # --wsgs=A,B,C restrict to a WSG subset (full bundle if omitted) @@ -65,7 +65,7 @@ done MAPPING_FLAG="--with-mapping-code" [ "$NO_MAPPING" = "1" ] && MAPPING_FLAG="" -# Build the passthrough flag string for trifecta_provincial.sh + trifecta_smoke.sh. +# Build the passthrough flag string for wsgs_dispatch.sh + trifecta_smoke.sh. DISPATCH_FLAGS="" [ -n "$WSGS_FILTER" ] && DISPATCH_FLAGS="$DISPATCH_FLAGS --wsgs=$WSGS_FILTER" [ -n "$CONFIG_NAME" ] && DISPATCH_FLAGS="$DISPATCH_FLAGS --config=$CONFIG_NAME" @@ -76,13 +76,13 @@ DISPATCH_FLAGS="" REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)" cd "$REPO_ROOT" TS="$(date -u +%Y%m%d_%H%M%S)" -LOG_DIR="$REPO_ROOT/data-raw/logs/province_run" +LOG_DIR="$REPO_ROOT/data-raw/logs/wsgs_run_pipeline" mkdir -p "$LOG_DIR" -LOG="$LOG_DIR/${TS}_province_run.log" +LOG="$LOG_DIR/${TS}_wsgs_run_pipeline.log" exec > >(tee -a "$LOG") 2>&1 START_EPOCH=$(date +%s) -echo "=== province_run.sh $TS ===" +echo "=== wsgs_run_pipeline.sh $TS ===" echo " log: $LOG" echo " config: $CONFIG_NAME" [ -n "$SCHEMA" ] && echo " schema: $SCHEMA" @@ -173,7 +173,7 @@ if [ -n "$SCHEMA" ]; then echo "=== Step 0: pre-clean target schema [$SCHEMA] ===" CLEAN_ARGS="--schemas=$SCHEMA" [ "$NO_CYPHERS" = "1" ] && CLEAN_ARGS="$CLEAN_ARGS --skip-cy" - bash data-raw/province_clean.sh $CLEAN_ARGS > "$LOG_DIR/${TS}_preclean.log" 2>&1 || { + bash data-raw/state_clean.sh $CLEAN_ARGS > "$LOG_DIR/${TS}_preclean.log" 2>&1 || { echo "FATAL: pre-clean failed; see $LOG_DIR/${TS}_preclean.log" exit 1 } @@ -234,14 +234,14 @@ else fi # --- Step 5: archive prior RDS — M4+M1 always, cyphers only when up --- -echo "=== Step 5: archive_provincial_runs.sh on all hosts ===" -bash data-raw/archive_provincial_runs.sh > "$LOG_DIR/${TS}_archive_m4.log" 2>&1 & -ssh m1 'cd ~/Projects/repo/link/data-raw && ./archive_provincial_runs.sh' \ +echo "=== Step 5: runs_archive.sh on all hosts ===" +bash data-raw/runs_archive.sh > "$LOG_DIR/${TS}_archive_m4.log" 2>&1 & +ssh m1 'cd ~/Projects/repo/link/data-raw && ./runs_archive.sh' \ > "$LOG_DIR/${TS}_archive_m1.log" 2>&1 & if [ "$NO_CYPHERS" = "0" ]; then for WS in job1 job2 job3; do IP="${CY_IP[$WS]}" - ssh "cypher@$IP" 'cd ~/Projects/repo/link/data-raw && ./archive_provincial_runs.sh' \ + ssh "cypher@$IP" 'cd ~/Projects/repo/link/data-raw && ./runs_archive.sh' \ > "$LOG_DIR/${TS}_archive_$WS.log" 2>&1 & done fi @@ -264,7 +264,7 @@ fi # --- Step 7: FULL DISPATCH --- # When --no-cyphers OR --wsgs is set, omit --cy-workspaces so -# trifecta_provincial.sh runs with the M4+M1-only plan it derives +# wsgs_dispatch.sh runs with the M4+M1-only plan it derives # from DISPATCH_FLAGS (which includes --no-cyphers, --wsgs, etc.). if [ "$NO_CYPHERS" = "0" ] && [ -z "$WSGS_FILTER" ]; then TRIFECTA_CY_ARG="--cy-workspaces=job1,job2,job3" @@ -275,9 +275,9 @@ else fi echo " DISPATCH_FLAGS=$DISPATCH_FLAGS" cd "$REPO_ROOT/data-raw" -if ! bash trifecta_provincial.sh $TRIFECTA_CY_ARG $DISPATCH_FLAGS $MAPPING_FLAG \ +if ! bash wsgs_dispatch.sh $TRIFECTA_CY_ARG $DISPATCH_FLAGS $MAPPING_FLAG \ > "$LOG_DIR/${TS}_full.log" 2>&1; then - echo "WARNING: trifecta_provincial.sh exited non-zero; partial result may exist" + echo "WARNING: wsgs_dispatch.sh exited non-zero; partial result may exist" # don't exit — let acceptance + consolidate inspect what landed fi echo "--- full dispatch tail ---" @@ -314,7 +314,7 @@ fi # dynamically — M1 always present; cyphers only when --no-cyphers # wasn't set. echo "=== Step 9: consolidate target schema ===" -ORCH_LOG=$(ls -1t data-raw/logs/*_trifecta_provincial_orchestrator.txt 2>/dev/null | head -1 || true) +ORCH_LOG=$(ls -1t data-raw/logs/*_wsgs_dispatch_orchestrator.txt 2>/dev/null | head -1 || true) if [ -z "$ORCH_LOG" ]; then echo " ✗ no orchestrator log found — cannot extract per-host buckets" exit 1 @@ -371,14 +371,14 @@ if [ "$NO_CYPHERS" = "0" ]; then TARGET_SCHEMA="$TARGET_SCHEMA" SOURCES_R="$SOURCES_R" \ Rscript -e ' suppressPackageStartupMessages({library(link)}) -source("consolidate_schema.R") +source("schema_consolidate.R") sources <- eval(parse(text = Sys.getenv("SOURCES_R"))) -result <- consolidate_schema(schema = Sys.getenv("TARGET_SCHEMA"), +result <- schema_consolidate(schema = Sys.getenv("TARGET_SCHEMA"), sources = sources, backup = TRUE) print(result) saveRDS(result, "/tmp/consolidate_result.rds") ' > "$LOG_DIR/${TS}_consolidate.log" 2>&1 || { - echo " ✗ consolidate_schema.R failed; see $LOG_DIR/${TS}_consolidate.log" + echo " ✗ schema_consolidate.R failed; see $LOG_DIR/${TS}_consolidate.log" exit 1 } else @@ -386,8 +386,8 @@ else M1_BUCKET="$M1_BUCKET" TARGET_SCHEMA="$TARGET_SCHEMA" \ Rscript -e ' suppressPackageStartupMessages({library(link)}) -source("consolidate_schema.R") -result <- consolidate_schema( +source("schema_consolidate.R") +result <- schema_consolidate( schema = Sys.getenv("TARGET_SCHEMA"), sources = list(list(host = "m1", via = "docker", bucket = strsplit(Sys.getenv("M1_BUCKET"), ",")[[1]])), @@ -395,7 +395,7 @@ result <- consolidate_schema( print(result) saveRDS(result, "/tmp/consolidate_result.rds") ' > "$LOG_DIR/${TS}_consolidate.log" 2>&1 || { - echo " ✗ consolidate_schema.R failed; see $LOG_DIR/${TS}_consolidate.log" + echo " ✗ schema_consolidate.R failed; see $LOG_DIR/${TS}_consolidate.log" exit 1 } fi @@ -406,7 +406,7 @@ cd "$REPO_ROOT" END_EPOCH=$(date +%s) WALL=$(( END_EPOCH - START_EPOCH )) echo -echo "=== province_run.sh complete in ${WALL}s (~$((WALL/60))m) ===" +echo "=== wsgs_run_pipeline.sh complete in ${WALL}s (~$((WALL/60))m) ===" echo " annotated CSV: $ANN_CSV" echo " UNEXPLAINED ≥2%: $N_UNEXP" echo " trap EXIT will now burn cyphers (unless --keep-cyphers)" diff --git a/planning/active/progress.md b/planning/active/progress.md index 760b3832..9c50d381 100644 --- a/planning/active/progress.md +++ b/planning/active/progress.md @@ -15,4 +15,5 @@ - **Root-cause fix** (Phase 1.5 commit `89da284`): added `state_clean.sh --schemas=` scoped mode (drops only listed schemas, skips canonical-fresh wipe + snapshot reload) + `province_run.sh` Step 0 pre-clean that fires when `--schema=` is set. Empty `--schemas=` guard added per round-1 code-check. - **Attempt 2** (with pre-clean): 16/16 WSGs in `fresh_default.streams` on M4, 20m wall (pre-clean + cold-cache pipeline + consolidate). Exit 0. No mid-run prompts. Annotated CSV: 343 rows; 66 UNEXPLAINED at ≥2% surfaced as WARNING (methodology divergence for `default` bundle, expected for northern-WSG test set). - Pre-existing limitation surfaced (consolidate stale-state collision) → resolved as part of #172 scope because the autonomy story requires it; the umbrella now genuinely runs end-to-end without operator handholding even when the cluster has leftover state. -- Next: Phase 4 — rename 8 operational scripts to noun_verb convention (git mv + reference updates). +- **Phase 4 done.** 8 scripts renamed via `git mv` (preserves `git log --follow`). Bulk `sed -i ''` applied old→new substitutions across the live tree, then NEWS.md was reverted to keep historical entries sealed. Tree-wide grep for the 8 old names returns empty (after excluding NEWS.md, planning/archive/, planning/active/, data-raw/logs/). `/code-check` round 1 clean — 7 concerns verified including idempotency (no new name contains any old name as substring). +- Next: Phase 5 — post-rename smoke (1-WSG via renamed umbrella, verifies all the reference rewrites resolve at runtime). diff --git a/planning/active/task_plan.md b/planning/active/task_plan.md index ef9d786c..9d4cfa9d 100644 --- a/planning/active/task_plan.md +++ b/planning/active/task_plan.md @@ -52,22 +52,15 @@ Patch the original filename first so the smoke (Phase 3) validates on the known- ## Phase 4 — Rename 8 scripts + update all live references -- [ ] `git mv data-raw/province_run.sh data-raw/wsgs_run_pipeline.sh` -- [ ] `git mv data-raw/province_clean.sh data-raw/state_clean.sh` -- [ ] `git mv data-raw/province_progress.sh data-raw/progress_check.sh` -- [ ] `git mv data-raw/trifecta_provincial.sh data-raw/wsgs_dispatch.sh` -- [ ] `git mv data-raw/run_provincial_parity.R data-raw/wsgs_run_host.R` -- [ ] `git mv data-raw/consolidate_schema.R data-raw/schema_consolidate.R` -- [ ] `git mv data-raw/archive_provincial_runs.sh data-raw/runs_archive.sh` -- [ ] `git mv data-raw/balance_provincial_buckets.R data-raw/buckets_balance.R` -- [ ] Update internal references in each renamed file (self-name in usage block, `source()` calls between them, log filenames). -- [ ] Update `data-raw/README.md` (~27 refs). -- [ ] Update `research/provincial_run_runbook.md` (~12 refs). -- [ ] Update `research/post_compact_provincial_handoff.md` (~8 refs). -- [ ] Update `CLAUDE.md` (2 refs). -- [ ] **Do NOT update** `planning/archive/**`, `NEWS.md` historical entries. -- [ ] `bash -n` clean on all 4 renamed shell scripts. -- [ ] `/code-check` clean. +- [x] `git mv` all 8 renames (preserves `git log --follow`). +- [x] `sed -i ''` across live tree applies all 8 old→new substitutions atomically. Order chosen to avoid prefix collisions; verified no new name contains any old name as substring (sed map is idempotent). +- [x] Internal references updated in renamed files: usage blocks, `Rscript wsgs_run_host.R` invocations, log-filename literals (`${TS}_wsgs_dispatch_*`), cross-script `bash` calls. +- [x] Updated `data-raw/README.md`, `data-raw/trifecta_smoke.sh`, `data-raw/query_schema_delta.R`, `wsg_compare.R`, `wsg_pipeline_run.R`. +- [x] Updated `research/*.md` (runbook, handoff, parity docs). +- [x] Updated `CLAUDE.md`, `R/utils.R` (one-line docstring ref). +- [x] **NOT updated** (sealed): `NEWS.md` historical entries (reverted after sed swept them), `planning/archive/**`. +- [x] `bash -n` clean on all renamed shell scripts (wsgs_run_pipeline.sh, state_clean.sh, progress_check.sh, wsgs_dispatch.sh, runs_archive.sh) + trifecta_smoke.sh sibling. +- [x] `/code-check` round 1 clean (all 7 concerns verified: bash syntax, cross-refs, log literals, Rscript invocations, tree-wide grep empty, idempotency, R/utils.R docstring). - [ ] Commit "Rename 8 operational scripts to noun_verb convention" ## Phase 5 — Smoke after rename diff --git a/research/bcfp_compare_mapping_code.md b/research/bcfp_compare_mapping_code.md index 184aae51..6a3a0bf4 100644 --- a/research/bcfp_compare_mapping_code.md +++ b/research/bcfp_compare_mapping_code.md @@ -150,7 +150,7 @@ Eight species columns: `mapping_code_bt`, `mapping_code_ch`, builds `.crossings` + `.barriers_anthropogenic` / `barriers_pscis` / `barriers_dams` / `barriers_remediations` from primitives. These are the inputs to `lnk_pipeline_access(barrier_sources = ...)`. -- The provincial parity script `data-raw/run_provincial_parity.R` runs +- The provincial parity script `data-raw/wsgs_run_host.R` runs phases 1–6 + km/ha rollup against bcfp. It does *not* run phases 7–8 (access + mapping_code). The mapping_code comparison is its own driver — this document plus `compare_bcfp_mapping_code.R`. diff --git a/research/distributed_2hosts_2026_05_01.md b/research/distributed_2hosts_2026_05_01.md index 364c0a03..27eeeb00 100644 --- a/research/distributed_2hosts_2026_05_01.md +++ b/research/distributed_2hosts_2026_05_01.md @@ -3,7 +3,7 @@ **Date**: 2026-05-01 08:12–08:18 PDT **Versions**: link 0.21.0, fresh 0.26.0, bcfishpass 440bc1e **Hosts**: M4 Max + M1 over Tailscale, both with local Docker fwapg :5432 + bcfp tunnel :63333 -**Coordination**: ad-hoc — `Rscript run_provincial_parity.R --wsgs=` per host, rsync per-WSG RDS files at end +**Coordination**: ad-hoc — `Rscript wsgs_run_host.R --wsgs=` per host, rsync per-WSG RDS files at end ## Headline @@ -27,7 +27,7 @@ The minor `bcfishobs` row-count delta between M4 (372,420) and M1 (372,505 from ## Architecture -Each host runs `run_provincial_parity.R --wsgs=` against its own: +Each host runs `wsgs_run_host.R --wsgs=` against its own: - writable Docker fwapg on :5432 (mutates `fresh.streams` per WSG) - bcfp reference DB on :63333 (read-only via SSH tunnel to db_newgraph) @@ -46,12 +46,12 @@ No shared filesystem. No coordination layer. Manual partition driven by per-WSG ```bash # m4 mkdir -p data-raw/logs/provincial_parity -Rscript data-raw/run_provincial_parity.R --wsgs=BULK,HARR,DEAD \ +Rscript data-raw/wsgs_run_host.R --wsgs=BULK,HARR,DEAD \ > data-raw/logs/_dist_m4.txt 2>&1 & # m1 (over Tailscale) ssh m1 "cd /Users/airvine/Projects/repo/link && nohup bash -c \ - '/usr/bin/time -p Rscript data-raw/run_provincial_parity.R --wsgs=ELKR,LFRA,VICT \ + '/usr/bin/time -p Rscript data-raw/wsgs_run_host.R --wsgs=ELKR,LFRA,VICT \ > data-raw/logs/_dist_m1.txt 2>&1' > /dev/null 2>&1 < /dev/null &" # wait for both, then rsync diff --git a/research/post_compact_provincial_handoff.md b/research/post_compact_provincial_handoff.md index a989d5b1..c0abdc39 100644 --- a/research/post_compact_provincial_handoff.md +++ b/research/post_compact_provincial_handoff.md @@ -6,7 +6,7 @@ Read this first if you're a fresh Claude session and the user asks you to run th - `lnk_compare_wsg()` + `lnk_parity_annotate()` — exported library functions - `research/bcfp_divergence_taxonomy.yml` — 11 verified-mechanism entries -- 5-host N-cypher orchestrator (`data-raw/trifecta_provincial.sh`) +- 5-host N-cypher orchestrator (`data-raw/wsgs_dispatch.sh`) - Phase 7 hardening: DDL drift detection, smoke fail-fast, log visibility, truth-in-headline - 5-WSG audit (ADMS/SETN/HORS/BULK/THOM) hit 0 UNEXPLAINED at |diff_pct|>=2% @@ -85,10 +85,10 @@ wait # If any cypher's prep fails, surface to user and STOP. # === Step 5: archive prior RDS on ALL hosts (cross-host smoke gotcha from 2026-05-12) === -cd ~/Projects/repo/link/data-raw && bash archive_provincial_runs.sh -ssh m1 'cd ~/Projects/repo/link/data-raw && ./archive_provincial_runs.sh' +cd ~/Projects/repo/link/data-raw && bash runs_archive.sh +ssh m1 'cd ~/Projects/repo/link/data-raw && ./runs_archive.sh' for IP in ; do - ssh "cypher@$IP" 'cd ~/Projects/repo/link/data-raw && ./archive_provincial_runs.sh' & + ssh "cypher@$IP" 'cd ~/Projects/repo/link/data-raw && ./runs_archive.sh' & done wait @@ -100,17 +100,17 @@ if [ $SMOKE_RC -ne 0 ]; then # Smoke caught an error stub. STOP. grep -E "smoke.*FAILED|smoke.*ERROR" /tmp/smoke.log # Inspect: - # data-raw/logs/_trifecta_provincial_cypher__R.txt + # data-raw/logs/_wsgs_dispatch_cypher__R.txt # DO NOT proceed to step 7 until smoke is clean. fi # === Step 7: FULL PROVINCIAL DISPATCH (~80-95 min wall) === cd ~/Projects/repo/link/data-raw -nohup bash trifecta_provincial.sh \ +nohup bash wsgs_dispatch.sh \ --cy-workspaces=job1,job2,job3 \ --with-mapping-code > /tmp/full_run.log 2>&1 & # Check completion via process list + log tail -ps -ef | grep trifecta_provincial.sh | grep -v grep | wc -l # 0 when done +ps -ef | grep wsgs_dispatch.sh | grep -v grep | wc -l # 0 when done tail -10 /tmp/full_run.log # Expected final headline: "local RDS: 217/217 pulled — 217 OK, 0 errors" @@ -130,20 +130,20 @@ if (nrow(unexp) > 0) print(head(unexp[, c('wsg','species','habitat_type','link_v # === Step 9: consolidate fresh schema (m1 + 3 cyphers -> M4) === # Extract per-host buckets from orchestrator log: -ORCH_LOG=$(ls -1t data-raw/logs/*_trifecta_provincial_orchestrator.txt | head -1) +ORCH_LOG=$(ls -1t data-raw/logs/*_wsgs_dispatch_orchestrator.txt | head -1) M1_BUCKET=$(grep '^ m1 bucket:' "$ORCH_LOG" | sed 's/.*bucket: //') CY1_BUCKET=$(grep '^ cypher\[job1\] bucket:' "$ORCH_LOG" | sed 's/.*bucket: //') CY2_BUCKET=$(grep '^ cypher\[job2\] bucket:' "$ORCH_LOG" | sed 's/.*bucket: //') CY3_BUCKET=$(grep '^ cypher\[job3\] bucket:' "$ORCH_LOG" | sed 's/.*bucket: //') -# Then invoke consolidate_schema.R (DBI calls, needs PG_PASS_SHARE): +# Then invoke schema_consolidate.R (DBI calls, needs PG_PASS_SHARE): cd ~/Projects/repo/link/data-raw M1_BUCKET="$M1_BUCKET" CY1_BUCKET="$CY1_BUCKET" CY2_BUCKET="$CY2_BUCKET" CY3_BUCKET="$CY3_BUCKET" \ CY1_IP= CY2_IP= CY3_IP= \ Rscript -e ' suppressPackageStartupMessages({library(link)}) -source("consolidate_schema.R") -result <- consolidate_schema( +source("schema_consolidate.R") +result <- schema_consolidate( schema = "fresh", sources = list( list(host = "m1", via = "docker", bucket = strsplit(Sys.getenv("M1_BUCKET"), ",")[[1]]), @@ -185,7 +185,7 @@ Use `trap EXIT` defense if you write a wrapper. Without a wrapper, you do this m **Trigger**: No `_per_wsg_times.csv` files in `data-raw/logs/provincial_parity/` at the top level. Happens if you archive prior runs (which moves the timing CSVs to `archive//`) without inheriting prior timing data into the new run. -**Fix landed 2026-05-13** (`trifecta_provincial.sh` SPLIT_R block): when no timing CSV exists, the fallback now uses host_speeds-weighted alphabetical split (`floor(n * speed_factor / sum(speed_factors))` per host, remainder to highest-factor hosts). Log line is now `[LPT] no timing CSVs found; using host_speeds-weighted split` and reports per-host bucket sizes. +**Fix landed 2026-05-13** (`wsgs_dispatch.sh` SPLIT_R block): when no timing CSV exists, the fallback now uses host_speeds-weighted alphabetical split (`floor(n * speed_factor / sum(speed_factors))` per host, remainder to highest-factor hosts). Log line is now `[LPT] no timing CSVs found; using host_speeds-weighted split` and reports per-host bucket sizes. **Tradeoff vs LPT-with-timings**: still doesn't know which specific WSGs are heavy (BULK / THOM heavyweights vs lightweight DEAD / ELKR). LPT-with-timings places heavyweights first; weighted-split is alphabetical. Better than equal split, worse than real LPT. The remedy: after the first successful provincial run, the per_wsg_times.csv landing in the top level gives the next dispatch real LPT data. @@ -195,7 +195,7 @@ All hosts run the link pipeline against their OWN local fwapg, and ALL HOSTS run | Host | How it reaches bcfp | |---|---| -| M4 | Operator's persistent tunnel + `trifecta_provincial.sh` opens an idempotent inline tunnel as backup (no harm if already up — bind fails silently, existing tunnel preserved) | +| M4 | Operator's persistent tunnel + `wsgs_dispatch.sh` opens an idempotent inline tunnel as backup (no harm if already up — bind fails silently, existing tunnel preserved) | | M1 | **Reverse forward from M4** (`ssh -R 63333:127.0.0.1:63333 m1`) — M1's localhost:63333 → M4's localhost:63333 → db_newgraph:5432. M1 does NOT need its own working db_newgraph identity. | | cypher[jobN] | Opens its OWN ssh tunnel via `ssh -L 63333 db_newgraph -N &` inside the generated wrapper script. Cyphers have `~/.ssh/id_db_newgraph` (passphrase-free, authorized on db_newgraph). | @@ -236,12 +236,12 @@ History: Prior runs (e.g. 2026-05-12 with 67 M1 WSGs OK) worked only because the | Diagnostic recipes per Class | `research/bcfp_divergence_investigation.md` | | Operational runbook | `research/provincial_run_runbook.md` | | Latest live run record | `research/provincial_parity_2026_05_12.md` | -| Orchestrator | `data-raw/trifecta_provincial.sh` | +| Orchestrator | `data-raw/wsgs_dispatch.sh` | | Smoke shim | `data-raw/trifecta_smoke.sh` | | Per-cypher prep | `data-raw/cypher_prep.sh` (defaults to `main` branch) | | Snapshot loader | `data-raw/snapshot_bcfp.sh` (with `--force` flag) | -| Archive helper | `data-raw/archive_provincial_runs.sh` | -| Consolidation | `data-raw/consolidate_schema.R` (R, not shell — pre/post row delta verification) | +| Archive helper | `data-raw/runs_archive.sh` | +| Consolidation | `data-raw/schema_consolidate.R` (R, not shell — pre/post row delta verification) | | Cypher infra | `~/Projects/repo/rtj/scripts/cypher/cypher_{up,down,run}.sh --workspace ` | | Findings from Phase 7 | `planning/archive/2026-05-link162-lnk-compare-wsg-annotated-csv/findings.md` | | Memory state | `~/.claude/projects/-Users-airvine-Projects-repo-link/memory/project_link_state.md` | diff --git a/research/provincial_parity_2026_05_01.md b/research/provincial_parity_2026_05_01.md index 3c54a517..89f729cf 100644 --- a/research/provincial_parity_2026_05_01.md +++ b/research/provincial_parity_2026_05_01.md @@ -161,7 +161,7 @@ The genuine "needs work" set is **Class D and the unresolved part of C**. Maybe - `data-raw/logs/20260501_0251_provincial_parity_per_wsg.csv` — per-WSG summary - `data-raw/logs/provincial_parity/.rds` — per-WSG output, 217 files - `data-raw/logs/20260430_2155_provincial_parity.txt` — run log -- `data-raw/run_provincial_parity.R` — reusable runner script +- `data-raw/wsgs_run_host.R` — reusable runner script ## Next diff --git a/research/provincial_parity_2026_05_11.md b/research/provincial_parity_2026_05_11.md index d53f3582..a3c39ca4 100644 --- a/research/provincial_parity_2026_05_11.md +++ b/research/provincial_parity_2026_05_11.md @@ -5,7 +5,7 @@ **Software**: link 0.35.0 (sha 8f8c7b6 + #157 dispatch fix), fresh 0.31.0, bcfp reference `bcfishpass@v0.7.14-125-g6e9cf1c` via tunnel `db_newgraph` **Configuration**: bcfishpass bundle parity only **Source data**: 232 candidate WSGs filtered → 217 dispatched (link#157), 15 known-empty WSGs excluded from dispatch -**Source log**: `data-raw/logs/202605112010_trifecta_provincial_orchestrator.txt` +**Source log**: `data-raw/logs/202605112010_wsgs_dispatch_orchestrator.txt` **Output**: 232 RDS files in `data-raw/logs/provincial_parity/*.rds` (15 stub-error from this run, 217 OK) **Aggregate**: 1,647 comparable rollup rows (after dropping `-100%` lake/wetland centerline artifacts and NA-baseline rows) @@ -52,10 +52,10 @@ Single-host baseline (2026-05-01): 4h 55min. 3-host LPT split saved **~3 hours** | M1 | Allans MacBook Pro (tailnet) | 0.83 (faster!) | 100 (102 − 7 errors) | 114.7 min | 64s | | cypher | DO droplet (g-8vcpu-32gb) | 1.83 | 46 | 88.6 min | 97s | -LPT (Longest Processing Time first) bin-packing in `data-raw/balance_provincial_buckets.R` weights each WSG by its `m4_equiv` time, then assigns to the host whose projected finish time would be shortest. Cypher gets the fewest WSGs because its per-WSG cost is 1.83× M4's; M1 gets the most because it's slightly faster than M4 per-WSG. Predicted wall was 155.5 min vs actual 114.7 min — predictions tracked within 25%. +LPT (Longest Processing Time first) bin-packing in `data-raw/buckets_balance.R` weights each WSG by its `m4_equiv` time, then assigns to the host whose projected finish time would be shortest. Cypher gets the fewest WSGs because its per-WSG cost is 1.83× M4's; M1 gets the most because it's slightly faster than M4 per-WSG. Predicted wall was 155.5 min vs actual 114.7 min — predictions tracked within 25%. **Operational notes:** -- `data-raw/trifecta_provincial.sh` orchestrates dispatch via SSH + tailnet (`m1`) + reserved-IP SSH (`cypher@24.144.70.121`). cypher gets its bcfp-tunnel via in-script SSH local-forward `-L 63333:127.0.0.1:5432 db_newgraph`. +- `data-raw/wsgs_dispatch.sh` orchestrates dispatch via SSH + tailnet (`m1`) + reserved-IP SSH (`cypher@24.144.70.121`). cypher gets its bcfp-tunnel via in-script SSH local-forward `-L 63333:127.0.0.1:5432 db_newgraph`. - M1 + cypher needed a one-time data sync of `cabd.dams` (1.9 MB), `whse_fish.pscis_assessment_svw` (18 MB), `fresh.modelled_stream_crossings` (380 MB) from M4 via `pg_dump | ssh docker exec psql`. `snapshot_bcfp.sh` is the canonical loader but those hosts didn't have it configured. - cypher's fresh+link install required `R CMD INSTALL --no-test-load` because pak tried to upgrade `sf` and conflicted with the host's conda-managed GDAL; downgrading to `R CMD INSTALL` kept the existing `sf 1.1.0` intact. @@ -228,11 +228,11 @@ for HOST in m1 cypher@24.144.70.121; do done # 4. Compute LPT-balanced buckets -Rscript data-raw/balance_provincial_buckets.R +Rscript data-raw/buckets_balance.R # Copy the --m4-bucket= / --m1-bucket= / --cy-bucket= overrides into the next step # 5. Dispatch -cd data-raw && ./trifecta_provincial.sh --m4-bucket=... --m1-bucket=... --cy-bucket=... +cd data-raw && ./wsgs_dispatch.sh --m4-bucket=... --m1-bucket=... --cy-bucket=... # 6. After completion: consolidate per-host RDS files (auto-pulled by trifecta script) # 7. Aggregate: source /tmp/summary.R-style script over data-raw/logs/provincial_parity/*.rds @@ -242,8 +242,8 @@ Wall: ~2 hours with current LPT factors (M4=1.0, M1=0.83, cy=1.83). Each provinc ## Files -- Trifecta orchestrator log: `data-raw/logs/202605112010_trifecta_provincial_orchestrator.txt` -- Per-host run logs: `data-raw/logs/202605112010_trifecta_provincial_{m4,m1,cypher}.txt` +- Trifecta orchestrator log: `data-raw/logs/202605112010_wsgs_dispatch_orchestrator.txt` +- Per-host run logs: `data-raw/logs/202605112010_wsgs_dispatch_{m4,m1,cypher}.txt` - Per-WSG timing CSVs: `data-raw/logs/provincial_parity/20260511_2010_{m4,m1,cy}_per_wsg_times.csv` - Per-WSG rollup RDS: `data-raw/logs/provincial_parity/.rds` - Aggregate summary: `/tmp/provincial_summary.rds` (regenerate via `/tmp/summary.R`) @@ -323,7 +323,7 @@ Bucket allocation was identity-deterministic from yesterday's per-WSG times (yes ### Consolidation to M4 (province-wide `fresh` schema) -After per-host runs, `data-raw/consolidate_schema.R` pg_dumps fresh schema from m1 + cypher and pg_restores to M4. Hit two recoverable issues: +After per-host runs, `data-raw/schema_consolidate.R` pg_dumps fresh schema from m1 + cypher and pg_restores to M4. Hit two recoverable issues: 1. **Cypher restore (52 WSGs)**: clean. Reported `ok=FALSE` in the result list but data did land (script's `ok` boolean false-positives on certain pg_restore warning patterns — investigate later). 2. **M1 restore (90 WSGs)**: pg_restore aborted COPYs for `streams_habitat_` tables due to duplicate-key violations. Root cause: M1's local `streams_habitat_` carries cross-run residuals (yesterday's runs left rows for WSGs that today went to different hosts), and `pg_restore --data-only` doesn't filter by today's bucket — it tries to insert ALL of M1's rows. Surgical recovery: `DELETE FROM fresh. WHERE watershed_group_code IN (today's M1 bucket)` on M4, then per-table `\COPY (SELECT ... WHERE wsg IN ...) FROM STDIN`. All 8 species habitat tables recovered cleanly. @@ -350,7 +350,7 @@ Per-species WSG counts reflect species presence: BT in 136 WSGs (most widespread - **rtj#129** (filed) — bake conda-geo-activate into cypher snapshot (`/etc/profile.d/conda-geo-activate.sh`). - **link#159** (filed) — wrap per-WSG body in `tryCatch({...}, finally = drop_working_schema)` for error-path cleanup. -- **consolidate_schema.R** — investigate the m1 `streams_habitat_` pg_restore failure pattern. Probably need a `--clean-wsg-keys` arg that DELETEs source-host-bucket rows on dest before pg_restore. +- **schema_consolidate.R** — investigate the m1 `streams_habitat_` pg_restore failure pattern. Probably need a `--clean-wsg-keys` arg that DELETEs source-host-bucket rows on dest before pg_restore. - **Full provincial mapping_code re-run** to validate #158 fix province-wide (BBAR + THOM confirmed ≥99.93%; remaining 215 WSGs extrapolated). Deferred — ~2 hr separate run. ### Runbook ratification diff --git a/research/provincial_parity_2026_05_12.md b/research/provincial_parity_2026_05_12.md index e50f56dd..f7c4f732 100644 --- a/research/provincial_parity_2026_05_12.md +++ b/research/provincial_parity_2026_05_12.md @@ -9,8 +9,8 @@ - Per-WSG RDS: `data-raw/logs/provincial_parity/*.rds` - Per-WSG mapping_code stats embedded in each list-shape RDS - Aggregate annotated CSV: `data-raw/logs/provincial_parity/__TS___annotated.csv` (output of `lnk_parity_annotate()` against `research/bcfp_divergence_taxonomy.yml`) -- Per-host run logs: `data-raw/logs/202605122221_trifecta_provincial_*.txt` -- Orchestrator log: `data-raw/logs/202605122221_trifecta_provincial_orchestrator.txt` +- Per-host run logs: `data-raw/logs/202605122221_wsgs_dispatch_*.txt` +- Orchestrator log: `data-raw/logs/202605122221_wsgs_dispatch_orchestrator.txt` This is the **first run that exercises link#162's full machinery**: inline LPT bucket allocation (Phase 5), N-cypher dispatch via tofu workspaces, post-pull `lnk_parity_annotate()` against the divergence taxonomy YAML, the `lnk_compare_wsg` library function (replacing the inline `compare_bcfishpass_wsg.R` SQL), and the `mapping_code` branch (per-segment per-species token-level parity). @@ -177,7 +177,7 @@ If the diff count is non-zero AND those segments appear in the WSG's mapping_cod This run surfaced several gotchas worth codifying for the wrapper script (`data-raw/run_phase7.sh` follow-up): -1. **Cross-host archival before run.** `archive_provincial_runs.sh` ran on M4 only initially; M1's stale RDS got SCP-pulled back during the post-pull step, polluting the aggregate annotation. Fix: archive on ALL hosts (M4 + M1 + all cyphers) before dispatch. Now codified in the recommended cadence in `data-raw/README.md`. +1. **Cross-host archival before run.** `runs_archive.sh` ran on M4 only initially; M1's stale RDS got SCP-pulled back during the post-pull step, polluting the aggregate annotation. Fix: archive on ALL hosts (M4 + M1 + all cyphers) before dispatch. Now codified in the recommended cadence in `data-raw/README.md`. 2. **bcfp coverage gap is real, not a bug.** bcfp's 2026-05-12 build models 187 WSGs; we dispatch 217. The 36-WSG delta caused `with_mapping_code = TRUE` to stop loudly when it should have warned + returned NA. Fix (commit __TBD__): `.lnk_compare_wsg_mapping_code_diff` distinguishes (a) bcfp 0 rows → warn + NA fill, (b) bcfp has rows but no merge → stop loud (real misalignment). @@ -187,8 +187,8 @@ This run surfaced several gotchas worth codifying for the wrapper script (`data- ## Files -- Orchestrator log: `data-raw/logs/202605122221_trifecta_provincial_orchestrator.txt` -- Per-host run logs: `data-raw/logs/202605122221_trifecta_provincial_{m4,m1,cypher_job1,cypher_job2,cypher_job3}.txt` +- Orchestrator log: `data-raw/logs/202605122221_wsgs_dispatch_orchestrator.txt` +- Per-host run logs: `data-raw/logs/202605122221_wsgs_dispatch_{m4,m1,cypher_job1,cypher_job2,cypher_job3}.txt` - Per-host timing CSVs: `data-raw/logs/provincial_parity/__TS___{m4,m1,cy}_per_wsg_times.csv` - Per-WSG rollup RDS: `data-raw/logs/provincial_parity/*.rds` - Aggregate annotated CSV: `data-raw/logs/provincial_parity/__TS___annotated.csv` diff --git a/research/provincial_run_runbook.md b/research/provincial_run_runbook.md index 1b04367c..caa2010a 100644 --- a/research/provincial_run_runbook.md +++ b/research/provincial_run_runbook.md @@ -18,13 +18,13 @@ Companion docs: ```bash cd ~/Projects/repo/link/data-raw -./archive_provincial_runs.sh # 1. clean LPT input +./runs_archive.sh # 1. clean LPT input # (spin cyphers per §1) ./trifecta_smoke.sh --cy-workspaces=job1,job2,job3 # 2. smoke (~3 min) # (if smoke errors loud, fix and re-run; DO NOT skip to full) -./trifecta_provincial.sh --with-mapping-code --cy-workspaces=job1,job2,job3 # 3. full (~80 min) +./wsgs_dispatch.sh --with-mapping-code --cy-workspaces=job1,job2,job3 # 3. full (~80 min) # (inspect annotated CSV) -# (consolidate fresh schema via consolidate_schema.R) +# (consolidate fresh schema via schema_consolidate.R) ~/Projects/repo/rtj/scripts/cypher/cypher_down.sh --workspace job1 # 4. burn (mandatory) ~/Projects/repo/rtj/scripts/cypher/cypher_down.sh --workspace job2 ~/Projects/repo/rtj/scripts/cypher/cypher_down.sh --workspace job3 @@ -145,10 +145,10 @@ After this, `lnk_persist_init` future calls (during dispatch) are silent no-ops. ```bash cd ~/Projects/repo/link/data-raw -./archive_provincial_runs.sh -ssh m1 'cd ~/Projects/repo/link/data-raw && ./archive_provincial_runs.sh' +./runs_archive.sh +ssh m1 'cd ~/Projects/repo/link/data-raw && ./runs_archive.sh' for IP in $JOB1_IP $JOB2_IP $JOB3_IP; do - ssh "cypher@$IP" 'cd ~/Projects/repo/link/data-raw && ./archive_provincial_runs.sh' & + ssh "cypher@$IP" 'cd ~/Projects/repo/link/data-raw && ./runs_archive.sh' & done wait ``` @@ -166,8 +166,8 @@ cd ~/Projects/repo/link/data-raw **Exits non-zero** if any host produced an error stub. Inspect: ``` -data-raw/logs/_trifecta_provincial_*.txt # orchestrator + per-host -data-raw/logs/_trifecta_provincial_cypher_*_R.txt # cypher R output (auto-pulled) +data-raw/logs/_wsgs_dispatch_*.txt # orchestrator + per-host +data-raw/logs/_wsgs_dispatch_cypher_*_R.txt # cypher R output (auto-pulled) ``` Common smoke failures + fixes: @@ -186,7 +186,7 @@ compute per failure mode. ```bash cd ~/Projects/repo/link/data-raw -./trifecta_provincial.sh \ +./wsgs_dispatch.sh \ --with-mapping-code \ --cy-workspaces=job1,job2,job3 \ > /tmp/full_run.log 2>&1 & @@ -227,8 +227,8 @@ taxonomy entries + re-annotate without rerunning the pipeline. ```bash cd ~/Projects/repo/link/data-raw Rscript -e ' -source("consolidate_schema.R") -result <- consolidate_schema( +source("schema_consolidate.R") +result <- schema_consolidate( schema = "fresh", sources = list( list(host = "m1", via = "docker", bucket = strsplit("WSG1,WSG2,...", ",")[[1]]), @@ -240,7 +240,7 @@ result <- consolidate_schema( ``` Bucket strings come from the orchestrator log (per-host bucket lines under -`[trifecta-provincial] dispatch start`). `consolidate_schema` (Phase 6) does +`[trifecta-provincial] dispatch start`). `schema_consolidate` (Phase 6) does bucket-aware DELETE + pg_restore + pre/post row-count assertion. ## 6. Burn cyphers — MANDATORY @@ -298,8 +298,8 @@ Required sections: |---|---|---| | `snapshot_bcfp.sh` | Load PSCIS + CABD + bchamp + bcfishobs into local fwapg | ~5-15 min | | `cypher_up.sh --workspace ` | Spin DO droplet from `cypher--warm` snapshot | ~3 min | -| `archive_provincial_runs.sh` | Move prior-run RDS + CSVs to archive/ | <1s | +| `runs_archive.sh` | Move prior-run RDS + CSVs to archive/ | <1s | | `trifecta_smoke.sh` | 1-WSG-per-host smoke; fails loud on any error stub | ~3 min | -| `trifecta_provincial.sh` | N-cypher full dispatch with inline LPT + annotation | ~80 min (5-host) | -| `consolidate_schema.R` | pg_dump from m1+cyphers → pg_restore on M4 | ~5 min | +| `wsgs_dispatch.sh` | N-cypher full dispatch with inline LPT + annotation | ~80 min (5-host) | +| `schema_consolidate.R` | pg_dump from m1+cyphers → pg_restore on M4 | ~5 min | | `cypher_down.sh --workspace ` | Destroy DO droplet (idempotent) | ~30s | From 025d13fc8a9d4038aea5256cc00bf31fa5fc251d Mon Sep 17 00:00:00 2001 From: almac2022 Date: Thu, 14 May 2026 18:01:40 -0700 Subject: [PATCH 7/9] Phase 5: smoke pass + sweep [trifecta-provincial] log prefix 3-WSG smoke via the renamed wsgs_run_pipeline.sh ran end-to-end: - Exit 0, 3m 13s wall - Step 0 pre-clean dropped fresh_default on M4+M1 - Step 7 dispatch via wsgs_dispatch.sh -> wsgs_run_host.R - Step 9 consolidate succeeded (target schema fresh_default) - 18/3 OK (pulled 18 RDS including legacy artifacts), 0 errors Pre-existing LPT-fallback edge case surfaced on the first 1-WSG attempt: with no timing CSVs for the requested config, the weighted split assigns all WSGs to M4 (highest weight host) -> M1 empty bucket -> empty-bucket guard fires. Not a rename bug; same behavior pre-rename. Worked around by using --config=default which has timing CSVs from Phase 3 runs. Cosmetic sweep: 10 occurrences of `[trifecta-provincial]` log prefix inside wsgs_dispatch.sh -> `[wsgs-dispatch]`. The Phase 4 sed pass only matched filename substrings, not hyphenated prose. devtools::test() still passes 1172/0. Co-Authored-By: Claude Opus 4.7 --- data-raw/wsgs_dispatch.sh | 22 +++++++++++----------- planning/active/progress.md | 3 ++- planning/active/task_plan.md | 16 +++++++++------- 3 files changed, 22 insertions(+), 19 deletions(-) diff --git a/data-raw/wsgs_dispatch.sh b/data-raw/wsgs_dispatch.sh index f3b06a79..70029f99 100755 --- a/data-raw/wsgs_dispatch.sh +++ b/data-raw/wsgs_dispatch.sh @@ -496,7 +496,7 @@ done TOTAL=$((M4_COUNT + M1_COUNT + CY_TOTAL)) echo "============================================" -echo "[trifecta-provincial] dispatch start: $(date '+%H:%M:%S')" +echo "[wsgs-dispatch] dispatch start: $(date '+%H:%M:%S')" echo " total WSGs: $TOTAL (m4=$M4_COUNT m1=$M1_COUNT cypher_total=$CY_TOTAL)" echo " config: $CONFIG with_mapping_code: ${WITH_MAPPING_CODE:-no}" echo " m4 bucket: $M4_WSGS" @@ -621,7 +621,7 @@ END=$(date +%s) ELAPSED=$((END - START)) echo "============================================" -printf '[trifecta-provincial] elapsed: %dh%02dm%02ds\n' \ +printf '[wsgs-dispatch] elapsed: %dh%02dm%02ds\n' \ $((ELAPSED/3600)) $(((ELAPSED%3600)/60)) $((ELAPSED%60)) printf ' m4 exit=%d log=%s\n' "$M4_EXIT" "$M4_LOG" printf ' m1 exit=%d log=%s\n' "$M1_EXIT" "$M1_LOG" @@ -660,7 +660,7 @@ done # Each cypher: via TF_WORKSPACE-resolved droplet IP # --------------------------------------------------------------------------- echo -echo "[trifecta-provincial] pulling m1 RDS files" +echo "[wsgs-dispatch] pulling m1 RDS files" scp -q "m1:~/Projects/repo/link/data-raw/logs/$RDS_DIR_NAME/*.rds" \ "$REPO_ROOT/data-raw/logs/$RDS_DIR_NAME/" 2>&1 | tail -3 || true @@ -668,10 +668,10 @@ for ((i=0; i/dev/null || echo "") if [ -z "$CY_IP" ]; then - echo "[trifecta-provincial] WARN: workspace '$WS' has no droplet_ip — skipping pull" + echo "[wsgs-dispatch] WARN: workspace '$WS' has no droplet_ip — skipping pull" continue fi - echo "[trifecta-provincial] pulling cypher[$WS] RDS files (cypher@$CY_IP)" + echo "[wsgs-dispatch] pulling cypher[$WS] RDS files (cypher@$CY_IP)" scp -q "cypher@$CY_IP:/home/cypher/Projects/repo/link/data-raw/logs/$RDS_DIR_NAME/*.rds" \ "$REPO_ROOT/data-raw/logs/$RDS_DIR_NAME/" 2>&1 | tail -3 || true done @@ -702,13 +702,13 @@ cat(n_ok, n_err) N_OK=$(echo "$RDS_COUNTS" | awk '{print $1+0}') N_ERR=$(echo "$RDS_COUNTS" | awk '{print $2+0}') N_OK=${N_OK:-0}; N_ERR=${N_ERR:-0} - echo "[trifecta-provincial] local RDS: $TOTAL_RDS / $TOTAL pulled — $N_OK OK, $N_ERR errors" + echo "[wsgs-dispatch] local RDS: $TOTAL_RDS / $TOTAL pulled — $N_OK OK, $N_ERR errors" if [ "$N_ERR" -gt 0 ]; then - echo "[trifecta-provincial] WARN: $N_ERR error-stub RDS found. Inspect cypher-side R logs:" + echo "[wsgs-dispatch] WARN: $N_ERR error-stub RDS found. Inspect cypher-side R logs:" ls "$LOG_DIR/${TS}_wsgs_dispatch_cypher_"*_R.txt 2>/dev/null | sed 's/^/ /' || true fi else - echo "[trifecta-provincial] local RDS file count: 0 / $TOTAL (no files pulled — all hosts failed?)" + echo "[wsgs-dispatch] local RDS file count: 0 / $TOTAL (no files pulled — all hosts failed?)" fi # --------------------------------------------------------------------------- @@ -721,7 +721,7 @@ TAXONOMY="$REPO_ROOT/research/bcfp_divergence_taxonomy.yml" if [ -f "$TAXONOMY" ]; then echo - echo "[trifecta-provincial] aggregating + annotating $TOTAL_RDS RDS files" + echo "[wsgs-dispatch] aggregating + annotating $TOTAL_RDS RDS files" Rscript - < 0) { cat(" acceptance bar met.\n") } RSCRIPT_EOF - echo "[trifecta-provincial] annotated CSV: $ANNOTATED_CSV" + echo "[wsgs-dispatch] annotated CSV: $ANNOTATED_CSV" else - echo "[trifecta-provincial] WARN: taxonomy YAML not at $TAXONOMY — skipping annotation" + echo "[wsgs-dispatch] WARN: taxonomy YAML not at $TAXONOMY — skipping annotation" fi # --------------------------------------------------------------------------- diff --git a/planning/active/progress.md b/planning/active/progress.md index 9c50d381..adeafb35 100644 --- a/planning/active/progress.md +++ b/planning/active/progress.md @@ -16,4 +16,5 @@ - **Attempt 2** (with pre-clean): 16/16 WSGs in `fresh_default.streams` on M4, 20m wall (pre-clean + cold-cache pipeline + consolidate). Exit 0. No mid-run prompts. Annotated CSV: 343 rows; 66 UNEXPLAINED at ≥2% surfaced as WARNING (methodology divergence for `default` bundle, expected for northern-WSG test set). - Pre-existing limitation surfaced (consolidate stale-state collision) → resolved as part of #172 scope because the autonomy story requires it; the umbrella now genuinely runs end-to-end without operator handholding even when the cluster has leftover state. - **Phase 4 done.** 8 scripts renamed via `git mv` (preserves `git log --follow`). Bulk `sed -i ''` applied old→new substitutions across the live tree, then NEWS.md was reverted to keep historical entries sealed. Tree-wide grep for the 8 old names returns empty (after excluding NEWS.md, planning/archive/, planning/active/, data-raw/logs/). `/code-check` round 1 clean — 7 concerns verified including idempotency (no new name contains any old name as substring). -- Next: Phase 5 — post-rename smoke (1-WSG via renamed umbrella, verifies all the reference rewrites resolve at runtime). +- **Phase 5 done.** 3-WSG smoke via renamed `wsgs_run_pipeline.sh` ran end-to-end in 3m 13s, exit 0, consolidate clean. Pre-existing LPT-fallback edge case surfaced (1-WSG dispatch puts everything on M4 when no timing CSVs exist) — not a rename bug, documented for follow-up. Caught 10 cosmetic `[trifecta-provincial]` log prefixes inside `wsgs_dispatch.sh` (sed missed hyphenated prose); fixed to `[wsgs-dispatch]`. `devtools::test()` 1172 PASS / 0 FAIL. +- Next: Phase 6 — rtj cross-repo update to point cypher_run.sh at `wsgs_run_host.R`. diff --git a/planning/active/task_plan.md b/planning/active/task_plan.md index 9d4cfa9d..49362376 100644 --- a/planning/active/task_plan.md +++ b/planning/active/task_plan.md @@ -65,13 +65,15 @@ Patch the original filename first so the smoke (Phase 3) validates on the known- ## Phase 5 — Smoke after rename -- [ ] 1-WSG smoke via the renamed umbrella: - ```bash - bash data-raw/wsgs_run_pipeline.sh --wsgs=DEAD --config=bcfishpass --no-cyphers - ``` -- [ ] Acceptance: exit code 0, DEAD lands in M4 `fresh.streams`. -- [ ] `devtools::test()` passes. -- [ ] `devtools::check()` — same warning baseline as v0.37.0 (no new warnings). +- [x] First attempt with 1-WSG `--wsgs=DEAD --config=bcfishpass --no-cyphers` surfaced a pre-existing LPT-fallback edge case: with no timing CSVs available for bcfishpass, the weighted split puts all WSGs on M4 → M1 empty bucket → empty-bucket guard fires. Not a rename bug — same behavior on main pre-rename. Not in scope to fix here; documented for follow-up. +- [x] Re-launched with `--wsgs=DEAD,ADMS,CRKD --config=default --schema=fresh_default --no-cyphers --with-mapping-code` (timing CSVs for `default` bundle exist from Phase 3 runs → LPT does a proper greedy split): + - Exit 0, 3m 13s wall, 18/3 OK (pulled 18 RDS including legacy provincial_default/ artifacts), 0 errors. + - `wsgs_run_pipeline.sh complete in 193s` — umbrella ran end-to-end under the new name. + - Step 0 pre-clean dropped `fresh_default` on both hosts; Step 7 dispatch via `wsgs_dispatch.sh` → `wsgs_run_host.R`; Step 9 consolidate succeeded. + - Annotated CSV: `data-raw/logs/provincial_default/202605141757_annotated.csv` (427 rows, 84 UNEXPLAINED at ≥2%, methodology divergence vs bcfp, expected). +- [x] Cosmetic fixup: 10 occurrences of `[trifecta-provincial]` log prefix inside `wsgs_dispatch.sh` swept to `[wsgs-dispatch]` (sed only matched filenames originally, not hyphenated prose). Fold into the rename commit history. +- [x] `devtools::test()` — 1172 PASS / 0 FAIL. +- [x] Reverted M1 to main + dropped the test branch (for now); will re-pull when PR lands. — see Phase 6 cross-repo note below. ## Phase 6 — Cross-repo rtj update From 1366b95106a0772144bf69e0251297cb5405cf1e Mon Sep 17 00:00:00 2001 From: almac2022 Date: Thu, 14 May 2026 18:03:51 -0700 Subject: [PATCH 8/9] Release v0.38.0 --- DESCRIPTION | 2 +- NEWS.md | 25 +++++++++++++++++++++++++ planning/active/progress.md | 4 +++- planning/active/task_plan.md | 17 +++++++++-------- 4 files changed, 38 insertions(+), 10 deletions(-) diff --git a/DESCRIPTION b/DESCRIPTION index eb9209ca..dfb7cd81 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: link Title: Stream Network Habitat Interpretation (Experimental) -Version: 0.37.0 +Version: 0.38.0 Date: 2026-05-14 Authors@R: c( person("Allan", "Irvine", , "airvine@newgraphenvironment.com", diff --git a/NEWS.md b/NEWS.md index c80135d0..23e957fc 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,28 @@ +# link 0.38.0 + +Provincial-run autonomy CLI + 8 operational-script renames to noun_verb convention. Closes [#172](https://github.com/NewGraphEnvironment/link/issues/172). Builds on v0.37.0's #168 decouple — with PG-state resume in place, the autonomy surface stays thin and the renames stay mechanical. + +- **Single-command autonomous run.** `wsgs_run_pipeline.sh` (was `province_run.sh`) accepts `--wsgs=A,B,C`, `--config=`, `--schema=`, `--no-cyphers`, `--force`, forwards to `wsgs_dispatch.sh` (was `trifecta_provincial.sh`) which intersects the WSG subset in its LPT split. M4+M1-only baseline validated end-to-end: 16-WSG default-bundle dispatch lands 16/16 in `fresh_default.streams` on M4, ~20 min wall, no operator prompts. +- **Step 0 pre-clean.** When `--schema=` is set, umbrella fires `state_clean.sh --schemas=` on every host before Step 1. Drops only the target schema (skips the canonical-fresh heuristic + snapshot reload). Eliminates a class of consolidate failures where stale leftover WSGs on a source host collided with destination data during pg_restore. +- **Scoped `state_clean.sh` (was `province_clean.sh`).** New `--schemas=A,B,C` mode drops only the listed schemas. Empty `--schemas=` rejected loud to prevent dynamic-arg silent fall-through to the destructive default mode. +- **Phantom-cy + error-surface fixes in `wsgs_dispatch.sh`.** R's `paste0("cy", integer(0))` returns `"cy"` length-1 (constant recycling) — would put a non-existent cypher in the host plan under `--no-cyphers`. Three-branched `cy_host_keys`. Empty `CY_WORKSPACES` init via explicit `CY_WS_ARR=()` (was `read -r -a` yielding single-empty-element). `SPLIT_OUT=$(Rscript ...)` wrapped with explicit `||` block so R-side `stop()` messages reach the operator (e.g. `--wsgs=BOGUS` surfaces the R error verbatim instead of silent abort). +- **8 rename mapping (`git mv` preserves `git log --follow`).** Names now describe scope honestly — these scripts work for any list of WSGs / any host count / any reference: + +| Old | New | +|---|---| +| `data-raw/province_run.sh` | `data-raw/wsgs_run_pipeline.sh` | +| `data-raw/province_clean.sh` | `data-raw/state_clean.sh` | +| `data-raw/province_progress.sh` | `data-raw/progress_check.sh` | +| `data-raw/trifecta_provincial.sh` | `data-raw/wsgs_dispatch.sh` | +| `data-raw/run_provincial_parity.R` | `data-raw/wsgs_run_host.R` | +| `data-raw/consolidate_schema.R` | `data-raw/schema_consolidate.R` | +| `data-raw/archive_provincial_runs.sh` | `data-raw/runs_archive.sh` | +| `data-raw/balance_provincial_buckets.R` | `data-raw/buckets_balance.R` | + +The `wsg_*` (singular, per-WSG functions from #168) vs `wsgs_*` (plural, collection-level orchestrators) distinction is now load-bearing in the naming. `compare_bcfishpass_wsg.R → wsg_compare.R` was renamed in #168. + +Filed-but-not-closed follow-ups: cypher integration testing (issue #172 Phase 2 + 3 acceptance — defer until M4+M1 baseline lands repeatably); LPT-fallback empty-bucket edge case when N_WSGs ≤ N_hosts without timing CSVs (pre-existing, not a #172 regression). + # link 0.37.0 Decouple bcfp comparison from the modelling pipeline. Closes [#168](https://github.com/NewGraphEnvironment/link/issues/168). The link package's deliverable — the per-WSG model in `.streams` + per-species habitat + barriers — now runs and is observable independently of any comparison framework. Comparison vs bcfishpass (or any future reference) is a diagnostic overlay that reads the persisted state and never gates whether the model itself ran. diff --git a/planning/active/progress.md b/planning/active/progress.md index adeafb35..5825aaa0 100644 --- a/planning/active/progress.md +++ b/planning/active/progress.md @@ -17,4 +17,6 @@ - Pre-existing limitation surfaced (consolidate stale-state collision) → resolved as part of #172 scope because the autonomy story requires it; the umbrella now genuinely runs end-to-end without operator handholding even when the cluster has leftover state. - **Phase 4 done.** 8 scripts renamed via `git mv` (preserves `git log --follow`). Bulk `sed -i ''` applied old→new substitutions across the live tree, then NEWS.md was reverted to keep historical entries sealed. Tree-wide grep for the 8 old names returns empty (after excluding NEWS.md, planning/archive/, planning/active/, data-raw/logs/). `/code-check` round 1 clean — 7 concerns verified including idempotency (no new name contains any old name as substring). - **Phase 5 done.** 3-WSG smoke via renamed `wsgs_run_pipeline.sh` ran end-to-end in 3m 13s, exit 0, consolidate clean. Pre-existing LPT-fallback edge case surfaced (1-WSG dispatch puts everything on M4 when no timing CSVs exist) — not a rename bug, documented for follow-up. Caught 10 cosmetic `[trifecta-provincial]` log prefixes inside `wsgs_dispatch.sh` (sed missed hyphenated prose); fixed to `[wsgs-dispatch]`. `devtools::test()` 1172 PASS / 0 FAIL. -- Next: Phase 6 — rtj cross-repo update to point cypher_run.sh at `wsgs_run_host.R`. +- **Phase 6 deferred to post-merge.** rtj `cypher_run.sh:8` only has a docstring reference to `run_provincial_parity.R` (no runtime hardcode — script takes the workload path as an argv). Update lands as part of `/gh-pr-merge` after this PR lands. +- **Phase 7 prep done.** `devtools::check()` 0 errors, same warning baseline as v0.37.0. DESCRIPTION 0.37.0 → 0.38.0. NEWS.md v0.38.0 entry written with full rename mapping + CLI surface + Step 0 pre-clean + phantom-cy fix + error-surface fix. +- Next: Release commit, `/planning-archive`, `/gh-pr-push`. diff --git a/planning/active/task_plan.md b/planning/active/task_plan.md index 49362376..1f961d65 100644 --- a/planning/active/task_plan.md +++ b/planning/active/task_plan.md @@ -75,19 +75,20 @@ Patch the original filename first so the smoke (Phase 3) validates on the known- - [x] `devtools::test()` — 1172 PASS / 0 FAIL. - [x] Reverted M1 to main + dropped the test branch (for now); will re-pull when PR lands. — see Phase 6 cross-repo note below. -## Phase 6 — Cross-repo rtj update +## Phase 6 — Cross-repo rtj update (DEFERRED to post-merge) -- [ ] In `~/Projects/repo/rtj`, update `scripts/cypher/cypher_run.sh` reference from `run_provincial_parity.R` → `wsgs_run_host.R`. +Lands after link's rename PR merges so `cypher_run.sh` never references a missing file on link/main. Will be done as part of `/gh-pr-merge` workflow. + +- [ ] In `~/Projects/repo/rtj`, update `scripts/cypher/cypher_run.sh` reference from `run_provincial_parity.R` → `wsgs_run_host.R` (only the docstring at `scripts/cypher/cypher_run.sh:8` references the old name — confirmed; the script itself takes the workload-script path as an arg so no runtime hardcode). - [ ] `bash -n scripts/cypher/cypher_run.sh` clean. -- [ ] Commit "scripts/cypher/cypher_run.sh: update for link wsgs_run_host.R rename" on rtj/main. -- [ ] Push to origin/main. -- [ ] Order: rtj commit lands **after** link's rename PR merges so cypher_run.sh never references a missing file on link/main. +- [ ] Commit + push on rtj/main. ## Phase 7 — Release v0.38.0 -- [ ] Update `DESCRIPTION` Version 0.37.0 → 0.38.0. -- [ ] Update `NEWS.md` with v0.38.0 entry covering CLI surface + 8 renames. -- [ ] Update `CLAUDE.md` if any rename touches its references. +- [x] `devtools::check()` — 0 errors, same warning baseline as v0.37.0 (3 warnings + 2 notes all pre-existing on main; no regression). +- [x] Update `DESCRIPTION` Version 0.37.0 → 0.38.0. +- [x] Update `NEWS.md` with v0.38.0 entry covering CLI surface (5 new flags), Step 0 pre-clean, scoped `state_clean.sh --schemas=`, phantom-cy / error-surface fixes, and the 8-rename mapping table. +- [x] `CLAUDE.md` already updated via Phase 4 sed pass (one line in the Status v0.29.0 entry mentioned `consolidate_schema`). - [ ] Commit "Release v0.38.0". - [ ] `/planning-archive` with slug `provincial-run-autonomy-renames`. - [ ] `/gh-pr-push` opens PR with SRED tag in body. From eae62bed4497bc444b225cb397da6fe838de48c4 Mon Sep 17 00:00:00 2001 From: almac2022 Date: Thu, 14 May 2026 18:04:27 -0700 Subject: [PATCH 9/9] Archive planning files for issue #172 --- .../README.md | 33 +++++++++++++++++++ .../findings.md | 0 .../progress.md | 0 .../task_plan.md | 0 4 files changed, 33 insertions(+) create mode 100644 planning/archive/2026-05-issue-172-provincial-run-autonomy-renames/README.md rename planning/{active => archive/2026-05-issue-172-provincial-run-autonomy-renames}/findings.md (100%) rename planning/{active => archive/2026-05-issue-172-provincial-run-autonomy-renames}/progress.md (100%) rename planning/{active => archive/2026-05-issue-172-provincial-run-autonomy-renames}/task_plan.md (100%) diff --git a/planning/archive/2026-05-issue-172-provincial-run-autonomy-renames/README.md b/planning/archive/2026-05-issue-172-provincial-run-autonomy-renames/README.md new file mode 100644 index 00000000..3715b3a7 --- /dev/null +++ b/planning/archive/2026-05-issue-172-provincial-run-autonomy-renames/README.md @@ -0,0 +1,33 @@ +# Provincial run autonomy + script renames (#172) — 2026-05-14 + +## Outcome + +Shipped v0.38.0 on top of v0.37.0 (#168 decouple). Two-axis change: + +1. **Autonomy CLI surface.** `wsgs_run_pipeline.sh` (was `province_run.sh`) accepts `--wsgs=A,B,C`, `--config=`, `--schema=`, `--no-cyphers`, `--force`, forwards to `wsgs_dispatch.sh` (was `trifecta_provincial.sh`). New Step 0 pre-clean fires `state_clean.sh --schemas=` when `--schema=` is set, eliminating the consolidate-stale-WSG class of failures. Phase 3 acceptance: 16/16 WSGs in `fresh_default.streams` on M4 after a single `bash data-raw/wsgs_run_pipeline.sh ...` invocation, ~20 min wall, exit 0, no operator prompts. +2. **8 mechanical renames to noun_verb.** `province_*` / `trifecta_*` / `consolidate_schema` / `archive_provincial_runs` / `balance_provincial_buckets` / `run_provincial_parity` → honest names that describe scope. Done via `git mv` so `git log --follow` preserves history. `compare_bcfishpass_wsg.R → wsg_compare.R` was already renamed in #168. + +Resulting naming family: +- `wsg_*` (singular, per-WSG functions from #168): `wsg_pipeline_run.R`, `wsg_compare.R`. +- `wsgs_*` (plural, collection-level orchestrators): `wsgs_run_host.R`, `wsgs_dispatch.sh`, `wsgs_run_pipeline.sh`. +- Mixed nouns for other wrappers: `state_clean.sh`, `progress_check.sh`, `runs_archive.sh`, `buckets_balance.R`, `schema_consolidate.R`. + +Side-fixes that landed because they were load-bearing for autonomy: +- Phantom-cy bug (R's `paste0("cy", integer(0))` returns `"cy"` length-1 via constant recycling). +- Empty `CY_WORKSPACES` init now explicit `CY_WS_ARR=()`. +- `SPLIT_OUT=$(Rscript ...)` wrapped with `||` block so R `stop()` errors reach the operator instead of silent abort. + +`/code-check` caught 3 real bugs over the phases: +- Phase 1: silent R-error abort (no operator-visible message) +- Phase 2: empty TARGET_SCHEMA fallback (masked misconfigured `--config=`) +- Phase 1.5: empty `--schemas=` silent fall-through to destructive default + +All fixed inline. + +## Filed-but-not-closed follow-ups + +- **Cypher integration tests** (issue #172 Phase 2 + 3 acceptance) — defer until M4+M1 baseline lands repeatably. Will file as new issue. +- **LPT-fallback empty-bucket edge case** when N_WSGs ≤ N_hosts without prior timing CSVs — pre-existing, not a #172 regression. Surfaces under `--wsgs=DEAD --config=bcfishpass --no-cyphers` (config without timing CSVs). +- **rtj `scripts/cypher/cypher_run.sh` ref** — only a docstring reference at line 8; updated post-merge as part of `/gh-pr-merge` workflow. + +Closed by: PR (TBD, branch `172-provincial-run-autonomy-renames`, tag `v0.38.0`). diff --git a/planning/active/findings.md b/planning/archive/2026-05-issue-172-provincial-run-autonomy-renames/findings.md similarity index 100% rename from planning/active/findings.md rename to planning/archive/2026-05-issue-172-provincial-run-autonomy-renames/findings.md diff --git a/planning/active/progress.md b/planning/archive/2026-05-issue-172-provincial-run-autonomy-renames/progress.md similarity index 100% rename from planning/active/progress.md rename to planning/archive/2026-05-issue-172-provincial-run-autonomy-renames/progress.md diff --git a/planning/active/task_plan.md b/planning/archive/2026-05-issue-172-provincial-run-autonomy-renames/task_plan.md similarity index 100% rename from planning/active/task_plan.md rename to planning/archive/2026-05-issue-172-provincial-run-autonomy-renames/task_plan.md