NewGraphEnvironment · NewGraphEnvironment · May 15, 2026 · May 14, 2026 · May 14, 2026 · May 14, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -15,7 +15,7 @@ Infrastructure stack stabilized; methodology research is the active focus. Recen
 
 - **v0.27.0** (#114, #45) — `cfg$pipeline$gradient_classes` config knob + per-species filter derivation in `prep_minimal`. Bit-identical bcfp parity by default; lets bundles override the break vector.
 - **v0.28.0** (#119, #45-followup) — Orphan-class break source: classes below all species' access thresholds enter `gradient_barriers_minimal` as a shared `barriers_orphan` table (segmentation only, no access semantics). New `default_extrabreaks` bundle. Province-wide methodology delta (231 vs 225 WSGs): SK spawn +6.7%, BT/RB/ST/WCT/GR spawn +1–2%, CH/ST rear -11%, BT/CO/GR rear -3 to -7%. "Ceiling sub-segment" mechanism — flat parts of mixed reaches now pass spawn; steep pockets previously folded into rear via averaging exceed rear ceiling as standalone segments.
-- **v0.29.0** (#120, #118) — DB hygiene: `cleanup_working = TRUE` in `compare_bcfishpass_wsg` drops working schemas after rollup; `keep_source = FALSE` in `consolidate_schema` drops source schemas after pg_restore. Prevents the disk-full incident that crashed cypher 2026-05-04.
+- **v0.29.0** (#120, #118) — DB hygiene: `cleanup_working = TRUE` in `compare_bcfishpass_wsg` drops working schemas after rollup; `keep_source = FALSE` in `schema_consolidate` drops source schemas after pg_restore. Prevents the disk-full incident that crashed cypher 2026-05-04.
 
 Cypher recovered (volumes nuked); needs fwapg reload before next provincial run. Methodology research is active: `default_extrabreaks` proved the mechanism; next variants include spawn-only / rear-only break vectors (each adds ~4 new classes; per-WSG cost ~1.3–1.4× vs default) and the channel-class breaks research (#52, blocked on fresh-side helper).
 

diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: link
 Title: Stream Network Habitat Interpretation (Experimental)
-Version: 0.37.0
+Version: 0.38.0
 Date: 2026-05-14
 Authors@R: c(
     person("Allan", "Irvine", , "airvine@newgraphenvironment.com",

diff --git a/NEWS.md b/NEWS.md
@@ -1,3 +1,28 @@
+# link 0.38.0
+
+Provincial-run autonomy CLI + 8 operational-script renames to noun_verb convention. Closes [#172](https://github.com/NewGraphEnvironment/link/issues/172). Builds on v0.37.0's #168 decouple — with PG-state resume in place, the autonomy surface stays thin and the renames stay mechanical.
+
+- **Single-command autonomous run.** `wsgs_run_pipeline.sh` (was `province_run.sh`) accepts `--wsgs=A,B,C`, `--config=<name>`, `--schema=<name>`, `--no-cyphers`, `--force`, forwards to `wsgs_dispatch.sh` (was `trifecta_provincial.sh`) which intersects the WSG subset in its LPT split. M4+M1-only baseline validated end-to-end: 16-WSG default-bundle dispatch lands 16/16 in `fresh_default.streams` on M4, ~20 min wall, no operator prompts.
+- **Step 0 pre-clean.** When `--schema=` is set, umbrella fires `state_clean.sh --schemas=<schema>` on every host before Step 1. Drops only the target schema (skips the canonical-fresh heuristic + snapshot reload). Eliminates a class of consolidate failures where stale leftover WSGs on a source host collided with destination data during pg_restore.
+- **Scoped `state_clean.sh` (was `province_clean.sh`).** New `--schemas=A,B,C` mode drops only the listed schemas. Empty `--schemas=` rejected loud to prevent dynamic-arg silent fall-through to the destructive default mode.
+- **Phantom-cy + error-surface fixes in `wsgs_dispatch.sh`.** R's `paste0("cy", integer(0))` returns `"cy"` length-1 (constant recycling) — would put a non-existent cypher in the host plan under `--no-cyphers`. Three-branched `cy_host_keys`. Empty `CY_WORKSPACES` init via explicit `CY_WS_ARR=()` (was `read -r -a` yielding single-empty-element). `SPLIT_OUT=$(Rscript ...)` wrapped with explicit `||` block so R-side `stop()` messages reach the operator (e.g. `--wsgs=BOGUS` surfaces the R error verbatim instead of silent abort).
+- **8 rename mapping (`git mv` preserves `git log --follow`).** Names now describe scope honestly — these scripts work for any list of WSGs / any host count / any reference:
+
+| Old | New |
+|---|---|
+| `data-raw/province_run.sh` | `data-raw/wsgs_run_pipeline.sh` |
+| `data-raw/province_clean.sh` | `data-raw/state_clean.sh` |
+| `data-raw/province_progress.sh` | `data-raw/progress_check.sh` |
+| `data-raw/trifecta_provincial.sh` | `data-raw/wsgs_dispatch.sh` |
+| `data-raw/run_provincial_parity.R` | `data-raw/wsgs_run_host.R` |
+| `data-raw/consolidate_schema.R` | `data-raw/schema_consolidate.R` |
+| `data-raw/archive_provincial_runs.sh` | `data-raw/runs_archive.sh` |
+| `data-raw/balance_provincial_buckets.R` | `data-raw/buckets_balance.R` |
+
+The `wsg_*` (singular, per-WSG functions from #168) vs `wsgs_*` (plural, collection-level orchestrators) distinction is now load-bearing in the naming. `compare_bcfishpass_wsg.R → wsg_compare.R` was renamed in #168.
+
+Filed-but-not-closed follow-ups: cypher integration testing (issue #172 Phase 2 + 3 acceptance — defer until M4+M1 baseline lands repeatably); LPT-fallback empty-bucket edge case when N_WSGs ≤ N_hosts without timing CSVs (pre-existing, not a #172 regression).
+
 # link 0.37.0
 
 Decouple bcfp comparison from the modelling pipeline. Closes [#168](https://github.com/NewGraphEnvironment/link/issues/168). The link package's deliverable — the per-WSG model in `<persist_schema>.streams` + per-species habitat + barriers — now runs and is observable independently of any comparison framework. Comparison vs bcfishpass (or any future reference) is a diagnostic overlay that reads the persisted state and never gates whether the model itself ran.

diff --git a/R/utils.R b/R/utils.R
@@ -193,7 +193,7 @@
 #' Probes `<persist_schema>.streams` for `watershed_group_code = aoi`.
 #' Returns `TRUE` when the table exists and has at least one row for
 #' the WSG; `FALSE` when the table is absent OR has no rows for the
-#' WSG. Used by the orchestrator loop (run_provincial_parity.R) as the
+#' WSG. Used by the orchestrator loop (wsgs_run_host.R) as the
 #' canonical resume check — PG state is authoritative, RDS files are
 #' diagnostic side-artifacts.
 #'

diff --git a/data-raw/README.md b/data-raw/README.md
@@ -76,10 +76,10 @@ The dispatch hierarchy: trifecta → run_provincial → compare_wsg.
 
 | Script | Calls | Purpose |
 |--------|-------|---------|
-| `trifecta_provincial.sh` | `run_provincial_parity.R` (×N hosts) | M4 + M1 + N-cypher orchestrator. Inline LPT bucket allocation (reads `_per_wsg_times.csv` from prior runs, computes balanced split using `--host-speeds=`), pre-flight version check across all hosts, parallel dispatch, RDS pull-back, post-pull `lnk_parity_annotate` against the divergence taxonomy. See "Provincial dispatch" section below for full flag reference + gotchas. |
+| `wsgs_dispatch.sh` | `wsgs_run_host.R` (×N hosts) | M4 + M1 + N-cypher orchestrator. Inline LPT bucket allocation (reads `_per_wsg_times.csv` from prior runs, computes balanced split using `--host-speeds=`), pre-flight version check across all hosts, parallel dispatch, RDS pull-back, post-pull `lnk_parity_annotate` against the divergence taxonomy. See "Provincial dispatch" section below for full flag reference + gotchas. |
 | `trifecta_15wsg.sh` | same | 15-WSG smoke variant (legacy 3-host, hardcoded WSG list). |
-| `trifecta_smoke.sh` | `trifecta_provincial.sh` | N-host smoke shim: one small WSG per host, ~3 min wall. See `Provincial dispatch` section. |
-| `run_provincial_parity.R` | `compare_bcfishpass_wsg.R` per WSG | Single-host provincial dispatcher. Loops every WSG in `wsg_species_presence`, saves per-WSG RDS, emits per-WSG times CSV. After the loop, optionally annotates the host's bucket against `research/bcfp_divergence_taxonomy.yml` (writes `<TS>_<host>_annotated.csv`). Accepts `--wsgs=`, `--config=`, `--schema=`, `--rds-dir=`, `--with-mapping-code`. |
+| `trifecta_smoke.sh` | `wsgs_dispatch.sh` | N-host smoke shim: one small WSG per host, ~3 min wall. See `Provincial dispatch` section. |
+| `wsgs_run_host.R` | `compare_bcfishpass_wsg.R` per WSG | Single-host provincial dispatcher. Loops every WSG in `wsg_species_presence`, saves per-WSG RDS, emits per-WSG times CSV. After the loop, optionally annotates the host's bucket against `research/bcfp_divergence_taxonomy.yml` (writes `<TS>_<host>_annotated.csv`). Accepts `--wsgs=`, `--config=`, `--schema=`, `--rds-dir=`, `--with-mapping-code`. |
 | `compare_bcfishpass_wsg.R` | `lnk_pipeline_*` family | Single-WSG end-to-end runner. Sources both connections (local fwapg + bcfp tunnel), runs the 6-phase pipeline, persists, emits comparison rollup tibble (link vs bcfp). The atomic unit of work in every multi-WSG run above. |
 
 ## Pipeline support
@@ -88,14 +88,14 @@ Run-adjacent helpers (planning, consolidation across hosts).
 
 | Script | Purpose |
 |--------|---------|
-| `balance_provincial_buckets.R` | Standalone LPT planner for the 3-host case. Reads per-host wall times from prior runs and prints buckets ready to paste into `trifecta_provincial.sh --m4-bucket=…`. **Superseded for the N-host orchestrator** — `trifecta_provincial.sh` now computes the LPT plan inline at dispatch time using the same algorithm. Kept here for one-off planning + cross-checks. Dedups `(wsg, host)` and across hosts before LPT so multi-run CSV accumulation doesn't double-assign WSGs. |
-| `consolidate_schema.R` | pg_dump from M1 + cypher → scp to M4 → pg_restore --data-only. Bucket-aware destination cleanup (DELETEs each source host's WSG bucket from destination tables before restore — avoids duplicate-key violations on re-consolidation). `ok = TRUE` requires pg_restore rc=0 AND post-restore row count > 0; rc=0 with empty schema flags as failure. |
-| `archive_provincial_runs.sh` | Moves the current top-level `_per_wsg_times.csv` + `*.rds` + `*_annotated.csv` artifacts in `provincial_<bundle>/` to `archive/<TS>/`. Operator cadence: run between provincial runs when you want the LPT planner to use the most recent run only. Skip to median-over multiple recent runs. |
-| `trifecta_smoke.sh` | Thin shim over `trifecta_provincial.sh` — one small WSG per host (m4→DEAD, m1→ELKR, cyN→ADMS/BABL/BULL). ~3 min wall. Exercises every orchestrator code path (preflight, dispatch, tunnel, RDS pull-back, annotation) before committing to a 200-WSG run. All flags pass through (e.g. `--cy-workspaces=`, `--with-mapping-code`). |
+| `buckets_balance.R` | Standalone LPT planner for the 3-host case. Reads per-host wall times from prior runs and prints buckets ready to paste into `wsgs_dispatch.sh --m4-bucket=…`. **Superseded for the N-host orchestrator** — `wsgs_dispatch.sh` now computes the LPT plan inline at dispatch time using the same algorithm. Kept here for one-off planning + cross-checks. Dedups `(wsg, host)` and across hosts before LPT so multi-run CSV accumulation doesn't double-assign WSGs. |
+| `schema_consolidate.R` | pg_dump from M1 + cypher → scp to M4 → pg_restore --data-only. Bucket-aware destination cleanup (DELETEs each source host's WSG bucket from destination tables before restore — avoids duplicate-key violations on re-consolidation). `ok = TRUE` requires pg_restore rc=0 AND post-restore row count > 0; rc=0 with empty schema flags as failure. |
+| `runs_archive.sh` | Moves the current top-level `_per_wsg_times.csv` + `*.rds` + `*_annotated.csv` artifacts in `provincial_<bundle>/` to `archive/<TS>/`. Operator cadence: run between provincial runs when you want the LPT planner to use the most recent run only. Skip to median-over multiple recent runs. |
+| `trifecta_smoke.sh` | Thin shim over `wsgs_dispatch.sh` — one small WSG per host (m4→DEAD, m1→ELKR, cyN→ADMS/BABL/BULL). ~3 min wall. Exercises every orchestrator code path (preflight, dispatch, tunnel, RDS pull-back, annotation) before committing to a 200-WSG run. All flags pass through (e.g. `--cy-workspaces=`, `--with-mapping-code`). |
 
-## Provincial dispatch (`trifecta_provincial.sh`)
+## Provincial dispatch (`wsgs_dispatch.sh`)
 
-The flagship orchestrator. Dispatches `run_provincial_parity.R` across
+The flagship orchestrator. Dispatches `wsgs_run_host.R` across
 M4 + M1 + N cyphers in parallel, pulls RDS files back, and emits a
 province-wide annotated CSV.
 
@@ -106,22 +106,22 @@ cd ~/Projects/repo/link/data-raw
 
 # Optional: archive prior run's CSVs first if you want LPT to plan
 # against this run only (not median-of-recent-runs):
-./archive_provincial_runs.sh
+./runs_archive.sh
 
 # Smoke-test first (~3 min, one small WSG per host) — catches preflight,
 # tunnel, dispatch, and annotation surprises before the full run:
 ./trifecta_smoke.sh                                     # 3-host smoke
 ./trifecta_smoke.sh --cy-workspaces=job1,job2,job3      # 5-host smoke
 
 # Full run:
-./trifecta_provincial.sh                                # 3-host default
-./trifecta_provincial.sh --cy-workspaces=job1,job2,job3 # 5-host
+./wsgs_dispatch.sh                                # 3-host default
+./wsgs_dispatch.sh --cy-workspaces=job1,job2,job3 # 5-host
 
 # Add per-segment mapping_code lens (+50% cost):
-./trifecta_provincial.sh --cy-workspaces=job1,job2,job3 --with-mapping-code
+./wsgs_dispatch.sh --cy-workspaces=job1,job2,job3 --with-mapping-code
 
 # Custom host-speed factors (lower = faster):
-./trifecta_provincial.sh --host-speeds=m4=1.0,m1=0.83,cy=1.83
+./wsgs_dispatch.sh --host-speeds=m4=1.0,m1=0.83,cy=1.83
 ```
 
 **Recommended cadence:** archive → smoke → full run. The smoke catches
@@ -240,7 +240,7 @@ Research scratchpad and one-off verification scripts.
 
 | Script | Purpose |
 |--------|---------|
-| `_targets.R` | The (legacy) targets pipeline that pre-dated `trifecta_provincial.sh`. Kept for the multi-WSG comparison harness. |
+| `_targets.R` | The (legacy) targets pipeline that pre-dated `wsgs_dispatch.sh`. Kept for the multi-WSG comparison harness. |
 | `exp_gradient_extra_breaks.R` | Experimental script that prototyped the orphan-class break source via in-line `frs_break_apply` before it was absorbed into `lnk_pipeline_prep_minimal()` (link v0.28.0). Kept as the smoke-test reference. |
 | `rule_flexibility_demo.R` / `rule_flexibility_render.R` | Demonstration of rules.yaml format flexibility, rendered as RMarkdown. |
 | `regress_dams_isolation.R` | One-off regression test for the dams-isolation work (link #109). |
@@ -266,9 +266,9 @@ discoverable without opening the file:
 
 Run artifacts land in subdirectories of `data-raw/logs/` keyed by topic:
 
-- `provincial_parity/`, `provincial_default/`, `provincial_default_extrabreaks/` — per-WSG RDS + per-WSG times CSV from `run_provincial_parity.R`.
+- `provincial_parity/`, `provincial_default/`, `provincial_default_extrabreaks/` — per-WSG RDS + per-WSG times CSV from `wsgs_run_host.R`.
 - `methodology_delta/` — schema-vs-schema delta RDS from `methodology_delta_query.R`.
-- `dumps_<schema>/` — pg_dump custom-format files from `consolidate_schema.R`.
+- `dumps_<schema>/` — pg_dump custom-format files from `schema_consolidate.R`.
 - `<TS>_*.txt` — orchestrator + per-host run logs.
 
 Reusable helper scripts read from these directories without hardcoded
@@ -290,7 +290,7 @@ single bundle's persistent schema. Rough footprints on a 232-WSG run:
 The 2026-05-04 cypher disk-full incident filled a 96 GB droplet with
 3 accumulated bundles + 60 working schemas at once. After the v0.29.0
 hygiene fixes (`compare_bcfishpass_wsg(cleanup_working = TRUE)` drops
-working schemas on completion; `consolidate_schema(keep_source = FALSE)`
+working schemas on completion; `schema_consolidate(keep_source = FALSE)`
 drops source persistent schema after successful pg_restore), a
 single-bundle-in-flight worker holds ~60 GB total — comfortable on the
 existing 96 GB cypher tier.

diff --git a/data-raw/balance_provincial_buckets.R → data-raw/buckets_balance.R b/data-raw/balance_provincial_buckets.R → data-raw/buckets_balance.R
@@ -9,7 +9,7 @@
 # cypher caught up. This script projects ~10-15 min savings per run.
 #
 # Usage:
-#   Rscript data-raw/balance_provincial_buckets.R
+#   Rscript data-raw/buckets_balance.R
 #
 # Hardcoded inputs (override above when re-running with new baseline):
 #   - Yesterday's host log paths (one per host)
@@ -23,16 +23,16 @@ suppressPackageStartupMessages({})
 
 logs_dir <- "/Users/airvine/Projects/repo/link/data-raw/logs"
 
-# Prefer per-WSG CSVs (emitted by run_provincial_parity.R after this script
+# Prefer per-WSG CSVs (emitted by wsgs_run_host.R after this script
 # was added). Fall back to text-log regex parsing for older runs.
 csvs <- list.files(file.path(logs_dir, "provincial_parity"),
                    pattern = "_per_wsg_times\\.csv$", full.names = TRUE)
 csvs <- c(csvs, list.files(file.path(logs_dir, "provincial_default"),
                            pattern = "_per_wsg_times\\.csv$", full.names = TRUE))
 if (length(csvs) == 0) {
-  m4_log <- file.path(logs_dir, "202605031423_trifecta_provincial_m4.txt")
-  m1_log <- file.path(logs_dir, "202605031423_trifecta_provincial_m1.txt")
-  cy_log <- "/Users/airvine/Projects/repo/rtj/scripts/cypher/logs/202605031423_cypher-run_202605031423_trifecta_provincial_cypher.txt"
+  m4_log <- file.path(logs_dir, "202605031423_wsgs_dispatch_m4.txt")
+  m1_log <- file.path(logs_dir, "202605031423_wsgs_dispatch_m1.txt")
+  cy_log <- "/Users/airvine/Projects/repo/rtj/scripts/cypher/logs/202605031423_cypher-run_202605031423_wsgs_dispatch_cypher.txt"
 }
 
 # Host speed factors (M4 = reference). Use yesterday's per-host MEAN

diff --git a/data-raw/province_progress.sh → data-raw/progress_check.sh b/data-raw/province_progress.sh → data-raw/progress_check.sh
@@ -1,5 +1,5 @@
 #!/usr/bin/env bash
-# province_progress.sh — report live progress of an in-flight provincial dispatch.
+# progress_check.sh — report live progress of an in-flight provincial dispatch.
 #
 # Reads each host's newest `_per_wsg_times.csv` (by mtime, NOT by date glob —
 # cypher logs use UTC, M4/M1 use local TZ; date-globbing across hosts breaks
@@ -9,9 +9,9 @@
 # plus a sample of the most recent completions to see what each host is on.
 #
 # Usage:
-#   bash data-raw/province_progress.sh [--cy-workspaces=job1,job2,job3] [--mtime-min=120]
+#   bash data-raw/progress_check.sh [--cy-workspaces=job1,job2,job3] [--mtime-min=120]
 #
-# Honors /tmp/cy_ips.env if present (set by trifecta_provincial.sh dispatch).
+# Honors /tmp/cy_ips.env if present (set by wsgs_dispatch.sh dispatch).
 # Otherwise derives cypher IPs from tofu state per workspace.
 #
 # --mtime-min=N : only consider CSVs modified in the last N minutes (default 240
@@ -85,11 +85,11 @@ done
 
 echo
 # Orchestrator-side state
-if pgrep -f "trifecta_provincial.sh" >/dev/null 2>&1; then
-  PID=$(pgrep -f "trifecta_provincial.sh" | head -1)
+if pgrep -f "wsgs_dispatch.sh" >/dev/null 2>&1; then
+  PID=$(pgrep -f "wsgs_dispatch.sh" | head -1)
   echo "dispatch process: ✓ PID=$PID (running)"
   # Find latest orchestrator log to extract bucket sizes if possible
-  LATEST_ORCH=$(find ~/Projects/repo/link/data-raw/logs -maxdepth 1 -name '*_trifecta_provincial_orchestrator.txt' -mmin -$MTIME_MIN | sort | tail -1)
+  LATEST_ORCH=$(find ~/Projects/repo/link/data-raw/logs -maxdepth 1 -name '*_wsgs_dispatch_orchestrator.txt' -mmin -$MTIME_MIN | sort | tail -1)
   if [ -n "$LATEST_ORCH" ]; then
     echo "orchestrator log: $LATEST_ORCH"
     grep -E "total WSGs|projected finish" "$LATEST_ORCH" 2>/dev/null | head -2