Skip to content

Provincial run autonomy: rename scripts to noun_verb + single 'approve once, builds end-to-end' wrapper #172

@NewGraphEnvironment

Description

@NewGraphEnvironment

Problem

After today's session (PR #171, v0.36.1) the operational scripts work but are scattered, inconsistently named, and require operator handholding mid-run. We want a single command — approved once — that runs end-to-end and lands clean output. M4+M1 only as the validated baseline; cyphers as opt-in after the baseline lands repeatably.

Current naming violations

The link script family is noun_verb (per NewGraphEnvironment/soul#46, matches rtj's cypher_up.sh / cypher_down.sh / cypher_run.sh). Several existing scripts violate this:

Current Convention Proposed rename
trifecta_provincial.sh noun_verb wsgs_dispatch.sh
run_provincial_parity.R noun_verb wsgs_run.R
compare_bcfishpass_wsg.R noun_verb wsg_compare.R
consolidate_schema.R noun_verb schema_consolidate.R
archive_provincial_runs.sh noun_verb runs_archive.sh
balance_provincial_buckets.R noun_verb buckets_balance.R
province_run.sh (already correct) unchanged
province_clean.sh (already correct) unchanged
province_progress.sh (already correct) unchanged

Plus the names lie about scope — these scripts run any list of WSGs, not just "provincial".

Goals

  1. Single-command autonomous run. Operator approves bash data-raw/<umbrella>.sh ... once; everything inside (state-clean → snapshot → dispatch → pull → consolidate → burn cyphers if any) runs without further prompts.
  2. Any WSG list. --wsgs=A,B,C accepted at the umbrella level, auto-split via LPT across configured hosts.
  3. Any config bundle. --config=default or --config=bcfishpass (default), --schema=<name> for output schema.
  4. Any host subset. --no-cyphers (M4+M1 only) for the validated baseline; --cy-workspaces=job1,job2,job3 for full distributed.
  5. Rename for honesty. No more "provincial" / "trifecta" / "bcfishpass" in script names that work for any list/host count/reference.

Acceptance

  • Phase 1 baseline: bash data-raw/province_run.sh --wsgs=<16-WSG-test-set> --config=default --schema=fresh_default --no-cyphers --with-mapping-code runs ~30 min wall on M4+M1, lands 16 WSGs in fresh_default.streams on M4, no operator prompts, exit code 0.
  • Phase 2 cypher integration: same command with --cy-workspaces=job1 (single cypher) runs end-to-end + burns the cypher on completion.
  • Phase 3 full distributed: same command with --cy-workspaces=job1,job2,job3 runs full 217-WSG provincial.
  • All scripts renamed per noun_verb convention; git log --follow works (renames are pure renames, not delete+add).
  • Internal references updated (source() calls, doc references, README, runbook, PWF, post_compact handoff).
  • Tests pass (devtools::test()).
  • bash -n syntax-clean on all 4 shell scripts.

Out of scope (separate issues)

Reference

  • Today's session learnings: planning/active/findings.md (12 gotchas) + progress.md
  • Convention reference: NewGraphEnvironment/soul#46
  • The 16-WSG test set: CARP, CRKD, FINA, FINL, FIRE, FOXR, INGR, LOMI, MESI, NATR, OSPK, PARA, PARS, PCEA, TOOD, UOMI

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions