You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After today's session (PR #171, v0.36.1) the operational scripts work but are scattered, inconsistently named, and require operator handholding mid-run. We want a single command — approved once — that runs end-to-end and lands clean output. M4+M1 only as the validated baseline; cyphers as opt-in after the baseline lands repeatably.
Current naming violations
The link script family is noun_verb (per NewGraphEnvironment/soul#46, matches rtj's cypher_up.sh / cypher_down.sh / cypher_run.sh). Several existing scripts violate this:
Current
Convention
Proposed rename
trifecta_provincial.sh
noun_verb
wsgs_dispatch.sh
run_provincial_parity.R
noun_verb
wsgs_run.R
compare_bcfishpass_wsg.R
noun_verb
wsg_compare.R
consolidate_schema.R
noun_verb
schema_consolidate.R
archive_provincial_runs.sh
noun_verb
runs_archive.sh
balance_provincial_buckets.R
noun_verb
buckets_balance.R
province_run.sh
(already correct)
unchanged
province_clean.sh
(already correct)
unchanged
province_progress.sh
(already correct)
unchanged
Plus the names lie about scope — these scripts run any list of WSGs, not just "provincial".
Goals
Single-command autonomous run. Operator approves bash data-raw/<umbrella>.sh ... once; everything inside (state-clean → snapshot → dispatch → pull → consolidate → burn cyphers if any) runs without further prompts.
Any WSG list.--wsgs=A,B,C accepted at the umbrella level, auto-split via LPT across configured hosts.
Any config bundle.--config=default or --config=bcfishpass (default), --schema=<name> for output schema.
Any host subset.--no-cyphers (M4+M1 only) for the validated baseline; --cy-workspaces=job1,job2,job3 for full distributed.
Rename for honesty. No more "provincial" / "trifecta" / "bcfishpass" in script names that work for any list/host count/reference.
Acceptance
Phase 1 baseline: bash data-raw/province_run.sh --wsgs=<16-WSG-test-set> --config=default --schema=fresh_default --no-cyphers --with-mapping-code runs ~30 min wall on M4+M1, lands 16 WSGs in fresh_default.streams on M4, no operator prompts, exit code 0.
Phase 2 cypher integration: same command with --cy-workspaces=job1 (single cypher) runs end-to-end + burns the cypher on completion.
Phase 3 full distributed: same command with --cy-workspaces=job1,job2,job3 runs full 217-WSG provincial.
All scripts renamed per noun_verb convention; git log --follow works (renames are pure renames, not delete+add).
Problem
After today's session (PR #171, v0.36.1) the operational scripts work but are scattered, inconsistently named, and require operator handholding mid-run. We want a single command — approved once — that runs end-to-end and lands clean output. M4+M1 only as the validated baseline; cyphers as opt-in after the baseline lands repeatably.
Current naming violations
The
linkscript family is noun_verb (per NewGraphEnvironment/soul#46, matches rtj'scypher_up.sh/cypher_down.sh/cypher_run.sh). Several existing scripts violate this:trifecta_provincial.shwsgs_dispatch.shrun_provincial_parity.Rwsgs_run.Rcompare_bcfishpass_wsg.Rwsg_compare.Rconsolidate_schema.Rschema_consolidate.Rarchive_provincial_runs.shruns_archive.shbalance_provincial_buckets.Rbuckets_balance.Rprovince_run.shprovince_clean.shprovince_progress.shPlus the names lie about scope — these scripts run any list of WSGs, not just "provincial".
Goals
bash data-raw/<umbrella>.sh ...once; everything inside (state-clean → snapshot → dispatch → pull → consolidate → burn cyphers if any) runs without further prompts.--wsgs=A,B,Caccepted at the umbrella level, auto-split via LPT across configured hosts.--config=defaultor--config=bcfishpass(default),--schema=<name>for output schema.--no-cyphers(M4+M1 only) for the validated baseline;--cy-workspaces=job1,job2,job3for full distributed.Acceptance
bash data-raw/province_run.sh --wsgs=<16-WSG-test-set> --config=default --schema=fresh_default --no-cyphers --with-mapping-coderuns ~30 min wall on M4+M1, lands 16 WSGs infresh_default.streamson M4, no operator prompts, exit code 0.--cy-workspaces=job1(single cypher) runs end-to-end + burns the cypher on completion.--cy-workspaces=job1,job2,job3runs full 217-WSG provincial.git log --followworks (renames are pure renames, not delete+add).source()calls, doc references, README, runbook, PWF, post_compact handoff).devtools::test()).bash -nsyntax-clean on all 4 shell scripts.Out of scope (separate issues)
lnk_persist_init(Simplify lnk_persist_init after clean cypher snapshot lands #169 after rtj#145)Reference
planning/active/findings.md(12 gotchas) +progress.md