Skip to content

ops-hardening: trifecta_provincial.sh patches + province_run/clean/progress.sh wrappers#171

Merged
NewGraphEnvironment merged 1 commit into
mainfrom
ops-hardening-20260514
May 14, 2026
Merged

ops-hardening: trifecta_provincial.sh patches + province_run/clean/progress.sh wrappers#171
NewGraphEnvironment merged 1 commit into
mainfrom
ops-hardening-20260514

Conversation

@NewGraphEnvironment
Copy link
Copy Markdown
Owner

Summary

Operational hardening from the 2026-05-13 → 2026-05-14 provincial dispatch session. Three new top-level scripts (province_run.sh / province_clean.sh / province_progress.sh) + a batch of hot patches to trifecta_provincial.sh that the session surfaced as needed.

  • trifecta_provincial.sh: M1 reverse-forward tunnel (sidesteps M1's passphrase-protected db_newgraph key); M4 inline-tunnel block (idempotent); LPT fallback uses host_speeds-weighted split when no timing CSV; HOST_SPEEDS recalibrated to time-multiplier semantics (m4=1.0, m1=0.79, cy=1.23) from real per-WSG medians (m4 ≈ 101s, m1 ≈ 80–105s, cy ≈ 124s).
  • NEW data-raw/province_run.sh: top-level 10-step wrapper with trap-EXIT cypher burn. Drafted ready for --smoke-only regression-test mode in a follow-up.
  • NEW data-raw/province_clean.sh: idempotent multi-host state wipe (<5 min wall).
  • NEW data-raw/province_progress.sh: mtime-based progress probe (no cross-host TZ-glob hell).
  • PWF: planning/active/{task_plan,findings,progress}.md capturing 12 distinct gotchas (M1 ssh key passphrase, M1 tailnet ~1.7 MB/s, LPT fallback ignored host_speeds, HOST_SPEEDS inverted, stale cypher snapshots, pkill -f Rscript missing R --no-echo subprocess, RDS-cache-skip in run_provincial_parity.R, etc.) + wrapper-test strategy.

Deliverable

M4 fresh.streams ends the session with 217 distinct WSGs = full BC stream network model. Annotated parity CSV in data-raw/logs/provincial_parity/20260514_0622_*_annotated.csv (4,739 rows).

Related Issues

  • Relates to NewGraphEnvironment/sred-2025-2026#24
  • Follow-up issues filed during the session (not closed by this PR):
    • link#167 — bcfp tunnel autossh (multi-hour run stability)
    • link#168 — decouple bcfp compare from link pipeline run
    • link#169 — simplify lnk_persist_init after rtj#145
    • link#170 — S3-based consolidate (route pg_dumps through s3://newgraph/)
    • rtj#145 — rebuild cypher snapshot with fwa dump tables ONLY
    • fresh#199 (reopened) — M4 PG over-tuning evidence + fix-up plan

Test plan

  • bash data-raw/province_progress.sh runs cleanly and reports current state (when there is one)
  • bash data-raw/province_clean.sh --help (or invocation) wipes fresh + working_* + reloads modelled_stream_crossings on all 5 hosts in <5 min
  • bash -n data-raw/province_run.sh syntax clean (lint)
  • Verify M4 fresh.streams has 217 distinct WSGs (SELECT count(DISTINCT watershed_group_code) FROM fresh.streams)
  • PWF readable (planning/active/findings.md gotcha list, progress.md chronology)
  • CI: pkgdown / R-CMD-check pass

Notes

Operational artifacts (per-dispatch logs in data-raw/logs/2026051*_trifecta_provincial_*, RDS files in data-raw/logs/provincial_parity/) intentionally not committed — they're per-run outputs, not source.

bcfp_baselines.csv updated with the session's append-only stamps (13 new rows). Some are from abandoned/superseded dispatch attempts — accurate chronology, dense but not misleading. Findings.md cross-references which were load-bearing.

Wrapper not yet tested end-to-end via --smoke-only; that's the first item for the next session before another provincial run.

Generated with Claude Code

…ogress.sh wrappers

Hot patches landed during 2026-05-13 provincial dispatch session and new
operational tooling. Final result: 217-WSG BC stream network model in M4
fresh schema after a chaotic multi-attempt session that surfaced 12 distinct
gotchas (captured in planning/active/findings.md).

trifecta_provincial.sh:
- M1 reverse-forward tunnel via `ssh -R 63333:127.0.0.1:63333` so M1 doesn't
  need its own (passphrase-protected) db_newgraph identity to reach bcfp.
- M4 inline-tunnel block (idempotent — bind-fail is harmless when operator's
  persistent tunnel is already up.
- LPT fallback now uses host_speeds-weighted alphabetical split when no
  per_wsg_times.csv exists (instead of equal  which ignored
  host_speeds entirely).
- HOST_SPEEDS recalibrated: m4=1.0, m1=0.79, cy=1.23 with explicit
  time-multiplier semantics in the comment (larger=slower=fewer WSGs).
  Original cy=1.83 default was directionally correct but magnitude was off;
  intermediate cy=0.7 patch had inverted semantics. Calibrated from real
  per-WSG medians on 5-host dispatch: m4=101s, m1=80-105s, cy=124s.

New scripts (noun_verb naming family):
- province_run.sh: top-level 10-step wrapper (pre-flight, snapshot, spin,
  prep, archive, smoke, dispatch, acceptance, consolidate, burn) with
  trap-EXIT cypher burn that fires regardless of mid-flight failure.
- province_clean.sh: idempotent multi-host state wipe (kills lingering
  R/Rscript, drops fresh + working_* + fresh_<bundle>* schemas, reloads
  fresh.modelled_stream_crossings via snapshot_bcfp.sh --force). <5 min wall.
- province_progress.sh: mtime-based progress probe across all hosts.
  Avoids cross-host date-glob hell (cyphers run UTC, M4/M1 run PDT).

Docs:
- research/post_compact_provincial_handoff.md: added tunnel architecture
  gotcha section (how each host reaches bcfp: M4 persistent, M1 via -R
  reverse-forward, cyphers via inline -L wrapper) + LPT fallback gotcha
  section (host_speeds-weighted split fix).
- planning/active/{task_plan,findings,progress}.md: full PWF capture of
  the session, 12 gotchas, wrapper test strategy.

Issues filed (not closed here — referenced for context):
- link#167 bcfp tunnel autossh
- link#168 decouple bcfp compare from link pipeline run
- link#169 simplify lnk_persist_init after rtj#145
- link#170 S3-based consolidate
- rtj#145 rebuild cypher snapshot with fwa dump tables ONLY
- fresh#199 reopened (M4 PG over-tuning)

bcfp_baselines.csv: 13 new append-only stamps from today's session — accurate
chronology of each dispatch attempt (including abandoned/superseded ones).
Cross-references findings.md for context on which were the load-bearing runs.

NOT committed:
- data-raw/logs/2026051*_trifecta_provincial_* (operational log artifacts)
- data-raw/logs/aborted/, data-raw/logs/provincial_parity/ (run-specific outputs)
- .claude/settings.local.json (local user prefs)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
EOF
)
@NewGraphEnvironment NewGraphEnvironment merged commit 7f03ef9 into main May 14, 2026
1 check passed
@NewGraphEnvironment NewGraphEnvironment deleted the ops-hardening-20260514 branch May 14, 2026 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant