Skip to content

fix(main): wire pair and run subcommands (fixes #21)#28

Merged
blaspat merged 2 commits into
mainfrom
fix/21-wire-main-run-loop
Jun 11, 2026
Merged

fix(main): wire pair and run subcommands (fixes #21)#28
blaspat merged 2 commits into
mainfrom
fix/21-wire-main-run-loop

Conversation

@blaspat

@blaspat blaspat commented Jun 11, 2026

Copy link
Copy Markdown
Owner

Summary

The hermes-node binary was a stub — it loaded the config, printed one line, and exited 0. The supervisor, dispatcher, pinger, and handlers were all implemented and tested in isolation but had no entry point. This PR wires them up.

Fixes #21.

What changed

Two new subcommands:

  • hermes-node pair --server <wss-url> --token <token> [--config <path>] — write a fresh config.toml with the supplied values, mode 0600. Refuses to overwrite an existing config so a re-pair is explicit.
  • hermes-node run [--config <path>] — long-lived background service. Connects, serves exec/read/write calls, reconnects on drops with exponential backoff.

Config schema: gains a required [node].token field. config.Load() on Unix verifies the file is mode 0600 — a chmod slip during a manual edit won't silently expose the token. config.Save() is the pair-side counterpart.

Test seam: runRun(ctx, ...) takes a ctx directly so unit tests can drive the supervisor with a deadline-bounded context and assert on connection + shutdown paths without subprocess gymnastics. Production uses signal.NotifyContext.

End-to-end coverage

TestRun_ConnectsToServer stands up an httptest WebSocket server, seeds a config pointing at it, runs hermes-node run in a goroutine, waits for the supervisor's dial to land, and asserts on a clean shutdown when the ctx is cancelled.

Quinn's review

Quinn reviewed the first cut. This branch addresses all 3 warnings (W1-W3) and the suggestions worth shipping (S1, S2, S5 folded into W3, S6):

  • W1 — close previous *exec.Session on reconnect (bash PID leak). Setup now closes the previous session before opening a new one via a closure-captured prevSession + prevSessionMu.
  • W2defaultConfigPath returns an error. The silent "return config.toml" fallback on os.UserHomeDir failure was misleading; run() now surfaces it.
  • W3 + S5pair stdout message now warns the operator that the default [node].name (the config filename) will silently fail auth per PROTOCOL.md §3.4, and that the default empty [node].allowed_paths will silently reject every call per §3.5.
  • S1 — deleted dead driveFakeServer helper.
  • S2OnError now captures runtime/debug.Stack() so audit log and stderr both carry the trace.
  • S6 — added TestSave_CreatesFile0600 unit test (was only integration-tested).

Skipped per Quinn: S3, S4, S7 (not blockers).

Verification

  • go test -race ./... — 6/6 packages, stable across 3 consecutive runs
  • go vet ./... clean
  • go build ./... clean
  • ✅ Real-binary smoke: pair writes correct TOML with 0600, 0644 rejected, run dials + backs off + clean SIGTERM, audit log captures reconnect attempts
  • ✅ No secrets in diff — test fixtures use placeholder strings

Diff stat

 cmd/hermes-node/main.go        | 378 ++++++++++++++++++++++---
 cmd/hermes-node/main_test.go   | 494 ++++++++++++++++++++++++++++++++++
 internal/config/config.go      |  88 +++++-
 internal/config/config_test.go | 166 +++++++++++
 4 files changed, 1098 insertions(+), 28 deletions(-)

blaspat and others added 2 commits June 11, 2026 16:57
The hermes-node binary previously loaded the config, printed one
line, and exited 0 — the WSS client never started. The supervisor,
dispatcher, pinger, and handlers were all implemented and tested
in isolation but had no entry point.

This commit wires them up. Two subcommands:

  hermes-node pair --server <wss-url> --token <token> [--config <path>]
    Write a fresh config.toml with the supplied values, mode 0600.

  hermes-node run [--config <path>]
    Long-lived background service. Connects, serves exec/read/write
    calls, reconnects on drops with exponential backoff.

Config schema gains a [node].token field (required). The
config.Load() call on Unix verifies the file is mode 0600 — a
chmod slip during a manual edit won't silently expose the token.
config.Save() is the pair-side counterpart, refuses to overwrite
an existing config so a re-pair is explicit.

The run subcommand splits signal handling (production) from the
ctx-driven run loop (testable). runRunWithSignalCtx wires the OS
signal handler; runRun takes a ctx directly so unit tests can
drive the supervisor with a deadline-bounded ctx and assert on
the connection and shutdown paths without subprocess gymnastics.

End-to-end coverage: TestRun_ConnectsToServer stands up an
httptest WebSocket server, seeds a config pointing at it, runs
hermes-node run in a goroutine, waits for the supervisor's dial
to land, and asserts on a clean shutdown when the ctx is
cancelled.

Co-authored-by: Kate <kate@local>
Quinn's review of fix/21-wire-main-run-loop surfaced three warnings
(real bugs / DX issues) and several nice-to-haves. This commit
applies the ones worth shipping before the PR goes up.

W1 — close previous exec Session on reconnect
  Setup now closes the previous *Session before opening a new one
  via a closure-captured prevSession + prevSessionMu. Without this,
  a flaky-network operator leaked one bash PID per reconnect
  (Go does not reap subprocesses when a *Session reference is
  dropped). The leak is bounded only by process shutdown, so on
  a container with a tight ulimit it would bound the node's
  uptime.

W2 — defaultConfigPath returns an error
  The silent "return config.toml" fallback on os.UserHomeDir
  failure could mislead the operator with a "file not found"
  error whose path was a lie. Now matches defaultLogPath's
  error-return shape; run() surfaces the error for the
  run/pair subcommands and lets version/help work without it.

W3 + S5 — pair stdout message
  The default [node].name (the config filename) would silently
  fail auth per PROTOCOL.md §3.4. The default empty
  [node].allowed_paths would silently reject every call. The
  pair message now spells both out with the protocol citations.

S1 — delete dead driveFakeServer
  The helper was defined and never called; the file's preamble
  promised a synthetic-exec round-trip test that the code never
  delivered. Removed.

S2 — OnError captures debug.Stack
  The panic-recovery path in dispatch.go pays the cost of
  recovering, then OnError only captured the panic value. The
  hook now also captures runtime/debug.Stack so the audit log
  and stderr both carry the trace, not just "handler panic:
  index out of range". dispatch.go's own docstring pointed
  callers at this idiom.

S6 — TestSave_CreatesFile0600 unit test
  Coverage was previously only via the integration test
  (TestRun_PairSubcommand_WritesConfig). The unit test localises
  the mode guarantee to the Save() function itself.

No changes to file ownership, no secret material added.
All three checkboxes (tests, lint, build) green; 3 consecutive
go test ./... runs stable.

Co-authored-by: Kate <kate@local>
@blaspat blaspat merged commit 28ce458 into main Jun 11, 2026
@blaspat blaspat deleted the fix/21-wire-main-run-loop branch June 11, 2026 10:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

main.go is a stub: binary loads config, prints one line, and exits — the WSS client never starts

1 participant