Skip to content

fix(hermes): refresh seeded config files on every container boot#27

Closed
EnriqueCanals wants to merge 3 commits into
capotej:mainfrom
EnriqueCanals:fix/hermes-seed-refresh-config
Closed

fix(hermes): refresh seeded config files on every container boot#27
EnriqueCanals wants to merge 3 commits into
capotej:mainfrom
EnriqueCanals:fix/hermes-seed-refresh-config

Conversation

@EnriqueCanals
Copy link
Copy Markdown
Collaborator

Problem

entrypoint-hermes.sh seeds /etc/harness/hermes-defaults/<flavor>/ into ~/.hermes-<flavor>/ using cp -rn (no-clobber). That works on first boot, but on every subsequent boot the seed is silently a no-op — even if the image was rebuilt with updated defaults. Downstream deployers using a persistent volume on ~/.hermes-openrouter (the topology this README's fly.io section documents) end up running stale config forever, with no indication that anything is wrong.

This was easy to walk into: I built a custom hermes image on top of ghcr.io/capotej/harness:hermes-1.5.0, baked in a custom config.yaml and system-prompt.md, deployed to fly with the README's recommended volume mount, watched first-boot work, then spent ~30 minutes wondering why my config edits weren't taking effect on redeploys.

Fix

Distinguish "config" from "state" at the top level of the seed source:

  • Top-level files (config.yaml, .env, system-prompt.md, …) → refreshed from the image on every boot. Config-as-code.
  • Top-level directories (sessions/, logs/, hooks/, memories/, skills/, plans/, workspace/) → initialized once on first boot, then preserved across restarts. Runtime state.

Hidden files (.env) are handled correctly via dotglob. Seed paths are parameterized via HERMES_SEED_SRC_* / HERMES_SEED_DST_* env vars so the behavior is unit-testable without spinning up a real container; defaults match the prior baked-in paths so this is a no-op for production.

First-boot behavior is identical. Only subsequent boots change, and only in the direction users expect.

Tests

New tests/e2e/entrypoint-hermes.test.mjs (4 cases, follows cli.test.mjs style):

  • first boot: empty volume → top-level files (incl. dotfiles) and state-scaffolding directories are all seeded
  • subsequent boot: stale top-level config files are overwritten from the image (config-as-code)
  • subsequent boot: pre-existing state directories are preserved untouched, including not letting the image's empty .gitkeep replace a user's session
  • missing seed source dir is a no-op (does not fail)
$ node --test tests/e2e/entrypoint-hermes.test.mjs
# pass 4
# fail 0

shellcheck, hadolint, biome, markdownlint all clean. Pre-existing CLI test failures on main (#16, #20, #24) confirmed unrelated to this change.

The previous `cp -rn` (no-clobber) seeding meant that once a config file
existed in `~/.hermes-{local,openrouter}`, image rebuilds with updated
defaults were silently ignored. This breaks the config-as-code workflow for
the most common deployment topology — and the one documented in this
README's own fly.io section — where a persistent volume is mounted on
`~/.hermes-openrouter`. Downstream deployers see their custom
`config.yaml`/`system-prompt.md`/`.env` baked into the image, deploy, then
get confused when their changes don't take effect after the first boot.

Distinguish "config" from "state" at the top level of the seed source:

  * Top-level files (config.yaml, .env, system-prompt.md, …) → always
    refreshed from the image on every boot. This is config-as-code.
  * Top-level directories (sessions/, logs/, hooks/, memories/, skills/,
    plans/, workspace/) → initialized once on first boot, then preserved
    across container restarts. This is runtime state.

Hidden files (.env) are handled correctly via dotglob. The seed source and
destination paths are now parameterized via env vars (HERMES_SEED_SRC_*,
HERMES_SEED_DST_*) so the behavior can be unit-tested without spinning up
a real container; defaults match the prior baked-in paths so this is a
no-op for production use.

First-boot behavior is identical to before. Only subsequent boots change,
and only in the direction users expect.

Made-with: Cursor
…rmes.sh

Four cases:
  * first boot: empty volume → top-level files (incl. dotfiles) and
    state-scaffolding directories are all seeded
  * subsequent boot: stale top-level config files are overwritten from
    the image (config-as-code)
  * subsequent boot: pre-existing state directories are preserved
    untouched, including not letting the image's empty .gitkeep replace
    a user's session
  * missing seed source dir is a no-op (does not fail)

Style follows tests/e2e/cli.test.mjs: node:test, tempdirs, asserts. The
entrypoint is invoked with /usr/bin/true as the exec target so we exercise
the full seeding code path without needing hermes installed on the test
host.

Made-with: Cursor
@hermclaw
Copy link
Copy Markdown
Contributor

Review Note: Runtime config mutations will be lost

Hermes writes to config.yaml at runtime — TUI settings, display preferences, model changes, etc. all go through save_config() in hermes_cli/config.py. Since this PR unconditionally overwrites top-level files on every boot, any config changes made while the container was running get wiped on restart.

The existing cp -rn behavior actually had this right — it preserved user modifications. This PR fixes a real problem (stale image defaults on redeploy) but trades it for a worse one (losing runtime configuration on every restart).

Suggestion: Split the file behavior:

  • .env — safe to overwrite every boot (Hermes routes API keys here, and the entrypoint's own OPENROUTER_API_KEY check already handles this)
  • config.yaml — only seed on first boot (preserve runtime mutations), or merge intelligently (seed default keys but preserve user-overridden ones)

The "config-as-code" framing works for truly static seed files, but config.yaml is more like "config-as-state" from Hermes's perspective.

cc @capotej

@EnriqueCanals
Copy link
Copy Markdown
Collaborator Author

Thanks for the careful review and the architectural pointer — both land.

You're right on the technical substance: config.yaml is hermes-owned mutable state via save_config(), not config-as-code. My PR would silently wipe TUI-set preferences, model overrides, and any other runtime mutations on every restart. The original cp -rn was correct for the world it was designed for — I conflated "files I baked in" with "files that should always reflect the image", and those are different things.

And on the broader point about FROM ghcr.io/capotej/harness:hermes-1.5.0 — that lands too. The bug I tried to fix only exists because I extended the image instead of mounting customizations as a volume into the unmodified upstream image. The README's fly.io section never told me to extend the image; I just assumed I had to in order to inject my CRM API wrappers and a custom system prompt. That assumption is the actual root cause, not the seeding semantics.

Refactoring our deployment to follow the claw pattern (use the upstream image directly via [build] image = …, inject tool wrappers via fly's [[files]], let hermes own its volume entirely), which makes this PR moot. Closing.

Two small follow-ups I'd be happy to send if useful — let me know:

  1. A docs-only PR adding a "Don't extend the image — inject customizations via fly [[files]]" note to the fly.io section of the README, with a tool-wrapper example. Would have saved me ~3 hours of yak-shaving and is the kind of thing I expect others to walk into the same way.
  2. If you'd like the test fixture from this PR salvaged into a tests/e2e/entrypoint-hermes.test.mjs that just locks in the existing cp -rn semantics (i.e. asserting that pre-existing files in the volume are preserved), happy to extract that piece. It's a cheap regression guard for a behavior that's clearly intentional.

Either, both, or neither — your call. Thanks again for engaging in good faith on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants