fix(hermes): enforce runtime env secret guard on direct hermes gateway (#4975)#4981
fix(hermes): enforce runtime env secret guard on direct hermes gateway (#4975)#4981abhi-0906 wants to merge 2 commits into
hermes gateway (#4975)#4981Conversation
…ay` (NVIDIA#4975) The Hermes runtime-env secret boundary (refuse raw secret-shaped env vars like SLACK_BOT_TOKEN=xoxb-real-...) ran only on the nemoclaw-start entrypoint and the host-side recovery path. A direct `docker exec ... hermes gateway run` invokes the CLI without crossing either, so the gateway started with raw secrets in its environment and no [SECURITY] rejection. Wrap /usr/local/bin/hermes: relocate the real binary to hermes.real and install a thin wrapper that runs the existing single-source-of-truth validator (validate-env-secret-boundary.py runtime-env) for the `gateway` subcommand before exec'ing the real binary. Every other subcommand passes straight through. SECURITY: the wrapper resolves the validator and the real binary from fixed installed paths, never from the environment — it exists to reject a malicious runtime env, so it must not let that env redirect the guard or the binary it protects (a dev fallback resolves against the script's own directory for checkouts, matching start.sh's _HERMES_BOUNDARY_VALIDATOR resolution). The guard is identical to the one start.sh already runs before launch, and the placeholder refresh only emits allow-listed openshell:resolve:env:* values, so the trusted startup path is not regressed; only a genuinely raw secret-shaped var is rejected. Add Linux-gated tests that run the wrapper against the real validator and a hermes.real stub: gateway+raw-secret is refused (exit 1, value not leaked, real never invoked), gateway+placeholders is allowed, and non-gateway subcommands (dashboard, --version) pass through even with raw secrets present. Signed-off-by: Abhimanyu Kumar <abhimanyukumar7290@gmail.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughThis PR adds a Hermes CLI wrapper that enforces runtime env-secret validation for ChangesHermes Gateway Env-Guard Wrapper
Sequence Diagram(s)sequenceDiagram
participant Wrapper as /usr/local/bin/hermes (wrapper)
participant Validator as validate-hermes-env-secret-boundary.py
participant Py as python3
participant HermesReal as /usr/local/bin/hermes.real
Wrapper->>Py: run Validator runtime-env (only for `gateway`)
Py->>Validator: execute validation
Validator-->>Py: exit 0 or non-zero
Py-->>Wrapper: exit code
alt validator succeeds
Wrapper->>HermesReal: exec with original args
else validator fails
Wrapper-->>Wrapper: exit with validator status (non-zero)
end
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly Related PRs
Suggested Labels
Suggested Reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
agents/hermes/Dockerfile (1)
98-109: Run the secret-boundary E2E lane on this branch before merge.This hunk changes the gateway startup security boundary path; run the boundary-focused job explicitly since nightly coverage is not PR-triggered.
gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=hermes-secret-boundary-e2eAs per coding guidelines, changes under
agents/hermes/**should be validated with Hermes E2E coverage, and the referenced workflow docs identifyhermes-secret-boundary-e2eas the boundary contract check.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agents/hermes/Dockerfile` around lines 98 - 109, This change relocates the Hermes binary and installs a wrapper (hermes-wrapper.sh -> /usr/local/bin/hermes, original moved to /usr/local/bin/hermes.real) which affects the gateway secret boundary; before merging, run the Hermes secret-boundary E2E lane to validate behavior by executing the nightly-e2e workflow for the hermes-secret-boundary-e2e job (use: gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=hermes-secret-boundary-e2e), verify the wrapper correctly delegates non-gateway subcommands to hermes.real and that `/usr/local/bin/hermes --version` succeeds in the built image, and address any failures found by updating hermes-wrapper.sh or the Dockerfile install/move steps.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@agents/hermes/hermes-wrapper.sh`:
- Around line 36-37: The wrapper currently invokes the guard with an unpinned
"python3" from PATH; change it to use a fixed interpreter path (e.g.
/usr/bin/python3) and verify it exists/executable before running to enforce the
"never from the environment" intent. Replace the invocation of python3 "$GUARD"
in hermes-wrapper.sh with a pinned interpreter variable (e.g.
PINNED_PYTHON=/usr/bin/python3), check [ -x "$PINNED_PYTHON" ] and then run
"$PINNED_PYTHON" "$GUARD" runtime-env || exit $?, so the guard always uses the
known system Python rather than PATH-resolved python3.
---
Nitpick comments:
In `@agents/hermes/Dockerfile`:
- Around line 98-109: This change relocates the Hermes binary and installs a
wrapper (hermes-wrapper.sh -> /usr/local/bin/hermes, original moved to
/usr/local/bin/hermes.real) which affects the gateway secret boundary; before
merging, run the Hermes secret-boundary E2E lane to validate behavior by
executing the nightly-e2e workflow for the hermes-secret-boundary-e2e job (use:
gh workflow run nightly-e2e.yaml --ref <branch> -f
jobs=hermes-secret-boundary-e2e), verify the wrapper correctly delegates
non-gateway subcommands to hermes.real and that `/usr/local/bin/hermes
--version` succeeds in the built image, and address any failures found by
updating hermes-wrapper.sh or the Dockerfile install/move steps.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 3d414101-6efa-489f-8be6-733a60cd06e4
📒 Files selected for processing (3)
agents/hermes/Dockerfileagents/hermes/hermes-wrapper.shtest/hermes-gateway-wrapper.test.ts
…apper Address CodeRabbit review on NVIDIA#4981: the wrapper invoked the secret-boundary guard via PATH-resolved `python3`. Because PATH is part of the untrusted runtime environment this wrapper exists to reject, an attacker could shadow `python3` (e.g. `docker exec -e PATH=/tmp/evil:...`) with a script that exits 0 and no-op the guard, starting the gateway with raw secrets. Resolve python3 from a fixed set of absolute paths (/usr/bin, /usr/local/bin, the hermes venv), never via PATH, and fail closed if none is executable. The interpreter join the validator and real binary as env-independent, fixed-path resolutions. Add a test that plants a PATH-shadowed python3 and asserts the guard still refuses a raw secret-shaped gateway launch. Signed-off-by: Abhimanyu Kumar <abhimanyukumar7290@gmail.com>
|
Thanks @coderabbitai — good catch on the interpreter. Addressed in 7148a54: the wrapper resolved On the |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Check the current state of the wrapper script after the fix in commit 7148a5462
gh pr view 4981 --repo NVIDIA/NemoClaw --json commits,headRefName 2>/dev/null | jq .
# Try to fetch the latest wrapper content
gh api repos/NVIDIA/NemoClaw/contents/agents/hermes/hermes-wrapper.sh?ref=fix/issue-4975-hermes-gateway-secret-guard 2>/dev/null | jq -r '.content' | base64 -d 2>/dev/null || echo "Could not fetch file"Length of output: 4043 🏁 Script executed: #!/bin/bash
# Fetch the current wrapper script content at the latest commit
gh api "repos/NVIDIA/NemoClaw/contents/agents/hermes/hermes-wrapper.sh?ref=7148a54621c16645b6395e043a85ac713cabd870" 2>/dev/null | jq -r '.content' | base64 -d 2>/dev/null || echo "Could not fetch file"Length of output: 2902
The updated wrapper looks correct — the three security-critical resolutions (interpreter, validator, real binary) now all use the same fixed-path-first / dev-fallback pattern and are fully isolated from the untrusted runtime environment: for _candidate in /usr/bin/python3 /usr/local/bin/python3 /opt/hermes/.venv/bin/python3; do
if [ -x "$_candidate" ]; then PYTHON3="$_candidate"; break; fi
done
if [ -z "$PYTHON3" ]; then
echo "[SECURITY] Refusing hermes gateway: no python3 at a trusted absolute path ..." >&2
exit 127
fiFail-closed on a missing interpreter (exit 127 + On the 🐇✨ |
Summary
The Hermes runtime-env secret boundary (refuse startup when raw secret-shaped env vars like
SLACK_BOT_TOKEN=xoxb-real-…are present) ran only on thenemoclaw-startentrypoint and the host-side recovery path. A directdocker exec … hermes gateway runinvokes the CLI without crossing either, so the gateway started with raw secrets and no[SECURITY]rejection. This wraps thehermesCLI so the same guard runs for thegatewaysubcommand however it is invoked.Related Issue
Fixes #4975
Root cause
validate_hermes_runtime_env_secret_boundary(agents/hermes/start.sh) and the recovery path both call the single-source-of-truth validatoragents/hermes/validate-env-secret-boundary.py runtime-env, but the guard is attached to those entrypoints — not to thehermes gatewaycommand — so a direct invocation bypasses it (as the issue notes, "implemented only for a different entrypoint").Changes
agents/hermes/hermes-wrapper.sh(new) — installed at/usr/local/bin/hermes; runs the existing validator for thegatewaysubcommand, thenexecs the real binary. All other subcommands pass straight through.agents/hermes/Dockerfile— relocate the real binary tohermes.real, install the wrapper, re-asserthermes --versionthrough it so a broken relocation fails the build.test/hermes-gateway-wrapper.test.ts(new) — Linux-gated tests against the real validator + ahermes.realstub.Design notes for review: the wrapper reuses the existing validator (no second copy of the security logic); it resolves the validator and real binary from fixed installed paths, never from env (it guards against a malicious env, so env must not redirect it);
execpreserves the PID sostart.sh'swait/respawn still track the gateway; and the trusted path already runs the identical guard before launch with placeholder-only env, so legitimate startup is not regressed.Type of Change
Verification
Validated on Ubuntu 24.04: reporter's exact repro now refused (non-zero,
[SECURITY], gateway never started, token not leaked); placeholder/allow-listed env still starts the gateway; non-gateway subcommands and--versionpass through; Dockerfile relocation preserves the symlink→venv binary; guard is not env-redirectable;execpreserves PID.npm test/npx prek run --all-filesdeferred to CI (ran targeted vitest + typecheck + test-size budget locally)Summary by CodeRabbit
New Features
Tests