Skip to content

fix(hermes): enforce runtime env secret guard on direct hermes gateway (#4975)#4981

Open
abhi-0906 wants to merge 2 commits into
NVIDIA:mainfrom
abhi-0906:fix/issue-4975-hermes-gateway-secret-guard
Open

fix(hermes): enforce runtime env secret guard on direct hermes gateway (#4975)#4981
abhi-0906 wants to merge 2 commits into
NVIDIA:mainfrom
abhi-0906:fix/issue-4975-hermes-gateway-secret-guard

Conversation

@abhi-0906

@abhi-0906 abhi-0906 commented Jun 8, 2026

Copy link
Copy Markdown

Summary

The Hermes runtime-env secret boundary (refuse startup when raw secret-shaped env vars like SLACK_BOT_TOKEN=xoxb-real-… are present) ran only on the nemoclaw-start entrypoint and the host-side recovery path. A direct docker exec … hermes gateway run invokes the CLI without crossing either, so the gateway started with raw secrets and no [SECURITY] rejection. This wraps the hermes CLI so the same guard runs for the gateway subcommand however it is invoked.

Related Issue

Fixes #4975

Root cause

validate_hermes_runtime_env_secret_boundary (agents/hermes/start.sh) and the recovery path both call the single-source-of-truth validator agents/hermes/validate-env-secret-boundary.py runtime-env, but the guard is attached to those entrypoints — not to the hermes gateway command — so a direct invocation bypasses it (as the issue notes, "implemented only for a different entrypoint").

Changes

  • agents/hermes/hermes-wrapper.sh (new) — installed at /usr/local/bin/hermes; runs the existing validator for the gateway subcommand, then execs the real binary. All other subcommands pass straight through.
  • agents/hermes/Dockerfile — relocate the real binary to hermes.real, install the wrapper, re-assert hermes --version through it so a broken relocation fails the build.
  • test/hermes-gateway-wrapper.test.ts (new) — Linux-gated tests against the real validator + a hermes.real stub.

Design notes for review: the wrapper reuses the existing validator (no second copy of the security logic); it resolves the validator and real binary from fixed installed paths, never from env (it guards against a malicious env, so env must not redirect it); exec preserves the PID so start.sh's wait/respawn still track the gateway; and the trusted path already runs the identical guard before launch with placeholder-only env, so legitimate startup is not regressed.

Type of Change

  • Code change (bug fix)

Verification

Validated on Ubuntu 24.04: reporter's exact repro now refused (non-zero, [SECURITY], gateway never started, token not leaked); placeholder/allow-listed env still starts the gateway; non-gateway subcommands and --version pass through; Dockerfile relocation preserves the symlink→venv binary; guard is not env-redirectable; exec preserves PID.

  • Tests added for new behavior (Linux-gated)
  • No secrets or credentials committed
  • Full npm test / npx prek run --all-files deferred to CI (ran targeted vitest + typecheck + test-size budget locally)

Summary by CodeRabbit

  • New Features

    • The gateway command now enforces an environment-secret boundary when invoked directly; other commands (including version) continue to pass through normally.
    • Direct gateway invocations that violate the boundary are refused to prevent accidental secret exposure.
  • Tests

    • Added tests covering acceptance/rejection of gateway invocations, resistance to PATH-based bypass attempts, and normal pass-through for non-gateway commands.

…ay` (NVIDIA#4975)

The Hermes runtime-env secret boundary (refuse raw secret-shaped env vars like
SLACK_BOT_TOKEN=xoxb-real-...) ran only on the nemoclaw-start entrypoint and the
host-side recovery path. A direct `docker exec ... hermes gateway run` invokes
the CLI without crossing either, so the gateway started with raw secrets in its
environment and no [SECURITY] rejection.

Wrap /usr/local/bin/hermes: relocate the real binary to hermes.real and install
a thin wrapper that runs the existing single-source-of-truth validator
(validate-env-secret-boundary.py runtime-env) for the `gateway` subcommand
before exec'ing the real binary. Every other subcommand passes straight through.

SECURITY: the wrapper resolves the validator and the real binary from fixed
installed paths, never from the environment — it exists to reject a malicious
runtime env, so it must not let that env redirect the guard or the binary it
protects (a dev fallback resolves against the script's own directory for
checkouts, matching start.sh's _HERMES_BOUNDARY_VALIDATOR resolution). The guard
is identical to the one start.sh already runs before launch, and the placeholder
refresh only emits allow-listed openshell:resolve:env:* values, so the trusted
startup path is not regressed; only a genuinely raw secret-shaped var is
rejected.

Add Linux-gated tests that run the wrapper against the real validator and a
hermes.real stub: gateway+raw-secret is refused (exit 1, value not leaked, real
never invoked), gateway+placeholders is allowed, and non-gateway subcommands
(dashboard, --version) pass through even with raw secrets present.

Signed-off-by: Abhimanyu Kumar <abhimanyukumar7290@gmail.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 8, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1d9be26e-9bba-417c-a785-526e9bc323dc

📥 Commits

Reviewing files that changed from the base of the PR and between c287d2d and 7148a54.

📒 Files selected for processing (2)
  • agents/hermes/hermes-wrapper.sh
  • test/hermes-gateway-wrapper.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • agents/hermes/hermes-wrapper.sh
  • test/hermes-gateway-wrapper.test.ts

📝 Walkthrough

Walkthrough

This PR adds a Hermes CLI wrapper that enforces runtime env-secret validation for gateway commands, installs that wrapper in the container by relocating the original binary, and adds tests verifying rejection of raw secret-shaped env vars and pass-through for allowed/non-gateway invocations.

Changes

Hermes Gateway Env-Guard Wrapper

Layer / File(s) Summary
Gateway guard wrapper logic
agents/hermes/hermes-wrapper.sh
Adds /usr/local/bin/hermes wrapper behavior that resolves validator and hermes.real, runs the validator only when the first argument is gateway, exits non-zero on validator failure, and otherwise execs the real Hermes binary.
Container binary path wiring
agents/hermes/Dockerfile
Copies the wrapper into the image, renames existing /usr/local/bin/hermes to /usr/local/bin/hermes.real, installs wrapper at the original CLI path, and runs hermes --version during build to validate wrapper wiring.
Wrapper behavior validation tests
test/hermes-gateway-wrapper.test.ts
Adds Linux + python3 gated tests, temp-dir harness, copied wrapper/validator assets, and hermes.real stub assertions for gateway rejection with raw secret-shaped env vars, bypass attempts via PATH shadowing, gateway allowance with placeholders/allow-listed keys, and non-gateway/--version pass-through.

Sequence Diagram(s)

sequenceDiagram
  participant Wrapper as /usr/local/bin/hermes (wrapper)
  participant Validator as validate-hermes-env-secret-boundary.py
  participant Py as python3
  participant HermesReal as /usr/local/bin/hermes.real

  Wrapper->>Py: run Validator runtime-env (only for `gateway`)
  Py->>Validator: execute validation
  Validator-->>Py: exit 0 or non-zero
  Py-->>Wrapper: exit code
  alt validator succeeds
    Wrapper->>HermesReal: exec with original args
  else validator fails
    Wrapper-->>Wrapper: exit with validator status (non-zero)
  end
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly Related PRs

  • NVIDIA/NemoClaw#4959: Implements closely related Hermes environment-boundary validation behavior in gateway startup paths using the same validator-oriented guard pattern.

Suggested Labels

integration: hermes, area: security, bug-fix, v0.0.61

Suggested Reviewers

  • prekshivyas
  • cv

Poem

🐇 I hopped through shells where secrets hide,
and wrapped the gate with watchful pride.
If tokens leak, I thump “not through!”
Safe paths pass by, neat as morning dew.
In Docker burrows, tests now sing— a guarded gateway, bright spring.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly describes the main change: adding runtime env secret guard enforcement on direct hermes gateway invocations, directly addressing issue #4975.
Linked Issues check ✅ Passed The PR implements all coding requirements from #4975: wrapper enforces env secret validation for gateway subcommand, rejects raw secret-shaped vars with [SECURITY] logs, allows allow-listed keys, passes non-gateway commands through, and includes comprehensive tests.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing #4975: wrapper script, Dockerfile relocation/validation, and comprehensive tests for the wrapper behavior. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
agents/hermes/Dockerfile (1)

98-109: Run the secret-boundary E2E lane on this branch before merge.

This hunk changes the gateway startup security boundary path; run the boundary-focused job explicitly since nightly coverage is not PR-triggered.

gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=hermes-secret-boundary-e2e

As per coding guidelines, changes under agents/hermes/** should be validated with Hermes E2E coverage, and the referenced workflow docs identify hermes-secret-boundary-e2e as the boundary contract check.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@agents/hermes/Dockerfile` around lines 98 - 109, This change relocates the
Hermes binary and installs a wrapper (hermes-wrapper.sh ->
/usr/local/bin/hermes, original moved to /usr/local/bin/hermes.real) which
affects the gateway secret boundary; before merging, run the Hermes
secret-boundary E2E lane to validate behavior by executing the nightly-e2e
workflow for the hermes-secret-boundary-e2e job (use: gh workflow run
nightly-e2e.yaml --ref <branch> -f jobs=hermes-secret-boundary-e2e), verify the
wrapper correctly delegates non-gateway subcommands to hermes.real and that
`/usr/local/bin/hermes --version` succeeds in the built image, and address any
failures found by updating hermes-wrapper.sh or the Dockerfile install/move
steps.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@agents/hermes/hermes-wrapper.sh`:
- Around line 36-37: The wrapper currently invokes the guard with an unpinned
"python3" from PATH; change it to use a fixed interpreter path (e.g.
/usr/bin/python3) and verify it exists/executable before running to enforce the
"never from the environment" intent. Replace the invocation of python3 "$GUARD"
in hermes-wrapper.sh with a pinned interpreter variable (e.g.
PINNED_PYTHON=/usr/bin/python3), check [ -x "$PINNED_PYTHON" ] and then run
"$PINNED_PYTHON" "$GUARD" runtime-env || exit $?, so the guard always uses the
known system Python rather than PATH-resolved python3.

---

Nitpick comments:
In `@agents/hermes/Dockerfile`:
- Around line 98-109: This change relocates the Hermes binary and installs a
wrapper (hermes-wrapper.sh -> /usr/local/bin/hermes, original moved to
/usr/local/bin/hermes.real) which affects the gateway secret boundary; before
merging, run the Hermes secret-boundary E2E lane to validate behavior by
executing the nightly-e2e workflow for the hermes-secret-boundary-e2e job (use:
gh workflow run nightly-e2e.yaml --ref <branch> -f
jobs=hermes-secret-boundary-e2e), verify the wrapper correctly delegates
non-gateway subcommands to hermes.real and that `/usr/local/bin/hermes
--version` succeeds in the built image, and address any failures found by
updating hermes-wrapper.sh or the Dockerfile install/move steps.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3d414101-6efa-489f-8be6-733a60cd06e4

📥 Commits

Reviewing files that changed from the base of the PR and between 16470fa and c287d2d.

📒 Files selected for processing (3)
  • agents/hermes/Dockerfile
  • agents/hermes/hermes-wrapper.sh
  • test/hermes-gateway-wrapper.test.ts

Comment thread agents/hermes/hermes-wrapper.sh Outdated
…apper

Address CodeRabbit review on NVIDIA#4981: the wrapper invoked the secret-boundary
guard via PATH-resolved `python3`. Because PATH is part of the untrusted runtime
environment this wrapper exists to reject, an attacker could shadow `python3`
(e.g. `docker exec -e PATH=/tmp/evil:...`) with a script that exits 0 and no-op
the guard, starting the gateway with raw secrets.

Resolve python3 from a fixed set of absolute paths (/usr/bin, /usr/local/bin,
the hermes venv), never via PATH, and fail closed if none is executable. The
interpreter join the validator and real binary as env-independent, fixed-path
resolutions. Add a test that plants a PATH-shadowed python3 and asserts the
guard still refuses a raw secret-shaped gateway launch.

Signed-off-by: Abhimanyu Kumar <abhimanyukumar7290@gmail.com>
@abhi-0906

Copy link
Copy Markdown
Author

Thanks @coderabbitai — good catch on the interpreter.

Addressed in 7148a54: the wrapper resolved python3 via PATH, which (as you note) is part of the untrusted environment this wrapper exists to reject — a shadowed python3 could no-op the guard. It now resolves the interpreter from a fixed set of absolute paths (/usr/bin/python3, /usr/local/bin/python3, the hermes venv), never via PATH, and fails closed if none is executable. Added a test that plants a PATH-shadowed python3 (exit 0) and asserts a raw secret-shaped gateway launch is still refused (real binary never invoked).

On the hermes-secret-boundary-e2e lane: agreed it should run against this change, but it's nightly-only and not PR-triggered, and as an external fork PR the NVIDIA-runner workflows are gated behind copy-pr-bot vetting — I can't dispatch it. Flagging for a maintainer to run nightly-e2e.yaml -f jobs=hermes-secret-boundary-e2e against the branch during review.

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the current state of the wrapper script after the fix in commit 7148a5462
gh pr view 4981 --repo NVIDIA/NemoClaw --json commits,headRefName 2>/dev/null | jq .
# Try to fetch the latest wrapper content
gh api repos/NVIDIA/NemoClaw/contents/agents/hermes/hermes-wrapper.sh?ref=fix/issue-4975-hermes-gateway-secret-guard 2>/dev/null | jq -r '.content' | base64 -d 2>/dev/null || echo "Could not fetch file"

Length of output: 4043


🏁 Script executed:

#!/bin/bash
# Fetch the current wrapper script content at the latest commit
gh api "repos/NVIDIA/NemoClaw/contents/agents/hermes/hermes-wrapper.sh?ref=7148a54621c16645b6395e043a85ac713cabd870" 2>/dev/null | jq -r '.content' | base64 -d 2>/dev/null || echo "Could not fetch file"

Length of output: 2902



@abhi-0906 Thanks for the detailed follow-up and for verifying the fix against the latest commit.

The updated wrapper looks correct — the three security-critical resolutions (interpreter, validator, real binary) now all use the same fixed-path-first / dev-fallback pattern and are fully isolated from the untrusted runtime environment:

for _candidate in /usr/bin/python3 /usr/local/bin/python3 /opt/hermes/.venv/bin/python3; do
  if [ -x "$_candidate" ]; then PYTHON3="$_candidate"; break; fi
done
if [ -z "$PYTHON3" ]; then
  echo "[SECURITY] Refusing hermes gateway: no python3 at a trusted absolute path ..." >&2
  exit 127
fi

Fail-closed on a missing interpreter (exit 127 + [SECURITY] log) is exactly the right posture. The shadow-python3 test is a solid regression guard for this specific vector.

On the hermes-secret-boundary-e2e lane — the constraint is understood; gating external fork PRs behind copy-pr-bot vetting is a sensible policy. The maintainer flag in the PR description should be enough to get it queued during review.

🐇✨

@wscurran wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[All Platforms][Security] Hermes gateway accepts raw SLACK_BOT_TOKEN env without security rejection in NemoHermes sandbox

2 participants