Skip to content

fix(security): revert gateway auth token externalization#2482

Merged
ericksoa merged 3 commits intomainfrom
revert/gateway-token-externalization
Apr 25, 2026
Merged

fix(security): revert gateway auth token externalization#2482
ericksoa merged 3 commits intomainfrom
revert/gateway-token-externalization

Conversation

@ericksoa
Copy link
Copy Markdown
Contributor

@ericksoa ericksoa commented Apr 25, 2026

Summary

  • Reverts 51aa6af (feat(security): externalize gateway auth token from openclaw.json (#2378))
  • The externalized token path breaks openclaw tui inside the sandbox — OpenClaw 2026.4.9 requires OPENCLAW_GATEWAY_TOKEN but the runtime injection fails under Landlock (non-root mode) and the token is no longer in openclaw.json where the TUI and gateway can read it
  • Restores build-time token generation in openclaw.json so gateways authenticate out-of-the-box again
  • The token externalization will be re-introduced in a separate PR with deeper testing across root/non-root modes and OpenClaw 2026.4.9

Fixes #2480

Test plan

  • npm run typecheck:cli passes
  • npx vitest run --project cli — 2110 tests pass
  • All pre-commit and pre-push hooks pass
  • Verify openclaw tui works inside sandbox after rebuild
  • Verify gateway auth works on Spark (non-root mode)
  • Verify gateway auth works in root mode

Summary by CodeRabbit

  • Documentation

    • Clarified security guidance: gateway auth tokens are stored in the sandbox configuration and risk notes updated.
  • Changes

    • Token generation moved earlier in the image/build process so auth is present in the sandbox config at runtime.
    • Runtime token retrieval simplified and connection instructions updated.
    • Gateway token is exported to an environment variable and persisted/removed in users' shell profiles.
  • Tests

    • Tests updated to validate token export, persistence, and retrieval behavior.

Reverts 51aa6af. The externalized token path breaks `openclaw tui`
inside the sandbox — OpenClaw 2026.4.9 requires OPENCLAW_GATEWAY_TOKEN
but the runtime injection fails under Landlock (non-root) and the token
is no longer in openclaw.json where the TUI can read it.

Restores build-time token generation in openclaw.json so gateways
authenticate out-of-the-box again. The externalization will be
re-introduced in a separate PR with deeper testing.

Fixes #2480
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 618627f2-b577-4814-aa8c-c3c5e421c14e

📥 Commits

Reviewing files that changed from the base of the PR and between 4752d10 and 72ca3a0.

📒 Files selected for processing (1)
  • Dockerfile
🚧 Files skipped from review as they are similar to previous changes (1)
  • Dockerfile

📝 Walkthrough

Walkthrough

Gateway token handling was changed: a per-build random token is embedded into /sandbox/.openclaw/openclaw.json at image build. Runtime reads gateway.auth.token from that file and exports OPENCLAW_GATEWAY_TOKEN (persisting to user rc files) instead of creating a separate external token file; host-side retrieval relies only on the config path.

Changes

Cohort / File(s) Summary
Documentation & Build
\.agents/skills/nemoclaw-user-configure-security/references/best-practices.md, docs/security/best-practices.md, Dockerfile
Docs updated to state tokens reside in .openclaw/openclaw.json. Dockerfile now generates and embeds a per-build random gateway token (secrets.token_hex(32)) into openclaw.json, removing runtime token-generation/cleanup steps and related comments.
Runtime / Startup Script
scripts/nemoclaw-start.sh
Replaced external token file flow with _read_gateway_token() that parses gateway.auth.token from /sandbox/.openclaw/openclaw.json. Added export_gateway_token() to export OPENCLAW_GATEWAY_TOKEN and persist/remove marked export blocks in ${_SANDBOX_HOME}/.bashrc and ${_SANDBOX_HOME}/.profile; startup flows updated to call this.
Host-side Onboard Logic
src/lib/onboard.ts
Removed kubectl-exec and temp-file search fallbacks; fetchGatewayAuthTokenFromSandbox now uses only the openclaw.json download path. Updated fallback help text to instruct manual jq extraction from /sandbox/.openclaw/openclaw.json.
Tests
test/nemoclaw-start.test.ts
Reworked tests to validate export_gateway_token behavior: rc-file marker persistence/removal, shared _read_gateway_token() usage, Python with open(...) read, shell-escaping, empty-token unset behavior, and updated startup sequencing expectations.

Sequence Diagram(s)

sequenceDiagram
    actor Build as Build Time
    participant Docker as Dockerfile
    participant Config as /sandbox/.openclaw/openclaw.json
    participant StartSh as scripts/nemoclaw-start.sh
    participant RcFiles as .bashrc/.profile
    participant UserShell as User Interactive Shell
    participant TUI as openclaw tui

    Build->>Docker: generate per-build random token (secrets.token_hex(32))
    Docker->>Config: embed token in openclaw.json (gateway.auth.token)

    Note over StartSh,Config: container starts
    StartSh->>Config: _read_gateway_token() parses gateway.auth.token
    Config-->>StartSh: token value
    StartSh->>StartSh: export OPENCLAW_GATEWAY_TOKEN
    StartSh->>RcFiles: write/remove marked export blocks via export_gateway_token()
    RcFiles->>UserShell: rc files sourced on new shell
    UserShell->>TUI: openclaw tui (reads $OPENCLAW_GATEWAY_TOKEN)
    TUI-->>TUI: gateway authentication proceeds
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

security

Suggested reviewers

  • brandonpelfrey

Poem

🐰 A tiny token tucked in JSON bright,
Built at image time in the quiet night.
At boot I hop out, export with care,
I nest in rc files so shells find me there,
OpenClaw tui greets me—now we're square. 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(security): revert gateway auth token externalization' directly summarizes the main change—reverting a previous externalization of the gateway auth token.
Linked Issues check ✅ Passed The PR addresses issue #2480 by restoring build-time token generation in openclaw.json, ensuring the token is available for openclaw tui and the gateway to authenticate without manual intervention.
Out of Scope Changes check ✅ Passed All changes are scoped to reverting externalized gateway token handling: documentation updates, Dockerfile changes, token reading logic, and test updates align directly with the fix objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch revert/gateway-token-externalization

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Dockerfile`:
- Around line 230-232: The ARG NEMOCLAW_BUILD_ID is declared but never used, so
changing it does not invalidate the token-generation layer; update the
token-generation layer that creates the gateway token (the "token-generation"
RUN/step) to consume NEMOCLAW_BUILD_ID (e.g., reference it in that RUN via ENV
or a no-op echo/printf) so Docker sees the build-arg changes and busts the
cache; ensure you reference ARG NEMOCLAW_BUILD_ID before the token-generation
RUN and use the variable name NEMOCLAW_BUILD_ID in that step so token
regeneration runs on each build-arg change.

In `@scripts/nemoclaw-start.sh`:
- Around line 621-660: The startup currently aborts if writing
${_SANDBOX_HOME}/.bashrc or .profile fails when persisting
OPENCLAW_GATEWAY_TOKEN (snippet using marker_begin/marker_end), which breaks
non-root/sandboxed runs; change the logic to make rc-file writes best-effort by
routing token persistence through the existing /tmp sourced-file pattern (create
a /tmp/openclaw-env-<uid>.sh containing the snippet and ensure rc files source
that file if writable), and if you must directly update ${_SANDBOX_HOME}/.bashrc
or .profile only attempt writes when they are writable and swallow failures (do
not let errors from cat >"$rc_file" or printf >>"$rc_file" abort startup),
leaving the export OPENCLAW_GATEWAY_TOKEN="$token" in the current process
unconditional so gateway startup never depends on rc file writes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 9b2a9d79-dfe8-4da3-93e1-2c11cc9ba0b2

📥 Commits

Reviewing files that changed from the base of the PR and between cc15689 and 1e497c6.

📒 Files selected for processing (6)
  • .agents/skills/nemoclaw-user-configure-security/references/best-practices.md
  • Dockerfile
  • docs/security/best-practices.md
  • scripts/nemoclaw-start.sh
  • src/lib/onboard.ts
  • test/nemoclaw-start.test.ts

Comment thread Dockerfile
Comment thread scripts/nemoclaw-start.sh Outdated
…writes

The reverted export_gateway_token code predates the Landlock fix in
a54f9a3 and lacks || true guards on .bashrc/.profile writes. Under
Landlock enforcement, DAC check ([ -w file ]) passes but the actual
write is blocked, crashing the entrypoint under set -e — the exact
same failure pattern that caused the 5-day non-root outage.

Apply the same || true + continue pattern used in install_configure_guard.
NEMOCLAW_BUILD_ID was declared as an ARG but never referenced by any
downstream instruction, so changing it via --build-arg had no effect on
Docker layer caching. Reference it on the token-generation RUN line so
Docker sees the value change and invalidates the cached layer, ensuring
each build produces a fresh gateway auth token.

Pre-existing issue surfaced by CodeRabbit review.
@ericksoa ericksoa merged commit 31c782c into main Apr 25, 2026
39 checks passed
ericksoa added a commit that referenced this pull request Apr 25, 2026
…d cache (#2483)

## Summary

- Fixes 4x build time regression on Spark (400s+ → ~100s) caused by
`NEMOCLAW_BUILD_ID` cache-busting the config generation layer, which
invalidated the expensive `openclaw doctor --fix` + `openclaw plugins
install` layer on every build
- Splits token generation into two steps: config layer writes a
placeholder (cacheable), then a late layer injects
`secrets.token_hex(32)` (cache-busted but trivially fast)
- The doctor/plugins layer no longer rebuilds on every build

Depends on #2482

## Test plan

- [x] `npx vitest run --project cli` — 1947 tests pass (ssrf-parity skip
is pre-existing, needs plugin build)
- [x] All pre-commit and pre-push hooks pass
- [ ] Verify build time improvement on Spark

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Optimized Docker image build layers to improve caching efficiency
while ensuring unique credentials are generated for each build.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
ericksoa added a commit that referenced this pull request Apr 27, 2026
Resolves conflicts in Dockerfile and test/nemoclaw-start.test.ts.

- Dockerfile config-generation block: kept the externalized
  scripts/generate-openclaw-config.py invocation (the PR's purpose)
  and dropped the inline python3 -c block from main.
- Dockerfile token step: dropped the PR's --clear-token step and took
  main's late-layer secrets.token_hex(32) injection (#2482 reverted
  gateway auth token externalization, so the token is again baked at
  build time).
- scripts/generate-openclaw-config.py: ported the inference_inputs
  parsing (#2441) and channel healthMonitor field from main; removed
  the now-obsolete --clear-token mode.
- test/nemoclaw-start.test.ts: took main's version, since the PR's
  token-externalization regression tests no longer match main's
  reverted design.
- test/generate-openclaw-config.test.ts: removed the --clear-token
  test cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nemoclaw <sandbox> connect fails to inject OPENCLAW_GATEWAY_TOKEN for openclaw tui

1 participant