Skip to content

feat(policy): add safe common egress defaults#4768

Merged
cv merged 7 commits into
mainfrom
feat/safe-common-egress-4767
Jun 5, 2026
Merged

feat(policy): add safe common egress defaults#4768
cv merged 7 commits into
mainfrom
feat/safe-common-egress-4767

Conversation

@ericksoa
Copy link
Copy Markdown
Contributor

@ericksoa ericksoa commented Jun 4, 2026

Summary

  • add read-only weather and public-reference presets with host/path/method-scoped public API egress
  • include weather in balanced/open defaults and public-reference in open defaults
  • include all Hermes Nous managed-tool policy presets for Hermes open policy selection while keeping OpenClaw open defaults agent-specific
  • add direct agent E2E coverage for OpenClaw balanced/open and Hermes open common-egress paths

Closes #4767
Fixes #4814

Related

Tests

  • npm run build:cli
  • npx vitest run test/policy-tiers.test.ts test/policy-tiers-onboard.test.ts test/onboard-policy-suggestions.test.ts test/policies.test.ts test/validate-config-schemas.test.ts (302 passed)
  • npx vitest run test/policies.test.ts test/validate-config-schemas.test.ts (229 passed after preset binary-scope update)
  • npm run validate:configs
  • bash -n test/e2e/test-common-egress-agent-e2e.sh && shellcheck test/e2e/test-common-egress-agent-e2e.sh

E2E status

  • test/e2e/test-common-egress-agent-e2e.sh exercises the allowed common-egress paths through agent tool use for OpenClaw balanced/open and Hermes open.
  • Full live execution provisions multiple sandboxes and is expected to run in CI/self-hosted runner diagnostics.

Summary by CodeRabbit

  • New Features

    • Added "weather" and "public-reference" read-only presets for curated public data, geocoding, weather, and reference APIs.
  • Behavior

    • Tiers updated: balanced now includes "weather" by default; open includes "weather" and "public-reference".
    • Onboarding now filters presets by agent (Hermes vs OpenClaw) and auto-includes Hermes tool gateway presets for Hermes; agent-incompatible presets are removed when resuming with a different agent.
  • Tests

    • New e2e script and expanded unit/integration tests covering presets, onboarding, and agent-specific behavior.
  • Chores

    • Nightly CI wired to run the new e2e job.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

PR Review Advisor

Findings: 2 needs attention, 5 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 5 still apply, 1 new item found

Review findings

🛠️ Needs attention

  • Hermes open auto-enables non-default Nous managed-code policy (src/lib/onboard/policy-selection.ts:189): Hermes open-tier suggestions add every Hermes managed-tool policy through allHermesToolGatewayPolicyPresets(), including nous-code. That preset opens the private host-gateway broker at host.openshell.internal:11436 to /modal and /modal/** with write methods, despite hermes-managed-tools.ts marking managed code execution as defaultSelected: false. This changes managed code execution from explicit opt-in to default open-tier egress.
    • Recommendation: Only auto-add defaultSelected Hermes managed-tool presets for Hermes open, or keep nous-code tied to explicit Hermes managed-tool selection. Add a regression test that Hermes open does not gain nous-code unless the managed-code tool is explicitly selected.
    • Evidence: policy-selection.ts adds allHermesToolGatewayPolicyPresets() for Hermes open; hermes-managed-tools.ts defines nous-code with defaultSelected: false; nemoclaw-blueprint/policies/presets/nous-code.yaml allows GET/POST/PUT/PATCH/DELETE to host.openshell.internal:11436 /modal and /modal/**.
  • Policy-selection monolith grew further (src/lib/onboard/policy-selection.ts): policy-selection.ts is already an onboarding hotspot and this PR increases it by more than the repository's monolith-growth threshold. The PR extracts agent-specific filtering into a helper, but the net hotspot still grows.
    • Recommendation: Offset the growth by moving the Hermes-open suggestion policy or more of the agent-filtering orchestration out of policy-selection.ts, or otherwise shrink this hotspot before merge.
    • Evidence: Drift analysis reports src/lib/onboard/policy-selection.ts grew from 475 to 501 lines (+26), with a blocker rationale for growth of 20 or more lines.

🔎 Worth checking

  • Source-of-truth review needed: Policy tier reference documentation: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: docs/reference/network-policies.mdx:68-69 lists old Balanced/Open defaults while tiers.yaml adds weather/public-reference and policy-selection.ts adds Hermes Nous open suggestions.
  • New target-ref E2E job passes an unused GITHUB_TOKEN (.github/workflows/nightly-e2e.yaml:735): common-egress-agent-e2e runs a target-ref script through the reusable e2e-script workflow and sets github_token: true. The new test script does not reference GITHUB_TOKEN, so this widens the trusted-code boundary by exposing a repository token without a demonstrated need.
    • Recommendation: Set github_token: false for common-egress-agent-e2e unless this script needs GitHub API access. Keep only the NVIDIA_API_KEY secret required for live inference.
    • Evidence: nightly-e2e.yaml selects test/e2e/test-common-egress-agent-e2e.sh and sets github_token: true; .github/workflows/e2e-script.yaml exports GITHUB_TOKEN when that input is true; grep of the new script found no GITHUB_TOKEN references.
  • Policy tier reference still lists the old defaults (docs/reference/network-policies.mdx:68): The PR changes user-visible policy tier defaults by adding weather to balanced/open, public-reference to open, and Hermes-specific Nous managed-tool suggestions for Hermes open. The Network Policies reference table still lists the previous Balanced and Open preset sets, leaving the policy source of truth inconsistent.
    • Recommendation: Update the policy tier reference to include weather, public-reference, and the Hermes-specific open-tier managed-tool behavior, or point the docs to generated tier data if tiers.yaml and agent-specific selection logic are intended to be the only sources of truth.
    • Evidence: docs/reference/network-policies.mdx lists Balanced as npm, pypi, huggingface, brew, and brave when supported, and Open without weather/public-reference/Hermes Nous behavior. tiers.yaml and policy-selection.ts now define different behavior.
  • New default egress coverage proves allowed paths but not denied runtime boundaries (test/e2e/test-common-egress-agent-e2e.sh:392): The new E2E validates successful weather/public-reference requests and checks one balanced-scope policy absence, while unit tests inspect YAML for missing write methods. It does not exercise the runtime proxy denying forbidden write methods or direct news/social/current-events hosts. Because this PR broadens default sandbox egress, denied-path runtime coverage would better catch policy interpretation drift.
    • Recommendation: Add targeted runtime checks that POST to a read-only weather endpoint is blocked, POST to a public-reference endpoint is blocked, and direct news/social/current-events hosts remain blocked in balanced/open. Keep them as small integration checks if full agent E2E coverage would be too slow.
    • Evidence: The E2E prompts fetch Open-Meteo and REST Countries successfully; policies.test checks YAML strings for absence of POST/PUT/PATCH/DELETE but does not validate OpenShell runtime enforcement of those denials.
  • public-reference uses a wildcard host despite explicit-host acceptance language (nemoclaw-blueprint/policies/presets/public-reference.yaml:12): The linked acceptance text asks the new normal presets to use explicit hosts and avoid wildcard egress. public-reference adds *.wikipedia.org. This is domain-scoped rather than world-open, but it is not an explicit host and deserves an intentional acceptance decision.
    • Recommendation: Either replace *.wikipedia.org with the specific Wikipedia hosts required for the supported workflows, or document why this domain-scoped wildcard is acceptable and add a validation test that rejects broad/world wildcards while allowing only this deliberate scope.
    • Evidence: Issue Safe common egress defaults for balanced/open policy presets #4767 states 'Use explicit hosts only' and 'No wildcard world egress'; public-reference.yaml includes host: "*.wikipedia.org".

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — common-egress-runtime-blocks-post-to-open-meteo-forecast. This PR changes sandbox network policy defaults, agent-specific policy selection, and a target-ref workflow job. Unit/static tests cover preset shape and selection logic, and the new E2E covers positive allowed paths, but high-risk denied runtime boundaries and workflow token minimization are not validated.
  • **Runtime validation** — common-egress-runtime-blocks-post-to-restcountries-public-reference. This PR changes sandbox network policy defaults, agent-specific policy selection, and a target-ref workflow job. Unit/static tests cover preset shape and selection logic, and the new E2E covers positive allowed paths, but high-risk denied runtime boundaries and workflow token minimization are not validated.
  • **Runtime validation** — common-egress-runtime-blocks-direct-news-and-social-hosts-in-balanced-and-open. This PR changes sandbox network policy defaults, agent-specific policy selection, and a target-ref workflow job. Unit/static tests cover preset shape and selection logic, and the new E2E covers positive allowed paths, but high-risk denied runtime boundaries and workflow token minimization are not validated.
  • **Runtime validation** — hermes-open-tier-does-not-include-nous-code-unless-managed-code-tool-explicitly-selected. This PR changes sandbox network policy defaults, agent-specific policy selection, and a target-ref workflow job. Unit/static tests cover preset shape and selection logic, and the new E2E covers positive allowed paths, but high-risk denied runtime boundaries and workflow token minimization are not validated.
  • **Runtime validation** — nightly-common-egress-agent-e2e-runs-without-github-token. This PR changes sandbox network policy defaults, agent-specific policy selection, and a target-ref workflow job. Unit/static tests cover preset shape and selection logic, and the new E2E covers positive allowed paths, but high-risk denied runtime boundaries and workflow token minimization are not validated.
  • **New default egress coverage proves allowed paths but not denied runtime boundaries** — Add targeted runtime checks that POST to a read-only weather endpoint is blocked, POST to a public-reference endpoint is blocked, and direct news/social/current-events hosts remain blocked in balanced/open. Keep them as small integration checks if full agent E2E coverage would be too slow.
  • **Acceptance clause:** `balanced` includes safe weather support. — add test evidence or identify existing coverage. tiers.yaml adds weather to balanced with access: read, weather.yaml uses explicit weather hosts with GET/HEAD and enforcement: enforce, and tests assert balanced contains read-only weather. Runtime denial of write methods is not exercised.
  • **Acceptance clause:** `open` includes weather, curated public reference/data APIs, and all Hermes Nous managed tool presets for Hermes. — add test evidence or identify existing coverage. tiers.yaml adds weather and public-reference to open, and policy-selection.ts adds all Hermes Nous presets for Hermes open. This includes nous-code, which conflicts with its defaultSelected: false managed-code opt-in posture.
Since last review details

Current findings:

  • Source-of-truth review needed: Policy tier reference documentation: The advisor marked localized patch analysis as missing.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: docs/reference/network-policies.mdx:68-69 lists old Balanced/Open defaults while tiers.yaml adds weather/public-reference and policy-selection.ts adds Hermes Nous open suggestions.
  • Hermes open auto-enables non-default Nous managed-code policy (src/lib/onboard/policy-selection.ts:189): Hermes open-tier suggestions add every Hermes managed-tool policy through allHermesToolGatewayPolicyPresets(), including nous-code. That preset opens the private host-gateway broker at host.openshell.internal:11436 to /modal and /modal/** with write methods, despite hermes-managed-tools.ts marking managed code execution as defaultSelected: false. This changes managed code execution from explicit opt-in to default open-tier egress.
    • Recommendation: Only auto-add defaultSelected Hermes managed-tool presets for Hermes open, or keep nous-code tied to explicit Hermes managed-tool selection. Add a regression test that Hermes open does not gain nous-code unless the managed-code tool is explicitly selected.
    • Evidence: policy-selection.ts adds allHermesToolGatewayPolicyPresets() for Hermes open; hermes-managed-tools.ts defines nous-code with defaultSelected: false; nemoclaw-blueprint/policies/presets/nous-code.yaml allows GET/POST/PUT/PATCH/DELETE to host.openshell.internal:11436 /modal and /modal/**.
  • Policy-selection monolith grew further (src/lib/onboard/policy-selection.ts): policy-selection.ts is already an onboarding hotspot and this PR increases it by more than the repository's monolith-growth threshold. The PR extracts agent-specific filtering into a helper, but the net hotspot still grows.
    • Recommendation: Offset the growth by moving the Hermes-open suggestion policy or more of the agent-filtering orchestration out of policy-selection.ts, or otherwise shrink this hotspot before merge.
    • Evidence: Drift analysis reports src/lib/onboard/policy-selection.ts grew from 475 to 501 lines (+26), with a blocker rationale for growth of 20 or more lines.
  • New target-ref E2E job passes an unused GITHUB_TOKEN (.github/workflows/nightly-e2e.yaml:735): common-egress-agent-e2e runs a target-ref script through the reusable e2e-script workflow and sets github_token: true. The new test script does not reference GITHUB_TOKEN, so this widens the trusted-code boundary by exposing a repository token without a demonstrated need.
    • Recommendation: Set github_token: false for common-egress-agent-e2e unless this script needs GitHub API access. Keep only the NVIDIA_API_KEY secret required for live inference.
    • Evidence: nightly-e2e.yaml selects test/e2e/test-common-egress-agent-e2e.sh and sets github_token: true; .github/workflows/e2e-script.yaml exports GITHUB_TOKEN when that input is true; grep of the new script found no GITHUB_TOKEN references.
  • Policy tier reference still lists the old defaults (docs/reference/network-policies.mdx:68): The PR changes user-visible policy tier defaults by adding weather to balanced/open, public-reference to open, and Hermes-specific Nous managed-tool suggestions for Hermes open. The Network Policies reference table still lists the previous Balanced and Open preset sets, leaving the policy source of truth inconsistent.
    • Recommendation: Update the policy tier reference to include weather, public-reference, and the Hermes-specific open-tier managed-tool behavior, or point the docs to generated tier data if tiers.yaml and agent-specific selection logic are intended to be the only sources of truth.
    • Evidence: docs/reference/network-policies.mdx lists Balanced as npm, pypi, huggingface, brew, and brave when supported, and Open without weather/public-reference/Hermes Nous behavior. tiers.yaml and policy-selection.ts now define different behavior.
  • New default egress coverage proves allowed paths but not denied runtime boundaries (test/e2e/test-common-egress-agent-e2e.sh:392): The new E2E validates successful weather/public-reference requests and checks one balanced-scope policy absence, while unit tests inspect YAML for missing write methods. It does not exercise the runtime proxy denying forbidden write methods or direct news/social/current-events hosts. Because this PR broadens default sandbox egress, denied-path runtime coverage would better catch policy interpretation drift.
    • Recommendation: Add targeted runtime checks that POST to a read-only weather endpoint is blocked, POST to a public-reference endpoint is blocked, and direct news/social/current-events hosts remain blocked in balanced/open. Keep them as small integration checks if full agent E2E coverage would be too slow.
    • Evidence: The E2E prompts fetch Open-Meteo and REST Countries successfully; policies.test checks YAML strings for absence of POST/PUT/PATCH/DELETE but does not validate OpenShell runtime enforcement of those denials.
  • public-reference uses a wildcard host despite explicit-host acceptance language (nemoclaw-blueprint/policies/presets/public-reference.yaml:12): The linked acceptance text asks the new normal presets to use explicit hosts and avoid wildcard egress. public-reference adds *.wikipedia.org. This is domain-scoped rather than world-open, but it is not an explicit host and deserves an intentional acceptance decision.
    • Recommendation: Either replace *.wikipedia.org with the specific Wikipedia hosts required for the supported workflows, or document why this domain-scoped wildcard is acceptable and add a validation test that rejects broad/world wildcards while allowing only this deliberate scope.
    • Evidence: Issue Safe common egress defaults for balanced/open policy presets #4767 states 'Use explicit hosts only' and 'No wildcard world egress'; public-reference.yaml includes host: "*.wikipedia.org".

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 4, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds read-only weather and public-reference policy presets, includes weather in balanced and both presets in open, auto-includes Hermes Nous tool presets for open+hermes suggestions, updates unit tests, adds an e2e agent validation script, and wires a nightly workflow job.

Changes

Policy Presets and Tier Updates

Layer / File(s) Summary
Weather and public-reference preset definitions
nemoclaw-blueprint/policies/presets/weather.yaml, nemoclaw-blueprint/policies/presets/public-reference.yaml
New YAML presets define read-only REST access to weather APIs (OpenMeteo, NOAA) and curated public-reference APIs (Wikipedia/Wikidata, OSM Nominatim, REST Countries) with GET/HEAD-only rules and restricted binary allowlists.
Tier configuration
nemoclaw-blueprint/policies/tiers.yaml
balanced tier now includes weather as read; open tier now includes weather and public-reference as read, inserted before existing third-party presets.
Agent applicability helpers
src/lib/onboard/agent-policy-presets.ts
New module implements preset applicability predicates and filtering helpers to gate presets by agent (Hermes vs OpenClaw).
Hermes Nous tool gateway integration & selection
src/lib/onboard/hermes-managed-tools.ts, src/lib/onboard/policy-selection.ts
Adds allHermesToolGatewayPolicyPresets() and applies agent-aware filtering across preset suggestion, merge, and resume flows; auto-includes Hermes Nous presets when tier === "open" and agent === "hermes".
Preset schema and allowlist validation
test/policies.test.ts
Tests updated: listPresets includes public-reference; new assertions verify weather and public-reference are protocol: rest, allow only GET/HEAD, disallow write methods, and include expected binary paths.
Tier resolution and onboarding tests
test/policy-tiers.test.ts, test/policy-tiers-onboard.test.ts, test/onboard-policy-suggestions.test.ts, test/onboard-preset-diff.test.ts
Tests adjusted to require weather in balanced (read-only) alongside read-write dev presets; open tier assertions require weather and public-reference as read-only; onboarding suggestion tests distinguish Hermes (includes Nous presets) from OpenClaw (includes openclaw-pricing).
End-to-end integration test & CI wiring
test/e2e/test-common-egress-agent-e2e.sh, .github/workflows/nightly-e2e.yaml
New comprehensive Bash e2e validates three phases (OpenClaw balanced+weather, OpenClaw open+public-reference, Hermes open+public-reference) with sandbox management, onboarding, policy endpoint checks, agent turn execution and validation; nightly workflow gains common-egress-agent-e2e job and is wired into failure/reporting/scorecard.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

Suggested labels

enhancement: policy

Suggested reviewers

  • cv
  • prekshivyas
  • jyaunches

Poem

🐰 I hop through APIs, maps, and skies,
I add some presets, tidy and wise,
Weather and references, read-only and neat,
Hermes and OpenClaw now play complete,
A rabbit cheers: tests green, two-tiered treat!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat(policy): add safe common egress defaults' clearly summarizes the main change—adding safe, limited egress policy defaults—which directly aligns with the core objective of the changeset.
Linked Issues check ✅ Passed All primary coding objectives from issue #4767 are met: weather preset added to balanced/open tiers, public-reference preset added to open tier, Hermes open tier includes Nous managed-tool presets, agent-specific filtering implemented, tests cover tier resolution and agent behavior.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing safe common egress defaults: new policy presets, tier configuration updates, agent-aware filtering logic, E2E validation, and test updates. No unrelated modifications present.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/safe-common-egress-4767

Comment @coderabbitai help to get the list of available commands and usage tips.

@ericksoa ericksoa changed the title Add safe common egress policy defaults feat(policy): add safe common egress defaults Jun 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

E2E Advisor Recommendation

Required E2E: common-egress-agent-e2e, network-policy-e2e, cloud-onboard-e2e
Optional E2E: hermes-e2e, onboard-resume-e2e, channels-stop-start-e2e

Dispatch hint: common-egress-agent-e2e,network-policy-e2e,cloud-onboard-e2e

Auto-dispatched E2E: network-policy-e2e, cloud-onboard-e2e via nightly-e2e.yaml at d66de2c4da2b929274b2a9a0c38692de28266ea7nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • common-egress-agent-e2e (high): Direct coverage for this PR's new behavior: OpenClaw balanced weather egress, OpenClaw open public-reference egress, Hermes open public-reference plus Nous managed policy presets, all through real agent turns.
  • network-policy-e2e (high): Required because the PR expands built-in egress policy assets and changes live policy-add selection; this job validates deny-by-default behavior, whitelisted endpoint access, live policy-add, dry-run, and hot reload enforcement.
  • cloud-onboard-e2e (medium): Required because onboarding policy tier defaults and non-interactive policy selection changed. This validates the install/onboard path applies policy selections successfully in a real sandbox.

Optional E2E

  • hermes-e2e (medium): Useful additional confidence for the Hermes install/onboard/health/live inference path after Hermes-specific policy preset filtering and managed-tool selection changes.
  • onboard-resume-e2e (medium): Optional regression coverage for resume/re-onboard behavior because preparePolicyPresetResumeSelection now filters applied and recorded presets by agent and merges required presets differently.
  • channels-stop-start-e2e (high): Optional adjacent coverage because open-tier policy defaults changed alongside messaging presets; this verifies channel policies remain attached correctly across stop/start and rebuild flows.

New E2E recommendations

  • None.

Dispatch hint

  • Workflow: .github/workflows/nightly-e2e.yaml
  • jobs input: common-egress-agent-e2e,network-policy-e2e,cloud-onboard-e2e

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-openclaw, ubuntu-repo-cloud-hermes
Optional scenario E2E: ubuntu-repo-cloud-openclaw-brave

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw
  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-hermes

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: medium

Required scenario E2E

  • ubuntu-repo-cloud-openclaw: Onboarding policy preset and tier selection changed for OpenClaw, including new builtin presets and agent-specific filtering. Run the primary Ubuntu repo OpenClaw scenario to exercise non-interactive cloud onboarding, policy application, smoke checks, inference, and baseline onboarding state.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw
  • ubuntu-repo-cloud-hermes: The PR adds Hermes/OpenClaw-specific policy preset filtering and Hermes managed tool preset behavior. Run the primary Ubuntu repo Hermes scenario to cover Hermes cloud onboarding and ensure the policy selection changes do not regress Hermes setup.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-hermes

Optional scenario E2E

  • ubuntu-repo-cloud-openclaw-brave: Optional adjacent coverage for OpenClaw onboarding with the Brave/web-search feature, since tier/preset filtering around builtin network-policy presets changed. Not required because the primary affected paths are covered by the baseline OpenClaw and Hermes onboarding scenarios.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw-brave

Relevant changed files

  • nemoclaw-blueprint/policies/presets/public-reference.yaml
  • nemoclaw-blueprint/policies/presets/weather.yaml
  • nemoclaw-blueprint/policies/tiers.yaml
  • src/lib/actions/sandbox/policy-channel.ts
  • src/lib/onboard/agent-policy-presets.ts
  • src/lib/onboard/hermes-managed-tools.ts
  • src/lib/onboard/policy-selection.ts

@ericksoa ericksoa added area: policy Network policy, egress rules, presets, or sandbox policy area: onboarding Onboarding FSM, provider setup, sandbox launch, or first-run flow labels Jun 4, 2026
@ericksoa ericksoa self-assigned this Jun 4, 2026
@ericksoa ericksoa added area: e2e End-to-end tests, nightly failures, or validation infrastructure area: integrations Third-party service integration behavior integration: openclaw OpenClaw integration behavior integration: hermes Hermes integration behavior feature PR adds or expands user-visible functionality v0.0.60 Release target labels Jun 4, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/policies.test.ts (1)

1589-1603: ⚡ Quick win

Add explicit guards against access: full and wildcard binaries in this new safety test.

This test validates methods, but it won’t catch a future widening to access: full or binary /**. Add direct assertions so the regression fails fast.

Proposed test hardening
     it("weather and public-reference presets stay read-only and narrowly client-scoped", () => {
       for (const preset of ["weather", "public-reference"]) {
         const content = requirePresetContent(policies.loadPreset(preset));
+        const parsed = YAML.parse(content) as {
+          network_policies?: Record<string, { binaries?: Array<{ path?: string }> }>;
+        };
+        const binaryPaths = Object.values(parsed.network_policies ?? {})
+          .flatMap((policy) => policy.binaries ?? [])
+          .map((entry) => entry.path ?? "");
+
         expect(content).toContain("protocol: rest");
         expect(content).toContain("method: GET");
         expect(content).toContain("method: HEAD");
+        expect(content).not.toContain("access: full");
         expect(content).not.toContain("method: POST");
         expect(content).not.toContain("method: PUT");
         expect(content).not.toContain("method: PATCH");
         expect(content).not.toContain("method: DELETE");
+        expect(binaryPaths).not.toContain("/**");
         expect(content).toContain("/usr/local/bin/node");
         expect(content).toContain("/opt/hermes/.venv/bin/python");
         expect(content).toContain("/usr/bin/curl");
       }
     });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/policies.test.ts` around lines 1589 - 1603, The test "weather and
public-reference presets stay read-only and narrowly client-scoped" currently
checks allowed HTTP methods and specific binaries but doesn't assert against
expanding privileges; update the test that iterates presets from
policies.loadPreset(preset) and content = requirePresetContent(...) to also
assert that the preset content does NOT contain "access: full" and does NOT
contain wildcard/broad binary paths such as "/**" or other wildcard patterns
(e.g. "/*") so any regression to full access or wildcard binaries fails
immediately.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/policies.test.ts`:
- Around line 1589-1603: The test "weather and public-reference presets stay
read-only and narrowly client-scoped" currently checks allowed HTTP methods and
specific binaries but doesn't assert against expanding privileges; update the
test that iterates presets from policies.loadPreset(preset) and content =
requirePresetContent(...) to also assert that the preset content does NOT
contain "access: full" and does NOT contain wildcard/broad binary paths such as
"/**" or other wildcard patterns (e.g. "/*") so any regression to full access or
wildcard binaries fails immediately.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 36b8b1df-8c49-4119-a075-645b54cbe1a2

📥 Commits

Reviewing files that changed from the base of the PR and between 17734b1 and bce7b14.

📒 Files selected for processing (10)
  • nemoclaw-blueprint/policies/presets/public-reference.yaml
  • nemoclaw-blueprint/policies/presets/weather.yaml
  • nemoclaw-blueprint/policies/tiers.yaml
  • src/lib/onboard/hermes-managed-tools.ts
  • src/lib/onboard/policy-selection.ts
  • test/e2e/test-common-egress-agent-e2e.sh
  • test/onboard-policy-suggestions.test.ts
  • test/policies.test.ts
  • test/policy-tiers-onboard.test.ts
  • test/policy-tiers.test.ts

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Selective E2E Results — ✅ All requested jobs passed

Run: 26955344390
Target ref: bce7b14ae6576095439318d0a43cedfefe535328
Workflow ref: main
Requested jobs: network-policy-e2e,cloud-e2e,hermes-discord-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
cloud-e2e ✅ success
hermes-discord-e2e ✅ success
network-policy-e2e ⚠️ cancelled

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Selective E2E Results — ✅ All requested jobs passed

Run: 26955772183
Target ref: 5964fde132b2deb241b9dc7b29d012aa66597385
Workflow ref: main
Requested jobs: cloud-e2e,network-policy-e2e,hermes-e2e
Summary: 0 passed, 0 failed, 0 skipped

Job Result
cloud-e2e ⚠️ cancelled
hermes-e2e ⚠️ cancelled
network-policy-e2e ⚠️ cancelled

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Selective E2E Results — ✅ All requested jobs passed

Run: 26955916534
Target ref: 8c4f41e0cd4fb0da72b2b88ae8bee12dc6b4ec90
Workflow ref: main
Requested jobs: network-policy-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
network-policy-e2e ✅ success

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Selective E2E Results — ❌ Some jobs failed

Run: 26956028685
Target ref: 8c4f41e0cd4fb0da72b2b88ae8bee12dc6b4ec90
Workflow ref: feat/safe-common-egress-4767
Requested jobs: common-egress-agent-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job Result
common-egress-agent-e2e ❌ failure

Failed jobs: common-egress-agent-e2e. Check run artifacts for logs.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Selective E2E Results — ❌ Some jobs failed

Run: 26957675088
Target ref: e269088ec
Workflow ref: feat/safe-common-egress-4767
Requested jobs: common-egress-agent-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job Result
common-egress-agent-e2e ❌ failure

Failed jobs: common-egress-agent-e2e. Check run artifacts for logs.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Selective E2E Results — ✅ All requested jobs passed

Run: 26957826438
Target ref: e269088ec9edac56c5ecde4a51ea5c40a98e8437
Workflow ref: main
Requested jobs: network-policy-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
network-policy-e2e ✅ success

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 4, 2026

Selective E2E Results — ❌ Some jobs failed

Run: 26957800405
Target ref: e269088ec9edac56c5ecde4a51ea5c40a98e8437
Workflow ref: feat/safe-common-egress-4767
Requested jobs: common-egress-agent-e2e
Summary: 0 passed, 1 failed, 0 skipped

Job Result
common-egress-agent-e2e ❌ failure

Failed jobs: common-egress-agent-e2e. Check run artifacts for logs.

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ✅ All requested jobs passed

Run: 27006005488
Target ref: 8dbfeffb138f561eed8504af2bd76042491d0f68
Workflow ref: main
Requested jobs: network-policy-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
network-policy-e2e ✅ success

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ✅ All requested jobs passed

Run: 27008152348
Target ref: 6da5868529a791bae076cdc65ef31b60058d1350
Workflow ref: main
Requested jobs: network-policy-e2e,cloud-onboard-e2e,hermes-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
cloud-onboard-e2e ✅ success
hermes-e2e ✅ success
network-policy-e2e ⚠️ cancelled

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 5, 2026

Selective E2E Results — ✅ All requested jobs passed

Run: 27008581953
Target ref: d66de2c4da2b929274b2a9a0c38692de28266ea7
Workflow ref: main
Requested jobs: network-policy-e2e,cloud-onboard-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
cloud-onboard-e2e ✅ success
network-policy-e2e ✅ success

@cv cv merged commit a1fab20 into main Jun 5, 2026
33 checks passed
@cv cv deleted the feat/safe-common-egress-4767 branch June 5, 2026 16:01
miyoungc added a commit that referenced this pull request Jun 6, 2026
## Summary
- Adds the `v0.0.60` section to `docs/about/release-notes.mdx` using the
dev announcement from discussion #4877.
- Fills the source-doc gaps found during release-prep review across
inference, policy tiers, command behavior, security boundaries, Hermes
dashboard/tooling, runtime context, and troubleshooting.
- Refreshes generated agent skills under `.agents/skills/` from the
current Fern docs output and upgrades Fern from `5.44.3` to `5.45.0`.

## Source summary
- #4037 -> `docs/reference/architecture.mdx`,
`docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents
system-only runtime context that stays out of visible chat.
- #4875 -> `docs/reference/architecture.mdx`,
`docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents
try-first sandbox network/filesystem guidance and clearer failure
classification.
- #4788 -> `docs/security/best-practices.mdx`,
`docs/about/release-notes.mdx`: Documents shared OpenClaw
device-approval policy for startup and connect.
- #4768 -> `docs/reference/network-policies.mdx`,
`docs/network-policy/integration-policy-examples.mdx`,
`docs/get-started/quickstart.mdx`,
`docs/get-started/quickstart-hermes.mdx`, `docs/reference/commands.mdx`:
Documents `weather`, `public-reference`, and Hermes managed-tool gateway
preset behavior.
- #3788 and #4864 -> `docs/reference/network-policies.mdx`,
`docs/reference/commands.mdx`: Documents non-interactive policy-tier
fail-fast behavior and interactive prompt fallback.
- #4756 and #4866 -> `docs/reference/commands.mdx`: Documents env-aware
default sandbox resolution for `list`, `status`, and `tunnel` commands.
- #4320 -> `docs/reference/commands.mdx`: Documents `$$nemoclaw tunnel
status` behavior.
- #4328 -> `docs/reference/commands.mdx`: Documents line-scoped policy
preset descriptions in `policy-list`.
- #4580 and #4748 -> `docs/reference/architecture.mdx`: Documents
package-managed OpenShell gateway service and Docker-driver
gateway-marker behavior.
- #4598 -> `docs/manage-sandboxes/lifecycle.mdx`: Documents concurrent
gateway/dashboard cleanup isolation by sandbox name and port.
- #4777 -> `docs/reference/troubleshooting.mdx`: Documents Docker GPU
patch rollback behavior.
- #4610 -> `docs/reference/troubleshooting.mdx`,
`docs/reference/commands.mdx`: Keeps mutable OpenClaw config permission
guidance aligned and removes skipped experimental wording.
- #4868 -> `docs/reference/commands.mdx`: Keeps `.dockerignore` handling
for custom `onboard --from <Dockerfile>` contexts in generated skills.
- #4870 -> `docs/reference/commands.mdx`,
`docs/manage-sandboxes/runtime-controls.mdx`: Documents
`NEMOCLAW_MINIMAL_BOOTSTRAP` and generated skill coverage.
- #4641 -> `docs/inference/inference-options.mdx`,
`docs/reference/troubleshooting.mdx`: Documents local NVIDIA NIM
platform-digest pulls and served-model id adoption.
- #4810 and #4867 -> `docs/inference/inference-options.mdx`: Documents
stable NGC managed-vLLM image lineage and DGX Station DeepSeek V4 Flash
coverage.
- #4852 -> `docs/inference/use-local-inference.mdx`,
`docs/reference/troubleshooting.mdx`: Documents Ollama model fit
filtering, 16K context floor, cold-load retry, and failed-model
exclusion.
- #4847 -> `docs/inference/switch-inference-providers.mdx`: Documents
API-family sync, Hermes `api_mode`, and Bedrock Runtime exception.
- #4800 -> `docs/inference/tool-calling-reliability.mdx`: Documents
Nemotron managed-inference native tool-search fallback.
- #4333 -> `docs/inference/switch-inference-providers.mdx`: Documents
interactive multimodal input prompting.
- #4086 -> `docs/reference/troubleshooting.mdx`: Keeps proxy bypass
normalization in generated troubleshooting coverage.
- #4811 and #4855 -> `docs/get-started/quickstart-hermes.mdx`: Documents
prebuilt Hermes dashboard assets and TUI recovery without runtime
rebuilds.
- #4854 -> `docs/inference/switch-inference-providers.mdx`,
`docs/reference/commands.mdx`: Documents Hermes proxy API-key
placeholder preservation during inference switches.
- #4248 -> `docs/manage-sandboxes/messaging-channels.mdx`,
`.agents/skills/`: Keeps messaging enrollment behavior aligned with
manifest-hook implementation.
- #4771 -> `docs/security/best-practices.mdx`,
`docs/security/credential-storage.mdx`: Documents Hermes
placeholder-only secret boundary for sandbox-visible runtime files.
- #4787 -> `docs/security/best-practices.mdx`,
`docs/about/release-notes.mdx`: Documents expanded memory scanner
examples for OpenAI project keys and Slack app-level tokens.
- #4848 -> `docs/reference/commands.mdx`: Documents OpenClaw skill
install mirroring into the agent home directory.
- #4790 -> `docs/about/release-notes.mdx`: Uses the prior release-prep
structure and generated `.agents/skills/` refresh as the template for
this release.

## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ skills/
--prefix nemoclaw-user --doc-platform fern-mdx --dry-run`
- `npm run docs`
- `git diff --check`
- skip-term scan across `docs/`, `.agents/skills/`, and `skills/`
- `npm run build:cli`
- `npm run typecheck:cli`
- Commit and pre-push hook suites, including markdownlint, gitleaks,
env-var docs gate, docs-to-skills verification, and skills YAML tests

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* DeepSeek-V4-Flash now available as default inference model for DGX
Station.
* Hermes dashboard improved with dedicated port and OAuth-authenticated
tool gateway selection.
* Added weather and public-reference policy presets for expanded agent
capabilities.
* Enhanced Ollama model selection with GPU memory filtering and
automatic retry for timeouts.

* **Bug Fixes**
  * Improved policy tier validation to prevent invalid configurations.
* Better sandbox cleanup scoping by port to prevent conflicts across
deployments.
  * Added GPU patch failure recovery with automatic rollback.

* **Documentation**
* Expanded troubleshooting guides for inference, security, and sandbox
lifecycle.
  * Added .dockerignore best practices for custom deployments.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: e2e End-to-end tests, nightly failures, or validation infrastructure area: integrations Third-party service integration behavior area: onboarding Onboarding FSM, provider setup, sandbox launch, or first-run flow area: policy Network policy, egress rules, presets, or sandbox policy feature PR adds or expands user-visible functionality integration: hermes Hermes integration behavior integration: openclaw OpenClaw integration behavior v0.0.60 Release target

Projects

None yet

3 participants