Skip to content

fix(uninstall): kill host openshell-gateway process during uninstall (#3516)#3554

Merged
cv merged 2 commits into
NVIDIA:mainfrom
kagura-agent:fix/uninstall-kill-host-gateway
May 17, 2026
Merged

fix(uninstall): kill host openshell-gateway process during uninstall (#3516)#3554
cv merged 2 commits into
NVIDIA:mainfrom
kagura-agent:fix/uninstall-kill-host-gateway

Conversation

@kagura-agent
Copy link
Copy Markdown
Contributor

@kagura-agent kagura-agent commented May 15, 2026

Summary

Fixes #3516nemoclaw uninstall --yes leaves the host-process openshell-gateway running and holding port 8080.

Root cause

When host glibc satisfies the gateway requirement (v0.0.41+), NemoClaw spawns /usr/local/bin/openshell-gateway directly as a host process instead of inside a container. The executePlan() "Stopping services" step did not include a kill for this host-process gateway — it only handled the containerized path via openshell gateway destroy.

Fix

Add a stopMatchingPids("openshell-gateway", ...) call in the "Stopping services" step of executePlan(), following the existing pattern used for OpenShell forward processes and orphaned openshell processes. This uses pgrep -f openshell-gateway to find and SIGTERM/SIGKILL all matching gateway processes before the state directory cleanup.

Testing

  • Added a vitest test that mocks pgrep returning a gateway PID and verifies it gets killed
  • All 16 uninstall run-plan tests pass

Changes

Signed-off-by: kagura-agent kagura-agent@users.noreply.github.com

Summary by CodeRabbit

  • Bug Fixes

    • Uninstall now stops host openshell-gateway processes to ensure complete cleanup of related services
  • Tests

    • Added test verifying host openshell-gateway processes are detected and terminated during uninstall

Review Change Stack

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 15, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8c8cc5cf-5d1f-45a3-9044-1a35ae0e0a01

📥 Commits

Reviewing files that changed from the base of the PR and between 90e3ea4 and c9c03e7.

📒 Files selected for processing (2)
  • src/lib/actions/uninstall/run-plan.test.ts
  • src/lib/actions/uninstall/run-plan.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/lib/actions/uninstall/run-plan.ts
  • src/lib/actions/uninstall/run-plan.test.ts

📝 Walkthrough

Walkthrough

Adds host-level openshell-gateway termination to the uninstall "Stopping services" step and a unit test that stubs pgrep returning PID 99887, asserting the uninstall plan issues a kill and logs the stopped PID.

Changes

Host openshell-gateway process cleanup

Layer / File(s) Summary
Stopping services — stop openshell-gateway PIDs
src/lib/actions/uninstall/run-plan.ts
executePlan now calls stopMatchingPids("openshell-gateway", ...) during the "Stopping services" step to terminate host openshell-gateway processes.
Test: kills host openshell-gateway process during uninstall
src/lib/actions/uninstall/run-plan.test.ts
New Vitest case stubs pgrep to return PID 99887, runs runUninstallPlan with OpenShell enabled, asserts a kill is issued for 99887, checks the log Stopped host openshell-gateway processes 99887, and verifies result.exitCode === 0.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#3405: Modifies the same executePlan "Stopping services" flow and adds cleanup for stale/dashboard listeners that this change builds upon.

Suggested labels

NemoClaw CLI, fix, bug, v0.0.41

Suggested reviewers

  • jyaunches
  • cv

Poem

🐰 A tiny PID hid in the grass,
openshell hummed as I let uninstall pass.
I sniffed and I chased till the process was gone,
Port 8080 clear at the rise of the dawn.
Hooray — no more orphans to frown upon!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: killing the host openshell-gateway process during uninstall, and references the issue number (#3516).
Linked Issues check ✅ Passed The PR fulfills all objectives from #3516: adds logic to kill the host openshell-gateway process via stopMatchingPids call, prevents port 8080 leaks, and includes a test verifying the fix.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing issue #3516: one line in run-plan.ts to call stopMatchingPids for host gateway, and one test case in run-plan.test.ts verifying the fix.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

…VIDIA#3516)

When NemoClaw spawns openshell-gateway as a host process (non-containerized
mode, v0.0.41+), `nemoclaw uninstall` did not stop it — the process survived
and kept port 8080 bound.

Add a `stopMatchingPids` call for openshell-gateway in the 'Stopping services'
step, matching the existing pattern used for OpenShell forward processes and
orphaned openshell processes.

Signed-off-by: kagura-agent <kagura.agent.ai@gmail.com>
@kagura-agent kagura-agent force-pushed the fix/uninstall-kill-host-gateway branch from 90e3ea4 to c9c03e7 Compare May 15, 2026 02:35
@wscurran wscurran added Docker Support for Docker containerization fix NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). labels May 15, 2026
@wscurran
Copy link
Copy Markdown
Contributor

✨ Thanks for submitting this detailed PR to fix the issue with the uninstall process not killing the running openshell-gateway host process on Ubuntu 24.04. This change aims to improve the reliability of the uninstall lifecycle by preserving the necessary cleanup of host-process gateways.


Related open issues:

@wscurran wscurran added OpenShell Support for OpenShell, a safe, private runtime for autonomous AI agents and removed Docker Support for Docker containerization labels May 15, 2026
@cv cv added the v0.0.45 Release target label May 17, 2026
@cv cv merged commit 62f5f42 into NVIDIA:main May 17, 2026
17 checks passed
ericksoa pushed a commit that referenced this pull request May 18, 2026
## Summary
Updates the NemoClaw documentation for the v0.0.45 release by
summarizing the user-facing changes merged since v0.0.44 and bumping the
docs version metadata.
Refreshes generated user skills so agent-facing references match the
source docs.

## Changes
- Added v0.0.45 release notes covering onboarding recovery, local
inference, channel cleanup, share mount diagnostics, uninstall cleanup,
and security redaction updates.
- Updated command and troubleshooting docs for sandbox name limits, GPU
gateway reuse, DNS preflight behavior, channel removal cleanup, and
share mount path validation.
- Bumped docs version metadata to 0.0.45 and regenerated NemoClaw user
skills from the docs.
- Source summary: #3672 -> `docs/reference/commands.md`: documented
channel removal detaching bridge providers and un-applying channel
policy presets.
- Source summary: #3678 -> `docs/about/release-notes.md`: documented
Ollama streamed usage accounting in the release notes.
- Source summary: #3670 -> `docs/reference/commands.md`,
`docs/reference/troubleshooting.md`: documented safe GPU gateway
replacement behavior.
- Source summary: #3664 -> `docs/about/release-notes.md`: documented
blueprint permission normalization in the release notes.
- Source summary: #3181 -> `docs/reference/troubleshooting.md`:
documented GPU toolkit guidance when host drivers work but passthrough
is disabled.
- Source summary: #3554 -> `docs/about/release-notes.md`: documented
host `openshell-gateway` cleanup during uninstall.
- Source summary: #3651 -> `docs/reference/troubleshooting.md`:
documented the uncached `.invalid` DNS preflight probe.
- Source summary: #3643 -> `docs/reference/commands.md`: included
existing `NEMOCLAW_PROVIDER` interactive-mode behavior in generated
docs.
- Source summary: #3647 -> `docs/reference/commands.md`: documented
remote sandbox path verification for `share mount`.
- Source summary: #3646 -> `docs/reference/commands.md`: included
existing local writable mount target guidance in generated docs.
- Source summary: #3642 -> `docs/inference/use-local-inference.md`,
`docs/reference/commands.md`: documented managed-vLLM model override and
gated-model token checks.
- Source summary: #3639 -> `docs/reference/commands.md`: documented the
63-character sandbox name limit.

## Type of Change
- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [x] Doc only (includes code sample changes)

## Verification
- [ ] `npx prek run --all-files` passes
- [ ] `npm test` passes
- [ ] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [x] `make docs` builds without warnings (doc changes only)
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

Commit hooks passed for the staged files. A standalone `npx prek run
--all-files` attempt was blocked by sandbox access to
`/Users/miyoungc/.cache/prek/prek.log`, so that checkbox is left
unchecked.

---
<!-- DCO sign-off required by CI. Run: git config user.name && git
config user.email -->
Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Documentation**
* Enhanced CLI command reference documentation with clearer guidance on
onboarding, GPU passthrough, inference configuration, channel removal,
and shared mounts.
* Improved troubleshooting sections with better DNS resolution and GPU
passthrough remediation steps.
  * Added documentation for overriding managed vLLM model selection.
* Updated release notes for v0.0.45 reflecting infrastructure and
workflow improvements.

* **Version Bump**
  * Released v0.0.45.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3755?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
yimoj added a commit to yimoj/NemoClaw that referenced this pull request May 26, 2026
…VIDIA#3516)

PR NVIDIA#3554 stopped the host gateway process during `nemoclaw uninstall`, but
direct `openshell gateway destroy` (and the sandbox-destroy `--cleanup-gateway`
flow) on Linux Docker-driver mode can still leave `/usr/local/bin/openshell-gateway`
running on port 8080. Consolidate the cleanup logic in a shared helper used by
uninstall, sandbox destroy, and onboard, and update recovery hints to surface a
`sudo pkill -f openshell-gateway` remediation alongside the existing
`openshell gateway remove` / `destroy` verbs.

Key changes:
- New src/lib/onboard/host-gateway-process.ts: shared stopper with PID-file
  primary path, opt-out pgrep fallback for orphans, Docker compat parent
  recognition (argv0 == docker AND mount-path token), bounded
  SIGTERM/SIGKILL with verified exit via `ps`, sudo remediation log when a
  privileged PID survives, and a warning when pgrep is unavailable so an
  uninstaller never claims success while an orphan is still bound.
- src/lib/actions/uninstall/run-plan.ts: invoke the shared helper during the
  "Stopping services" step.
- src/lib/actions/sandbox/destroy.ts: replace the local PID-file probe with
  the shared helper and disable the pgrep sweep so a same-host gateway run
  for an unrelated project is not torn down by `destroy --cleanup-gateway`.
- src/lib/onboard.ts: route `stopDockerDriverGatewayProcess` and the drift
  restart through the shared helper (drift restart targets only the supplied
  PID); update the final-failure recovery banner to add the sudo pkill line
  on Linux while keeping both lifecycle verbs.
- Recovery hint helpers (gpu-recovery.ts, gateway-gpu-passthrough.ts) and
  user-facing docs (`docs/manage-sandboxes/lifecycle.mdx`,
  `docs/reference/commands.mdx`, `docs/reference/troubleshooting.mdx`,
  `scripts/install.sh`) tell users to run `gateway remove` first with the
  legacy `gateway destroy -g` fallback, then `sudo pkill -f openshell-gateway`
  for stuck privileged processes.

Tests:
- New unit tests cover the PID-file path, pgrep fallback, false-positive
  rejection (random `vim /opt/nemoclaw/openshell-gateway`, unrelated
  cmdlines mentioning openshell-gateway), Docker compat parent matching,
  pgrep-skip behaviour when explicit PIDs are passed, and the
  pgrep-unavailable warning.
- Existing uninstall + onboard tests updated for the new copy.
- E2E verified by spawning a fake `/tmp/.../openshell-gateway` that binds a
  unique port and confirming both PID-file-only and pgrep-only branches
  stop it via the built dist/ entry point.

Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). OpenShell Support for OpenShell, a safe, private runtime for autonomous AI agents v0.0.45 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Ubuntu 24.04][Install] nemoclaw uninstall does not kill running openshell-gateway host process — port 8080 leaks after uninstall

3 participants