Skip to content

preflight: detect gateway port 18789 conflicts during onboard#251

Merged
ericksoa merged 4 commits into
NVIDIA:mainfrom
brianwtaylor:feat/preflight-port-check
Mar 18, 2026
Merged

preflight: detect gateway port 18789 conflicts during onboard#251
ericksoa merged 4 commits into
NVIDIA:mainfrom
brianwtaylor:feat/preflight-port-check

Conversation

@brianwtaylor
Copy link
Copy Markdown
Contributor

@brianwtaylor brianwtaylor commented Mar 17, 2026

Summary

  • Add checkPortAvailable() to bin/lib/preflight.js — uses lsof (primary) to identify blocking process name + PID, with a Node.js net probe fallback for cross-platform EADDRINUSE detection
  • Check both required ports: 8080 (OpenShell gateway) and 18789 (NemoClaw dashboard)
  • Integrate into onboard step 1 (after cgroup check, before GPU detection) so users get a clear error with remediation steps instead of a silent failure
  • 9 test cases covering lsof parsing, net probe, multi-port, and smoke tests

Closes #18 — detect host openclaw-gateway.service port conflicts and fail loudly before onboarding continues
Addresses #51install.sh assumes port 8080 isn't already in use

Problem

nemoclaw onboard starts the gateway on port 8080 and the dashboard on port 18789. If something else (e.g., a leftover openclaw-gateway.service, another container, or a dev server) is already listening, the gateway bind fails silently and the user ends up with a non-functional dashboard.

Detection chain

  1. lsof (primary): lsof -i :PORT -sTCP:LISTEN -P -n — works on Linux + macOS, returns process name + PID. Uses -sTCP:LISTEN to avoid false positives from TIME_WAIT connections.
  2. Node.js net probe (fallback): net.createServer().listen(port) — cross-platform, detects EADDRINUSE but can't identify the process. Catches cases where non-root users can't see root-owned listeners via lsof.
  3. If lsof is not installed, falls through to the net probe automatically

Test plan

Automated Tests

node --test test/preflight.test.js
  • 9 tests: lsof parsing (single + multi listener), net probe (free + occupied), empty lsof fallthrough, smoke test, default port, gateway port 8080

Manual verification

# Start a dummy server on 18789
node -e "require('net').createServer().listen(18789)" &
# Run onboard — should fail fast with clear error
nemoclaw onboard
# Kill dummy server
kill %1

# Same for gateway port
node -e "require('net').createServer().listen(8080)" &
nemoclaw onboard
# Should report port 8080 conflict with process info
kill %1

Closes #51
Closes #18

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Onboarding's preflight now calls a new checkPortAvailable(18789) which uses lsof parsing with a Node net probe fallback. If the port is in use the tool prints diagnostics (process, PID, stop hints, reason) and exits code 1; otherwise it logs the port as available and continues.

Changes

Cohort / File(s) Summary
Port Availability Utility
bin/lib/preflight.js
Added checkPortAvailable(port, opts) that parses lsof output (opts.lsofOutput) and falls back to a Node net probe (opts.skipLsof). Returns { ok: true } or { ok: false, process?, pid?, reason? } and exported via module.exports.
Onboarding Integration
bin/lib/onboard.js
preflight() now calls checkPortAvailable(18789); on failure prints diagnostics including process/PID, suggested stop/inspection commands and reason, then exits with code 1; logs availability on success.
Tests
test/preflight.test.js
New test suite for checkPortAvailable covering lsof parsing (single/multiple listeners), net-probe fallback, skipLsof behavior, occupied/free ports, live smoke test, and default-argument behavior.

Sequence Diagram

sequenceDiagram
    actor User
    participant Onboard as Onboarding
    participant Preflight as Preflight
    participant PortCheck as checkPortAvailable
    participant Lsof as lsof
    participant Net as NodeNetProbe

    User->>Onboard: start onboarding
    Onboard->>Preflight: run preflight()
    Preflight->>PortCheck: checkPortAvailable(18789)

    alt skipLsof not set
        PortCheck->>Lsof: invoke/parse lsof output for port 18789
        Lsof-->>PortCheck: lsof output (listeners/processes)
        PortCheck->>PortCheck: parse to identify process/pid
    end

    alt lsof inconclusive or skipped
        PortCheck->>Net: attempt TCP bind/connect probe (127.0.0.1)
        Net-->>PortCheck: success or EADDRINUSE/error
    end

    alt port available
        PortCheck-->>Preflight: { ok: true }
        Preflight-->>Onboard: preflight OK
        Onboard->>User: "✓ Port 18789 available" (continue)
    else port unavailable
        PortCheck-->>Preflight: { ok: false, process, pid, reason }
        Preflight-->>Onboard: port conflict detected
        Onboard->>User: diagnostics (process, PID, stop/inspect commands)
        Onboard->>Onboard: exit 1
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 I sniffed the port with lsof and net,

A tiny probe to see who's set,
If 18789 is free and bright,
I hop aboard and carry the light,
If someone blocks it, I trumpet the PID tonight.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: adding port conflict detection for gateway port 18789 during the onboarding preflight check.
Linked Issues check ✅ Passed The PR successfully implements all coding requirements from issue #18: it checks port 18789 availability, identifies conflicting processes/PIDs when possible, exits non-zero on conflict, and provides clear diagnostic output.
Out of Scope Changes check ✅ Passed All changes directly support port-conflict detection for issue #18; no unrelated modifications detected in onboard.js, preflight.js, or test files.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can enforce grammar and style rules using `languagetool`.

Configure the reviews.tools.languagetool setting to enable/disable rules and categories. Refer to the LanguageTool Community to learn more.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
bin/lib/preflight.js (1)

95-109: Validate port before shell interpolation.

At Line 108, p is interpolated directly into a shell command. Add numeric/range validation to avoid malformed command execution if this helper is reused with non-numeric input.

Suggested validation
-  const p = port || 18789;
+  const p = port ?? 18789;
+  if (!Number.isInteger(p) || p < 1 || p > 65535) {
+    return {
+      ok: false,
+      process: "unknown",
+      pid: null,
+      reason: `invalid port: ${port}`,
+    };
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/lib/preflight.js` around lines 95 - 109, In checkPortAvailable, validate
and sanitize the incoming port (p) before using it in the lsof shell
interpolation: ensure it's a finite integer within the TCP port range (1–65535)
or else throw/return an error/false; coerce string digits to Number safely
(e.g., parseInt + Number.isFinite/Number.isInteger checks) and reject any
non-numeric or out-of-range values so the lsof invocation (`lsof -i :${p} ...`)
never receives malformed input.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/lib/onboard.js`:
- Around line 100-107: The CLI prints a `sudo kill ${portCheck.pid}` remediation
even when portCheck.pid is null, which yields misleading output; update the port
conflict handling in onboard.js (check the portCheck object and its
portCheck.process/portCheck.pid usage) to only print the specific "sudo kill
<PID>" line when portCheck.pid is a valid/defined PID (truthy number/string),
and otherwise omit that line or replace it with a generic instruction like "stop
the offending process (PID unknown) or use system tools" while still showing the
systemd suggestion.

In `@bin/lib/preflight.js`:
- Around line 114-121: The code currently treats an empty parsed lsof output as
definitive by returning { ok: true } when dataLines.length === 0; remove that
early return and instead let the function fall through to the existing fallback
probe logic so inconclusive lsof results trigger the fallback check.
Specifically, update the block that inspects lsofOut/lines/dataLines (variables
lsofOut, lines, dataLines) to avoid returning { ok: true } on empty dataLines
and ensure the subsequent probePort/fallback path runs and decides the final {
ok, pid } result.

In `@test/preflight.test.js`:
- Around line 168-171: The fixed-port smoke test is flaky; change it to pick a
guaranteed-free port at runtime by creating a temporary server bound to port 0
to obtain an OS-assigned free port, close that server, then call
checkPortAvailable(<thatPort>) and assert result.ok is true. Update the "smoke
test with live detection on an unlikely port" test to use this setup flow
(create server on 0 → read assigned port → close server → call
checkPortAvailable) so the assertion uses a deterministic free port rather than
hard-coded 59123.

---

Nitpick comments:
In `@bin/lib/preflight.js`:
- Around line 95-109: In checkPortAvailable, validate and sanitize the incoming
port (p) before using it in the lsof shell interpolation: ensure it's a finite
integer within the TCP port range (1–65535) or else throw/return an error/false;
coerce string digits to Number safely (e.g., parseInt +
Number.isFinite/Number.isInteger checks) and reject any non-numeric or
out-of-range values so the lsof invocation (`lsof -i :${p} ...`) never receives
malformed input.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4836c596-30be-4d40-a138-b2ff52b5fd6c

📥 Commits

Reviewing files that changed from the base of the PR and between 2a9afbc and ba9b95c.

📒 Files selected for processing (3)
  • bin/lib/onboard.js
  • bin/lib/preflight.js
  • test/preflight.test.js

Comment thread bin/lib/onboard.js Outdated
Comment thread bin/lib/preflight.js
Comment thread test/preflight.test.js Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/preflight.test.js (1)

115-121: Extract repeated free-port setup into a helper.

The same ephemeral-port allocation pattern appears in three tests; extracting it will reduce duplication and make future updates safer.

♻️ Suggested refactor
+async function getEphemeralFreePort() {
+  return await new Promise((resolve) => {
+    const srv = net.createServer();
+    srv.listen(0, "127.0.0.1", () => {
+      const port = srv.address().port;
+      srv.close(() => resolve(port));
+    });
+  });
+}
+
 describe("checkPortAvailable", () => {
   it("falls through to net probe when lsof output is empty", async () => {
-    const freePort = await new Promise((resolve) => {
-      const srv = net.createServer();
-      srv.listen(0, "127.0.0.1", () => {
-        const port = srv.address().port;
-        srv.close(() => resolve(port));
-      });
-    });
+    const freePort = await getEphemeralFreePort();
     const result = await checkPortAvailable(freePort, { lsofOutput: "" });
     assert.deepEqual(result, { ok: true });
   });
@@
   it("net probe returns ok for a free port", async () => {
-    const freePort = await new Promise((resolve) => {
-      const srv = net.createServer();
-      srv.listen(0, "127.0.0.1", () => {
-        const port = srv.address().port;
-        srv.close(() => resolve(port));
-      });
-    });
+    const freePort = await getEphemeralFreePort();
     const result = await checkPortAvailable(freePort, { skipLsof: true });
     assert.deepEqual(result, { ok: true });
   });
@@
   it("smoke test with live detection on a dynamically selected free port", async () => {
-    const freePort = await new Promise((resolve) => {
-      const srv = net.createServer();
-      srv.listen(0, "127.0.0.1", () => {
-        const port = srv.address().port;
-        srv.close(() => resolve(port));
-      });
-    });
+    const freePort = await getEphemeralFreePort();
     const result = await checkPortAvailable(freePort);
     assert.equal(result.ok, true);
   });

Also applies to: 169-175, 196-202

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/preflight.test.js` around lines 115 - 121, The ephemeral-port allocation
block is duplicated across three tests; extract it into a single async helper
(e.g., getFreePort) that returns a free port by creating a net.createServer(),
listening on 0, capturing srv.address().port, closing the server, and resolving
the port; then replace the inline Promise blocks in the tests with calls to
getFreePort to remove duplication and ensure the server is always closed before
returning.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/preflight.test.js`:
- Around line 139-140: The test's finally blocks call srv.close() without
awaiting it, leaving the server shutdown as fire-and-forget and risking
transient handle leakage; change both occurrences of srv.close() to await the
shutdown completion (either by awaiting a Promise-wrapped srv.close or using the
callback form: await new Promise(resolve => srv.close(resolve))) so the test
waits for the server to fully close before finishing.

---

Nitpick comments:
In `@test/preflight.test.js`:
- Around line 115-121: The ephemeral-port allocation block is duplicated across
three tests; extract it into a single async helper (e.g., getFreePort) that
returns a free port by creating a net.createServer(), listening on 0, capturing
srv.address().port, closing the server, and resolving the port; then replace the
inline Promise blocks in the tests with calls to getFreePort to remove
duplication and ensure the server is always closed before returning.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c12603fd-5db6-4e2d-b2f6-12f26dab6b51

📥 Commits

Reviewing files that changed from the base of the PR and between 76bb5bf and 8e18771.

📒 Files selected for processing (2)
  • bin/lib/preflight.js
  • test/preflight.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • bin/lib/preflight.js

Comment thread test/preflight.test.js Outdated
Add checkPortAvailable() to preflight checks. Uses lsof (primary)
to identify the blocking process name and PID, with a Node.js net
probe fallback for cross-platform EADDRINUSE detection. Integrated
into onboard step 1 before gateway operations so users get a clear
error instead of a dead dashboard.

Closes #18
Non-root lsof cannot see root-owned listeners (/proc/[pid]/fd returns
EACCES). Empty lsof output is not authoritative — fall through to the
net probe whose bind() checks the kernel socket table directly.

Added test covering the exact scenario: lsof empty + port occupied =
net probe catches EADDRINUSE.
@brianwtaylor brianwtaylor force-pushed the feat/preflight-port-check branch from 9ee0d70 to 027eddf Compare March 18, 2026 03:23
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/preflight.test.js (1)

15-21: Consider extracting a shared free-port helper to reduce repetition.

The same bind/release sequence is repeated three times; a small helper would simplify maintenance and keep setup behavior consistent across tests.

♻️ Optional refactor sketch
 const net = require("net");
 const { checkPortAvailable } = require("../bin/lib/preflight");
 
+async function getFreePort() {
+  const srv = net.createServer();
+  return await new Promise((resolve, reject) => {
+    srv.once("error", reject);
+    srv.listen(0, "127.0.0.1", () => {
+      const { port } = srv.address();
+      srv.close((err) => (err ? reject(err) : resolve(port)));
+    });
+  });
+}
+
 describe("checkPortAvailable", () => {
   it("falls through to net probe when lsof output is empty", async () => {
@@
-    const freePort = await new Promise((resolve) => {
-      const srv = net.createServer();
-      srv.listen(0, "127.0.0.1", () => {
-        const port = srv.address().port;
-        srv.close(() => resolve(port));
-      });
-    });
+    const freePort = await getFreePort();
@@
-    const freePort = await new Promise((resolve) => {
-      const srv = net.createServer();
-      srv.listen(0, "127.0.0.1", () => {
-        const port = srv.address().port;
-        srv.close(() => resolve(port));
-      });
-    });
+    const freePort = await getFreePort();
@@
-    const freePort = await new Promise((resolve) => {
-      const srv = net.createServer();
-      srv.listen(0, "127.0.0.1", () => {
-        const port = srv.address().port;
-        srv.close(() => resolve(port));
-      });
-    });
+    const freePort = await getFreePort();

Also applies to: 69-75, 96-102

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/preflight.test.js` around lines 15 - 21, Extract the repeated
bind/release sequence into a single async helper (e.g., async function
getFreePort() or acquireFreePort()) that creates a net.createServer(), listens
on port 0 at "127.0.0.1", reads srv.address().port, closes the server and
returns the port; then replace each inline Promise block that assigns freePort
with calls to this helper in test/preflight.test.js (where the current blocks
around the freePort assignments are repeated) to keep setup consistent and DRY.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/preflight.test.js`:
- Around line 15-21: Extract the repeated bind/release sequence into a single
async helper (e.g., async function getFreePort() or acquireFreePort()) that
creates a net.createServer(), listens on port 0 at "127.0.0.1", reads
srv.address().port, closes the server and returns the port; then replace each
inline Promise block that assigns freePort with calls to this helper in
test/preflight.test.js (where the current blocks around the freePort assignments
are repeated) to keep setup consistent and DRY.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 5dbdb7fd-60fa-477c-b123-bfc14be9830d

📥 Commits

Reviewing files that changed from the base of the PR and between 9ee0d70 and 027eddf.

📒 Files selected for processing (3)
  • bin/lib/onboard.js
  • bin/lib/preflight.js
  • test/preflight.test.js
🚧 Files skipped from review as they are similar to previous changes (2)
  • bin/lib/preflight.js
  • bin/lib/onboard.js

Check both required ports (8080 for OpenShell gateway, 18789 for
dashboard) during preflight instead of only 18789. Uses the same
lsof + net-probe detection chain with per-port error messaging.
Copy link
Copy Markdown
Contributor

@ericksoa ericksoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — solid architecture with lsof + net probe fallback, good test coverage. Tested: port conflict correctly detected and blocks onboard, clean ports pass preflight.

@ericksoa ericksoa merged commit 776a5d1 into NVIDIA:main Mar 18, 2026
3 checks passed
Ryuketsukami pushed a commit to Ryuketsukami/NemoClaw that referenced this pull request Mar 24, 2026
…#251)

* preflight: detect gateway port 18789 conflicts during onboard

Add checkPortAvailable() to preflight checks. Uses lsof (primary)
to identify the blocking process name and PID, with a Node.js net
probe fallback for cross-platform EADDRINUSE detection. Integrated
into onboard step 1 before gateway operations so users get a clear
error instead of a dead dashboard.

Closes NVIDIA#18

* preflight: fall through to net probe when lsof returns empty

Non-root lsof cannot see root-owned listeners (/proc/[pid]/fd returns
EACCES). Empty lsof output is not authoritative — fall through to the
net probe whose bind() checks the kernel socket table directly.

Added test covering the exact scenario: lsof empty + port occupied =
net probe catches EADDRINUSE.

* test: await srv.close() in finally blocks to prevent handle leakage

* preflight: also check gateway port 8080 before onboard

Check both required ports (8080 for OpenShell gateway, 18789 for
dashboard) during preflight instead of only 18789. Uses the same
lsof + net-probe detection chain with per-port error messaging.
jessesanford pushed a commit to jessesanford/NemoClaw that referenced this pull request Mar 24, 2026
…#251)

* preflight: detect gateway port 18789 conflicts during onboard

Add checkPortAvailable() to preflight checks. Uses lsof (primary)
to identify the blocking process name and PID, with a Node.js net
probe fallback for cross-platform EADDRINUSE detection. Integrated
into onboard step 1 before gateway operations so users get a clear
error instead of a dead dashboard.

Closes NVIDIA#18

* preflight: fall through to net probe when lsof returns empty

Non-root lsof cannot see root-owned listeners (/proc/[pid]/fd returns
EACCES). Empty lsof output is not authoritative — fall through to the
net probe whose bind() checks the kernel socket table directly.

Added test covering the exact scenario: lsof empty + port occupied =
net probe catches EADDRINUSE.

* test: await srv.close() in finally blocks to prevent handle leakage

* preflight: also check gateway port 8080 before onboard

Check both required ports (8080 for OpenShell gateway, 18789 for
dashboard) during preflight instead of only 18789. Uses the same
lsof + net-probe detection chain with per-port error messaging.
mafueee pushed a commit to mafueee/NemoClaw that referenced this pull request Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

./install.sh assumes port 8080 isn't already in use Detect host openclaw-gateway.service port conflicts and fail loudly before onboarding continues

2 participants