preflight: detect gateway port 18789 conflicts during onboard#251
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughOnboarding's preflight now calls a new Changes
Sequence DiagramsequenceDiagram
actor User
participant Onboard as Onboarding
participant Preflight as Preflight
participant PortCheck as checkPortAvailable
participant Lsof as lsof
participant Net as NodeNetProbe
User->>Onboard: start onboarding
Onboard->>Preflight: run preflight()
Preflight->>PortCheck: checkPortAvailable(18789)
alt skipLsof not set
PortCheck->>Lsof: invoke/parse lsof output for port 18789
Lsof-->>PortCheck: lsof output (listeners/processes)
PortCheck->>PortCheck: parse to identify process/pid
end
alt lsof inconclusive or skipped
PortCheck->>Net: attempt TCP bind/connect probe (127.0.0.1)
Net-->>PortCheck: success or EADDRINUSE/error
end
alt port available
PortCheck-->>Preflight: { ok: true }
Preflight-->>Onboard: preflight OK
Onboard->>User: "✓ Port 18789 available" (continue)
else port unavailable
PortCheck-->>Preflight: { ok: false, process, pid, reason }
Preflight-->>Onboard: port conflict detected
Onboard->>User: diagnostics (process, PID, stop/inspect commands)
Onboard->>Onboard: exit 1
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment Tip CodeRabbit can enforce grammar and style rules using `languagetool`.Configure the |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
bin/lib/preflight.js (1)
95-109: Validateportbefore shell interpolation.At Line 108,
pis interpolated directly into a shell command. Add numeric/range validation to avoid malformed command execution if this helper is reused with non-numeric input.Suggested validation
- const p = port || 18789; + const p = port ?? 18789; + if (!Number.isInteger(p) || p < 1 || p > 65535) { + return { + ok: false, + process: "unknown", + pid: null, + reason: `invalid port: ${port}`, + }; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@bin/lib/preflight.js` around lines 95 - 109, In checkPortAvailable, validate and sanitize the incoming port (p) before using it in the lsof shell interpolation: ensure it's a finite integer within the TCP port range (1–65535) or else throw/return an error/false; coerce string digits to Number safely (e.g., parseInt + Number.isFinite/Number.isInteger checks) and reject any non-numeric or out-of-range values so the lsof invocation (`lsof -i :${p} ...`) never receives malformed input.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/lib/onboard.js`:
- Around line 100-107: The CLI prints a `sudo kill ${portCheck.pid}` remediation
even when portCheck.pid is null, which yields misleading output; update the port
conflict handling in onboard.js (check the portCheck object and its
portCheck.process/portCheck.pid usage) to only print the specific "sudo kill
<PID>" line when portCheck.pid is a valid/defined PID (truthy number/string),
and otherwise omit that line or replace it with a generic instruction like "stop
the offending process (PID unknown) or use system tools" while still showing the
systemd suggestion.
In `@bin/lib/preflight.js`:
- Around line 114-121: The code currently treats an empty parsed lsof output as
definitive by returning { ok: true } when dataLines.length === 0; remove that
early return and instead let the function fall through to the existing fallback
probe logic so inconclusive lsof results trigger the fallback check.
Specifically, update the block that inspects lsofOut/lines/dataLines (variables
lsofOut, lines, dataLines) to avoid returning { ok: true } on empty dataLines
and ensure the subsequent probePort/fallback path runs and decides the final {
ok, pid } result.
In `@test/preflight.test.js`:
- Around line 168-171: The fixed-port smoke test is flaky; change it to pick a
guaranteed-free port at runtime by creating a temporary server bound to port 0
to obtain an OS-assigned free port, close that server, then call
checkPortAvailable(<thatPort>) and assert result.ok is true. Update the "smoke
test with live detection on an unlikely port" test to use this setup flow
(create server on 0 → read assigned port → close server → call
checkPortAvailable) so the assertion uses a deterministic free port rather than
hard-coded 59123.
---
Nitpick comments:
In `@bin/lib/preflight.js`:
- Around line 95-109: In checkPortAvailable, validate and sanitize the incoming
port (p) before using it in the lsof shell interpolation: ensure it's a finite
integer within the TCP port range (1–65535) or else throw/return an error/false;
coerce string digits to Number safely (e.g., parseInt +
Number.isFinite/Number.isInteger checks) and reject any non-numeric or
out-of-range values so the lsof invocation (`lsof -i :${p} ...`) never receives
malformed input.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 4836c596-30be-4d40-a138-b2ff52b5fd6c
📒 Files selected for processing (3)
bin/lib/onboard.jsbin/lib/preflight.jstest/preflight.test.js
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/preflight.test.js (1)
115-121: Extract repeated free-port setup into a helper.The same ephemeral-port allocation pattern appears in three tests; extracting it will reduce duplication and make future updates safer.
♻️ Suggested refactor
+async function getEphemeralFreePort() { + return await new Promise((resolve) => { + const srv = net.createServer(); + srv.listen(0, "127.0.0.1", () => { + const port = srv.address().port; + srv.close(() => resolve(port)); + }); + }); +} + describe("checkPortAvailable", () => { it("falls through to net probe when lsof output is empty", async () => { - const freePort = await new Promise((resolve) => { - const srv = net.createServer(); - srv.listen(0, "127.0.0.1", () => { - const port = srv.address().port; - srv.close(() => resolve(port)); - }); - }); + const freePort = await getEphemeralFreePort(); const result = await checkPortAvailable(freePort, { lsofOutput: "" }); assert.deepEqual(result, { ok: true }); }); @@ it("net probe returns ok for a free port", async () => { - const freePort = await new Promise((resolve) => { - const srv = net.createServer(); - srv.listen(0, "127.0.0.1", () => { - const port = srv.address().port; - srv.close(() => resolve(port)); - }); - }); + const freePort = await getEphemeralFreePort(); const result = await checkPortAvailable(freePort, { skipLsof: true }); assert.deepEqual(result, { ok: true }); }); @@ it("smoke test with live detection on a dynamically selected free port", async () => { - const freePort = await new Promise((resolve) => { - const srv = net.createServer(); - srv.listen(0, "127.0.0.1", () => { - const port = srv.address().port; - srv.close(() => resolve(port)); - }); - }); + const freePort = await getEphemeralFreePort(); const result = await checkPortAvailable(freePort); assert.equal(result.ok, true); });Also applies to: 169-175, 196-202
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/preflight.test.js` around lines 115 - 121, The ephemeral-port allocation block is duplicated across three tests; extract it into a single async helper (e.g., getFreePort) that returns a free port by creating a net.createServer(), listening on 0, capturing srv.address().port, closing the server, and resolving the port; then replace the inline Promise blocks in the tests with calls to getFreePort to remove duplication and ensure the server is always closed before returning.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/preflight.test.js`:
- Around line 139-140: The test's finally blocks call srv.close() without
awaiting it, leaving the server shutdown as fire-and-forget and risking
transient handle leakage; change both occurrences of srv.close() to await the
shutdown completion (either by awaiting a Promise-wrapped srv.close or using the
callback form: await new Promise(resolve => srv.close(resolve))) so the test
waits for the server to fully close before finishing.
---
Nitpick comments:
In `@test/preflight.test.js`:
- Around line 115-121: The ephemeral-port allocation block is duplicated across
three tests; extract it into a single async helper (e.g., getFreePort) that
returns a free port by creating a net.createServer(), listening on 0, capturing
srv.address().port, closing the server, and resolving the port; then replace the
inline Promise blocks in the tests with calls to getFreePort to remove
duplication and ensure the server is always closed before returning.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c12603fd-5db6-4e2d-b2f6-12f26dab6b51
📒 Files selected for processing (2)
bin/lib/preflight.jstest/preflight.test.js
🚧 Files skipped from review as they are similar to previous changes (1)
- bin/lib/preflight.js
Add checkPortAvailable() to preflight checks. Uses lsof (primary) to identify the blocking process name and PID, with a Node.js net probe fallback for cross-platform EADDRINUSE detection. Integrated into onboard step 1 before gateway operations so users get a clear error instead of a dead dashboard. Closes #18
Non-root lsof cannot see root-owned listeners (/proc/[pid]/fd returns EACCES). Empty lsof output is not authoritative — fall through to the net probe whose bind() checks the kernel socket table directly. Added test covering the exact scenario: lsof empty + port occupied = net probe catches EADDRINUSE.
9ee0d70 to
027eddf
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/preflight.test.js (1)
15-21: Consider extracting a shared free-port helper to reduce repetition.The same bind/release sequence is repeated three times; a small helper would simplify maintenance and keep setup behavior consistent across tests.
♻️ Optional refactor sketch
const net = require("net"); const { checkPortAvailable } = require("../bin/lib/preflight"); +async function getFreePort() { + const srv = net.createServer(); + return await new Promise((resolve, reject) => { + srv.once("error", reject); + srv.listen(0, "127.0.0.1", () => { + const { port } = srv.address(); + srv.close((err) => (err ? reject(err) : resolve(port))); + }); + }); +} + describe("checkPortAvailable", () => { it("falls through to net probe when lsof output is empty", async () => { @@ - const freePort = await new Promise((resolve) => { - const srv = net.createServer(); - srv.listen(0, "127.0.0.1", () => { - const port = srv.address().port; - srv.close(() => resolve(port)); - }); - }); + const freePort = await getFreePort(); @@ - const freePort = await new Promise((resolve) => { - const srv = net.createServer(); - srv.listen(0, "127.0.0.1", () => { - const port = srv.address().port; - srv.close(() => resolve(port)); - }); - }); + const freePort = await getFreePort(); @@ - const freePort = await new Promise((resolve) => { - const srv = net.createServer(); - srv.listen(0, "127.0.0.1", () => { - const port = srv.address().port; - srv.close(() => resolve(port)); - }); - }); + const freePort = await getFreePort();Also applies to: 69-75, 96-102
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/preflight.test.js` around lines 15 - 21, Extract the repeated bind/release sequence into a single async helper (e.g., async function getFreePort() or acquireFreePort()) that creates a net.createServer(), listens on port 0 at "127.0.0.1", reads srv.address().port, closes the server and returns the port; then replace each inline Promise block that assigns freePort with calls to this helper in test/preflight.test.js (where the current blocks around the freePort assignments are repeated) to keep setup consistent and DRY.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/preflight.test.js`:
- Around line 15-21: Extract the repeated bind/release sequence into a single
async helper (e.g., async function getFreePort() or acquireFreePort()) that
creates a net.createServer(), listens on port 0 at "127.0.0.1", reads
srv.address().port, closes the server and returns the port; then replace each
inline Promise block that assigns freePort with calls to this helper in
test/preflight.test.js (where the current blocks around the freePort assignments
are repeated) to keep setup consistent and DRY.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 5dbdb7fd-60fa-477c-b123-bfc14be9830d
📒 Files selected for processing (3)
bin/lib/onboard.jsbin/lib/preflight.jstest/preflight.test.js
🚧 Files skipped from review as they are similar to previous changes (2)
- bin/lib/preflight.js
- bin/lib/onboard.js
Check both required ports (8080 for OpenShell gateway, 18789 for dashboard) during preflight instead of only 18789. Uses the same lsof + net-probe detection chain with per-port error messaging.
ericksoa
left a comment
There was a problem hiding this comment.
LGTM — solid architecture with lsof + net probe fallback, good test coverage. Tested: port conflict correctly detected and blocks onboard, clean ports pass preflight.
…#251) * preflight: detect gateway port 18789 conflicts during onboard Add checkPortAvailable() to preflight checks. Uses lsof (primary) to identify the blocking process name and PID, with a Node.js net probe fallback for cross-platform EADDRINUSE detection. Integrated into onboard step 1 before gateway operations so users get a clear error instead of a dead dashboard. Closes NVIDIA#18 * preflight: fall through to net probe when lsof returns empty Non-root lsof cannot see root-owned listeners (/proc/[pid]/fd returns EACCES). Empty lsof output is not authoritative — fall through to the net probe whose bind() checks the kernel socket table directly. Added test covering the exact scenario: lsof empty + port occupied = net probe catches EADDRINUSE. * test: await srv.close() in finally blocks to prevent handle leakage * preflight: also check gateway port 8080 before onboard Check both required ports (8080 for OpenShell gateway, 18789 for dashboard) during preflight instead of only 18789. Uses the same lsof + net-probe detection chain with per-port error messaging.
…#251) * preflight: detect gateway port 18789 conflicts during onboard Add checkPortAvailable() to preflight checks. Uses lsof (primary) to identify the blocking process name and PID, with a Node.js net probe fallback for cross-platform EADDRINUSE detection. Integrated into onboard step 1 before gateway operations so users get a clear error instead of a dead dashboard. Closes NVIDIA#18 * preflight: fall through to net probe when lsof returns empty Non-root lsof cannot see root-owned listeners (/proc/[pid]/fd returns EACCES). Empty lsof output is not authoritative — fall through to the net probe whose bind() checks the kernel socket table directly. Added test covering the exact scenario: lsof empty + port occupied = net probe catches EADDRINUSE. * test: await srv.close() in finally blocks to prevent handle leakage * preflight: also check gateway port 8080 before onboard Check both required ports (8080 for OpenShell gateway, 18789 for dashboard) during preflight instead of only 18789. Uses the same lsof + net-probe detection chain with per-port error messaging.
Summary
checkPortAvailable()tobin/lib/preflight.js— useslsof(primary) to identify blocking process name + PID, with a Node.jsnetprobe fallback for cross-platformEADDRINUSEdetectionCloses #18 — detect host
openclaw-gateway.serviceport conflicts and fail loudly before onboarding continuesAddresses #51 —
install.shassumes port 8080 isn't already in useProblem
nemoclaw onboardstarts the gateway on port 8080 and the dashboard on port 18789. If something else (e.g., a leftoveropenclaw-gateway.service, another container, or a dev server) is already listening, the gateway bind fails silently and the user ends up with a non-functional dashboard.Detection chain
lsof -i :PORT -sTCP:LISTEN -P -n— works on Linux + macOS, returns process name + PID. Uses-sTCP:LISTENto avoid false positives from TIME_WAIT connections.net.createServer().listen(port)— cross-platform, detects EADDRINUSE but can't identify the process. Catches cases where non-root users can't see root-owned listeners via lsof.lsofis not installed, falls through to the net probe automaticallyTest plan
Automated Tests
Manual verification
Closes #51
Closes #18