fix: resolve silent output and remove nc dependency by Iamkewl · Pull Request #3669 · NVIDIA/NemoClaw

Iamkewl · 2026-05-17T22:40:12Z

Summary

Replaces the platform-dependent nc readiness probe with an inline Node TCP probe, aligns timeouts, hardens tests to avoid CI flakes, and adds CI reliability fixes.

Related Issue

Fixes #3665

Changes

Replaced nc with Node-based TCP probe in src/lib/core/wait.ts: runs via spawnSync(process.execPath, ["-e", probeScript, port]), aligns socket.setTimeout with pollIntervalMs, makes done() idempotent (removes listeners before close), prefers socket.end() and enforces a child timeout.
Made Ollama proxy startup timeout configurable via NEMOCLAW_OLLAMA_PROXY_STARTUP_TIMEOUT (default 10) and used it in src/lib/inference/ollama/proxy.ts; added to ci/env-var-doc-allowlist.json.
Tests & mocks: updated test/onboard-selection.test.ts mocks to accept process.execPath probe invocations; added loopback integration tests for waitForPort in test/wait.test.ts; changed test/skills-frontmatter.test.ts to fail unless NEMOCLAW_TEST_ALLOW_MISSING_GENERATED_SKILLS=1.
CI: added retry for hadolint download in .github/actions/basic-checks/action.yaml to reduce transient failures.
Misc: hardened scripts/find-source-shape-tests.ts to ignore ENOENT/EACCES while walking directories.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
make docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Your Name your-email@example.com

Summary by CodeRabbit

New Features
- Ollama auth-proxy startup timeout is now configurable via environment variable for slow environments.
Bug Fixes
- Improved CI robustness with retry logic and checksum validation for dependency downloads.
- Enhanced error handling for filesystem traversal operations.
- Port reachability checks now use native Node.js for better reliability.
Tests
- Expanded test coverage for port availability checks and conditional test execution based on environment configuration.

copy-pr-bot · 2026-05-17T22:40:15Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-05-17T22:40:22Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds Node type refs and guarded filesystem traversal, guards skills frontmatter tests when generated skills are absent, replaces an external nc port probe with an inline Node net probe invoked via process.execPath, makes the Ollama proxy startup timeout configurable, updates tests to match the new probe, and adds hadolint download retries.

Changes

Error Resilience Improvements

Layer / File(s)	Summary
Node types and guarded directory traversal `scripts/find-source-shape-tests.ts`	Added `/// <reference types="node" />` and wrapped `readdirSync`/`statSync` calls in try/catch to return early on unreadable directories and skip entries when `stat` fails with `ENOENT`/`EACCES`.
Test suite existence guard `test/skills-frontmatter.test.ts`	Added `fs.existsSync(skillsRoot)` check that registers a single placeholder assertion and returns early when generated skills are missing, gated by `NEMOCLAW_TEST_ALLOW_MISSING_GENERATED_SKILLS`.
Replace nc-based port probe with Node net probe `src/lib/core/wait.ts`	`waitForPort` now runs a small Node `net` probe via `process.execPath` and uses the probe exit status to determine TCP reachability.
Test coverage and spawnSync mocks update `test/wait.test.ts`, `test/onboard-selection.test.ts`	Added network-based waitForPort tests with ephemeral servers; updated embedded `spawnSync` mocks in onboard tests to treat probes run via `process.execPath` as successful alongside `nc`.
Startup timeout env and CI retry `src/lib/inference/ollama/proxy.ts`, `ci/env-var-doc-allowlist.json`, `.github/actions/basic-checks/action.yaml`	`startOllamaAuthProxy()` reads `NEMOCLAW_OLLAMA_PROXY_STARTUP_TIMEOUT` (default 10s) for readiness checks; CI hadolint download now retries up to 3 times with SHA256 validation; allowlist documents the new env var.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

NVIDIA/NemoClaw#3444: Related changes touching src/lib/inference/ollama/proxy.ts and proxy health probing.
NVIDIA/NemoClaw#3636: Related edits to scripts/find-source-shape-tests.ts.
NVIDIA/NemoClaw#2442: Prior changes to waitForPort readiness-probe behavior.

Suggested labels

configuration, enhancement: testing, Platform: All, v0.0.44

Suggested reviewers

jyaunches
prekshivyas
cjagwani

Poem

🐰 I hop through folders, stat and try,
If a port won't answer, I spawn Node and pry.
Tests that miss generated things politely pause,
CI retries dance to checksum laws.
A little rabbit cheered the resilient cause.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'fix: resolve silent output and remove nc dependency' directly addresses the main technical change: replacing the nc (netcat) command-line tool with a native Node.js TCP probe.
Linked Issues check	✅ Passed	The PR code changes comprehensively address the linked issue `#3665` objectives: it removes the platform-dependent nc dependency with native Node TCP probing, makes Ollama proxy startup timeout configurable for slow/WSL2 environments, hardens directory walking to handle filesystem errors, and adds integration tests for the wait functionality.
Out of Scope Changes check	✅ Passed	All code changes are directly scoped to fixing issue `#3665`: replacing nc with Node TCP probes, making timeout configurable, hardening error handling in file walking, updating corresponding tests, and improving CI download reliability.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

test/skills-frontmatter.test.ts (1)

48-53: ⚡ Quick win

Remove the duplicated skillsRoot existence guard.

This block is a duplicate of Lines 41-46 and is redundant/unreachable in practice.

Proposed cleanup

-  if (!fs.existsSync(skillsRoot)) {
-    it("no generated skills present (CI) — skipping", () => {
-      expect(true).toBe(true);
-    });
-    return;
-  }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/skills-frontmatter.test.ts` around lines 48 - 53, The duplicate
existence guard for skillsRoot (the block starting with if
(!fs.existsSync(skillsRoot)) { it("no generated skills present (CI) — skipping",
...); return; }) should be removed because it repeats the earlier guard; delete
this second block so only the initial check for skillsRoot remains in the test
file (keep the first occurrence that guards the test suite and remove the
redundant copy to avoid unreachable/duplicated logic).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/skills-frontmatter.test.ts`:
- Around line 48-53: The duplicate existence guard for skillsRoot (the block
starting with if (!fs.existsSync(skillsRoot)) { it("no generated skills present
(CI) — skipping", ...); return; }) should be removed because it repeats the
earlier guard; delete this second block so only the initial check for skillsRoot
remains in the test file (keep the first occurrence that guards the test suite
and remove the redundant copy to avoid unreachable/duplicated logic).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 829d456c-baa2-4c7c-8cf7-8519bd4d2aa9

📥 Commits

Reviewing files that changed from the base of the PR and between 647566a and efeb4b2.

📒 Files selected for processing (2)

scripts/find-source-shape-tests.ts
test/skills-frontmatter.test.ts

coderabbitai

🧹 Nitpick comments (1)

src/lib/core/wait.ts (1)
52-55: 💤 Low value

Consider aligning socket timeout with the polling interval.

The socket timeout is hardcoded to 1000ms, but the probe is invoked every 200ms (line 64). Each probe attempt can block for up to 1 second, which reduces the effective polling frequency from the expected 5 Hz down to ~1 Hz. This means fewer probe attempts occur within the overall timeout window.

For tighter polling behavior, consider reducing the socket timeout to 200-500ms to match the polling interval.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/core/wait.ts` around lines 52 - 55, The socket probe currently uses a
hardcoded 1000ms timeout (socket.setTimeout(1000)), which blocks each probe and
undermines the 200ms polling interval; change socket.setTimeout to match the
polling interval (e.g., 200–500ms) or use the existing poll interval
constant/variable instead of 1000 so socket.setTimeout(...) aligns with the
probe frequency; update the code around socket.setTimeout and its related
handlers (socket.once('connect'), socket.once('timeout'), socket.once('error'))
to use that interval.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/lib/core/wait.ts`:
- Around line 52-55: The socket probe currently uses a hardcoded 1000ms timeout
(socket.setTimeout(1000)), which blocks each probe and undermines the 200ms
polling interval; change socket.setTimeout to match the polling interval (e.g.,
200–500ms) or use the existing poll interval constant/variable instead of 1000
so socket.setTimeout(...) aligns with the probe frequency; update the code
around socket.setTimeout and its related handlers (socket.once('connect'),
socket.once('timeout'), socket.once('error')) to use that interval.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ab07f6d6-012c-433e-91fc-6cd9def6555e

📥 Commits

Reviewing files that changed from the base of the PR and between 0a48cc5 and 59e2d5a.

📒 Files selected for processing (1)

src/lib/core/wait.ts

Iamkewl · 2026-05-18T13:50:27Z

@coderabbitai review

coderabbitai · 2026-05-18T13:50:32Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/actions/basic-checks/action.yaml:
- Around line 22-27: The checksum verification can silently succeed when both
EXPECTED and ACTUAL are empty; update the hadolint download loop to set and
check an explicit verified flag (e.g., verified=1) only when both EXPECTED and
ACTUAL are populated and match, and after the loop replace the current guard
that compares EXPECTED and ACTUAL with a conditional that fails if verified is
not set or if EXPECTED is empty (use [ -n "$EXPECTED" ] and the verified flag)
so the script errors out on missing checksum or failed verification; adjust the
loop body that currently computes EXPECTED and ACTUAL to assign those variables
conditionally and set verified when the hash check passes.

In `@src/lib/inference/ollama/proxy.ts`:
- Around line 213-214: The parsed proxyStartupTimeout can be NaN if
process.env.NEMOCLAW_OLLAMA_PROXY_STARTUP_TIMEOUT is non-numeric, causing
waitForPort(OLLAMA_PROXY_PORT, proxyStartupTimeout) to behave incorrectly;
update the code around Number.parseInt(...) to validate the parsed value (use
Number.parseInt(..., 10) result checked with Number.isFinite/Number.isNaN or
isNaN) and fall back to the default 10 (or clamp to a positive integer) when the
env value is invalid or <= 0 before calling waitForPort; ensure you keep the
existing symbols proxyStartupTimeout, OLLAMA_PROXY_PORT and waitForPort when
making the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: c1dd8a3d-a52d-40bb-aa93-393505d1086b

📥 Commits

Reviewing files that changed from the base of the PR and between 0a48cc5 and 383b070.

📒 Files selected for processing (6)

.github/actions/basic-checks/action.yaml
ci/env-var-doc-allowlist.json
scripts/find-source-shape-tests.ts
src/lib/core/wait.ts
src/lib/inference/ollama/proxy.ts
test/onboard-selection.test.ts

✅ Files skipped from review due to trivial changes (1)

ci/env-var-doc-allowlist.json

🚧 Files skipped from review as they are similar to previous changes (2)

src/lib/core/wait.ts
scripts/find-source-shape-tests.ts

wscurran · 2026-05-18T14:44:28Z

✨ Thanks for submitting this detailed PR to fix the silent output issue and remove the dependency on the nc utility. This change aims to improve the reliability of the CLI by ensuring stdio is flushed and implementing a native Node.js TCP probe.

Related open issues:

#3665 Nemoclaw onboarding - gateway connection issue

Iamkewl · 2026-05-18T15:55:37Z

@coderabbitai review

coderabbitai · 2026-05-18T15:55:42Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai Bot reviewed May 17, 2026

View reviewed changes

Iamkewl mentioned this pull request May 17, 2026

Nemoclaw onboarding - gateway connection issue #3665

Open

2 tasks

Iamkewl force-pushed the fix/issue-3665-pr branch from d8085d2 to 0a48cc5 Compare May 18, 2026 06:02

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Iamkewl marked this pull request as draft May 18, 2026 13:36

Iamkewl marked this pull request as ready for review May 18, 2026 13:49

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

Comment thread .github/actions/basic-checks/action.yaml

Comment thread src/lib/inference/ollama/proxy.ts Outdated

Iamkewl marked this pull request as draft May 18, 2026 13:59

wscurran added fix NemoClaw CLI Use this label to identify issues with the NemoClaw command-line interface (CLI). labels May 18, 2026

Iamkewl marked this pull request as ready for review May 18, 2026 15:13

fix: resolve silent output and remove nc dependency

37f8eaa

Iamkewl force-pushed the fix/issue-3665-pr branch from 0ac306d to 37f8eaa Compare May 20, 2026 22:12

github-actions Bot mentioned this pull request May 21, 2026

fix(installer): preserve npm lockfiles during install #3840

Closed

Iamkewl mentioned this pull request May 21, 2026

fix(ci): use Node TCP probe for waitForPort; increase Ollama proxy wait timeout Iamkewl/NemoClaw#7

Open

github-actions Bot mentioned this pull request May 22, 2026

fix(installer): preserve npm lockfiles during install #4029

Open

Conversation

Iamkewl commented May 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented May 17, 2026

Uh oh!

coderabbitai Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Iamkewl commented May 18, 2026

Uh oh!

coderabbitai Bot commented May 18, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wscurran commented May 18, 2026

Uh oh!

Iamkewl commented May 18, 2026

Uh oh!

coderabbitai Bot commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Iamkewl commented May 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 17, 2026 •

edited

Loading