refactor(onboard): extract modules from onboard.ts (Steps 1-3, rebased) by jyaunches · Pull Request #2489 · NVIDIA/NemoClaw

jyaunches · 2026-04-26T20:27:17Z

Summary

Rebased onto current main (post-#2441 merge). Supersedes #2087 — same work, fresh branch to avoid force-push restriction.

Steps 1–3 of the original extraction plan:

Extract onboard-providers.ts — provider metadata, CRUD, labels, inference config
Extract onboard-ollama-proxy.ts — proxy lifecycle, model pull/validate
Extract onboard-inference-probes.ts — endpoint validation probes (OpenAI, Anthropic, Gemini)

Reduces onboard.ts by ~1,000 lines. All 3 new modules are under 400 lines. No behavior changes — all existing exports remain accessible via re-exports and thin wrappers.

Step 4 (dashboard extraction) was dropped — main already landed dashboard-contract.ts, dashboard-health.ts, and dashboard-recover.ts (#2398) which supersede that extraction.

Steps 5–10 (gateway, preflight, messaging, policies, inference-setup, sandbox) remain deferred per the original plan.

Downstream: #2306

#2306 (extract rebuild/recreate path + canonicalize credential resolution) builds directly on top of this PR. It imports REMOTE_PROVIDER_CONFIG, upsertProvider, and ensureOllamaAuthProxy from the new extracted modules via onboard.ts re-exports. Merging this PR first gives #2306 a clean module surface to build on.

Rebase notes

Ported all main-side refinements into the extracted modules (e.g., startOllamaAuthProxy now returns boolean, getProbeAuthMode always returns undefined)
Added type annotations for the CJS require() imports to satisfy strict TypeScript

Test plan

npx tsc -p tsconfig.src.json --noEmit compiles cleanly
Affected tests pass: gemini-probe-auth, credential-exposure, wsl2-probe-timeout
Full npm test (CI)
Manual smoke test: nemoclaw onboard completes successfully

Co-authored-by: Aaron Erickson aerickson@nvidia.com

Summary by CodeRabbit

New Features
- Enhanced inference endpoint validation during provider and model selection, with improved error handling for provider-specific failures.
- Expanded Ollama integration with improved model selection, proxy management, and model preparation capabilities.
- Refined provider configuration and setup handling across multiple endpoint types.
Tests
- Updated test coverage to reflect internal code organization changes while maintaining validation of credential handling and endpoint probing behavior.

Move provider metadata constants (REMOTE_PROVIDER_CONFIG, endpoint URLs, LOCAL_INFERENCE_PROVIDERS, DISCORD_SNOWFLAKE_RE), lookup helpers (getEffectiveProviderName, getNonInteractive*, getRequested*Hint), gateway CRUD (upsertProvider, providerExistsInGateway, upsertMessagingProviders, buildProviderArgs), and getSandboxInferenceConfig into a dedicated onboard-providers module. Add getProviderLabel() to consolidate the scattered if/else provider name→label chains (used in printDashboard). Functions that need runOpenshell accept it as a last parameter (dependency injection) to avoid circular requires. onboard.ts retains thin wrappers with the original signatures. Update source-scanning tests to read onboard-providers.ts where the code now lives.

Move the entire Ollama auth-proxy subsystem into a dedicated module: token/PID persistence, proxy spawn/kill lifecycle, model prompt/pull, and the exposure warning. Zero coupling to other onboard logic — all dependencies (runner, ports, local-inference, credentials, model-prompts) are imported directly.

Move inference endpoint probe functions (parseJsonObject, hasResponsesToolCall, shouldRequireResponsesToolCalling, getProbeAuthMode, getValidationProbeCurlArgs, probeResponsesToolCalling, probeOpenAiLikeEndpoint, probeAnthropicEndpoint) into a dedicated module. All dependencies (http-probe, credentials, platform, validation) are imported directly — no circular deps. The 4 validate* wrapper functions remain in onboard.ts for now as they depend on promptValidationRecovery; they'll consolidate when setupNim moves in a later step. Update source-scanning tests to read from the new compiled file.

coderabbitai · 2026-04-26T20:27:27Z

📝 Walkthrough

Walkthrough

The pull request modularizes a large onboarding file (src/lib/onboard.ts) by extracting domain-focused helpers into three new modules: inference endpoint probing, Ollama proxy management, and provider configuration. The original file delegates to these modules while maintaining backward compatibility. A refactoring plan document specifies the staged extraction approach.

Changes

Cohort / File(s)	Summary
Refactoring Plan Documentation `.claude/refactor-onboard-plan.md`	New documentation specifying an incremental refactor strategy to break up `onboard.ts` into ~11 domain-focused modules, including operational constraints (no behavior changes, backward compatibility via re-exports, staged commits, verification checklist).
Inference Probe Module `src/lib/onboard-inference-probes.ts`	New module providing endpoint validation utilities: JSON parsing, Responses tool-call detection, probes for OpenAI-like and Anthropic endpoints with retry logic on timeout/connection errors, WSL-specific curl timing, and NVCF error detection.
Ollama Proxy Module `src/lib/onboard-ollama-proxy.ts`	New module managing Ollama auth-proxy lifecycle (token generation, PID persistence, process spawning/cleanup), model selection via prompts, and model preparation (pull, warmup, validation with 10-minute timeout).
Provider Configuration Module `src/lib/onboard-providers.ts`	New module defining endpoint constants, provider metadata, and helpers: provider label/name resolution, provider CRUD via gateway, messaging-provider upserts, non-interactive provider/model selection, and sandbox inference routing configuration.
Refactored Main Module `src/lib/onboard.ts`	Delegates inference probes, Ollama proxy management, and provider configuration to new modules; removes ~880 lines of extracted code while maintaining existing public exports and backward compatibility.
Test Updates `test/credential-exposure.test.ts`, `test/onboard.test.ts`, `test/wsl2-probe-timeout.test.ts`	Updated test source file references and assertion patterns to match refactored module structure; maintains existing validation logic.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 The warren grows orderly, modules align,
Breaking monoliths into tasks sublime,
Ollama hops freely, probes take their place,
Providers now flourish in structured space! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 52.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main refactoring effort: extracting three modules (providers, ollama-proxy, inference-probes) from onboard.ts as Steps 1-3 of a structured plan. It is specific, concise, and clearly communicates the primary change.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch refactor/onboard-extract-modules-v2

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (1)

test/onboard.test.ts (1)

2371-2371: Tighten the existence-check regex to assert injected runner usage.

The current pattern is broad and can pass without validating the DI call shape introduced by the refactor. Consider matching both parameters explicitly.

♻️ Suggested assertion tightening

-    assert.match(source, /providerExistsInGateway\(name/);
+    assert.match(source, /providerExistsInGateway\(name,\s*_runOpenshell\)/);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/onboard.test.ts` at line 2371, The test's regex assert.match(source,
/providerExistsInGateway\(name/) is too loose; update it to assert the
injected-runner call shape by matching both parameters, e.g., require the
function call to include the second parameter (runner) by changing the pattern
to something like /providerExistsInGateway\(name,\s*runner\)/ so the assertion
verifies the DI call form introduced by the refactor (look for
providerExistsInGateway and runner in the same call).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/refactor-onboard-plan.md:
- Around line 101-113: Step 4 should be removed or rewritten because the
dashboard extraction was already completed in PR `#2398`; update the plan to
delete the entire "Step 4: Extract `onboard-dashboard.ts`" section (including
all listed items: CONTROL_UI_PORT, ensureDashboardForward(),
findOpenclawJsonPath(), fetchGatewayAuthTokenFromSandbox(),
getDashboardForwardPort(), getDashboardForwardTarget(),
getDashboardForwardStartCommand(), buildAuthenticatedDashboardUrl(),
getWslHostAddress(), getDashboardAccessInfo(), getDashboardGuidanceLines(), and
printDashboard()) or rewrite it to reference the landed changes and point to the
existing landed artifacts (use getProviderLabel() as noted) so the plan no
longer schedules already-completed work.
- Around line 1-2: Add the required SPDX HTML header to the top of the Markdown
file by inserting two HTML comment lines: one with the exact copyright text
"SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All
rights reserved." and one with the license identifier "SPDX-License-Identifier:
Apache-2.0"; place them as HTML comments (<!-- ... -->) at the very beginning of
.claude/refactor-onboard-plan.md so the file contains the repository-required
SPDX header.

In `@src/lib/onboard-inference-probes.ts`:
- Around line 135-136: The function probeOpenAiLikeEndpoint currently disables
the complexity guard; instead refactor it by extracting major branches into
smaller helpers (e.g., create helper functions such as attemptProbeForRetry,
handleStreamResponse, parseProbeError, and buildProbeOptions) and move the retry
loop, streaming response handling, and error-handling branches into those
helpers so probeOpenAiLikeEndpoint becomes a thin coordinator calling them;
remove the eslint-disable-next-line complexity comment and ensure the top-level
probeOpenAiLikeEndpoint is under the cyclomatic complexity limit by delegating
logic to these named functions (referencing probeOpenAiLikeEndpoint,
handleStreamResponse, parseProbeError, and attemptProbeForRetry to locate and
split the code).
- Around line 327-344: The Anthropic probe (probeAnthropicEndpoint) still uses
getCurlTimingArgs(), which allows long blocking validation; replace that call
with getValidationProbeCurlArgs() so the runCurlProbe invocation for Anthropic
uses the short validation timeouts (keep other args and the endpoint
construction the same) and ensure you update any imports/usage so
probeAnthropicEndpoint calls getValidationProbeCurlArgs() instead of
getCurlTimingArgs().
- Around line 272-290: The retry path after isTimeoutOrConnFailure currently
hardcodes Bearer header auth and a fixed "/chat/completions" path, so it ignores
options.authMode and will fail when authMode === "query-param"; update the retry
construction in the block that sets retriedAfterTimeout so it reuses the same
auth handling and endpoint formatting as the main probe: call
getValidationProbeCurlArgs() and then apply the same logic that conditionally
adds the Authorization header versus query-param (using apiKey and
normalizeCredentialValue(apiKey) and options.authMode), and build the request
URL the same way as the original probe (preserving any path adjustments on
endpointUrl) before calling runCurlProbe so the retry mirrors the main probe
behavior.

In `@src/lib/onboard-ollama-proxy.ts`:
- Around line 172-188: ensureOllamaAuthProxy currently respawns the proxy
without verifying it started; change the restart path so you only treat it as
recovered after confirming a new proxy process is running. Modify
spawnOllamaAuthProxy to return the child PID (or have it expose the PID), then
in ensureOllamaAuthProxy call spawnOllamaAuthProxy(token) and poll
isOllamaProxyProcess(newPid) with a short retry loop and timeout (using sleep)
before setting ollamaProxyToken; if the check fails, do not set
ollamaProxyToken, log the failure, and leave error handling or retries to the
caller. Ensure you still call killStaleProxy() before spawning and use
loadPersistedProxyPid()/loadPersistedProxyToken() to locate and validate prior
state.

In `@src/lib/onboard.ts`:
- Around line 90-110: The destructure of onboardProviders is missing the
extracted endpoint URL constants used by setupNim(), causing unresolved
identifiers; update the destructure to also pull in OPENAI_ENDPOINT_URL and
ANTHROPIC_ENDPOINT_URL (or any other extracted endpoint URL constants your
refactor introduced) so that setupNim() can reference them (look for the
destructure block and the setupNim function to add these symbols to the existing
list alongside REMOTE_PROVIDER_CONFIG, LOCAL_INFERENCE_PROVIDERS, etc.).

---

Nitpick comments:
In `@test/onboard.test.ts`:
- Line 2371: The test's regex assert.match(source,
/providerExistsInGateway\(name/) is too loose; update it to assert the
injected-runner call shape by matching both parameters, e.g., require the
function call to include the second parameter (runner) by changing the pattern
to something like /providerExistsInGateway\(name,\s*runner\)/ so the assertion
verifies the DI call form introduced by the refactor (look for
providerExistsInGateway and runner in the same call).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2ba835a1-84fc-4ee7-bd3f-fae48484c833

📥 Commits

Reviewing files that changed from the base of the PR and between f439189 and 4bf6f38.

📒 Files selected for processing (8)

.claude/refactor-onboard-plan.md
src/lib/onboard-inference-probes.ts
src/lib/onboard-ollama-proxy.ts
src/lib/onboard-providers.ts
src/lib/onboard.ts
test/credential-exposure.test.ts
test/onboard.test.ts
test/wsl2-probe-timeout.test.ts

coderabbitai · 2026-04-26T20:32:33Z

+# Refactor `onboard.ts` (6,382 lines) into ~11 focused modules
+


⚠️ Potential issue | 🟠 Major

Add the required SPDX HTML header.

This new Markdown file is missing the repository-required SPDX header.

Proposed fix

+ + + # Refactor `onboard.ts` (6,382 lines) into ~11 focused modules

As per coding guidelines, **/*.{js,mjs,ts,tsx,sh,md}: Every source file must include an SPDX license header: // SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. and // SPDX-License-Identifier: Apache-2.0. Use # comments for shell scripts and HTML comments for Markdown.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Refactor `onboard.ts` (6,382 lines) into ~11 focused modules





# Refactor `onboard.ts` (6,382 lines) into ~11 focused modules

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.claude/refactor-onboard-plan.md around lines 1 - 2, Add the required SPDX HTML header to the top of the Markdown file by inserting two HTML comment lines: one with the exact copyright text "SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved." and one with the license identifier "SPDX-License-Identifier: Apache-2.0"; place them as HTML comments () at the very beginning of .claude/refactor-onboard-plan.md so the file contains the repository-required SPDX header.

coderabbitai · 2026-04-26T20:32:34Z

+### Step 4: Extract `onboard-dashboard.ts` (~200 lines)
+
+**Move from `onboard.ts`:**
+- `CONTROL_UI_PORT` (L5552)
+- `ensureDashboardForward()` (L5558-5578)
+- `findOpenclawJsonPath()` (L5580-5593)
+- `fetchGatewayAuthTokenFromSandbox()` (L5599-5622)
+- `getDashboardForwardPort()` / `getDashboardForwardTarget()` / `getDashboardForwardStartCommand()` (L5626-5651)
+- `buildAuthenticatedDashboardUrl()` (L5653-5656)
+- `getWslHostAddress()` (L5658-5672)
+- `getDashboardAccessInfo()` / `getDashboardGuidanceLines()` (L5674-5714)
+- `printDashboard()` (L5716-5790) — use `getProviderLabel()` from Step 1
+


⚠️ Potential issue | 🟡 Minor

Remove or rewrite Step 4 in this plan.

The PR objective says dashboard extraction was already dropped because those files landed in #2398, but this document still schedules that work as a future step. Merging it as-is will make the plan stale immediately.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.claude/refactor-onboard-plan.md around lines 101 - 113, Step 4 should be removed or rewritten because the dashboard extraction was already completed in PR `#2398`; update the plan to delete the entire "Step 4: Extract `onboard-dashboard.ts`" section (including all listed items: CONTROL_UI_PORT, ensureDashboardForward(), findOpenclawJsonPath(), fetchGatewayAuthTokenFromSandbox(), getDashboardForwardPort(), getDashboardForwardTarget(), getDashboardForwardStartCommand(), buildAuthenticatedDashboardUrl(), getWslHostAddress(), getDashboardAccessInfo(), getDashboardGuidanceLines(), and printDashboard()) or rewrite it to reference the landed changes and point to the existing landed artifacts (use getProviderLabel() as noted) so the plan no longer schedules already-completed work.

coderabbitai · 2026-04-26T20:32:34Z

+// eslint-disable-next-line complexity
+function probeOpenAiLikeEndpoint(endpointUrl, model, apiKey, options = {}) {


🛠️ Refactor suggestion | 🟠 Major

Don’t suppress the complexity guard here.

This extraction moved probe logic into its own module, but probeOpenAiLikeEndpoint() now bypasses the repo’s complexity limit instead of decomposing the retry/streaming/error-handling branches into helpers.

As per coding guidelines, {src,nemoclaw/src,scripts}/**/*.{js,ts,tsx}: TypeScript files must maintain cyclomatic complexity limit of 20, ratcheting down to 15. Enforced by ESLint.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard-inference-probes.ts` around lines 135 - 136, The function probeOpenAiLikeEndpoint currently disables the complexity guard; instead refactor it by extracting major branches into smaller helpers (e.g., create helper functions such as attemptProbeForRetry, handleStreamResponse, parseProbeError, and buildProbeOptions) and move the retry loop, streaming response handling, and error-handling branches into those helpers so probeOpenAiLikeEndpoint becomes a thin coordinator calling them; remove the eslint-disable-next-line complexity comment and ensure the top-level probeOpenAiLikeEndpoint is under the cyclomatic complexity limit by delegating logic to these named functions (referencing probeOpenAiLikeEndpoint, handleStreamResponse, parseProbeError, and attemptProbeForRetry to locate and split the code).

coderabbitai · 2026-04-26T20:32:34Z

+  if (failures.length > 0 && isTimeoutOrConnFailure(failures[0].curlStatus)) {
+    retriedAfterTimeout = true;
+    const baseArgs = getValidationProbeCurlArgs();
+    const doubledArgs = baseArgs.map((arg) =>
+      /^\d+$/.test(arg) ? String(Number(arg) * 2) : arg,
+    );
+    const retryResult = runCurlProbe([
+      "-sS",
+      ...doubledArgs,
+      "-H",
+      "Content-Type: application/json",
+      ...(apiKey ? ["-H", `Authorization: Bearer ${normalizeCredentialValue(apiKey)}`] : []),
+      "-d",
+      JSON.stringify({
+        model,
+        messages: [{ role: "user", content: "Reply with exactly: OK" }],
+      }),
+      `${String(endpointUrl).replace(/\/+$/, "")}/chat/completions`,
+    ]);


⚠️ Potential issue | 🟡 Minor

Keep the retry probe aligned with authMode.

The main probe path honors options.authMode === "query-param", but the retry hardcodes Bearer auth and a plain /chat/completions URL. If a caller selects query-param auth, the retry can never succeed even when the timeout was transient.

Proposed fix

const retryResult = runCurlProbe([ "-sS", ...doubledArgs, "-H", "Content-Type: application/json", - ...(apiKey ? ["-H", `Authorization: Bearer ${normalizeCredentialValue(apiKey)}`] : []), + ...authHeader, "-d", JSON.stringify({ model, messages: [{ role: "user", content: "Reply with exactly: OK" }], }), - `${String(endpointUrl).replace(/\/+$/, "")}/chat/completions`, + appendKey("/chat/completions"), ]);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard-inference-probes.ts` around lines 272 - 290, The retry path after isTimeoutOrConnFailure currently hardcodes Bearer header auth and a fixed "/chat/completions" path, so it ignores options.authMode and will fail when authMode === "query-param"; update the retry construction in the block that sets retriedAfterTimeout so it reuses the same auth handling and endpoint formatting as the main probe: call getValidationProbeCurlArgs() and then apply the same logic that conditionally adds the Authorization header versus query-param (using apiKey and normalizeCredentialValue(apiKey) and options.authMode), and build the request URL the same way as the original probe (preserving any path adjustments on endpointUrl) before calling runCurlProbe so the retry mirrors the main probe behavior.

coderabbitai · 2026-04-26T20:32:34Z

+function probeAnthropicEndpoint(endpointUrl, model, apiKey) {
+  const result = runCurlProbe([
+    "-sS",
+    ...getCurlTimingArgs(),
+    "-H",
+    `x-api-key: ${normalizeCredentialValue(apiKey)}`,
+    "-H",
+    "anthropic-version: 2023-06-01",
+    "-H",
+    "content-type: application/json",
+    "-d",
+    JSON.stringify({
+      model,
+      max_tokens: 16,
+      messages: [{ role: "user", content: "Reply with exactly: OK" }],
+    }),
+    `${String(endpointUrl).replace(/\/+$/, "")}/v1/messages`,
+  ]);


⚠️ Potential issue | 🟠 Major

Use the short validation timeouts for the Anthropic probe too.

This path still calls getCurlTimingArgs(), so Anthropic validation can block much longer than the OpenAI-like probes. That undercuts the new “validation must not hang the wizard” behavior you just centralized in getValidationProbeCurlArgs().

Proposed fix

const result = runCurlProbe([ "-sS", - ...getCurlTimingArgs(), + ...getValidationProbeCurlArgs(), "-H", `x-api-key: ${normalizeCredentialValue(apiKey)}`,

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

function probeAnthropicEndpoint(endpointUrl, model, apiKey) {

const result = runCurlProbe([

"-sS",

...getCurlTimingArgs(),

"-H",

`x-api-key: ${normalizeCredentialValue(apiKey)}`,

"-H",

"anthropic-version: 2023-06-01",

"-H",

"content-type: application/json",

"-d",

JSON.stringify({

model,

max_tokens: 16,

messages: [{ role: "user", content: "Reply with exactly: OK" }],

}),

`${String(endpointUrl).replace(/\/+$/, "")}/v1/messages`,

]);

function probeAnthropicEndpoint(endpointUrl, model, apiKey) {

const result = runCurlProbe([

"-sS",

...getValidationProbeCurlArgs(),

"-H",

`x-api-key: ${normalizeCredentialValue(apiKey)}`,

"-H",

"anthropic-version: 2023-06-01",

"-H",

"content-type: application/json",

"-d",

JSON.stringify({

model,

max_tokens: 16,

messages: [{ role: "user", content: "Reply with exactly: OK" }],

}),

`${String(endpointUrl).replace(/\/+$/, "")}/v1/messages`,

]);

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard-inference-probes.ts` around lines 327 - 344, The Anthropic probe (probeAnthropicEndpoint) still uses getCurlTimingArgs(), which allows long blocking validation; replace that call with getValidationProbeCurlArgs() so the runCurlProbe invocation for Anthropic uses the short validation timeouts (keep other args and the endpoint construction the same) and ensure you update any imports/usage so probeAnthropicEndpoint calls getValidationProbeCurlArgs() instead of getCurlTimingArgs().

coderabbitai · 2026-04-26T20:32:34Z

+function ensureOllamaAuthProxy(): void {
+  // Try to load persisted token first — if none, this isn't an Ollama setup.
+  const token = loadPersistedProxyToken();
+  if (!token) return;
+
+  const pid = loadPersistedProxyPid();
+  if (isOllamaProxyProcess(pid)) {
+    ollamaProxyToken = token;
+    return;
+  }
+
+  // Proxy not running — restart it with the persisted token.
+  killStaleProxy();
+  ollamaProxyToken = token;
+  spawnOllamaAuthProxy(token);
+  sleep(1);
+}


⚠️ Potential issue | 🟠 Major

Verify the restart path before treating the proxy as recovered.

ensureOllamaAuthProxy() respawns the proxy but never checks whether the new process actually came up. On reconnect/resume, a failed restart now looks like success and the first real Ollama request fails much farther downstream.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard-ollama-proxy.ts` around lines 172 - 188, ensureOllamaAuthProxy currently respawns the proxy without verifying it started; change the restart path so you only treat it as recovered after confirming a new proxy process is running. Modify spawnOllamaAuthProxy to return the child PID (or have it expose the PID), then in ensureOllamaAuthProxy call spawnOllamaAuthProxy(token) and poll isOllamaProxyProcess(newPid) with a short retry loop and timeout (using sleep) before setting ollamaProxyToken; if the check fails, do not set ollamaProxyToken, log the failure, and leave error handling or retries to the caller. Ensure you still call killStaleProxy() before spawning and use loadPersistedProxyPid()/loadPersistedProxyToken() to locate and validate prior state.

coderabbitai · 2026-04-26T20:32:34Z

+const {
+  REMOTE_PROVIDER_CONFIG,
+  LOCAL_INFERENCE_PROVIDERS,
+  DISCORD_SNOWFLAKE_RE,
+  getProviderLabel,
+  getEffectiveProviderName,
+  getNonInteractiveProvider,
+  getNonInteractiveModel,
+  getSandboxInferenceConfig,
+} = onboardProviders as {
+  REMOTE_PROVIDER_CONFIG: Record<string, RemoteProviderConfigEntry>;
+  LOCAL_INFERENCE_PROVIDERS: string[];
+  DISCORD_SNOWFLAKE_RE: RegExp;
+  getProviderLabel: (key: string) => string;
+  getEffectiveProviderName: (key: string | null | undefined) => string | null;
+  getNonInteractiveProvider: () => string | null;
+  getNonInteractiveModel: (providerKey: string) => string | null;
+  getSandboxInferenceConfig: (model: string, provider?: string | null, preferredInferenceApi?: string | null) => {
+    providerKey: string; primaryModelRef: string; inferenceBaseUrl: string; inferenceApi: string; inferenceCompat: LooseObject | null;
+  };
+};


⚠️ Potential issue | 🔴 Critical

Import the extracted endpoint URL constants too.

setupNim() still falls back to OPENAI_ENDPOINT_URL and ANTHROPIC_ENDPOINT_URL, but this destructure no longer brings those symbols into scope after the extraction. That leaves unresolved identifiers at Lines 4000, 4027, 4072, and 4096.

Proposed fix

const { REMOTE_PROVIDER_CONFIG, LOCAL_INFERENCE_PROVIDERS, DISCORD_SNOWFLAKE_RE, + OPENAI_ENDPOINT_URL, + ANTHROPIC_ENDPOINT_URL, getProviderLabel, getEffectiveProviderName, getNonInteractiveProvider, getNonInteractiveModel, getSandboxInferenceConfig, } = onboardProviders as { REMOTE_PROVIDER_CONFIG: Record<string, RemoteProviderConfigEntry>; LOCAL_INFERENCE_PROVIDERS: string[]; DISCORD_SNOWFLAKE_RE: RegExp; + OPENAI_ENDPOINT_URL: string; + ANTHROPIC_ENDPOINT_URL: string; getProviderLabel: (key: string) => string; getEffectiveProviderName: (key: string | null | undefined) => string | null; getNonInteractiveProvider: () => string | null; getNonInteractiveModel: (providerKey: string) => string | null; getSandboxInferenceConfig: (model: string, provider?: string | null, preferredInferenceApi?: string | null) => {

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const {

REMOTE_PROVIDER_CONFIG,

LOCAL_INFERENCE_PROVIDERS,

DISCORD_SNOWFLAKE_RE,

getProviderLabel,

getEffectiveProviderName,

getNonInteractiveProvider,

getNonInteractiveModel,

getSandboxInferenceConfig,

} = onboardProviders as {

REMOTE_PROVIDER_CONFIG: Record<string, RemoteProviderConfigEntry>;

LOCAL_INFERENCE_PROVIDERS: string[];

DISCORD_SNOWFLAKE_RE: RegExp;

getProviderLabel: (key: string) => string;

getEffectiveProviderName: (key: string | null | undefined) => string | null;

getNonInteractiveProvider: () => string | null;

getNonInteractiveModel: (providerKey: string) => string | null;

getSandboxInferenceConfig: (model: string, provider?: string | null, preferredInferenceApi?: string | null) => {

providerKey: string; primaryModelRef: string; inferenceBaseUrl: string; inferenceApi: string; inferenceCompat: LooseObject | null;

};

};

const {

REMOTE_PROVIDER_CONFIG,

LOCAL_INFERENCE_PROVIDERS,

DISCORD_SNOWFLAKE_RE,

OPENAI_ENDPOINT_URL,

ANTHROPIC_ENDPOINT_URL,

getProviderLabel,

getEffectiveProviderName,

getNonInteractiveProvider,

getNonInteractiveModel,

getSandboxInferenceConfig,

} = onboardProviders as {

REMOTE_PROVIDER_CONFIG: Record<string, RemoteProviderConfigEntry>;

LOCAL_INFERENCE_PROVIDERS: string[];

DISCORD_SNOWFLAKE_RE: RegExp;

OPENAI_ENDPOINT_URL: string;

ANTHROPIC_ENDPOINT_URL: string;

getProviderLabel: (key: string) => string;

getEffectiveProviderName: (key: string | null | undefined) => string | null;

getNonInteractiveProvider: () => string | null;

getNonInteractiveModel: (providerKey: string) => string | null;

getSandboxInferenceConfig: (model: string, provider?: string | null, preferredInferenceApi?: string | null) => {

providerKey: string; primaryModelRef: string; inferenceBaseUrl: string; inferenceApi: string; inferenceCompat: LooseObject | null;

};

};

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard.ts` around lines 90 - 110, The destructure of onboardProviders is missing the extracted endpoint URL constants used by setupNim(), causing unresolved identifiers; update the destructure to also pull in OPENAI_ENDPOINT_URL and ANTHROPIC_ENDPOINT_URL (or any other extracted endpoint URL constants your refactor introduced) so that setupNim() can reference them (look for the destructure block and the setupNim function to add these symbols to the existing list alongside REMOTE_PROVIDER_CONFIG, LOCAL_INFERENCE_PROVIDERS, etc.).

jyaunches · 2026-04-27T00:25:19Z

Superseded by next iteration — missing endpoint URL imports caused typecheck:cli failure.

@ericksoa

…d) (#2495) ## ⚡ Interim PR — Unblocks #2306 This is an **interim extraction PR** that lands the foundational module decomposition needed by #2306 (extract rebuild/recreate path + canonicalize credential resolution). It is intentionally scoped to pure code movement with no behavior changes. Rebased onto current main. Supersedes #2087 and #2489. --- ## Summary Steps 1–3 of the original extraction plan (authored by @ericksoa in #2087): - Extract `onboard-providers.ts` — provider metadata, CRUD, labels, inference config - Extract `onboard-ollama-proxy.ts` — proxy lifecycle, model pull/validate - Extract `onboard-inference-probes.ts` — endpoint validation probes (OpenAI, Anthropic, Gemini) Reduces `onboard.ts` by ~1,000 lines. All 3 new modules are under 400 lines. No behavior changes — all existing exports remain accessible via re-exports and thin wrappers. **Step 4 (dashboard extraction) was dropped** — main already landed `dashboard-contract.ts`, `dashboard-health.ts`, and `dashboard-recover.ts` (#2398) which supersede that extraction. Steps 5–10 (gateway, preflight, messaging, policies, inference-setup, sandbox) remain deferred per the original plan. ## Downstream: #2306 `#2306` (extract rebuild/recreate path + canonicalize credential resolution) builds directly on top of this PR. It imports `REMOTE_PROVIDER_CONFIG`, `upsertProvider`, and `ensureOllamaAuthProxy` from the new extracted modules via onboard.ts re-exports. Merging this PR first gives #2306 a clean module surface to build on. ## CodeRabbit comments All 6 comments addressed: - 1 resolved (deleted planning doc) - 4 are valid improvements on **pre-existing code** that was moved as-is — intentionally out of scope for a pure extraction PR, flagged as follow-up material - 1 is incorrect (`upsertMessagingProviders` was never exported on main) ## Rebase notes - Ported all main-side refinements into the extracted modules (e.g., `startOllamaAuthProxy` now returns `boolean`, `getProbeAuthMode` always returns `undefined`) - Added type annotations for the CJS `require()` imports to satisfy strict TypeScript - Re-imported endpoint URL constants that `onboard.ts` still references after extraction ## Test plan - [x] `npx tsc -p tsconfig.src.json --noEmit` compiles cleanly - [x] `npx tsc -p tsconfig.cli.json --noEmit` compiles cleanly (pre-existing failures only, same as main) - [x] Affected tests pass: gemini-probe-auth, credential-exposure, wsl2-probe-timeout - [x] CI: all 15 checks green - [ ] Manual smoke test: `nemoclaw onboard` completes successfully Co-authored-by: Aaron Erickson <aerickson@nvidia.com>  ## Summary by CodeRabbit * **Refactor** * Reorganized onboarding logic into modular utilities for improved code maintainability and separation of concerns. * Separated provider management, inference endpoint validation, and Ollama proxy lifecycle into dedicated modules. * **Tests** * Updated tests to reflect new code organization structure.  --------- Co-authored-by: Aaron Erickson <aerickson@nvidia.com>

ericksoa added 3 commits April 26, 2026 14:20

jyaunches mentioned this pull request Apr 26, 2026

refactor(onboard): extract modules from onboard.ts (WIP) #2087

Closed

4 tasks

coderabbitai Bot reviewed Apr 26, 2026

View reviewed changes

jyaunches closed this Apr 27, 2026

jyaunches mentioned this pull request Apr 27, 2026

refactor(onboard): extract modules from onboard.ts (Steps 1-3, rebased) #2495

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(onboard): extract modules from onboard.ts (Steps 1-3, rebased)#2489

refactor(onboard): extract modules from onboard.ts (Steps 1-3, rebased)#2489
jyaunches wants to merge 3 commits intomainfrom
refactor/onboard-extract-modules-v2

jyaunches commented Apr 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 26, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 26, 2026

Uh oh!

coderabbitai Bot Apr 26, 2026

Uh oh!

coderabbitai Bot Apr 26, 2026

Uh oh!

coderabbitai Bot Apr 26, 2026

Uh oh!

coderabbitai Bot Apr 26, 2026

Uh oh!

coderabbitai Bot Apr 26, 2026

Uh oh!

coderabbitai Bot Apr 26, 2026

Uh oh!

jyaunches commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Refactor `onboard.ts` (6,382 lines) into ~11 focused modules

		// eslint-disable-next-line complexity
		function probeOpenAiLikeEndpoint(endpointUrl, model, apiKey, options = {}) {

Conversation

jyaunches commented Apr 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Downstream: #2306

Rebase notes

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 26, 2026

Choose a reason for hiding this comment

Uh oh!

jyaunches commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jyaunches commented Apr 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 26, 2026 •

edited

Loading