Skip to content

refactor(onboard): extract modules from onboard.ts (Steps 1-3, rebased)#2489

Closed
jyaunches wants to merge 3 commits intomainfrom
refactor/onboard-extract-modules-v2
Closed

refactor(onboard): extract modules from onboard.ts (Steps 1-3, rebased)#2489
jyaunches wants to merge 3 commits intomainfrom
refactor/onboard-extract-modules-v2

Conversation

@jyaunches
Copy link
Copy Markdown
Contributor

@jyaunches jyaunches commented Apr 26, 2026

Summary

Rebased onto current main (post-#2441 merge). Supersedes #2087 — same work, fresh branch to avoid force-push restriction.

Steps 1–3 of the original extraction plan:

  • Extract onboard-providers.ts — provider metadata, CRUD, labels, inference config
  • Extract onboard-ollama-proxy.ts — proxy lifecycle, model pull/validate
  • Extract onboard-inference-probes.ts — endpoint validation probes (OpenAI, Anthropic, Gemini)

Reduces onboard.ts by ~1,000 lines. All 3 new modules are under 400 lines. No behavior changes — all existing exports remain accessible via re-exports and thin wrappers.

Step 4 (dashboard extraction) was dropped — main already landed dashboard-contract.ts, dashboard-health.ts, and dashboard-recover.ts (#2398) which supersede that extraction.

Steps 5–10 (gateway, preflight, messaging, policies, inference-setup, sandbox) remain deferred per the original plan.

Downstream: #2306

#2306 (extract rebuild/recreate path + canonicalize credential resolution) builds directly on top of this PR. It imports REMOTE_PROVIDER_CONFIG, upsertProvider, and ensureOllamaAuthProxy from the new extracted modules via onboard.ts re-exports. Merging this PR first gives #2306 a clean module surface to build on.

Rebase notes

  • Ported all main-side refinements into the extracted modules (e.g., startOllamaAuthProxy now returns boolean, getProbeAuthMode always returns undefined)
  • Added type annotations for the CJS require() imports to satisfy strict TypeScript

Test plan

  • npx tsc -p tsconfig.src.json --noEmit compiles cleanly
  • Affected tests pass: gemini-probe-auth, credential-exposure, wsl2-probe-timeout
  • Full npm test (CI)
  • Manual smoke test: nemoclaw onboard completes successfully

Co-authored-by: Aaron Erickson aerickson@nvidia.com

Summary by CodeRabbit

  • New Features

    • Enhanced inference endpoint validation during provider and model selection, with improved error handling for provider-specific failures.
    • Expanded Ollama integration with improved model selection, proxy management, and model preparation capabilities.
    • Refined provider configuration and setup handling across multiple endpoint types.
  • Tests

    • Updated test coverage to reflect internal code organization changes while maintaining validation of credential handling and endpoint probing behavior.

Move provider metadata constants (REMOTE_PROVIDER_CONFIG, endpoint URLs,
LOCAL_INFERENCE_PROVIDERS, DISCORD_SNOWFLAKE_RE), lookup helpers
(getEffectiveProviderName, getNonInteractive*, getRequested*Hint),
gateway CRUD (upsertProvider, providerExistsInGateway,
upsertMessagingProviders, buildProviderArgs), and
getSandboxInferenceConfig into a dedicated onboard-providers module.

Add getProviderLabel() to consolidate the scattered if/else provider
name→label chains (used in printDashboard).

Functions that need runOpenshell accept it as a last parameter
(dependency injection) to avoid circular requires. onboard.ts retains
thin wrappers with the original signatures.

Update source-scanning tests to read onboard-providers.ts where the
code now lives.
Move the entire Ollama auth-proxy subsystem into a dedicated module:
token/PID persistence, proxy spawn/kill lifecycle, model prompt/pull,
and the exposure warning.

Zero coupling to other onboard logic — all dependencies (runner, ports,
local-inference, credentials, model-prompts) are imported directly.
Move inference endpoint probe functions (parseJsonObject,
hasResponsesToolCall, shouldRequireResponsesToolCalling,
getProbeAuthMode, getValidationProbeCurlArgs, probeResponsesToolCalling,
probeOpenAiLikeEndpoint, probeAnthropicEndpoint) into a dedicated
module.

All dependencies (http-probe, credentials, platform, validation) are
imported directly — no circular deps. The 4 validate* wrapper functions
remain in onboard.ts for now as they depend on promptValidationRecovery;
they'll consolidate when setupNim moves in a later step.

Update source-scanning tests to read from the new compiled file.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 26, 2026

📝 Walkthrough

Walkthrough

The pull request modularizes a large onboarding file (src/lib/onboard.ts) by extracting domain-focused helpers into three new modules: inference endpoint probing, Ollama proxy management, and provider configuration. The original file delegates to these modules while maintaining backward compatibility. A refactoring plan document specifies the staged extraction approach.

Changes

Cohort / File(s) Summary
Refactoring Plan Documentation
.claude/refactor-onboard-plan.md
New documentation specifying an incremental refactor strategy to break up onboard.ts into ~11 domain-focused modules, including operational constraints (no behavior changes, backward compatibility via re-exports, staged commits, verification checklist).
Inference Probe Module
src/lib/onboard-inference-probes.ts
New module providing endpoint validation utilities: JSON parsing, Responses tool-call detection, probes for OpenAI-like and Anthropic endpoints with retry logic on timeout/connection errors, WSL-specific curl timing, and NVCF error detection.
Ollama Proxy Module
src/lib/onboard-ollama-proxy.ts
New module managing Ollama auth-proxy lifecycle (token generation, PID persistence, process spawning/cleanup), model selection via prompts, and model preparation (pull, warmup, validation with 10-minute timeout).
Provider Configuration Module
src/lib/onboard-providers.ts
New module defining endpoint constants, provider metadata, and helpers: provider label/name resolution, provider CRUD via gateway, messaging-provider upserts, non-interactive provider/model selection, and sandbox inference routing configuration.
Refactored Main Module
src/lib/onboard.ts
Delegates inference probes, Ollama proxy management, and provider configuration to new modules; removes ~880 lines of extracted code while maintaining existing public exports and backward compatibility.
Test Updates
test/credential-exposure.test.ts, test/onboard.test.ts, test/wsl2-probe-timeout.test.ts
Updated test source file references and assertion patterns to match refactored module structure; maintains existing validation logic.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 The warren grows orderly, modules align,
Breaking monoliths into tasks sublime,
Ollama hops freely, probes take their place,
Providers now flourish in structured space! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main refactoring effort: extracting three modules (providers, ollama-proxy, inference-probes) from onboard.ts as Steps 1-3 of a structured plan. It is specific, concise, and clearly communicates the primary change.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch refactor/onboard-extract-modules-v2

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
test/onboard.test.ts (1)

2371-2371: Tighten the existence-check regex to assert injected runner usage.

The current pattern is broad and can pass without validating the DI call shape introduced by the refactor. Consider matching both parameters explicitly.

♻️ Suggested assertion tightening
-    assert.match(source, /providerExistsInGateway\(name/);
+    assert.match(source, /providerExistsInGateway\(name,\s*_runOpenshell\)/);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/onboard.test.ts` at line 2371, The test's regex assert.match(source,
/providerExistsInGateway\(name/) is too loose; update it to assert the
injected-runner call shape by matching both parameters, e.g., require the
function call to include the second parameter (runner) by changing the pattern
to something like /providerExistsInGateway\(name,\s*runner\)/ so the assertion
verifies the DI call form introduced by the refactor (look for
providerExistsInGateway and runner in the same call).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/refactor-onboard-plan.md:
- Around line 101-113: Step 4 should be removed or rewritten because the
dashboard extraction was already completed in PR `#2398`; update the plan to
delete the entire "Step 4: Extract `onboard-dashboard.ts`" section (including
all listed items: CONTROL_UI_PORT, ensureDashboardForward(),
findOpenclawJsonPath(), fetchGatewayAuthTokenFromSandbox(),
getDashboardForwardPort(), getDashboardForwardTarget(),
getDashboardForwardStartCommand(), buildAuthenticatedDashboardUrl(),
getWslHostAddress(), getDashboardAccessInfo(), getDashboardGuidanceLines(), and
printDashboard()) or rewrite it to reference the landed changes and point to the
existing landed artifacts (use getProviderLabel() as noted) so the plan no
longer schedules already-completed work.
- Around line 1-2: Add the required SPDX HTML header to the top of the Markdown
file by inserting two HTML comment lines: one with the exact copyright text
"SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All
rights reserved." and one with the license identifier "SPDX-License-Identifier:
Apache-2.0"; place them as HTML comments (<!-- ... -->) at the very beginning of
.claude/refactor-onboard-plan.md so the file contains the repository-required
SPDX header.

In `@src/lib/onboard-inference-probes.ts`:
- Around line 135-136: The function probeOpenAiLikeEndpoint currently disables
the complexity guard; instead refactor it by extracting major branches into
smaller helpers (e.g., create helper functions such as attemptProbeForRetry,
handleStreamResponse, parseProbeError, and buildProbeOptions) and move the retry
loop, streaming response handling, and error-handling branches into those
helpers so probeOpenAiLikeEndpoint becomes a thin coordinator calling them;
remove the eslint-disable-next-line complexity comment and ensure the top-level
probeOpenAiLikeEndpoint is under the cyclomatic complexity limit by delegating
logic to these named functions (referencing probeOpenAiLikeEndpoint,
handleStreamResponse, parseProbeError, and attemptProbeForRetry to locate and
split the code).
- Around line 327-344: The Anthropic probe (probeAnthropicEndpoint) still uses
getCurlTimingArgs(), which allows long blocking validation; replace that call
with getValidationProbeCurlArgs() so the runCurlProbe invocation for Anthropic
uses the short validation timeouts (keep other args and the endpoint
construction the same) and ensure you update any imports/usage so
probeAnthropicEndpoint calls getValidationProbeCurlArgs() instead of
getCurlTimingArgs().
- Around line 272-290: The retry path after isTimeoutOrConnFailure currently
hardcodes Bearer header auth and a fixed "/chat/completions" path, so it ignores
options.authMode and will fail when authMode === "query-param"; update the retry
construction in the block that sets retriedAfterTimeout so it reuses the same
auth handling and endpoint formatting as the main probe: call
getValidationProbeCurlArgs() and then apply the same logic that conditionally
adds the Authorization header versus query-param (using apiKey and
normalizeCredentialValue(apiKey) and options.authMode), and build the request
URL the same way as the original probe (preserving any path adjustments on
endpointUrl) before calling runCurlProbe so the retry mirrors the main probe
behavior.

In `@src/lib/onboard-ollama-proxy.ts`:
- Around line 172-188: ensureOllamaAuthProxy currently respawns the proxy
without verifying it started; change the restart path so you only treat it as
recovered after confirming a new proxy process is running. Modify
spawnOllamaAuthProxy to return the child PID (or have it expose the PID), then
in ensureOllamaAuthProxy call spawnOllamaAuthProxy(token) and poll
isOllamaProxyProcess(newPid) with a short retry loop and timeout (using sleep)
before setting ollamaProxyToken; if the check fails, do not set
ollamaProxyToken, log the failure, and leave error handling or retries to the
caller. Ensure you still call killStaleProxy() before spawning and use
loadPersistedProxyPid()/loadPersistedProxyToken() to locate and validate prior
state.

In `@src/lib/onboard.ts`:
- Around line 90-110: The destructure of onboardProviders is missing the
extracted endpoint URL constants used by setupNim(), causing unresolved
identifiers; update the destructure to also pull in OPENAI_ENDPOINT_URL and
ANTHROPIC_ENDPOINT_URL (or any other extracted endpoint URL constants your
refactor introduced) so that setupNim() can reference them (look for the
destructure block and the setupNim function to add these symbols to the existing
list alongside REMOTE_PROVIDER_CONFIG, LOCAL_INFERENCE_PROVIDERS, etc.).

---

Nitpick comments:
In `@test/onboard.test.ts`:
- Line 2371: The test's regex assert.match(source,
/providerExistsInGateway\(name/) is too loose; update it to assert the
injected-runner call shape by matching both parameters, e.g., require the
function call to include the second parameter (runner) by changing the pattern
to something like /providerExistsInGateway\(name,\s*runner\)/ so the assertion
verifies the DI call form introduced by the refactor (look for
providerExistsInGateway and runner in the same call).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2ba835a1-84fc-4ee7-bd3f-fae48484c833

📥 Commits

Reviewing files that changed from the base of the PR and between f439189 and 4bf6f38.

📒 Files selected for processing (8)
  • .claude/refactor-onboard-plan.md
  • src/lib/onboard-inference-probes.ts
  • src/lib/onboard-ollama-proxy.ts
  • src/lib/onboard-providers.ts
  • src/lib/onboard.ts
  • test/credential-exposure.test.ts
  • test/onboard.test.ts
  • test/wsl2-probe-timeout.test.ts

Comment on lines +1 to +2
# Refactor `onboard.ts` (6,382 lines) into ~11 focused modules

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Add the required SPDX HTML header.

This new Markdown file is missing the repository-required SPDX header.

Proposed fix
+<!-- SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -->
+<!-- SPDX-License-Identifier: Apache-2.0 -->
+
 # Refactor `onboard.ts` (6,382 lines) into ~11 focused modules

As per coding guidelines, **/*.{js,mjs,ts,tsx,sh,md}: Every source file must include an SPDX license header: // SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. and // SPDX-License-Identifier: Apache-2.0. Use # comments for shell scripts and HTML comments for Markdown.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Refactor `onboard.ts` (6,382 lines) into ~11 focused modules
<!-- SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. -->
<!-- SPDX-License-Identifier: Apache-2.0 -->
# Refactor `onboard.ts` (6,382 lines) into ~11 focused modules
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/refactor-onboard-plan.md around lines 1 - 2, Add the required SPDX
HTML header to the top of the Markdown file by inserting two HTML comment lines:
one with the exact copyright text "SPDX-FileCopyrightText: Copyright (c) 2026
NVIDIA CORPORATION & AFFILIATES. All rights reserved." and one with the license
identifier "SPDX-License-Identifier: Apache-2.0"; place them as HTML comments
(<!-- ... -->) at the very beginning of .claude/refactor-onboard-plan.md so the
file contains the repository-required SPDX header.

Comment on lines +101 to +113
### Step 4: Extract `onboard-dashboard.ts` (~200 lines)

**Move from `onboard.ts`:**
- `CONTROL_UI_PORT` (L5552)
- `ensureDashboardForward()` (L5558-5578)
- `findOpenclawJsonPath()` (L5580-5593)
- `fetchGatewayAuthTokenFromSandbox()` (L5599-5622)
- `getDashboardForwardPort()` / `getDashboardForwardTarget()` / `getDashboardForwardStartCommand()` (L5626-5651)
- `buildAuthenticatedDashboardUrl()` (L5653-5656)
- `getWslHostAddress()` (L5658-5672)
- `getDashboardAccessInfo()` / `getDashboardGuidanceLines()` (L5674-5714)
- `printDashboard()` (L5716-5790) — use `getProviderLabel()` from Step 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove or rewrite Step 4 in this plan.

The PR objective says dashboard extraction was already dropped because those files landed in #2398, but this document still schedules that work as a future step. Merging it as-is will make the plan stale immediately.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/refactor-onboard-plan.md around lines 101 - 113, Step 4 should be
removed or rewritten because the dashboard extraction was already completed in
PR `#2398`; update the plan to delete the entire "Step 4: Extract
`onboard-dashboard.ts`" section (including all listed items: CONTROL_UI_PORT,
ensureDashboardForward(), findOpenclawJsonPath(),
fetchGatewayAuthTokenFromSandbox(), getDashboardForwardPort(),
getDashboardForwardTarget(), getDashboardForwardStartCommand(),
buildAuthenticatedDashboardUrl(), getWslHostAddress(), getDashboardAccessInfo(),
getDashboardGuidanceLines(), and printDashboard()) or rewrite it to reference
the landed changes and point to the existing landed artifacts (use
getProviderLabel() as noted) so the plan no longer schedules already-completed
work.

Comment on lines +135 to +136
// eslint-disable-next-line complexity
function probeOpenAiLikeEndpoint(endpointUrl, model, apiKey, options = {}) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Don’t suppress the complexity guard here.

This extraction moved probe logic into its own module, but probeOpenAiLikeEndpoint() now bypasses the repo’s complexity limit instead of decomposing the retry/streaming/error-handling branches into helpers.

As per coding guidelines, {src,nemoclaw/src,scripts}/**/*.{js,ts,tsx}: TypeScript files must maintain cyclomatic complexity limit of 20, ratcheting down to 15. Enforced by ESLint.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard-inference-probes.ts` around lines 135 - 136, The function
probeOpenAiLikeEndpoint currently disables the complexity guard; instead
refactor it by extracting major branches into smaller helpers (e.g., create
helper functions such as attemptProbeForRetry, handleStreamResponse,
parseProbeError, and buildProbeOptions) and move the retry loop, streaming
response handling, and error-handling branches into those helpers so
probeOpenAiLikeEndpoint becomes a thin coordinator calling them; remove the
eslint-disable-next-line complexity comment and ensure the top-level
probeOpenAiLikeEndpoint is under the cyclomatic complexity limit by delegating
logic to these named functions (referencing probeOpenAiLikeEndpoint,
handleStreamResponse, parseProbeError, and attemptProbeForRetry to locate and
split the code).

Comment on lines +272 to +290
if (failures.length > 0 && isTimeoutOrConnFailure(failures[0].curlStatus)) {
retriedAfterTimeout = true;
const baseArgs = getValidationProbeCurlArgs();
const doubledArgs = baseArgs.map((arg) =>
/^\d+$/.test(arg) ? String(Number(arg) * 2) : arg,
);
const retryResult = runCurlProbe([
"-sS",
...doubledArgs,
"-H",
"Content-Type: application/json",
...(apiKey ? ["-H", `Authorization: Bearer ${normalizeCredentialValue(apiKey)}`] : []),
"-d",
JSON.stringify({
model,
messages: [{ role: "user", content: "Reply with exactly: OK" }],
}),
`${String(endpointUrl).replace(/\/+$/, "")}/chat/completions`,
]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Keep the retry probe aligned with authMode.

The main probe path honors options.authMode === "query-param", but the retry hardcodes Bearer auth and a plain /chat/completions URL. If a caller selects query-param auth, the retry can never succeed even when the timeout was transient.

Proposed fix
     const retryResult = runCurlProbe([
       "-sS",
       ...doubledArgs,
       "-H",
       "Content-Type: application/json",
-      ...(apiKey ? ["-H", `Authorization: Bearer ${normalizeCredentialValue(apiKey)}`] : []),
+      ...authHeader,
       "-d",
       JSON.stringify({
         model,
         messages: [{ role: "user", content: "Reply with exactly: OK" }],
       }),
-      `${String(endpointUrl).replace(/\/+$/, "")}/chat/completions`,
+      appendKey("/chat/completions"),
     ]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard-inference-probes.ts` around lines 272 - 290, The retry path
after isTimeoutOrConnFailure currently hardcodes Bearer header auth and a fixed
"/chat/completions" path, so it ignores options.authMode and will fail when
authMode === "query-param"; update the retry construction in the block that sets
retriedAfterTimeout so it reuses the same auth handling and endpoint formatting
as the main probe: call getValidationProbeCurlArgs() and then apply the same
logic that conditionally adds the Authorization header versus query-param (using
apiKey and normalizeCredentialValue(apiKey) and options.authMode), and build the
request URL the same way as the original probe (preserving any path adjustments
on endpointUrl) before calling runCurlProbe so the retry mirrors the main probe
behavior.

Comment on lines +327 to +344
function probeAnthropicEndpoint(endpointUrl, model, apiKey) {
const result = runCurlProbe([
"-sS",
...getCurlTimingArgs(),
"-H",
`x-api-key: ${normalizeCredentialValue(apiKey)}`,
"-H",
"anthropic-version: 2023-06-01",
"-H",
"content-type: application/json",
"-d",
JSON.stringify({
model,
max_tokens: 16,
messages: [{ role: "user", content: "Reply with exactly: OK" }],
}),
`${String(endpointUrl).replace(/\/+$/, "")}/v1/messages`,
]);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the short validation timeouts for the Anthropic probe too.

This path still calls getCurlTimingArgs(), so Anthropic validation can block much longer than the OpenAI-like probes. That undercuts the new “validation must not hang the wizard” behavior you just centralized in getValidationProbeCurlArgs().

Proposed fix
   const result = runCurlProbe([
     "-sS",
-    ...getCurlTimingArgs(),
+    ...getValidationProbeCurlArgs(),
     "-H",
     `x-api-key: ${normalizeCredentialValue(apiKey)}`,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function probeAnthropicEndpoint(endpointUrl, model, apiKey) {
const result = runCurlProbe([
"-sS",
...getCurlTimingArgs(),
"-H",
`x-api-key: ${normalizeCredentialValue(apiKey)}`,
"-H",
"anthropic-version: 2023-06-01",
"-H",
"content-type: application/json",
"-d",
JSON.stringify({
model,
max_tokens: 16,
messages: [{ role: "user", content: "Reply with exactly: OK" }],
}),
`${String(endpointUrl).replace(/\/+$/, "")}/v1/messages`,
]);
function probeAnthropicEndpoint(endpointUrl, model, apiKey) {
const result = runCurlProbe([
"-sS",
...getValidationProbeCurlArgs(),
"-H",
`x-api-key: ${normalizeCredentialValue(apiKey)}`,
"-H",
"anthropic-version: 2023-06-01",
"-H",
"content-type: application/json",
"-d",
JSON.stringify({
model,
max_tokens: 16,
messages: [{ role: "user", content: "Reply with exactly: OK" }],
}),
`${String(endpointUrl).replace(/\/+$/, "")}/v1/messages`,
]);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard-inference-probes.ts` around lines 327 - 344, The Anthropic
probe (probeAnthropicEndpoint) still uses getCurlTimingArgs(), which allows long
blocking validation; replace that call with getValidationProbeCurlArgs() so the
runCurlProbe invocation for Anthropic uses the short validation timeouts (keep
other args and the endpoint construction the same) and ensure you update any
imports/usage so probeAnthropicEndpoint calls getValidationProbeCurlArgs()
instead of getCurlTimingArgs().

Comment on lines +172 to +188
function ensureOllamaAuthProxy(): void {
// Try to load persisted token first — if none, this isn't an Ollama setup.
const token = loadPersistedProxyToken();
if (!token) return;

const pid = loadPersistedProxyPid();
if (isOllamaProxyProcess(pid)) {
ollamaProxyToken = token;
return;
}

// Proxy not running — restart it with the persisted token.
killStaleProxy();
ollamaProxyToken = token;
spawnOllamaAuthProxy(token);
sleep(1);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Verify the restart path before treating the proxy as recovered.

ensureOllamaAuthProxy() respawns the proxy but never checks whether the new process actually came up. On reconnect/resume, a failed restart now looks like success and the first real Ollama request fails much farther downstream.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard-ollama-proxy.ts` around lines 172 - 188,
ensureOllamaAuthProxy currently respawns the proxy without verifying it started;
change the restart path so you only treat it as recovered after confirming a new
proxy process is running. Modify spawnOllamaAuthProxy to return the child PID
(or have it expose the PID), then in ensureOllamaAuthProxy call
spawnOllamaAuthProxy(token) and poll isOllamaProxyProcess(newPid) with a short
retry loop and timeout (using sleep) before setting ollamaProxyToken; if the
check fails, do not set ollamaProxyToken, log the failure, and leave error
handling or retries to the caller. Ensure you still call killStaleProxy() before
spawning and use loadPersistedProxyPid()/loadPersistedProxyToken() to locate and
validate prior state.

Comment thread src/lib/onboard.ts
Comment on lines +90 to +110
const {
REMOTE_PROVIDER_CONFIG,
LOCAL_INFERENCE_PROVIDERS,
DISCORD_SNOWFLAKE_RE,
getProviderLabel,
getEffectiveProviderName,
getNonInteractiveProvider,
getNonInteractiveModel,
getSandboxInferenceConfig,
} = onboardProviders as {
REMOTE_PROVIDER_CONFIG: Record<string, RemoteProviderConfigEntry>;
LOCAL_INFERENCE_PROVIDERS: string[];
DISCORD_SNOWFLAKE_RE: RegExp;
getProviderLabel: (key: string) => string;
getEffectiveProviderName: (key: string | null | undefined) => string | null;
getNonInteractiveProvider: () => string | null;
getNonInteractiveModel: (providerKey: string) => string | null;
getSandboxInferenceConfig: (model: string, provider?: string | null, preferredInferenceApi?: string | null) => {
providerKey: string; primaryModelRef: string; inferenceBaseUrl: string; inferenceApi: string; inferenceCompat: LooseObject | null;
};
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Import the extracted endpoint URL constants too.

setupNim() still falls back to OPENAI_ENDPOINT_URL and ANTHROPIC_ENDPOINT_URL, but this destructure no longer brings those symbols into scope after the extraction. That leaves unresolved identifiers at Lines 4000, 4027, 4072, and 4096.

Proposed fix
 const {
   REMOTE_PROVIDER_CONFIG,
   LOCAL_INFERENCE_PROVIDERS,
   DISCORD_SNOWFLAKE_RE,
+  OPENAI_ENDPOINT_URL,
+  ANTHROPIC_ENDPOINT_URL,
   getProviderLabel,
   getEffectiveProviderName,
   getNonInteractiveProvider,
   getNonInteractiveModel,
   getSandboxInferenceConfig,
 } = onboardProviders as {
   REMOTE_PROVIDER_CONFIG: Record<string, RemoteProviderConfigEntry>;
   LOCAL_INFERENCE_PROVIDERS: string[];
   DISCORD_SNOWFLAKE_RE: RegExp;
+  OPENAI_ENDPOINT_URL: string;
+  ANTHROPIC_ENDPOINT_URL: string;
   getProviderLabel: (key: string) => string;
   getEffectiveProviderName: (key: string | null | undefined) => string | null;
   getNonInteractiveProvider: () => string | null;
   getNonInteractiveModel: (providerKey: string) => string | null;
   getSandboxInferenceConfig: (model: string, provider?: string | null, preferredInferenceApi?: string | null) => {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const {
REMOTE_PROVIDER_CONFIG,
LOCAL_INFERENCE_PROVIDERS,
DISCORD_SNOWFLAKE_RE,
getProviderLabel,
getEffectiveProviderName,
getNonInteractiveProvider,
getNonInteractiveModel,
getSandboxInferenceConfig,
} = onboardProviders as {
REMOTE_PROVIDER_CONFIG: Record<string, RemoteProviderConfigEntry>;
LOCAL_INFERENCE_PROVIDERS: string[];
DISCORD_SNOWFLAKE_RE: RegExp;
getProviderLabel: (key: string) => string;
getEffectiveProviderName: (key: string | null | undefined) => string | null;
getNonInteractiveProvider: () => string | null;
getNonInteractiveModel: (providerKey: string) => string | null;
getSandboxInferenceConfig: (model: string, provider?: string | null, preferredInferenceApi?: string | null) => {
providerKey: string; primaryModelRef: string; inferenceBaseUrl: string; inferenceApi: string; inferenceCompat: LooseObject | null;
};
};
const {
REMOTE_PROVIDER_CONFIG,
LOCAL_INFERENCE_PROVIDERS,
DISCORD_SNOWFLAKE_RE,
OPENAI_ENDPOINT_URL,
ANTHROPIC_ENDPOINT_URL,
getProviderLabel,
getEffectiveProviderName,
getNonInteractiveProvider,
getNonInteractiveModel,
getSandboxInferenceConfig,
} = onboardProviders as {
REMOTE_PROVIDER_CONFIG: Record<string, RemoteProviderConfigEntry>;
LOCAL_INFERENCE_PROVIDERS: string[];
DISCORD_SNOWFLAKE_RE: RegExp;
OPENAI_ENDPOINT_URL: string;
ANTHROPIC_ENDPOINT_URL: string;
getProviderLabel: (key: string) => string;
getEffectiveProviderName: (key: string | null | undefined) => string | null;
getNonInteractiveProvider: () => string | null;
getNonInteractiveModel: (providerKey: string) => string | null;
getSandboxInferenceConfig: (model: string, provider?: string | null, preferredInferenceApi?: string | null) => {
providerKey: string; primaryModelRef: string; inferenceBaseUrl: string; inferenceApi: string; inferenceCompat: LooseObject | null;
};
};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/onboard.ts` around lines 90 - 110, The destructure of
onboardProviders is missing the extracted endpoint URL constants used by
setupNim(), causing unresolved identifiers; update the destructure to also pull
in OPENAI_ENDPOINT_URL and ANTHROPIC_ENDPOINT_URL (or any other extracted
endpoint URL constants your refactor introduced) so that setupNim() can
reference them (look for the destructure block and the setupNim function to add
these symbols to the existing list alongside REMOTE_PROVIDER_CONFIG,
LOCAL_INFERENCE_PROVIDERS, etc.).

@jyaunches
Copy link
Copy Markdown
Contributor Author

Superseded by next iteration — missing endpoint URL imports caused typecheck:cli failure.

@jyaunches jyaunches closed this Apr 27, 2026
jyaunches added a commit that referenced this pull request Apr 27, 2026
…d) (#2495)

## ⚡ Interim PR — Unblocks #2306

This is an **interim extraction PR** that lands the foundational module
decomposition needed by #2306 (extract rebuild/recreate path +
canonicalize credential resolution). It is intentionally scoped to pure
code movement with no behavior changes.

Rebased onto current main. Supersedes #2087 and #2489.

---

## Summary

Steps 1–3 of the original extraction plan (authored by @ericksoa in
#2087):

- Extract `onboard-providers.ts` — provider metadata, CRUD, labels,
inference config
- Extract `onboard-ollama-proxy.ts` — proxy lifecycle, model
pull/validate
- Extract `onboard-inference-probes.ts` — endpoint validation probes
(OpenAI, Anthropic, Gemini)

Reduces `onboard.ts` by ~1,000 lines. All 3 new modules are under 400
lines. No behavior changes — all existing exports remain accessible via
re-exports and thin wrappers.

**Step 4 (dashboard extraction) was dropped** — main already landed
`dashboard-contract.ts`, `dashboard-health.ts`, and
`dashboard-recover.ts` (#2398) which supersede that extraction.

Steps 5–10 (gateway, preflight, messaging, policies, inference-setup,
sandbox) remain deferred per the original plan.

## Downstream: #2306

`#2306` (extract rebuild/recreate path + canonicalize credential
resolution) builds directly on top of this PR. It imports
`REMOTE_PROVIDER_CONFIG`, `upsertProvider`, and `ensureOllamaAuthProxy`
from the new extracted modules via onboard.ts re-exports. Merging this
PR first gives #2306 a clean module surface to build on.

## CodeRabbit comments

All 6 comments addressed:
- 1 resolved (deleted planning doc)
- 4 are valid improvements on **pre-existing code** that was moved as-is
— intentionally out of scope for a pure extraction PR, flagged as
follow-up material
- 1 is incorrect (`upsertMessagingProviders` was never exported on main)

## Rebase notes

- Ported all main-side refinements into the extracted modules (e.g.,
`startOllamaAuthProxy` now returns `boolean`, `getProbeAuthMode` always
returns `undefined`)
- Added type annotations for the CJS `require()` imports to satisfy
strict TypeScript
- Re-imported endpoint URL constants that `onboard.ts` still references
after extraction

## Test plan

- [x] `npx tsc -p tsconfig.src.json --noEmit` compiles cleanly
- [x] `npx tsc -p tsconfig.cli.json --noEmit` compiles cleanly
(pre-existing failures only, same as main)
- [x] Affected tests pass: gemini-probe-auth, credential-exposure,
wsl2-probe-timeout
- [x] CI: all 15 checks green
- [ ] Manual smoke test: `nemoclaw onboard` completes successfully

Co-authored-by: Aaron Erickson <aerickson@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Reorganized onboarding logic into modular utilities for improved code
maintainability and separation of concerns.
* Separated provider management, inference endpoint validation, and
Ollama proxy lifecycle into dedicated modules.

* **Tests**
  * Updated tests to reflect new code organization structure.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Aaron Erickson <aerickson@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants