Skip to content

fix(onboard): probe host pythons and fall back before Model Router venv setup#3786

Merged
ericksoa merged 5 commits into
mainfrom
fix/model-router-python-fallback
May 19, 2026
Merged

fix(onboard): probe host pythons and fall back before Model Router venv setup#3786
ericksoa merged 5 commits into
mainfrom
fix/model-router-python-fallback

Conversation

@laitingsheng
Copy link
Copy Markdown
Contributor

@laitingsheng laitingsheng commented May 19, 2026

Summary

Model Router venv setup used to grab whatever python3 resolved to first and run python -m venv unconditionally. When that python's stdlib was broken at the C-extension level (e.g. Homebrew python@3.14 with a pyexpat dlopen mismatch against the system libexpat), onboarding aborted with a generic "Failed to create Model Router virtual environment." message and made no attempt to fall back to a healthy interpreter — even when one was right there on PATH.

Related Issue

Fixes #3781

Changes

  • New pickHostPython helper in src/lib/onboard/model-router-python.ts probes python3.13, python3.12, python3.11, python3.10, and bare python3 (plus an optional NEMOCLAW_MODEL_ROUTER_PYTHON override) and adopts the first interpreter whose version is in [3.10, 3.14) and whose ensurepip, pyexpat, ssl, and venv stdlib modules import cleanly.
  • installModelRouterCommand now delegates the host-python probe and python -m venv invocation to prepareModelRouterVenv, which surfaces the real per-candidate failure (for example, the _XML_SetAllocTrackerActivationThreshold ImportError) when no candidate qualifies.
  • Documents the supported Python range and the NEMOCLAW_MODEL_ROUTER_PYTHON override in docs/inference/inference-options.md and docs/reference/commands.md.
  • Adds pickHostPython unit tests covering version-range bounds, stdlib import failures, candidate dedup, env-var override, and the formatted error path. Updates the existing Model Router integration tests so the fake python3 stub answers the new probe.

Type of Change

  • Code change with doc updates
  • Code change (feature, bug fix, or refactor)
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

  • New Features

    • Added NEMOCLAW_MODEL_ROUTER_PYTHON env var to strictly pin the host Python for the Model Router.
    • Model Router onboarding now creates a host-side virtual environment and probes candidate Python interpreters for supported versions and required stdlib modules; onboarding aborts with detailed per-candidate failures if none qualify.
  • Documentation

    • Documented interpreter auto-discovery, qualification rules, pinning semantics, and failure guidance.
  • Tests

    • Added tests for discovery, qualification, pin override, venv bootstrap, fallback, and failure messaging.

Review Change Stack

…nv setup

Closes #3781

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 19, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 19, 2026

📝 Walkthrough

Walkthrough

Host Python discovery and venv bootstrap for Model Router: probes candidate interpreters for required stdlib modules and version bounds, selects healthy candidates (supports strict override), attempts venv creation per-candidate, integrates into onboarding, and documents host requirements and failure messages.

Changes

Model Router Python Discovery and Venv Setup

Layer / File(s) Summary
Python discovery contracts and version/stdlib requirements
src/lib/onboard/model-router-python.ts
Module docs, exported constants (MIN_PYTHON_VERSION, MAX_PYTHON_EXCLUSIVE, REQUIRED_STDLIB_MODULES, OVERRIDE_ENV_VAR) and TypeScript interfaces for probe results and DI.
Interpreter probing and candidate selection
src/lib/onboard/model-router-python.ts
Probe script and pickHostPython implementation: resolve candidates or strict override, deduplicate paths, run JSON-emitting probe, validate version bounds, collect per-candidate failures, and return ok + ordered healthy candidates.
Failure message formatting and helpers
src/lib/onboard/model-router-python.ts
formatHostPythonFailureMessage and default helpers: which via command -v, probe runner using spawnSync, and logging.
venv creation from healthy candidates
src/lib/onboard/model-router-python.ts
prepareModelRouterVenv: ensure parent dir, handle allowReplaceExisting, iterate healthy interpreters removing partial venvs, run <python> -m venv with timeout, verify ${venvDir}/bin/python, and aggregate errors if all attempts fail.
Integration into Model Router onboarding
src/lib/onboard.ts
Imports prepareModelRouterVenv and replaces prior inline venv bootstrapping with prepareModelRouterVenv({ venvDir, allowReplaceExisting }) before dependency installation and executable validation.
Unit tests for discovery logic
src/lib/onboard/model-router-python.test.ts
Vitest coverage validating candidate ranking, fallback on probe failure, version bound enforcement (exclude 3.14 via exclusive max), path deduplication, strict override behavior, failure-message formatting, spawn/ENOENT error surface, and prepareModelRouterVenv refusal behavior.
Integration test mock updates
test/onboard-model-router.test.ts
Test harness updates to detect python3 venv creation via regex; mocked python3 now handles -c probe invocations returning JSON version payloads.
User documentation for host Python requirements
docs/inference/inference-options.md, docs/inference/inference-options.mdx, docs/reference/commands.md, docs/reference/commands.mdx
New "Host Python requirement" subsection and NEMOCLAW_MODEL_ROUTER_PYTHON CLI/env docs describing probe behavior, required stdlib imports, qualification/failure reporting, and strict absolute-path pin semantics.

Sequence Diagram

sequenceDiagram
  participant pickHostPython
  participant which
  participant probe
  participant prepareModelRouterVenv
  participant python_m_venv
  pickHostPython->>which: resolve candidate names to absolute paths
  which-->>pickHostPython: return paths
  pickHostPython->>probe: run probe script to import stdlib modules and emit JSON
  probe-->>pickHostPython: JSON version or error
  pickHostPython->>prepareModelRouterVenv: healthy candidates and failures
  prepareModelRouterVenv->>python_m_venv: run python -m venv for each candidate
  python_m_venv-->>prepareModelRouterVenv: success or stderr/exit
  prepareModelRouterVenv-->>pickHostPython: venv python path or aggregated error
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested labels

documentation, enhancement: testing

Suggested reviewers

  • jyaunches
  • ericksoa
  • cv

Poem

A rabbit hops through Python paths,
probes each interpreter with tiny maths,
skips the broken, finds a friendly one,
builds a venv so routers can run,
happy hops — Model Router's job is done. 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding host Python probing and fallback logic before Model Router venv setup.
Linked Issues check ✅ Passed The PR implements all three objectives from #3781: smoke-test probing of candidate Python interpreters for required stdlib modules, fallback to next-highest healthy Python, and detailed error messages naming the broken interpreter and actual import failure.
Out of Scope Changes check ✅ Passed All changes are directly scoped to #3781 requirements: Python discovery/probing logic, venv preparation, documentation of supported versions and override env var, and related test coverage.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/model-router-python-fallback

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

ghost commented May 19, 2026

@github-actions
Copy link
Copy Markdown
Contributor

ghost commented May 19, 2026

E2E Advisor Recommendation

Required E2E: model-router-provider-routed-inference-e2e
Optional E2E: docs-validation-e2e, cloud-onboard-e2e

Dispatch hint: model-router-provider-routed-inference-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • model-router-provider-routed-inference-e2e (~45 min; requires NVIDIA_API_KEY and Docker on ubuntu-latest): Directly exercises NEMOCLAW_PROVIDER=routed onboarding, Model Router startup/health, provider registration, and an inference.local routed chat completion. This is the existing E2E most likely to catch regressions in the new Python/venv preparation path.

Optional E2E

  • docs-validation-e2e (~15 min): Useful because the PR updates inference and command reference docs, including a new environment variable. This validates CLI/docs parity and markdown links but is not merge-blocking for the runtime Model Router change.
  • cloud-onboard-e2e (~45 min; requires NVIDIA_API_KEY): Optional broad smoke for the onboard entrypoint after adding a new module import to src/lib/onboard.ts. It does not specifically exercise the routed provider, so the Model Router regression job remains the required targeted check.

New E2E recommendations

  • Model Router host Python fallback and strict pin behavior (medium): Existing routed-provider E2E validates the happy path with the runner's available Python, but there is no E2E that simulates a broken top-priority Python, a venv creation failure followed by fallback, or a strict NEMOCLAW_MODEL_ROUTER_PYTHON pin failure/success in a real onboard invocation.
    • Suggested test: Add a regression E2E that shims python3.13/python3.12 on PATH during NEMOCLAW_PROVIDER=routed onboard to verify fallback, and separately verifies an absolute NEMOCLAW_MODEL_ROUTER_PYTHON pin is honored while a relative pin fails before sandbox creation.
  • macOS Model Router onboarding (medium): The motivating failure mode documented in this PR is macOS/Homebrew pyexpat behavior, but the existing macOS E2E runs the default cloud OpenClaw path and does not cover NEMOCLAW_PROVIDER=routed or Model Router venv creation.
    • Suggested test: Add a macOS scenario or regression job for Model Router Python discovery that can run without full live routed inference, or a full routed-provider macOS E2E when secrets/runtime support are available.

Dispatch hint

  • Workflow: regression-e2e.yaml
  • jobs input: model-router-provider-routed-inference-e2e

@laitingsheng laitingsheng marked this pull request as ready for review May 19, 2026 04:19
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
docs/reference/commands.mdx (1)

1151-1151: ⚡ Quick win

Replace colon with period in environment variable description.

The phrase "Strict: NemoClaw probes..." uses a colon to separate clauses rather than introduce a list.

As per coding guidelines, "Colons should only introduce a list. Flag colons used as general punctuation between clauses."

Suggested fix
-| `NEMOCLAW_MODEL_ROUTER_PYTHON` | absolute path | Pins the host Python interpreter used to create the Model Router virtual environment. Strict: NemoClaw probes only that interpreter and aborts with the failure reason if it does not qualify, rather than silently falling back to another python. When unset, NemoClaw probes `python3.13`, `python3.12`, `python3.11`, `python3.10`, and bare `python3`, retains every interpreter whose version is in `[3.10, 3.14)` and whose `ensurepip`, `pyexpat`, `ssl`, and `venv` stdlib modules import cleanly, and tries `python -m venv` on each in priority order until one succeeds. Set the pin when the auto-discovered interpreter is broken (for example, Homebrew `python@3.14` with a `pyexpat` dlopen mismatch on macOS). |
+| `NEMOCLAW_MODEL_ROUTER_PYTHON` | absolute path | Pins the host Python interpreter used to create the Model Router virtual environment. Strict. NemoClaw probes only that interpreter and aborts with the failure reason if it does not qualify, rather than silently falling back to another python. When unset, NemoClaw probes `python3.13`, `python3.12`, `python3.11`, `python3.10`, and bare `python3`, retains every interpreter whose version is in `[3.10, 3.14)` and whose `ensurepip`, `pyexpat`, `ssl`, and `venv` stdlib modules import cleanly, and tries `python -m venv` on each in priority order until one succeeds. Set the pin when the auto-discovered interpreter is broken (for example, Homebrew `python@3.14` with a `pyexpat` dlopen mismatch on macOS). |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/reference/commands.mdx` at line 1151, In the
NEMOCLAW_MODEL_ROUTER_PYTHON environment variable description, replace the colon
after the word "Strict" with a period and adjust capitalization if needed so the
sentence reads as two sentences (e.g., "Strict. NemoClaw probes only...")—update
the text near the NEMOCLAW_MODEL_ROUTER_PYTHON entry in
docs/reference/commands.mdx to use a period instead of the colon to conform to
the colon-usage guideline.
docs/reference/commands.md (1)

1164-1164: ⚡ Quick win

Replace colon with period in environment variable description.

The phrase "Strict: NemoClaw probes..." uses a colon to separate clauses rather than introduce a list.

As per coding guidelines, "Colons should only introduce a list. Flag colons used as general punctuation between clauses."

Suggested fix
-| `NEMOCLAW_MODEL_ROUTER_PYTHON` | absolute path | Pins the host Python interpreter used to create the Model Router virtual environment. Strict: NemoClaw probes only that interpreter and aborts with the failure reason if it does not qualify, rather than silently falling back to another python. When unset, NemoClaw probes `python3.13`, `python3.12`, `python3.11`, `python3.10`, and bare `python3`, retains every interpreter whose version is in `[3.10, 3.14)` and whose `ensurepip`, `pyexpat`, `ssl`, and `venv` stdlib modules import cleanly, and tries `python -m venv` on each in priority order until one succeeds. Set the pin when the auto-discovered interpreter is broken (for example, Homebrew `python@3.14` with a `pyexpat` dlopen mismatch on macOS). |
+| `NEMOCLAW_MODEL_ROUTER_PYTHON` | absolute path | Pins the host Python interpreter used to create the Model Router virtual environment. Strict. NemoClaw probes only that interpreter and aborts with the failure reason if it does not qualify, rather than silently falling back to another python. When unset, NemoClaw probes `python3.13`, `python3.12`, `python3.11`, `python3.10`, and bare `python3`, retains every interpreter whose version is in `[3.10, 3.14)` and whose `ensurepip`, `pyexpat`, `ssl`, and `venv` stdlib modules import cleanly, and tries `python -m venv` on each in priority order until one succeeds. Set the pin when the auto-discovered interpreter is broken (for example, Homebrew `python@3.14` with a `pyexpat` dlopen mismatch on macOS). |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/reference/commands.md` at line 1164, Update the environment variable
description for NEMOCLAW_MODEL_ROUTER_PYTHON by replacing the colon after
"Strict" with a period and capitalizing the following clause so it reads
"Strict. NemoClaw probes only that interpreter..." (ensure you modify the string
that begins with `NEMOCLAW_MODEL_ROUTER_PYTHON` in the docs so punctuation
follows the guideline that colons only introduce lists).
docs/inference/inference-options.md (1)

122-122: ⚡ Quick win

Replace colon with period.

The colon separates two independent clauses rather than introducing a list.
Colons should only introduce lists, per the style guide.

As per coding guidelines, "Colons should only introduce a list. Flag colons used as general punctuation between clauses."

Suggested fix
-The pin is strict: NemoClaw probes only that interpreter and aborts with the failure reason if it does not qualify, rather than silently falling back to a different python on `PATH`.
+The pin is strict.
+NemoClaw probes only that interpreter and aborts with the failure reason if it does not qualify, rather than silently falling back to a different python on `PATH`.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/inference/inference-options.md` at line 122, Change the sentence "The
pin is strict: NemoClaw probes only that interpreter and aborts with the failure
reason if it does not qualify, rather than silently falling back to a different
python on `PATH`." to use a period instead of a colon so it reads "The pin is
strict. NemoClaw probes only that interpreter..."—ensure spacing and
capitalization are correct after replacing the colon.
docs/inference/inference-options.mdx (1)

109-109: ⚡ Quick win

Replace colon with period.

The colon separates two independent clauses rather than introducing a list.
Colons should only introduce lists, per the style guide.

As per coding guidelines, "Colons should only introduce a list. Flag colons used as general punctuation between clauses."

Suggested fix
-The pin is strict: NemoClaw probes only that interpreter and aborts with the failure reason if it does not qualify, rather than silently falling back to a different python on `PATH`.
+The pin is strict.
+NemoClaw probes only that interpreter and aborts with the failure reason if it does not qualify, rather than silently falling back to a different python on `PATH`.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/inference/inference-options.mdx` at line 109, Replace the colon in the
sentence "The pin is strict: NemoClaw probes only that interpreter and aborts
with the failure reason if it does not qualify, rather than silently falling
back to a different python on `PATH`." with a period so it becomes two
independent sentences (e.g., "The pin is strict. NemoClaw probes..."), ensuring
the second sentence still begins with "NemoClaw" and preserves the rest of the
wording; update the sentence in the inference options doc where that exact
phrase appears.
src/lib/onboard/model-router-python.test.ts (2)

24-24: 💤 Low value

Consider using a supported version in the default parameter for clarity.

The default version of [3, 14, 5] is above the supported ceiling. When tests like line 92–94 omit the version argument, the probe reports both an import error and a version outside the supported range. For tests specifically validating stdlib probe failures (per the test name "falls back when the top candidate fails the stdlib probe"), it would be clearer to default to a version within [3, 10, 3.14), such as [3, 13, 0].

Optional refactor
-function probeImportError(detail: string, version: readonly [number, number, number] = [3, 14, 5]) {
+function probeImportError(detail: string, version: readonly [number, number, number] = [3, 13, 0]) {
   return {
     exit: 1,
     stdout: JSON.stringify({ version: [...version], error: detail }),
     stderr: "",
   };
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard/model-router-python.test.ts` at line 24, The default version
used by probeImportError is outside the supported ceiling, causing tests that
omit the version to also trigger a "version outside supported range" failure;
change the default parameter in probeImportError(version: readonly [number,
number, number] = [3, 13, 0]) to a supported tuple (e.g., [3,13,0]) so tests
that expect only the stdlib import error (like the "falls back when the top
candidate fails the stdlib probe" test) get a supported Python version by
default; update any test expectations that intentionally rely on an out-of-range
default to pass an explicit version when necessary.

203-203: 💤 Low value

Static analysis RegExp warning is a false positive.

The warnings about constructing RegExp from a variable flag potential ReDoS vulnerabilities when the pattern comes from untrusted input. Here, OVERRIDE_ENV_VAR is a constant string imported from the module under test, not user input, so there is no security risk.

Using the constant maintains synchronization with the actual environment variable name. If you prefer to silence the static analysis warning, you could replace with a string literal /NEMOCLAW_MODEL_ROUTER_PYTHON/, though this trades static analysis cleanliness for slightly looser coupling to the constant.

Also applies to: 220-220

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard/model-router-python.test.ts` at line 203, The
RegExp-from-variable warning is a static-analysis false positive because
OVERRIDE_ENV_VAR is a module constant, so either keep the current
assert.match(message, new RegExp(OVERRIDE_ENV_VAR)) and silence the rule for
that line by adding an inline eslint disable (e.g., add a single-line comment
like // eslint-disable-next-line security/detect-unsafe-regex) immediately above
the assert.match call, or replace the RegExp with a literal string match (e.g.,
assert.match(message, /NEMOCLAW_MODEL_ROUTER_PYTHON/) ) if you prefer to remove
the linter suppression; target the assert.match line referencing
OVERRIDE_ENV_VAR in model-router-python.test.ts.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard/model-router-python.ts`:
- Around line 291-294: The venv creation currently uses hostPython.command which
may re-resolve to a different interpreter later; change the run invocation that
creates the virtualenv (the call producing venvResult) to use
hostPython.executable instead of hostPython.command so the venv is created with
the exact probed binary and subsequent diagnostics/reference point to
hostPython.executable; keep the same options (ignoreError, timeout,
opts.venvDir) and update any related log/error messages to reference
hostPython.executable.
- Around line 308-314: The final Error thrown at the end of the Model Router
Python setup currently only includes venvFailures, which loses prior stdlib
probe failures; update the throw (in model-router-python.ts where the Error is
constructed) to include both the probe failures and venvFailures (e.g., include
probeFailures or stdlib probe failure messages before venvFailures in the array
passed to join), so the error message preserves the original broken-host
import/probe details as well as the subsequent -m venv failures.

---

Nitpick comments:
In `@docs/inference/inference-options.md`:
- Line 122: Change the sentence "The pin is strict: NemoClaw probes only that
interpreter and aborts with the failure reason if it does not qualify, rather
than silently falling back to a different python on `PATH`." to use a period
instead of a colon so it reads "The pin is strict. NemoClaw probes only that
interpreter..."—ensure spacing and capitalization are correct after replacing
the colon.

In `@docs/inference/inference-options.mdx`:
- Line 109: Replace the colon in the sentence "The pin is strict: NemoClaw
probes only that interpreter and aborts with the failure reason if it does not
qualify, rather than silently falling back to a different python on `PATH`."
with a period so it becomes two independent sentences (e.g., "The pin is strict.
NemoClaw probes..."), ensuring the second sentence still begins with "NemoClaw"
and preserves the rest of the wording; update the sentence in the inference
options doc where that exact phrase appears.

In `@docs/reference/commands.md`:
- Line 1164: Update the environment variable description for
NEMOCLAW_MODEL_ROUTER_PYTHON by replacing the colon after "Strict" with a period
and capitalizing the following clause so it reads "Strict. NemoClaw probes only
that interpreter..." (ensure you modify the string that begins with
`NEMOCLAW_MODEL_ROUTER_PYTHON` in the docs so punctuation follows the guideline
that colons only introduce lists).

In `@docs/reference/commands.mdx`:
- Line 1151: In the NEMOCLAW_MODEL_ROUTER_PYTHON environment variable
description, replace the colon after the word "Strict" with a period and adjust
capitalization if needed so the sentence reads as two sentences (e.g., "Strict.
NemoClaw probes only...")—update the text near the NEMOCLAW_MODEL_ROUTER_PYTHON
entry in docs/reference/commands.mdx to use a period instead of the colon to
conform to the colon-usage guideline.

In `@src/lib/onboard/model-router-python.test.ts`:
- Line 24: The default version used by probeImportError is outside the supported
ceiling, causing tests that omit the version to also trigger a "version outside
supported range" failure; change the default parameter in
probeImportError(version: readonly [number, number, number] = [3, 13, 0]) to a
supported tuple (e.g., [3,13,0]) so tests that expect only the stdlib import
error (like the "falls back when the top candidate fails the stdlib probe" test)
get a supported Python version by default; update any test expectations that
intentionally rely on an out-of-range default to pass an explicit version when
necessary.
- Line 203: The RegExp-from-variable warning is a static-analysis false positive
because OVERRIDE_ENV_VAR is a module constant, so either keep the current
assert.match(message, new RegExp(OVERRIDE_ENV_VAR)) and silence the rule for
that line by adding an inline eslint disable (e.g., add a single-line comment
like // eslint-disable-next-line security/detect-unsafe-regex) immediately above
the assert.match call, or replace the RegExp with a literal string match (e.g.,
assert.match(message, /NEMOCLAW_MODEL_ROUTER_PYTHON/) ) if you prefer to remove
the linter suppression; target the assert.match line referencing
OVERRIDE_ENV_VAR in model-router-python.test.ts.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 42884b4c-be81-4012-83b9-a145cad8e17e

📥 Commits

Reviewing files that changed from the base of the PR and between 0164e6c and 95a6a03.

📒 Files selected for processing (8)
  • docs/inference/inference-options.md
  • docs/inference/inference-options.mdx
  • docs/reference/commands.md
  • docs/reference/commands.mdx
  • src/lib/onboard.ts
  • src/lib/onboard/model-router-python.test.ts
  • src/lib/onboard/model-router-python.ts
  • test/onboard-model-router.test.ts

Comment thread src/lib/onboard/model-router-python.ts Outdated
Comment thread src/lib/onboard/model-router-python.ts Outdated
…lures in errors

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@laitingsheng laitingsheng added the v0.0.46 Release target label May 19, 2026
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
docs/inference/inference-options.mdx (2)

110-112: ⚡ Quick win

Use inline code formatting for CLI command names in prose.

Line 110 and Line 112 use plain-text python in command-context prose; format those command names as inline code for consistency.

As per coding guidelines, "CLI commands, file paths, flags, parameter names, and values must use inline code formatting."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/inference/inference-options.mdx` around lines 110 - 112, Update the
prose to use inline code formatting for the CLI command names mentioned: replace
plain-text instances of python, python3.12, command -v python3.12, and python -m
venv with inline `code` formatting (e.g., `python`, `python3.12`, `command -v
python3.12`, `python -m venv`) so the sentences in the NemoClaw probing
paragraph follow the CLI/paths/flags formatting guideline.

111-111: ⚡ Quick win

Rewrite passive voice to active voice.

Line 111 uses passive voice ("are rejected"). Please switch this to an active construction.

As per coding guidelines, "Active voice required. Flag passive constructions."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/inference/inference-options.mdx` at line 111, Change the passive
sentence "Relative command names such as `python3.12` are rejected; use `command
-v python3.12` to find the absolute path." to an active construction that names
the actor—e.g., start with "The system rejects relative command names such as
`python3.12`; use `command -v python3.12` to find the absolute path."—so replace
the passive verb "are rejected" with an active verb phrase referencing "the
system" while keeping the example text (`python3.12` and `command -v
python3.12`) intact.
docs/inference/inference-options.md (2)

123-125: ⚡ Quick win

Use inline code formatting for CLI command names in prose.

Line 123 and Line 125 use plain-text python in command-context prose; format those command names as inline code for consistency.

As per coding guidelines, "CLI commands, file paths, flags, parameter names, and values must use inline code formatting."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/inference/inference-options.md` around lines 123 - 125, Update the prose
in docs/inference/inference-options.md so CLI command names use inline code
formatting: change plain "python" occurrences in the sentence "NemoClaw probes
only that interpreter..." and the phrase "If python -m venv itself fails..." to
use inline code formatting (e.g., `python`, `python -m venv`); ensure relative
command examples like python3.12 are also rendered as code (e.g., `python3.12`)
and keep the rest of the wording intact.

124-124: ⚡ Quick win

Rewrite passive voice to active voice.

Line 124 uses passive voice ("are rejected"). Please switch this to an active construction.

As per coding guidelines, "Active voice required. Flag passive constructions."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/inference/inference-options.md` at line 124, Change the passive sentence
"Relative command names such as `python3.12` are rejected; use `command -v
python3.12` to find the absolute path." to active voice by making the subject
perform the action (for example, "The system rejects relative command names such
as `python3.12`; use `command -v python3.12` to find the absolute path.").
Update the sentence in the docs/inference/inference-options.md content where the
phrase "Relative command names such as `python3.12`" appears so the wording uses
active voice (e.g., "The system rejects..." or "We reject...") while keeping the
recommendation to use `command -v python3.12`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@docs/inference/inference-options.md`:
- Around line 123-125: Update the prose in docs/inference/inference-options.md
so CLI command names use inline code formatting: change plain "python"
occurrences in the sentence "NemoClaw probes only that interpreter..." and the
phrase "If python -m venv itself fails..." to use inline code formatting (e.g.,
`python`, `python -m venv`); ensure relative command examples like python3.12
are also rendered as code (e.g., `python3.12`) and keep the rest of the wording
intact.
- Line 124: Change the passive sentence "Relative command names such as
`python3.12` are rejected; use `command -v python3.12` to find the absolute
path." to active voice by making the subject perform the action (for example,
"The system rejects relative command names such as `python3.12`; use `command -v
python3.12` to find the absolute path."). Update the sentence in the
docs/inference/inference-options.md content where the phrase "Relative command
names such as `python3.12`" appears so the wording uses active voice (e.g., "The
system rejects..." or "We reject...") while keeping the recommendation to use
`command -v python3.12`.

In `@docs/inference/inference-options.mdx`:
- Around line 110-112: Update the prose to use inline code formatting for the
CLI command names mentioned: replace plain-text instances of python, python3.12,
command -v python3.12, and python -m venv with inline `code` formatting (e.g.,
`python`, `python3.12`, `command -v python3.12`, `python -m venv`) so the
sentences in the NemoClaw probing paragraph follow the CLI/paths/flags
formatting guideline.
- Line 111: Change the passive sentence "Relative command names such as
`python3.12` are rejected; use `command -v python3.12` to find the absolute
path." to an active construction that names the actor—e.g., start with "The
system rejects relative command names such as `python3.12`; use `command -v
python3.12` to find the absolute path."—so replace the passive verb "are
rejected" with an active verb phrase referencing "the system" while keeping the
example text (`python3.12` and `command -v python3.12`) intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 185b95df-c043-4362-8890-d74f73cdf23a

📥 Commits

Reviewing files that changed from the base of the PR and between 783bb56 and 8e96ccd.

📒 Files selected for processing (7)
  • docs/inference/inference-options.md
  • docs/inference/inference-options.mdx
  • docs/reference/commands.md
  • docs/reference/commands.mdx
  • src/lib/onboard.ts
  • src/lib/onboard/model-router-python.test.ts
  • src/lib/onboard/model-router-python.ts
✅ Files skipped from review due to trivial changes (1)
  • docs/reference/commands.md
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/reference/commands.mdx
  • src/lib/onboard.ts

@cjagwani cjagwani self-assigned this May 19, 2026
@github-actions
Copy link
Copy Markdown
Contributor

ghost commented May 19, 2026

Selective E2E Results — ✅ All requested jobs passed

Run: 26116771755
Target ref: 8e96ccd9b862f894d8f8bde78fcee645a81c4403
Workflow ref: fix/model-router-python-fallback
Requested jobs: cloud-onboard-e2e,double-onboard-e2e,onboard-repair-e2e,onboard-resume-e2e
Summary: 4 passed, 0 failed, 0 skipped

Job Result
cloud-onboard-e2e ✅ success
double-onboard-e2e ✅ success
onboard-repair-e2e ✅ success
onboard-resume-e2e ✅ success

@cjagwani cjagwani added bug Something isn't working priority: medium Issue that should be addressed in upcoming releases labels May 19, 2026
Copy link
Copy Markdown
Contributor

@cjagwani cjagwani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Acceptance match against #3781 is a perfect 4/4, scope is tight, tests are well-structured (DI, no real spawn), and the two CodeRabbit majors on 95a6a03 are properly resolved in 783bb56 (absolute-path spawn + merged probe+venv-stage error).

Couple of optional nits, none blocking merge:

  1. prepareModelRouterVenv's venv-stage fallback (probe-healthy python whose -m venv fails, retry next candidate) is the headline P2 behavior but it's only indirectly covered — the unit test asserts the healthy[] list shape, not the loop. Worth adding a prepareModelRouterVenv test where a fake-python's -m venv returns non-zero for the top candidate and zero for the second. Cheap insurance against a future refactor flattening that loop.

  2. MAX_PYTHON_EXCLUSIVE = [3, 14] is gated by an explicit test, which is the right call. Tracking suggestion: drop a TODO/issue link pointing at the upstream wheel-coverage signal (torch + litellm cp314) so we don't forget to bump.

The unrelated CI / Pull Request / checks red is a flake in src/lib/inference/local.test.ts:49:3 (5s timeout on a sync URL assertion) — not your code.

Copy link
Copy Markdown
Contributor

@ericksoa ericksoa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved after adversarial follow-up fixes and validation. Focused model-router regression and onboard nightly slice are green, and the PR check rerun is clean.

@ericksoa ericksoa merged commit 0427986 into main May 19, 2026
85 of 86 checks passed
@ericksoa ericksoa deleted the fix/model-router-python-fallback branch May 19, 2026 18:53
@miyoungc miyoungc mentioned this pull request May 20, 2026
12 tasks
miyoungc added a commit that referenced this pull request May 20, 2026
## Summary
Refreshes the NemoClaw docs for v0.0.46 by updating version metadata,
release notes, and generated user skills. The refresh also keeps public
docs aligned with the docs skip list by removing non-public experimental
references from the generated output.

## Related Issue
None.

## Changes
- #3744 and #3824 -> `docs/about/release-notes.mdx`: Added Windows
bootstrap and WSL express install coverage for v0.0.46.
- #3392 -> `docs/manage-sandboxes/messaging-channels.mdx`,
`docs/reference/commands.mdx`, `docs/reference/network-policies.mdx`,
and policy examples: Refreshed public messaging channel docs around
WhatsApp and matching policy presets.
- #3742, #3767, #3732, #3786, #3777, and #3808 ->
`docs/about/release-notes.mdx`: Added release-note coverage for Hermes
managed tools, Bedrock Runtime endpoint detection, WSL Ollama proxying,
Model Router Python fallback, plugin command registration, and
tool-catalog latency improvements.
- #3124 -> `docs/about/release-notes.mdx`: Added release-note coverage
for hosted uninstall flag guidance.
- Generated `nemoclaw-user-*` skills from the updated MDX docs for the
v0.0.46 release.

## Type of Change
- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [x] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification
- [ ] `npx prek run --all-files` passes
- [ ] `npm test` passes
- [ ] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [ ] `make docs` builds without warnings (doc changes only)
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

Verification notes:
- Commit hooks passed, including markdownlint, gitleaks, docs-to-skills
verification, env-var docs, and skills YAML checks.
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx` passed.
- `bash test/e2e/e2e-cloud-experimental/check-docs.sh --only-links
--local-only --with-skills` passed.
- `git diff --check` passed.
- `make docs` was attempted but blocked before MDX validation because
`npx` received HTTP 403 fetching `fern-api` from npm.

---
Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Released v0.0.46: improved Windows setup, WhatsApp messaging support,
Hermes sandbox/tool routing, Anthropic endpoint compatibility, Ollama
proxy routing, model-router fallback, OpenClaw plugin/backup
compatibility, sandbox build tooling fixes, and updated uninstall flag
behavior.

* **Documentation**
* Removed WeChat from messaging flows and presets across guides and CLI
docs; clarified onboarding and channel setup for WhatsApp. Clarified
runtime mutability and filesystem (Landlock) behavior — some changes
require sandbox rebuilds; prefer host-side commands for durable config.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3911?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working fix priority: medium Issue that should be addressed in upcoming releases v0.0.46 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[macOS][Onboard] Model Router venv build fails when latest host python is broken; no fallback

4 participants