fix(onboard): prompt for multimodal model inputs (Fixes #3850)#4333
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds helpers to detect/validate inference-input overrides and prompts interactively for text-only vs text+image when a selected model appears multimodal; integrates the prompt into onboarding after model selection and adds tests. ChangesInference Input Capability Prompting for Multimodal Models
sequenceDiagram
participant User
participant Onboard
participant InferenceInputCapability
participant Environment
User->>Onboard: select inference model
Onboard->>InferenceInputCapability: maybePromptForInferenceInputCapability(model, deps)
InferenceInputCapability->>InferenceInputCapability: shouldPromptForInferenceInputCapability(model)
alt Multimodal & interactive
InferenceInputCapability->>User: prompt (1: Text only, 2: Text+Image)
User->>InferenceInputCapability: choose
InferenceInputCapability->>Environment: set NEMOCLAW_INFERENCE_INPUTS accordingly
else Skip (non-interactive or valid env)
InferenceInputCapability->>Environment: no change
end
InferenceInputCapability-->>Onboard: continue with env state
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
test/onboard.test.ts (1)
360-366: ⚡ Quick winAdd an explicit
"image"acceptance assertion.This test currently validates
text,text,image, andimage,text, but not the singletonimagevalue. Adding that one assertion will lock the full accepted-input contract and prevent regressions.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/onboard.test.ts` around lines 360 - 366, Update the unit test that verifies accepted inference input overrides by adding an explicit assertion that the singleton "image" value is accepted; specifically, inside the same test block that calls isValidInferenceInputsOverride for "text", "text,image", and "image,text", add expect(isValidInferenceInputsOverride("image")).toBe(true) so the test covers the standalone "image" input case and prevents regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 3964-3997: The onboarding file grew by adding the
inference-input-capability helpers; extract VALID_INFERENCE_INPUTS_PATTERN,
MULTIMODAL_MODEL_HINT_PATTERN, isValidInferenceInputsOverride,
shouldPromptForInferenceInputCapability and
maybePromptForInferenceInputCapability into a new module (e.g.,
src/lib/inferenceInputCapability.ts), export the functions/constants, then
import maybePromptForInferenceInputCapability (and any other needed symbols)
into src/lib/onboard.ts and replace the in-file definitions with the import so
onboard.ts no longer contains that block and stays within the entrypoint size
budget; ensure function signatures and behavior are unchanged and update any
references to the moved symbols accordingly.
- Around line 3975-3995: The prompt currently leaves an invalid
NEMOCLAW_INFERENCE_INPUTS value unchanged when the user selects "Text only"
(default). In maybePromptForInferenceInputCapability, normalize or clear the env
var when the choice is not "2": after resolving (choice || "1").trim(), set
process.env.NEMOCLAW_INFERENCE_INPUTS = "text" (or delete/clear it) for the
Text-only branch so invalid prior values cannot propagate; keep the existing
branch that sets "text,image" when the user explicitly selects "2".
---
Nitpick comments:
In `@test/onboard.test.ts`:
- Around line 360-366: Update the unit test that verifies accepted inference
input overrides by adding an explicit assertion that the singleton "image" value
is accepted; specifically, inside the same test block that calls
isValidInferenceInputsOverride for "text", "text,image", and "image,text", add
expect(isValidInferenceInputsOverride("image")).toBe(true) so the test covers
the standalone "image" input case and prevents regressions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 59abfcbd-6a40-4870-9269-fc49d667f6b3
📒 Files selected for processing (2)
src/lib/onboard.tstest/onboard.test.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard/inference-input-capability.ts`:
- Line 10: The current regex constant VALID_INFERENCE_INPUTS_PATTERN allows
duplicate tokens (e.g., "text,text"); update the validation to ensure tokens are
unique by replacing or augmenting usage of VALID_INFERENCE_INPUTS_PATTERN with
logic that splits the input by comma, verifies each token is one of "text" or
"image", and then checks that new Set(tokens).size === tokens.length to reject
duplicates; update any code that relies on VALID_INFERENCE_INPUTS_PATTERN
(search for that constant) to use this uniqueness check so values like
"text,text" and "image,image" are considered invalid.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 4bbd1676-b232-4e40-90e6-59cf0d72cd06
📒 Files selected for processing (3)
src/lib/onboard.tssrc/lib/onboard/inference-input-capability.tstest/onboard.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- test/onboard.test.ts
- src/lib/onboard.ts
|
Moved input-capability logic under src/lib/onboard, normalized invalid text-only overrides to text, rejected duplicate tokens, and kept onboard.ts budget-neutral. Local checks: build:cli, typecheck:cli, lint, and full test/onboard.test.ts. |
|
✨ Thanks for submitting this detailed PR that addresses the issue with multimodal model inputs during onboarding. This proposes a fix for the problem where vision-capable models are treated as text-only and adds an interactive prompt to handle such cases, improving the overall onboarding experience. Related open issues: |
6299944 to
5a3d64d
Compare
|
Rebased on current main. build:cli and the focused multimodal input prompt test pass. |
prekshivyas
left a comment
There was a problem hiding this comment.
LGTM — narrow addition in setupNim with an isolated helper, env override preserved for non-interactive flows, tests cover detection, override vocabulary, and bad-override normalization. CI green on 5a3d64d.
Non-blocking follow-up: should non-NIM provider paths (e.g. setupOpenAI, the Ollama path) also surface this prompt for multimodal-hinting model names, or is NIM the only provider where the model id alone leaves capability ambiguous?
|
@deepujain — DCO is failing on the merge commit Could you rebase this branch onto current main and force-push? That'll drop the merge commit and DCO should go green. Sorry for the noise. |
Fixes NVIDIA#3850 Signed-off-by: Deepak Jain <deepujain@gmail.com>
Signed-off-by: Deepak Jain <deepujain@gmail.com>
Signed-off-by: Deepak Jain <deepujain@gmail.com>
Signed-off-by: Deepak Jain <deepujain@gmail.com>
6751692 to
409b372
Compare
|
Rebased onto current main and dropped the unsigned merge commit, so DCO should clear on the next run. The four PR commits are signed, and build:cli plus the focused multimodal input prompt test pass. |
## Summary - Adds the `v0.0.60` section to `docs/about/release-notes.mdx` using the dev announcement from discussion #4877. - Fills the source-doc gaps found during release-prep review across inference, policy tiers, command behavior, security boundaries, Hermes dashboard/tooling, runtime context, and troubleshooting. - Refreshes generated agent skills under `.agents/skills/` from the current Fern docs output and upgrades Fern from `5.44.3` to `5.45.0`. ## Source summary - #4037 -> `docs/reference/architecture.mdx`, `docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents system-only runtime context that stays out of visible chat. - #4875 -> `docs/reference/architecture.mdx`, `docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents try-first sandbox network/filesystem guidance and clearer failure classification. - #4788 -> `docs/security/best-practices.mdx`, `docs/about/release-notes.mdx`: Documents shared OpenClaw device-approval policy for startup and connect. - #4768 -> `docs/reference/network-policies.mdx`, `docs/network-policy/integration-policy-examples.mdx`, `docs/get-started/quickstart.mdx`, `docs/get-started/quickstart-hermes.mdx`, `docs/reference/commands.mdx`: Documents `weather`, `public-reference`, and Hermes managed-tool gateway preset behavior. - #3788 and #4864 -> `docs/reference/network-policies.mdx`, `docs/reference/commands.mdx`: Documents non-interactive policy-tier fail-fast behavior and interactive prompt fallback. - #4756 and #4866 -> `docs/reference/commands.mdx`: Documents env-aware default sandbox resolution for `list`, `status`, and `tunnel` commands. - #4320 -> `docs/reference/commands.mdx`: Documents `$$nemoclaw tunnel status` behavior. - #4328 -> `docs/reference/commands.mdx`: Documents line-scoped policy preset descriptions in `policy-list`. - #4580 and #4748 -> `docs/reference/architecture.mdx`: Documents package-managed OpenShell gateway service and Docker-driver gateway-marker behavior. - #4598 -> `docs/manage-sandboxes/lifecycle.mdx`: Documents concurrent gateway/dashboard cleanup isolation by sandbox name and port. - #4777 -> `docs/reference/troubleshooting.mdx`: Documents Docker GPU patch rollback behavior. - #4610 -> `docs/reference/troubleshooting.mdx`, `docs/reference/commands.mdx`: Keeps mutable OpenClaw config permission guidance aligned and removes skipped experimental wording. - #4868 -> `docs/reference/commands.mdx`: Keeps `.dockerignore` handling for custom `onboard --from <Dockerfile>` contexts in generated skills. - #4870 -> `docs/reference/commands.mdx`, `docs/manage-sandboxes/runtime-controls.mdx`: Documents `NEMOCLAW_MINIMAL_BOOTSTRAP` and generated skill coverage. - #4641 -> `docs/inference/inference-options.mdx`, `docs/reference/troubleshooting.mdx`: Documents local NVIDIA NIM platform-digest pulls and served-model id adoption. - #4810 and #4867 -> `docs/inference/inference-options.mdx`: Documents stable NGC managed-vLLM image lineage and DGX Station DeepSeek V4 Flash coverage. - #4852 -> `docs/inference/use-local-inference.mdx`, `docs/reference/troubleshooting.mdx`: Documents Ollama model fit filtering, 16K context floor, cold-load retry, and failed-model exclusion. - #4847 -> `docs/inference/switch-inference-providers.mdx`: Documents API-family sync, Hermes `api_mode`, and Bedrock Runtime exception. - #4800 -> `docs/inference/tool-calling-reliability.mdx`: Documents Nemotron managed-inference native tool-search fallback. - #4333 -> `docs/inference/switch-inference-providers.mdx`: Documents interactive multimodal input prompting. - #4086 -> `docs/reference/troubleshooting.mdx`: Keeps proxy bypass normalization in generated troubleshooting coverage. - #4811 and #4855 -> `docs/get-started/quickstart-hermes.mdx`: Documents prebuilt Hermes dashboard assets and TUI recovery without runtime rebuilds. - #4854 -> `docs/inference/switch-inference-providers.mdx`, `docs/reference/commands.mdx`: Documents Hermes proxy API-key placeholder preservation during inference switches. - #4248 -> `docs/manage-sandboxes/messaging-channels.mdx`, `.agents/skills/`: Keeps messaging enrollment behavior aligned with manifest-hook implementation. - #4771 -> `docs/security/best-practices.mdx`, `docs/security/credential-storage.mdx`: Documents Hermes placeholder-only secret boundary for sandbox-visible runtime files. - #4787 -> `docs/security/best-practices.mdx`, `docs/about/release-notes.mdx`: Documents expanded memory scanner examples for OpenAI project keys and Slack app-level tokens. - #4848 -> `docs/reference/commands.mdx`: Documents OpenClaw skill install mirroring into the agent home directory. - #4790 -> `docs/about/release-notes.mdx`: Uses the prior release-prep structure and generated `.agents/skills/` refresh as the template for this release. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ skills/ --prefix nemoclaw-user --doc-platform fern-mdx --dry-run` - `npm run docs` - `git diff --check` - skip-term scan across `docs/`, `.agents/skills/`, and `skills/` - `npm run build:cli` - `npm run typecheck:cli` - Commit and pre-push hook suites, including markdownlint, gitleaks, env-var docs gate, docs-to-skills verification, and skills YAML tests <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * DeepSeek-V4-Flash now available as default inference model for DGX Station. * Hermes dashboard improved with dedicated port and OAuth-authenticated tool gateway selection. * Added weather and public-reference policy presets for expanded agent capabilities. * Enhanced Ollama model selection with GPU memory filtering and automatic retry for timeouts. * **Bug Fixes** * Improved policy tier validation to prevent invalid configurations. * Better sandbox cleanup scoping by port to prevent conflicts across deployments. * Added GPU patch failure recovery with automatic rollback. * **Documentation** * Expanded troubleshooting guides for inference, security, and sandbox lifecycle. * Added .dockerignore best practices for custom deployments. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Summary
Fixes #3850.
Some providers only return a model id during discovery, so a vision-capable model can get baked into OpenClaw as text-only. NemoClaw already supports
NEMOCLAW_INFERENCE_INPUTS, but that is easy to miss during interactive onboarding.This adds a small interactive prompt for model names that strongly look multimodal, such as
omni,vision,vl,image, ormultimodal. The default remains text-only. ChoosingText + Imagesets the existingNEMOCLAW_INFERENCE_INPUTS=text,imagepath before the sandbox config is generated.Changes
src/lib/onboard.ts: detects likely multimodal model names and asks for input capability during interactive onboarding.src/lib/onboard.ts: preserves the existing env override for non-interactive and scripted flows.test/onboard.test.ts: covers the model-name detection and accepted input capability values.Testing
npm run build:clinpm run typecheck:clinpx vitest run test/onboard.test.ts -t 'input capability'npx vitest run test/onboard.test.tsEvidence
The focused tests cover the multimodal-name gate and strict capability vocabulary. The full onboard helper suite passes with 57 tests.
Signed-off-by: Deepak Jain dejain@nvidia.com
Summary by CodeRabbit