feat(inference): validate streaming events for /v1/responses and add NEMOCLAW_PREFERRED_API override by ericksoa · Pull Request #1833 · NVIDIA/NemoClaw

ericksoa · 2026-04-13T13:17:46Z

Summary

Adds streaming SSE event validation to the /v1/responses probe for custom OpenAI-compatible endpoints, catching backends like SGLang that return valid non-streaming responses but emit incomplete streaming events
Adds NEMOCLAW_PREFERRED_API=openai-completions env var to bypass /v1/responses probe entirely during onboarding
Documents both the env var override and the existing NEMOCLAW_INFERENCE_API_OVERRIDE workaround for already-onboarded sandboxes

Context

Community user reported SGLang passes onboarding validation for /v1/responses but fails at runtime because its streaming mode only emits 3 lifecycle events (response.created, response.in_progress, response.completed) — missing the granular content deltas OpenClaw requires (response.output_text.delta, etc.).

Test plan

Unit tests for shouldForceCompletionsApi() (6 cases) and runStreamingEventProbe() (5 cases) pass
NEMOCLAW_PREFERRED_API=openai-completions skips /v1/responses probe during custom endpoint onboarding
Streaming probe detects SGLang-like incomplete SSE events and falls back to /chat/completions
Full test suite green

Summary by CodeRabbit

New Features
- Added NEMOCLAW_PREFERRED_API to force Chat Completions (works interactive/non‑interactive) and optionally skip the /v1/responses probe
- Onboarding now validates streaming events and will automatically fall back to Chat Completions if required events are missing; transport/probe failures produce a hard failure
Documentation
- New troubleshooting and recovery steps (rerun nemoclaw onboard to re‑probe and bake the correct API)
- Clarified that NEMOCLAW_INFERENCE_API_OVERRIDE only patches startup config and does not update baked image ARGs
- Minor wording tweak about image rebuilds
Tests
- Added tests covering streaming probes, cleanup, error cases, and the preference logic

…NEMOCLAW_PREFERRED_API override Backends like SGLang expose /v1/responses and pass the existing non-streaming validation probe, but their streaming mode only emits lifecycle events (created/in_progress/completed) without the granular content deltas OpenClaw requires (output_text.delta, etc.). This causes runtime failures after onboarding succeeds. Changes: - Add runStreamingEventProbe() in http-probe.ts that sends a stream:true request and verifies the SSE event stream includes response.output_text.delta - Integrate the streaming probe into probeOpenAiLikeEndpoint for custom endpoints (probeStreaming: true) — falls back to /chat/completions when streaming events are incomplete - Add shouldForceCompletionsApi() in validation.ts checking NEMOCLAW_PREFERRED_API env var so users can bypass /responses entirely - Wire both into validateCustomOpenAiLikeSelection - Add unit tests for the new functions (11 new test cases) - Document NEMOCLAW_PREFERRED_API, the NEMOCLAW_INFERENCE_API_OVERRIDE workaround, and a troubleshooting entry for the runtime failure scenario Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

coderabbitai · 2026-04-13T13:18:05Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: eebdabde-6905-4e12-bb17-8c62c68223ad

📥 Commits

Reviewing files that changed from the base of the PR and between 92e6a93 and be9383c.

📒 Files selected for processing (1)

src/lib/onboard.ts

🚧 Files skipped from review as they are similar to previous changes (1)

src/lib/onboard.ts

📝 Walkthrough

Walkthrough

Adds a streaming-event probe for OpenAI-compatible /v1/responses, a preference override to force /v1/chat/completions, onboarding changes to re-probe and bake the chosen API, tests for the probe and validation, and multiple documentation updates describing recovery and overrides.

Changes

Cohort / File(s)	Summary
Docs & Skill Guidance `/.agents/skills/nemoclaw-user-configure-inference/SKILL.md`, `/.agents/skills/nemoclaw-user-reference/references/troubleshooting.md`, `docs/inference/switch-inference-providers.md`, `docs/inference/use-local-inference.md`, `docs/reference/troubleshooting.md`	Add troubleshooting and recovery guidance for `/v1/responses` streaming gaps; document `NEMOCLAW_PREFERRED_API` to force chat-completions during onboarding; clarify `NEMOCLAW_INFERENCE_API_OVERRIDE` limits and instruct re-running `nemoclaw onboard` to re-probe and bake the API choice.
HTTP Streaming Probe `src/lib/http-probe.ts`, `src/lib/http-probe.test.ts`	Add `StreamingProbeResult` and `runStreamingEventProbe()` to curl `/v1/responses` in streaming mode, parse SSE `event:` lines (check for `response.output_text.delta`), manage temp-file lifecycle; add tests for success, missing events, timeout/exit handling, spawn errors, and cleanup.
Onboarding Integration `src/lib/onboard.ts`	Run streaming-event probe during `/v1/responses` validation when `probeStreaming` enabled; record streaming-specific failures (with `(streaming)` suffix), allow fallback to `/v1/chat/completions`, and support skipping `/responses` when `shouldForceCompletionsApi()` indicates preference.
Validation & Tests `src/lib/validation.ts`, `src/lib/validation.test.ts`	Add exported `shouldForceCompletionsApi(preferredApi?)` (case-insensitive detection of `openai-completions` / `chat-completions`) and tests verifying true/false cases and normalization.

Sequence Diagram

sequenceDiagram
    participant User
    participant Onboard as Onboarding
    participant Validation as ValidationLogic
    participant Probe as HTTPProbe
    participant Server as OpenAI-Compatible Server

    User->>Onboard: run "nemoclaw onboard"
    Onboard->>Validation: read NEMOCLAW_PREFERRED_API
    alt preference forces completions
        Validation-->>Onboard: skip /v1/responses
        Onboard->>Probe: probe /v1/chat/completions
        Probe->>Server: probe /v1/chat/completions
        Server-->>Probe: OK
    else probe responses first
        Onboard->>Probe: probe /v1/responses (non-stream)
        Probe->>Server: probe /v1/responses
        Server-->>Probe: OK
        Onboard->>Probe: runStreamingEventProbe(/v1/responses)
        Probe->>Server: curl -N /v1/responses (stream)
        Server-->>Probe: SSE events
        Probe->>Probe: parse events, check response.output_text.delta
        alt required event present
            Probe-->>Onboard: {ok: true}
        else missing events
            Probe-->>Onboard: {ok: false, missingEvents: [...]}
            Onboard->>Probe: probe /v1/chat/completions (fallback)
            Probe->>Server: probe /v1/chat/completions
            Server-->>Probe: OK
        end
    end
    Probe-->>Onboard: final probe result
    Onboard-->>User: onboarding result (and image baked config)

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped along the streaming trail tonight,
Listening for deltas in soft SSE light,
When responses falter, I nudge the flow,
Fall back to completions, and onward we go,
A small rabbit's tweak to make sandboxes right.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding streaming event validation for /v1/responses and introducing the NEMOCLAW_PREFERRED_API override feature.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/sglang-responses-streaming-validation

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

src/lib/validation.ts (1)
132-140: Pass the preferred API into this helper instead of reading process.env here.

validation.ts is documented as a pure, input-driven module, but this addition now depends on process state. Moving the env lookup to src/lib/onboard.ts keeps this layer deterministic and easier to reuse/test.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/validation.ts` around lines 132 - 140, The helper
shouldForceCompletionsApi now reads process.env directly which breaks the pure,
input-driven design of validation.ts; change its signature to accept the
preferred API string (e.g., preferredApi: string | undefined) and remove any env
access from inside shouldForceCompletionsApi, then perform the
trim()/toLowerCase() & comparison there; update callers (notably in
src/lib/onboard.ts) to read process.env.NEMOCLAW_PREFERRED_API, pass that value
into shouldForceCompletionsApi, and adjust tests accordingly so validation.ts
remains deterministic and testable.
docs/inference/switch-inference-providers.md (1)
87-89: Rewrite the passive sentence in active voice.

No image rebuild is needed. reads passively. Say what NemoClaw does or what the reader does instead. As per coding guidelines, "Active voice required. Flag passive constructions."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/inference/switch-inference-providers.md` around lines 87 - 89, Rewrite
the passive sentence "No image rebuild is needed." in active voice—replace it
with a clear actor-based line such as "You do not need to rebuild the image." or
"NemoClaw does not require rebuilding the image." Update the sentence near the
existing text that mentions patching `openclaw.json` (the line that currently
reads "No image rebuild is needed.") so the statement explicitly names the
actor.
docs/inference/use-local-inference.md (1)
147-149: Address the reader directly in this paragraph.

This variable tells the wizard... / It works... is feature-centric wording. The docs style guide asks for second person when you describe what the reader should do or what happens when they set a value. As per coding guidelines, "Second person ('you') when addressing the reader."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/inference/use-local-inference.md` around lines 147 - 149, Update the two
sentences that currently start "This variable tells the wizard..." and "It works
in both..." to use second-person wording addressing the reader; for example,
replace with something like "Set this variable to make the wizard skip the
/v1/responses probe and use /v1/chat/completions directly." and "This works in
both interactive and non-interactive modes." Ensure the edited sentences mention
the endpoints (/v1/responses and /v1/chat/completions) and keep the meaning
unchanged while using "you"/direct instruction tone.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/skills/nemoclaw-user-configure-inference/SKILL.md:
- Around line 131-145: The SKILL.md changes were made directly but this artifact
must be regenerated from the canonical docs source; revert the manual edits in
SKILL.md (the nemoclaw-user-configure-inference SKILL.md) and run the project's
skill-generation pipeline/tool that produces .agents/skills/*/SKILL.md from
docs/ (regenerate the skill from docs to reapply the intended content), then
commit the regenerated SKILL.md so the file is produced consistently from docs
rather than edited in place.

In `@src/lib/onboard.ts`:
- Around line 1204-1229: The code treats any runStreamingEventProbe failure
(streamResult.ok === false) as a streaming-incompatibility and falls back
silently; instead, inspect runStreamingEventProbe's failure details (e.g.,
streamResult.reason, streamResult.errorCode, or the text in
streamResult.message) and only perform the streaming-to-chat fallback when the
failure explicitly indicates missing/unsupported SSE events (e.g., reason ===
"missing-events" or message contains "missing events"); for all other non-ok
results from runStreamingEventProbe, surface a validation error (push
failure/log it and abort/return the probe) rather than switching APIs. Use the
runStreamingEventProbe, streamResult.ok, and
streamResult.message/streamResult.reason identifiers to locate and implement
this conditional.

---

Nitpick comments:
In `@docs/inference/switch-inference-providers.md`:
- Around line 87-89: Rewrite the passive sentence "No image rebuild is needed."
in active voice—replace it with a clear actor-based line such as "You do not
need to rebuild the image." or "NemoClaw does not require rebuilding the image."
Update the sentence near the existing text that mentions patching
`openclaw.json` (the line that currently reads "No image rebuild is needed.") so
the statement explicitly names the actor.

In `@docs/inference/use-local-inference.md`:
- Around line 147-149: Update the two sentences that currently start "This
variable tells the wizard..." and "It works in both..." to use second-person
wording addressing the reader; for example, replace with something like "Set
this variable to make the wizard skip the /v1/responses probe and use
/v1/chat/completions directly." and "This works in both interactive and
non-interactive modes." Ensure the edited sentences mention the endpoints
(/v1/responses and /v1/chat/completions) and keep the meaning unchanged while
using "you"/direct instruction tone.

In `@src/lib/validation.ts`:
- Around line 132-140: The helper shouldForceCompletionsApi now reads
process.env directly which breaks the pure, input-driven design of
validation.ts; change its signature to accept the preferred API string (e.g.,
preferredApi: string | undefined) and remove any env access from inside
shouldForceCompletionsApi, then perform the trim()/toLowerCase() & comparison
there; update callers (notably in src/lib/onboard.ts) to read
process.env.NEMOCLAW_PREFERRED_API, pass that value into
shouldForceCompletionsApi, and adjust tests accordingly so validation.ts remains
deterministic and testable.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: edd33113-b66d-4071-aa12-59635240f544

📥 Commits

Reviewing files that changed from the base of the PR and between d4aac4c and 7df2237.

📒 Files selected for processing (10)

.agents/skills/nemoclaw-user-configure-inference/SKILL.md
.agents/skills/nemoclaw-user-reference/references/troubleshooting.md
docs/inference/switch-inference-providers.md
docs/inference/use-local-inference.md
docs/reference/troubleshooting.md
src/lib/http-probe.test.ts
src/lib/http-probe.ts
src/lib/onboard.ts
src/lib/validation.test.ts
src/lib/validation.ts

.agents/skills/nemoclaw-user-configure-inference/SKILL.md

src/lib/onboard.ts

…file ARG precedence NEMOCLAW_INFERENCE_API_OVERRIDE only patches openclaw.json at container startup — it does not update the Dockerfile ARG baked into the image. On recreate-sandbox the baked value wins. The reliable fix is a fresh nemoclaw onboard which re-probes and rebakes the image. Updated all three doc pages to recommend nemoclaw onboard instead of the override env var, and added a note explaining the limitation. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

- Distinguish transport failures from missing-events in streaming probe fallback: only fall back to /chat/completions when missingEvents is non-empty; surface transport errors as hard validation failures - Make shouldForceCompletionsApi() pure by accepting the preferred API value as a parameter instead of reading process.env directly, keeping validation.ts free of I/O per its module contract - Fix passive voice and second-person wording in docs Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

brandonpelfrey · 2026-04-13T19:08:06Z

🦞 NemoClaw Functional Review — PR #1833

Verdict: APPROVE

Summary

feat(inference): validate streaming events for /v1/responses and add NEMOCLAW_PREFERRED_API override

Changed files: 10 files
Tests: 1374 passed, 0 failed

Blocking Issues

⚠️ ~~Potential hardcoded credentials detected~~ — False positive. The flagged line is getCredential(credentialEnv) — a function call that reads from the secure credential store, not a hardcoded secret.

Functional Testing

Check	Result	Evidence
Clone & checkout	✅	`git fetch origin pull/1833/head:pr-1833 && git checkout pr-1833` — clean
Dependencies	✅	`npm install --include=dev` — installed without errors
Build	✅ Build completed — `npm run build:cli` exited 0, no TypeScript errors
Tests	✅ All 1374 tests pass — `npx vitest run --reporter=verbose` exited 0
Dockerfile	ℹ️ Dockerfile not modified by this PR
Entrypoint	ℹ️ nemoclaw-start.sh not modified by this PR
Full onboard	✅ Full onboard test pass — DinD (Ubuntu 22.04) — ✅12 ❌0 ⚠️1

Adversarial Testing (15 tests — 15 pass, 0 concern)

Ran 15 adversarial tests against the live sandbox targeting SSE parsing, event name matching, env var validation, error handling, temp file cleanup, and DoS resilience.

Adversarial Test Report — PR #1833

Tested: 2026-04-13T13:55 UTC
Container: nemoclaw-onboard-1833
Sandbox: pr-test
Commit: 7df2237

Analysis

What the PR does: Adds streaming event validation for /v1/responses probes during onboarding. Backends like SGLang expose the endpoint but only emit lifecycle events (created/in_progress/completed), missing the response.output_text.delta events OpenClaw requires. The PR auto-detects this and falls back to /v1/chat/completions. Also adds NEMOCLAW_PREFERRED_API env var to force chat completions mode, and NEMOCLAW_INFERENCE_API_OVERRIDE for post-onboard switching.

Attack surfaces identified:

SSE parsing robustness (malformed, injected, empty, huge payloads)
Event name matching (case sensitivity, spoofing)
NEMOCLAW_PREFERRED_API env var validation (injection, bypass)
Temp file handling (cleanup, leak)
Error handling (curl failures, timeouts)

Adversarial Tests

Test 1: SGLang-like incomplete streaming detected

Hypothesis: Response with only lifecycle events (no delta) should fail the probe
Impact: This is the core bug the PR fixes. If incomplete streaming isn't detected, users with SGLang backends would pass onboarding but have agents fail at runtime with no actionable error — the exact problem that motivated this PR.
Command: runStreamingEventProbe() with SSE containing only created/in_progress/completed
Output: ok: false, missingEvents: ['response.output_text.delta']
Result: PASS

Test 2: Full valid streaming response passes

Hypothesis: Response with all required events should pass the probe
Impact: If valid streaming is rejected, all working /v1/responses backends would be forced to fall back to chat completions — degrading the experience for users with fully-compliant backends.
Command: runStreamingEventProbe() with SSE including response.output_text.delta
Output: ok: true
Result: PASS

Test 3: Empty response body

Hypothesis: Empty SSE body should fail the probe (no events at all)
Impact: An empty response (server returns 200 but no content) must not be treated as "all events present" — it would let a broken backend through.
Command: runStreamingEventProbe() with empty string body
Output: ok: false, missingEvents: ['response.output_text.delta']
Result: PASS

Test 4: Malformed SSE (no `event:` prefix)

Hypothesis: Lines without proper event: prefix should not count as valid events
Impact: If the parser matches event names without the event: prefix, any line containing the event name (comments, data payloads) could be misinterpreted as a valid event — masking broken backends.
Command: runStreamingEventProbe() with body response.output_text.delta\ndata: {...}
Output: ok: false
Result: PASS — regex requires ^event:\s* prefix

Test 5: XSS in delta data

Hypothesis: Malicious content in data: field should not affect event name matching
Impact: The probe only parses event names, not data payloads. If data content leaked into event matching, a crafted response could inject events that don't actually exist.
Command: runStreamingEventProbe() with data: {"delta":"<script>alert(1)</script>"}
Output: ok: true — event name correctly matched, data content ignored
Result: PASS

Test 6: Case sensitivity of event names

Hypothesis: Response.Output_Text.Delta (wrong case) should NOT match response.output_text.delta
Impact: SSE event names are case-sensitive per spec. If the parser were case-insensitive, a backend emitting wrong-case events would pass the probe but fail at runtime in OpenClaw (which matches exact case).
Command: runStreamingEventProbe() with event: Response.Output_Text.Delta
Output: ok: false
Result: PASS — case-sensitive matching matches OpenClaw's runtime behavior

Test 7: curl timeout (exit 28) with valid events

Hypothesis: Timeout is expected for streaming — if events were captured before timeout, probe should pass
Impact: Without this handling, every streaming probe against a real server would fail (streaming doesn't "complete" — it's a long-lived connection), blocking all /v1/responses backends from being selected.
Command: runStreamingEventProbe() with exit code 28 and valid events
Output: ok: true
Result: PASS

Test 8: curl error code 7 (connection refused)

Hypothesis: Non-timeout curl errors should fail the probe even if some SSE was written
Impact: If connection errors were silently ignored, a server that crashes mid-stream would appear healthy — agents would be onboarded against an unreachable endpoint.
Command: runStreamingEventProbe() with exit code 7
Output: ok: false
Result: PASS

Test 9: NEMOCLAW_PREFERRED_API valid values

Hypothesis: openai-completions and chat-completions should both trigger force-completions mode
Impact: Users following the documented override must be able to bypass the responses probe. If valid values aren't recognized, the documented workaround doesn't work.
Command: shouldForceCompletionsApi() with both valid values
Output: true for both
Result: PASS

Test 10: NEMOCLAW_PREFERRED_API invalid/malicious values

Hypothesis: Values like openai-responses, rm -rf /, injection strings should NOT trigger force mode
Impact: If arbitrary values trigger force-completions, a typo or malicious env var could silently downgrade all users from Responses API to chat completions — losing functionality like built-in tool calling.
Command: shouldForceCompletionsApi() with openai-responses, rm -rf /, true; curl evil.com
Output: false for all
Result: PASS

Test 11: NEMOCLAW_PREFERRED_API case insensitivity

Hypothesis: OPENAI-COMPLETIONS should be treated the same as openai-completions
Impact: Case sensitivity in env var values is a common user frustration. If the check were case-sensitive, OPENAI-COMPLETIONS in a Dockerfile or shell export would silently not take effect.
Command: shouldForceCompletionsApi() with OPENAI-COMPLETIONS
Output: true
Result: PASS

Test 12: NEMOCLAW_PREFERRED_API whitespace handling

Hypothesis: Leading/trailing whitespace should be trimmed
Impact: Copy-pasting env var values from docs often introduces trailing whitespace. If not trimmed, the override silently fails.
Command: shouldForceCompletionsApi() with openai-completions
Output: true
Result: PASS

Test 13: NEMOCLAW_PREFERRED_API empty and unset

Hypothesis: Empty string and unset should both return false (auto-detect mode)
Impact: If empty string triggered force mode, any script that sets NEMOCLAW_PREFERRED_API= (common pattern to "unset" in Docker) would silently force chat completions, losing Responses API support.
Command: shouldForceCompletionsApi() with "" and undefined
Output: false for both
Result: PASS

Test 14: Temp file cleanup after probe

Hypothesis: SSE output file should be deleted after probe completes
Impact: Leaked temp files containing SSE data could accumulate on disk and potentially expose API responses (including model output) to other processes on the same host.
Command: Capture temp file path during probe, check existence after
Output: File does not exist after probe returns
Result: PASS

Test 15: Very large SSE body (10MB)

Hypothesis: A 10MB response body should not crash the probe
Impact: A malicious or buggy server returning a huge streaming response could cause out-of-memory in the probe, crashing the entire onboard process and leaving the user unable to set up NemoClaw.
Command: runStreamingEventProbe() with 10MB data payload
Output: ok: true — no crash
Result: PASS

Summary

Tests	Pass	Concern	Fail
15	15	0	0

Verdict impact: Clean pass across all tests. The streaming event detection, env var validation, error handling, and temp file cleanup are all solid. No concerns.

Security Scan

Check	Result	Evidence
Dangerous patterns (eval/exec/proto)	Scanned `pr.diff` with `grep -n`	`RegExp.exec()` — SSE parsing, not `child_process.exec` (false positive)
Hardcoded credentials	Scanned `pr.diff` with `grep -niE`	`getCredential()` function call, not hardcoded (false positive)
Dependency changes	Checked `package.json` diff	(no dependency changes)
Permission changes (chmod/chown)	Scanned `pr.diff`	(none found)

Notes

This is an automated functional review. Every ✅ above was verified by running the stated command/test.
Manual review is still recommended for business logic, API contracts, and performance.

🦞 Auto-reviewed by Nemo.

…NEMOCLAW_PREFERRED_API override (NVIDIA#1833) ## Summary - Adds streaming SSE event validation to the `/v1/responses` probe for custom OpenAI-compatible endpoints, catching backends like SGLang that return valid non-streaming responses but emit incomplete streaming events - Adds `NEMOCLAW_PREFERRED_API=openai-completions` env var to bypass `/v1/responses` probe entirely during onboarding - Documents both the env var override and the existing `NEMOCLAW_INFERENCE_API_OVERRIDE` workaround for already-onboarded sandboxes ## Context Community user reported SGLang passes onboarding validation for `/v1/responses` but fails at runtime because its streaming mode only emits 3 lifecycle events (`response.created`, `response.in_progress`, `response.completed`) — missing the granular content deltas OpenClaw requires (`response.output_text.delta`, etc.). ## Test plan - [ ] Unit tests for `shouldForceCompletionsApi()` (6 cases) and `runStreamingEventProbe()` (5 cases) pass - [ ] `NEMOCLAW_PREFERRED_API=openai-completions` skips `/v1/responses` probe during custom endpoint onboarding - [ ] Streaming probe detects SGLang-like incomplete SSE events and falls back to `/chat/completions` - [ ] Full test suite green  ## Summary by CodeRabbit * **New Features** * Added NEMOCLAW_PREFERRED_API to force Chat Completions (works interactive/non‑interactive) and optionally skip the /v1/responses probe * Onboarding now validates streaming events and will automatically fall back to Chat Completions if required events are missing; transport/probe failures produce a hard failure * **Documentation** * New troubleshooting and recovery steps (rerun `nemoclaw onboard` to re‑probe and bake the correct API) * Clarified that NEMOCLAW_INFERENCE_API_OVERRIDE only patches startup config and does not update baked image ARGs * Minor wording tweak about image rebuilds * **Tests** * Added tests covering streaming probes, cleanup, error cases, and the preference logic  --------- Signed-off-by: Aaron Erickson <aerickson@nvidia.com>

…NEMOCLAW_PREFERRED_API override (NVIDIA#1833) ## Summary - Adds streaming SSE event validation to the `/v1/responses` probe for custom OpenAI-compatible endpoints, catching backends like SGLang that return valid non-streaming responses but emit incomplete streaming events - Adds `NEMOCLAW_PREFERRED_API=openai-completions` env var to bypass `/v1/responses` probe entirely during onboarding - Documents both the env var override and the existing `NEMOCLAW_INFERENCE_API_OVERRIDE` workaround for already-onboarded sandboxes ## Context Community user reported SGLang passes onboarding validation for `/v1/responses` but fails at runtime because its streaming mode only emits 3 lifecycle events (`response.created`, `response.in_progress`, `response.completed`) — missing the granular content deltas OpenClaw requires (`response.output_text.delta`, etc.). ## Test plan - [ ] Unit tests for `shouldForceCompletionsApi()` (6 cases) and `runStreamingEventProbe()` (5 cases) pass - [ ] `NEMOCLAW_PREFERRED_API=openai-completions` skips `/v1/responses` probe during custom endpoint onboarding - [ ] Streaming probe detects SGLang-like incomplete SSE events and falls back to `/chat/completions` - [ ] Full test suite green  ## Summary by CodeRabbit * **New Features** * Added NEMOCLAW_PREFERRED_API to force Chat Completions (works interactive/non‑interactive) and optionally skip the /v1/responses probe * Onboarding now validates streaming events and will automatically fall back to Chat Completions if required events are missing; transport/probe failures produce a hard failure * **Documentation** * New troubleshooting and recovery steps (rerun `nemoclaw onboard` to re‑probe and bake the correct API) * Clarified that NEMOCLAW_INFERENCE_API_OVERRIDE only patches startup config and does not update baked image ARGs * Minor wording tweak about image rebuilds * **Tests** * Added tests covering streaming probes, cleanup, error cases, and the preference logic  --------- Signed-off-by: Aaron Erickson <aerickson@nvidia.com> Signed-off-by: ColinM-sys <cmcdonough@50words.com>

coderabbitai bot reviewed Apr 13, 2026

View reviewed changes

.agents/skills/nemoclaw-user-configure-inference/SKILL.md Show resolved Hide resolved

src/lib/onboard.ts Show resolved Hide resolved

ericksoa added 2 commits April 13, 2026 07:42

ericksoa self-assigned this Apr 13, 2026

wscurran added the enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. label Apr 13, 2026

Merge branch 'main' into feat/sglang-responses-streaming-validation

be9383c

ericksoa added bug Something isn't working and removed enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. labels Apr 13, 2026

brandonpelfrey self-requested a review April 13, 2026 19:08

brandonpelfrey approved these changes Apr 13, 2026

View reviewed changes

ericksoa merged commit a064e97 into main Apr 13, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(inference): validate streaming events for /v1/responses and add NEMOCLAW_PREFERRED_API override#1833

feat(inference): validate streaming events for /v1/responses and add NEMOCLAW_PREFERRED_API override#1833
ericksoa merged 4 commits intomainfrom
feat/sglang-responses-streaming-validation

ericksoa commented Apr 13, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 13, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

brandonpelfrey commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ericksoa commented Apr 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Context

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

brandonpelfrey commented Apr 13, 2026

🦞 NemoClaw Functional Review — PR #1833

Summary

Blocking Issues

Functional Testing

Adversarial Testing (15 tests — 15 pass, 0 concern)

Adversarial Test Report — PR #1833

Analysis

Adversarial Tests

Test 1: SGLang-like incomplete streaming detected

Test 2: Full valid streaming response passes

Test 3: Empty response body

Test 4: Malformed SSE (no event: prefix)

Test 5: XSS in delta data

Test 6: Case sensitivity of event names

Test 7: curl timeout (exit 28) with valid events

Test 8: curl error code 7 (connection refused)

Test 9: NEMOCLAW_PREFERRED_API valid values

Test 10: NEMOCLAW_PREFERRED_API invalid/malicious values

Test 11: NEMOCLAW_PREFERRED_API case insensitivity

Test 12: NEMOCLAW_PREFERRED_API whitespace handling

Test 13: NEMOCLAW_PREFERRED_API empty and unset

Test 14: Temp file cleanup after probe

Test 15: Very large SSE body (10MB)

Summary

Security Scan

Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ericksoa commented Apr 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 13, 2026 •

edited

Loading

Test 4: Malformed SSE (no `event:` prefix)