Skip to content

feat(inference): validate streaming events for /v1/responses and add NEMOCLAW_PREFERRED_API override#1833

Merged
ericksoa merged 4 commits intomainfrom
feat/sglang-responses-streaming-validation
Apr 13, 2026
Merged

feat(inference): validate streaming events for /v1/responses and add NEMOCLAW_PREFERRED_API override#1833
ericksoa merged 4 commits intomainfrom
feat/sglang-responses-streaming-validation

Conversation

@ericksoa
Copy link
Copy Markdown
Contributor

@ericksoa ericksoa commented Apr 13, 2026

Summary

  • Adds streaming SSE event validation to the /v1/responses probe for custom OpenAI-compatible endpoints, catching backends like SGLang that return valid non-streaming responses but emit incomplete streaming events
  • Adds NEMOCLAW_PREFERRED_API=openai-completions env var to bypass /v1/responses probe entirely during onboarding
  • Documents both the env var override and the existing NEMOCLAW_INFERENCE_API_OVERRIDE workaround for already-onboarded sandboxes

Context

Community user reported SGLang passes onboarding validation for /v1/responses but fails at runtime because its streaming mode only emits 3 lifecycle events (response.created, response.in_progress, response.completed) — missing the granular content deltas OpenClaw requires (response.output_text.delta, etc.).

Test plan

  • Unit tests for shouldForceCompletionsApi() (6 cases) and runStreamingEventProbe() (5 cases) pass
  • NEMOCLAW_PREFERRED_API=openai-completions skips /v1/responses probe during custom endpoint onboarding
  • Streaming probe detects SGLang-like incomplete SSE events and falls back to /chat/completions
  • Full test suite green

Summary by CodeRabbit

  • New Features

    • Added NEMOCLAW_PREFERRED_API to force Chat Completions (works interactive/non‑interactive) and optionally skip the /v1/responses probe
    • Onboarding now validates streaming events and will automatically fall back to Chat Completions if required events are missing; transport/probe failures produce a hard failure
  • Documentation

    • New troubleshooting and recovery steps (rerun nemoclaw onboard to re‑probe and bake the correct API)
    • Clarified that NEMOCLAW_INFERENCE_API_OVERRIDE only patches startup config and does not update baked image ARGs
    • Minor wording tweak about image rebuilds
  • Tests

    • Added tests covering streaming probes, cleanup, error cases, and the preference logic

…NEMOCLAW_PREFERRED_API override

Backends like SGLang expose /v1/responses and pass the existing non-streaming
validation probe, but their streaming mode only emits lifecycle events
(created/in_progress/completed) without the granular content deltas OpenClaw
requires (output_text.delta, etc.). This causes runtime failures after
onboarding succeeds.

Changes:
- Add runStreamingEventProbe() in http-probe.ts that sends a stream:true
  request and verifies the SSE event stream includes response.output_text.delta
- Integrate the streaming probe into probeOpenAiLikeEndpoint for custom
  endpoints (probeStreaming: true) — falls back to /chat/completions when
  streaming events are incomplete
- Add shouldForceCompletionsApi() in validation.ts checking
  NEMOCLAW_PREFERRED_API env var so users can bypass /responses entirely
- Wire both into validateCustomOpenAiLikeSelection
- Add unit tests for the new functions (11 new test cases)
- Document NEMOCLAW_PREFERRED_API, the NEMOCLAW_INFERENCE_API_OVERRIDE
  workaround, and a troubleshooting entry for the runtime failure scenario

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 13, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: eebdabde-6905-4e12-bb17-8c62c68223ad

📥 Commits

Reviewing files that changed from the base of the PR and between 92e6a93 and be9383c.

📒 Files selected for processing (1)
  • src/lib/onboard.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/onboard.ts

📝 Walkthrough

Walkthrough

Adds a streaming-event probe for OpenAI-compatible /v1/responses, a preference override to force /v1/chat/completions, onboarding changes to re-probe and bake the chosen API, tests for the probe and validation, and multiple documentation updates describing recovery and overrides.

Changes

Cohort / File(s) Summary
Docs & Skill Guidance
/.agents/skills/nemoclaw-user-configure-inference/SKILL.md, /.agents/skills/nemoclaw-user-reference/references/troubleshooting.md, docs/inference/switch-inference-providers.md, docs/inference/use-local-inference.md, docs/reference/troubleshooting.md
Add troubleshooting and recovery guidance for /v1/responses streaming gaps; document NEMOCLAW_PREFERRED_API to force chat-completions during onboarding; clarify NEMOCLAW_INFERENCE_API_OVERRIDE limits and instruct re-running nemoclaw onboard to re-probe and bake the API choice.
HTTP Streaming Probe
src/lib/http-probe.ts, src/lib/http-probe.test.ts
Add StreamingProbeResult and runStreamingEventProbe() to curl /v1/responses in streaming mode, parse SSE event: lines (check for response.output_text.delta), manage temp-file lifecycle; add tests for success, missing events, timeout/exit handling, spawn errors, and cleanup.
Onboarding Integration
src/lib/onboard.ts
Run streaming-event probe during /v1/responses validation when probeStreaming enabled; record streaming-specific failures (with (streaming) suffix), allow fallback to /v1/chat/completions, and support skipping /responses when shouldForceCompletionsApi() indicates preference.
Validation & Tests
src/lib/validation.ts, src/lib/validation.test.ts
Add exported shouldForceCompletionsApi(preferredApi?) (case-insensitive detection of openai-completions / chat-completions) and tests verifying true/false cases and normalization.

Sequence Diagram

sequenceDiagram
    participant User
    participant Onboard as Onboarding
    participant Validation as ValidationLogic
    participant Probe as HTTPProbe
    participant Server as OpenAI-Compatible Server

    User->>Onboard: run "nemoclaw onboard"
    Onboard->>Validation: read NEMOCLAW_PREFERRED_API
    alt preference forces completions
        Validation-->>Onboard: skip /v1/responses
        Onboard->>Probe: probe /v1/chat/completions
        Probe->>Server: probe /v1/chat/completions
        Server-->>Probe: OK
    else probe responses first
        Onboard->>Probe: probe /v1/responses (non-stream)
        Probe->>Server: probe /v1/responses
        Server-->>Probe: OK
        Onboard->>Probe: runStreamingEventProbe(/v1/responses)
        Probe->>Server: curl -N /v1/responses (stream)
        Server-->>Probe: SSE events
        Probe->>Probe: parse events, check response.output_text.delta
        alt required event present
            Probe-->>Onboard: {ok: true}
        else missing events
            Probe-->>Onboard: {ok: false, missingEvents: [...]}
            Onboard->>Probe: probe /v1/chat/completions (fallback)
            Probe->>Server: probe /v1/chat/completions
            Server-->>Probe: OK
        end
    end
    Probe-->>Onboard: final probe result
    Onboard-->>User: onboarding result (and image baked config)
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped along the streaming trail tonight,
Listening for deltas in soft SSE light,
When responses falter, I nudge the flow,
Fall back to completions, and onward we go,
A small rabbit's tweak to make sandboxes right.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding streaming event validation for /v1/responses and introducing the NEMOCLAW_PREFERRED_API override feature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sglang-responses-streaming-validation

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
src/lib/validation.ts (1)

132-140: Pass the preferred API into this helper instead of reading process.env here.

validation.ts is documented as a pure, input-driven module, but this addition now depends on process state. Moving the env lookup to src/lib/onboard.ts keeps this layer deterministic and easier to reuse/test.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/lib/validation.ts` around lines 132 - 140, The helper
shouldForceCompletionsApi now reads process.env directly which breaks the pure,
input-driven design of validation.ts; change its signature to accept the
preferred API string (e.g., preferredApi: string | undefined) and remove any env
access from inside shouldForceCompletionsApi, then perform the
trim()/toLowerCase() & comparison there; update callers (notably in
src/lib/onboard.ts) to read process.env.NEMOCLAW_PREFERRED_API, pass that value
into shouldForceCompletionsApi, and adjust tests accordingly so validation.ts
remains deterministic and testable.
docs/inference/switch-inference-providers.md (1)

87-89: Rewrite the passive sentence in active voice.

No image rebuild is needed. reads passively. Say what NemoClaw does or what the reader does instead. As per coding guidelines, "Active voice required. Flag passive constructions."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/inference/switch-inference-providers.md` around lines 87 - 89, Rewrite
the passive sentence "No image rebuild is needed." in active voice—replace it
with a clear actor-based line such as "You do not need to rebuild the image." or
"NemoClaw does not require rebuilding the image." Update the sentence near the
existing text that mentions patching `openclaw.json` (the line that currently
reads "No image rebuild is needed.") so the statement explicitly names the
actor.
docs/inference/use-local-inference.md (1)

147-149: Address the reader directly in this paragraph.

This variable tells the wizard... / It works... is feature-centric wording. The docs style guide asks for second person when you describe what the reader should do or what happens when they set a value. As per coding guidelines, "Second person ('you') when addressing the reader."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/inference/use-local-inference.md` around lines 147 - 149, Update the two
sentences that currently start "This variable tells the wizard..." and "It works
in both..." to use second-person wording addressing the reader; for example,
replace with something like "Set this variable to make the wizard skip the
/v1/responses probe and use /v1/chat/completions directly." and "This works in
both interactive and non-interactive modes." Ensure the edited sentences mention
the endpoints (/v1/responses and /v1/chat/completions) and keep the meaning
unchanged while using "you"/direct instruction tone.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.agents/skills/nemoclaw-user-configure-inference/SKILL.md:
- Around line 131-145: The SKILL.md changes were made directly but this artifact
must be regenerated from the canonical docs source; revert the manual edits in
SKILL.md (the nemoclaw-user-configure-inference SKILL.md) and run the project's
skill-generation pipeline/tool that produces .agents/skills/*/SKILL.md from
docs/ (regenerate the skill from docs to reapply the intended content), then
commit the regenerated SKILL.md so the file is produced consistently from docs
rather than edited in place.

In `@src/lib/onboard.ts`:
- Around line 1204-1229: The code treats any runStreamingEventProbe failure
(streamResult.ok === false) as a streaming-incompatibility and falls back
silently; instead, inspect runStreamingEventProbe's failure details (e.g.,
streamResult.reason, streamResult.errorCode, or the text in
streamResult.message) and only perform the streaming-to-chat fallback when the
failure explicitly indicates missing/unsupported SSE events (e.g., reason ===
"missing-events" or message contains "missing events"); for all other non-ok
results from runStreamingEventProbe, surface a validation error (push
failure/log it and abort/return the probe) rather than switching APIs. Use the
runStreamingEventProbe, streamResult.ok, and
streamResult.message/streamResult.reason identifiers to locate and implement
this conditional.

---

Nitpick comments:
In `@docs/inference/switch-inference-providers.md`:
- Around line 87-89: Rewrite the passive sentence "No image rebuild is needed."
in active voice—replace it with a clear actor-based line such as "You do not
need to rebuild the image." or "NemoClaw does not require rebuilding the image."
Update the sentence near the existing text that mentions patching
`openclaw.json` (the line that currently reads "No image rebuild is needed.") so
the statement explicitly names the actor.

In `@docs/inference/use-local-inference.md`:
- Around line 147-149: Update the two sentences that currently start "This
variable tells the wizard..." and "It works in both..." to use second-person
wording addressing the reader; for example, replace with something like "Set
this variable to make the wizard skip the /v1/responses probe and use
/v1/chat/completions directly." and "This works in both interactive and
non-interactive modes." Ensure the edited sentences mention the endpoints
(/v1/responses and /v1/chat/completions) and keep the meaning unchanged while
using "you"/direct instruction tone.

In `@src/lib/validation.ts`:
- Around line 132-140: The helper shouldForceCompletionsApi now reads
process.env directly which breaks the pure, input-driven design of
validation.ts; change its signature to accept the preferred API string (e.g.,
preferredApi: string | undefined) and remove any env access from inside
shouldForceCompletionsApi, then perform the trim()/toLowerCase() & comparison
there; update callers (notably in src/lib/onboard.ts) to read
process.env.NEMOCLAW_PREFERRED_API, pass that value into
shouldForceCompletionsApi, and adjust tests accordingly so validation.ts remains
deterministic and testable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: edd33113-b66d-4071-aa12-59635240f544

📥 Commits

Reviewing files that changed from the base of the PR and between d4aac4c and 7df2237.

📒 Files selected for processing (10)
  • .agents/skills/nemoclaw-user-configure-inference/SKILL.md
  • .agents/skills/nemoclaw-user-reference/references/troubleshooting.md
  • docs/inference/switch-inference-providers.md
  • docs/inference/use-local-inference.md
  • docs/reference/troubleshooting.md
  • src/lib/http-probe.test.ts
  • src/lib/http-probe.ts
  • src/lib/onboard.ts
  • src/lib/validation.test.ts
  • src/lib/validation.ts

…file ARG precedence

NEMOCLAW_INFERENCE_API_OVERRIDE only patches openclaw.json at container
startup — it does not update the Dockerfile ARG baked into the image. On
recreate-sandbox the baked value wins. The reliable fix is a fresh
nemoclaw onboard which re-probes and rebakes the image.

Updated all three doc pages to recommend nemoclaw onboard instead of the
override env var, and added a note explaining the limitation.

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
- Distinguish transport failures from missing-events in streaming probe
  fallback: only fall back to /chat/completions when missingEvents is
  non-empty; surface transport errors as hard validation failures
- Make shouldForceCompletionsApi() pure by accepting the preferred API
  value as a parameter instead of reading process.env directly, keeping
  validation.ts free of I/O per its module contract
- Fix passive voice and second-person wording in docs

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
@ericksoa ericksoa self-assigned this Apr 13, 2026
@wscurran wscurran added the enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. label Apr 13, 2026
@ericksoa ericksoa added bug Something isn't working and removed enhancement: feature Use this label to identify requests for new capabilities in NemoClaw. labels Apr 13, 2026
@brandonpelfrey
Copy link
Copy Markdown
Collaborator

🦞 NemoClaw Functional Review — PR #1833

Verdict: APPROVE

Summary

feat(inference): validate streaming events for /v1/responses and add NEMOCLAW_PREFERRED_API override

Changed files: 10 files
Tests: 1374 passed, 0 failed

Blocking Issues

  • ⚠️ Potential hardcoded credentials detectedFalse positive. The flagged line is getCredential(credentialEnv) — a function call that reads from the secure credential store, not a hardcoded secret.

Functional Testing

Check Result Evidence
Clone & checkout git fetch origin pull/1833/head:pr-1833 && git checkout pr-1833 — clean
Dependencies npm install --include=dev — installed without errors
Build ✅ Build completed — npm run build:cli exited 0, no TypeScript errors
Tests ✅ All 1374 tests pass — npx vitest run --reporter=verbose exited 0
Dockerfile ℹ️ Dockerfile not modified by this PR
Entrypoint ℹ️ nemoclaw-start.sh not modified by this PR
Full onboard ✅ Full onboard test pass — DinD (Ubuntu 22.04) — ✅12 ❌0 ⚠️1

Adversarial Testing (15 tests — 15 pass, 0 concern)

Ran 15 adversarial tests against the live sandbox targeting SSE parsing, event name matching, env var validation, error handling, temp file cleanup, and DoS resilience.

Adversarial Test Report — PR #1833

Tested: 2026-04-13T13:55 UTC
Container: nemoclaw-onboard-1833
Sandbox: pr-test
Commit: 7df2237

Analysis

What the PR does: Adds streaming event validation for /v1/responses probes during onboarding. Backends like SGLang expose the endpoint but only emit lifecycle events (created/in_progress/completed), missing the response.output_text.delta events OpenClaw requires. The PR auto-detects this and falls back to /v1/chat/completions. Also adds NEMOCLAW_PREFERRED_API env var to force chat completions mode, and NEMOCLAW_INFERENCE_API_OVERRIDE for post-onboard switching.

Attack surfaces identified:

  1. SSE parsing robustness (malformed, injected, empty, huge payloads)
  2. Event name matching (case sensitivity, spoofing)
  3. NEMOCLAW_PREFERRED_API env var validation (injection, bypass)
  4. Temp file handling (cleanup, leak)
  5. Error handling (curl failures, timeouts)

Adversarial Tests

Test 1: SGLang-like incomplete streaming detected

  • Hypothesis: Response with only lifecycle events (no delta) should fail the probe
  • Impact: This is the core bug the PR fixes. If incomplete streaming isn't detected, users with SGLang backends would pass onboarding but have agents fail at runtime with no actionable error — the exact problem that motivated this PR.
  • Command: runStreamingEventProbe() with SSE containing only created/in_progress/completed
  • Output: ok: false, missingEvents: ['response.output_text.delta']
  • Result: PASS

Test 2: Full valid streaming response passes

  • Hypothesis: Response with all required events should pass the probe
  • Impact: If valid streaming is rejected, all working /v1/responses backends would be forced to fall back to chat completions — degrading the experience for users with fully-compliant backends.
  • Command: runStreamingEventProbe() with SSE including response.output_text.delta
  • Output: ok: true
  • Result: PASS

Test 3: Empty response body

  • Hypothesis: Empty SSE body should fail the probe (no events at all)
  • Impact: An empty response (server returns 200 but no content) must not be treated as "all events present" — it would let a broken backend through.
  • Command: runStreamingEventProbe() with empty string body
  • Output: ok: false, missingEvents: ['response.output_text.delta']
  • Result: PASS

Test 4: Malformed SSE (no event: prefix)

  • Hypothesis: Lines without proper event: prefix should not count as valid events
  • Impact: If the parser matches event names without the event: prefix, any line containing the event name (comments, data payloads) could be misinterpreted as a valid event — masking broken backends.
  • Command: runStreamingEventProbe() with body response.output_text.delta\ndata: {...}
  • Output: ok: false
  • Result: PASS — regex requires ^event:\s* prefix

Test 5: XSS in delta data

  • Hypothesis: Malicious content in data: field should not affect event name matching
  • Impact: The probe only parses event names, not data payloads. If data content leaked into event matching, a crafted response could inject events that don't actually exist.
  • Command: runStreamingEventProbe() with data: {"delta":"<script>alert(1)</script>"}
  • Output: ok: true — event name correctly matched, data content ignored
  • Result: PASS

Test 6: Case sensitivity of event names

  • Hypothesis: Response.Output_Text.Delta (wrong case) should NOT match response.output_text.delta
  • Impact: SSE event names are case-sensitive per spec. If the parser were case-insensitive, a backend emitting wrong-case events would pass the probe but fail at runtime in OpenClaw (which matches exact case).
  • Command: runStreamingEventProbe() with event: Response.Output_Text.Delta
  • Output: ok: false
  • Result: PASS — case-sensitive matching matches OpenClaw's runtime behavior

Test 7: curl timeout (exit 28) with valid events

  • Hypothesis: Timeout is expected for streaming — if events were captured before timeout, probe should pass
  • Impact: Without this handling, every streaming probe against a real server would fail (streaming doesn't "complete" — it's a long-lived connection), blocking all /v1/responses backends from being selected.
  • Command: runStreamingEventProbe() with exit code 28 and valid events
  • Output: ok: true
  • Result: PASS

Test 8: curl error code 7 (connection refused)

  • Hypothesis: Non-timeout curl errors should fail the probe even if some SSE was written
  • Impact: If connection errors were silently ignored, a server that crashes mid-stream would appear healthy — agents would be onboarded against an unreachable endpoint.
  • Command: runStreamingEventProbe() with exit code 7
  • Output: ok: false
  • Result: PASS

Test 9: NEMOCLAW_PREFERRED_API valid values

  • Hypothesis: openai-completions and chat-completions should both trigger force-completions mode
  • Impact: Users following the documented override must be able to bypass the responses probe. If valid values aren't recognized, the documented workaround doesn't work.
  • Command: shouldForceCompletionsApi() with both valid values
  • Output: true for both
  • Result: PASS

Test 10: NEMOCLAW_PREFERRED_API invalid/malicious values

  • Hypothesis: Values like openai-responses, rm -rf /, injection strings should NOT trigger force mode
  • Impact: If arbitrary values trigger force-completions, a typo or malicious env var could silently downgrade all users from Responses API to chat completions — losing functionality like built-in tool calling.
  • Command: shouldForceCompletionsApi() with openai-responses, rm -rf /, true; curl evil.com
  • Output: false for all
  • Result: PASS

Test 11: NEMOCLAW_PREFERRED_API case insensitivity

  • Hypothesis: OPENAI-COMPLETIONS should be treated the same as openai-completions
  • Impact: Case sensitivity in env var values is a common user frustration. If the check were case-sensitive, OPENAI-COMPLETIONS in a Dockerfile or shell export would silently not take effect.
  • Command: shouldForceCompletionsApi() with OPENAI-COMPLETIONS
  • Output: true
  • Result: PASS

Test 12: NEMOCLAW_PREFERRED_API whitespace handling

  • Hypothesis: Leading/trailing whitespace should be trimmed
  • Impact: Copy-pasting env var values from docs often introduces trailing whitespace. If not trimmed, the override silently fails.
  • Command: shouldForceCompletionsApi() with openai-completions
  • Output: true
  • Result: PASS

Test 13: NEMOCLAW_PREFERRED_API empty and unset

  • Hypothesis: Empty string and unset should both return false (auto-detect mode)
  • Impact: If empty string triggered force mode, any script that sets NEMOCLAW_PREFERRED_API= (common pattern to "unset" in Docker) would silently force chat completions, losing Responses API support.
  • Command: shouldForceCompletionsApi() with "" and undefined
  • Output: false for both
  • Result: PASS

Test 14: Temp file cleanup after probe

  • Hypothesis: SSE output file should be deleted after probe completes
  • Impact: Leaked temp files containing SSE data could accumulate on disk and potentially expose API responses (including model output) to other processes on the same host.
  • Command: Capture temp file path during probe, check existence after
  • Output: File does not exist after probe returns
  • Result: PASS

Test 15: Very large SSE body (10MB)

  • Hypothesis: A 10MB response body should not crash the probe
  • Impact: A malicious or buggy server returning a huge streaming response could cause out-of-memory in the probe, crashing the entire onboard process and leaving the user unable to set up NemoClaw.
  • Command: runStreamingEventProbe() with 10MB data payload
  • Output: ok: true — no crash
  • Result: PASS

Summary

Tests Pass Concern Fail
15 15 0 0

Verdict impact: Clean pass across all tests. The streaming event detection, env var validation, error handling, and temp file cleanup are all solid. No concerns.

Security Scan

Check Result Evidence
Dangerous patterns (eval/exec/proto) Scanned pr.diff with grep -n RegExp.exec() — SSE parsing, not child_process.exec (false positive)
Hardcoded credentials Scanned pr.diff with grep -niE getCredential() function call, not hardcoded (false positive)
Dependency changes Checked package.json diff (no dependency changes)
Permission changes (chmod/chown) Scanned pr.diff (none found)

Notes

This is an automated functional review. Every ✅ above was verified by running the stated command/test.
Manual review is still recommended for business logic, API contracts, and performance.


🦞 Auto-reviewed by Nemo.

@brandonpelfrey brandonpelfrey self-requested a review April 13, 2026 19:08
@ericksoa ericksoa merged commit a064e97 into main Apr 13, 2026
15 checks passed
ericksoa added a commit to cheese-head/NemoClaw that referenced this pull request Apr 14, 2026
…NEMOCLAW_PREFERRED_API override (NVIDIA#1833)

## Summary

- Adds streaming SSE event validation to the `/v1/responses` probe for
custom OpenAI-compatible endpoints, catching backends like SGLang that
return valid non-streaming responses but emit incomplete streaming
events
- Adds `NEMOCLAW_PREFERRED_API=openai-completions` env var to bypass
`/v1/responses` probe entirely during onboarding
- Documents both the env var override and the existing
`NEMOCLAW_INFERENCE_API_OVERRIDE` workaround for already-onboarded
sandboxes

## Context

Community user reported SGLang passes onboarding validation for
`/v1/responses` but fails at runtime because its streaming mode only
emits 3 lifecycle events (`response.created`, `response.in_progress`,
`response.completed`) — missing the granular content deltas OpenClaw
requires (`response.output_text.delta`, etc.).

## Test plan

- [ ] Unit tests for `shouldForceCompletionsApi()` (6 cases) and
`runStreamingEventProbe()` (5 cases) pass
- [ ] `NEMOCLAW_PREFERRED_API=openai-completions` skips `/v1/responses`
probe during custom endpoint onboarding
- [ ] Streaming probe detects SGLang-like incomplete SSE events and
falls back to `/chat/completions`
- [ ] Full test suite green

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added NEMOCLAW_PREFERRED_API to force Chat Completions (works
interactive/non‑interactive) and optionally skip the /v1/responses probe
* Onboarding now validates streaming events and will automatically fall
back to Chat Completions if required events are missing; transport/probe
failures produce a hard failure

* **Documentation**
* New troubleshooting and recovery steps (rerun `nemoclaw onboard` to
re‑probe and bake the correct API)
* Clarified that NEMOCLAW_INFERENCE_API_OVERRIDE only patches startup
config and does not update baked image ARGs
  * Minor wording tweak about image rebuilds

* **Tests**
* Added tests covering streaming probes, cleanup, error cases, and the
preference logic
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
ColinM-sys pushed a commit to ColinM-sys/NemoClaw that referenced this pull request Apr 14, 2026
…NEMOCLAW_PREFERRED_API override (NVIDIA#1833)

## Summary

- Adds streaming SSE event validation to the `/v1/responses` probe for
custom OpenAI-compatible endpoints, catching backends like SGLang that
return valid non-streaming responses but emit incomplete streaming
events
- Adds `NEMOCLAW_PREFERRED_API=openai-completions` env var to bypass
`/v1/responses` probe entirely during onboarding
- Documents both the env var override and the existing
`NEMOCLAW_INFERENCE_API_OVERRIDE` workaround for already-onboarded
sandboxes

## Context

Community user reported SGLang passes onboarding validation for
`/v1/responses` but fails at runtime because its streaming mode only
emits 3 lifecycle events (`response.created`, `response.in_progress`,
`response.completed`) — missing the granular content deltas OpenClaw
requires (`response.output_text.delta`, etc.).

## Test plan

- [ ] Unit tests for `shouldForceCompletionsApi()` (6 cases) and
`runStreamingEventProbe()` (5 cases) pass
- [ ] `NEMOCLAW_PREFERRED_API=openai-completions` skips `/v1/responses`
probe during custom endpoint onboarding
- [ ] Streaming probe detects SGLang-like incomplete SSE events and
falls back to `/chat/completions`
- [ ] Full test suite green

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added NEMOCLAW_PREFERRED_API to force Chat Completions (works
interactive/non‑interactive) and optionally skip the /v1/responses probe
* Onboarding now validates streaming events and will automatically fall
back to Chat Completions if required events are missing; transport/probe
failures produce a hard failure

* **Documentation**
* New troubleshooting and recovery steps (rerun `nemoclaw onboard` to
re‑probe and bake the correct API)
* Clarified that NEMOCLAW_INFERENCE_API_OVERRIDE only patches startup
config and does not update baked image ARGs
  * Minor wording tweak about image rebuilds

* **Tests**
* Added tests covering streaming probes, cleanup, error cases, and the
preference logic
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Signed-off-by: ColinM-sys <cmcdonough@50words.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants