docs(onboard): document NEMOCLAW_SANDBOX_READY_TIMEOUT#3440
Conversation
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
🚀 Docs preview ready! |
E2E Advisor RecommendationRequired E2E: None Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (2)
📝 WalkthroughWalkthroughAdds documentation for a new sandbox readiness timeout ChangesSandbox readiness & onboarding timeouts
🎯 2 (Simple) | ⏱️ ~10 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
docs/reference/commands.md (1)
1170-1170: 💤 Low valueConsider splitting dense table cell content for clarity.
The Purpose column contains three sentences packed together. While table formatting makes strict one-sentence-per-line difficult, consider whether this cell could be more scannable.
As per coding guidelines, "One sentence per line in source (makes diffs readable)."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/reference/commands.md` at line 1170, Split the dense Purpose cell for `NEMOCLAW_SANDBOX_READY_TIMEOUT` into multiple lines/sentences in the markdown source so each sentence lives on its own line: break the current three-sentence paragraph into three separate lines (e.g., one line describing what the flag controls, one line with examples/when to raise it, and one line describing the behavior when the deadline expires), updating the table cell content for `NEMOCLAW_SANDBOX_READY_TIMEOUT` accordingly so diffs are readable and follow the "one sentence per line" guideline.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/deployment/deploy-to-remote-gpu.md`:
- Line 146: Replace the passive clause "the partially-created sandbox is deleted
first" with an active construction that names the actor; rewrite the sentence so
that onboard performs the action (e.g., "onboard deletes the partially-created
sandbox first, so the next attempt with the raised budget starts from a clean
state"), updating the sentence in the docs string that currently reads "If
onboard ends with `Sandbox '<name>' was created but did not become ready within
180s`, the partially-created sandbox is deleted first, so the next attempt with
the raised budget starts from a clean state." to use active voice and reference
"onboard" as the actor.
In `@docs/reference/commands.md`:
- Line 1178: The line containing "If a timeout fires, onboarding emits the
elapsed budget plus a hint to raise the relevant variable. The Ollama pull
preserves its partial download for the next attempt; the readiness wait deletes
the orphaned sandbox first so the next `nemoclaw onboard` starts clean." should
be split so each sentence is on its own line: place the first sentence on one
line and the remaining sentence(s) each on their own lines to follow the
one-sentence-per-line guideline and improve diff readability.
---
Nitpick comments:
In `@docs/reference/commands.md`:
- Line 1170: Split the dense Purpose cell for `NEMOCLAW_SANDBOX_READY_TIMEOUT`
into multiple lines/sentences in the markdown source so each sentence lives on
its own line: break the current three-sentence paragraph into three separate
lines (e.g., one line describing what the flag controls, one line with
examples/when to raise it, and one line describing the behavior when the
deadline expires), updating the table cell content for
`NEMOCLAW_SANDBOX_READY_TIMEOUT` accordingly so diffs are readable and follow
the "one sentence per line" guideline.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f89b8574-7c48-4cff-8fda-7359840892ea
📒 Files selected for processing (4)
docs/deployment/deploy-to-remote-gpu.mddocs/inference/use-local-inference.mddocs/reference/commands.mddocs/reference/troubleshooting.md
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
There was a problem hiding this comment.
🧹 Nitpick comments (4)
docs/deployment/deploy-to-remote-gpu.md (2)
133-133: ⚡ Quick winUse active voice.
The phrase "is sized for" is passive. As per coding guidelines, "Active voice required. Flag passive constructions."
✏️ Suggested revision
-The post-create readiness wait defaults to 180 seconds (`NEMOCLAW_SANDBOX_READY_TIMEOUT`), which is sized for warm-cache, workstation-class onboarding and can be exceeded on: +The post-create readiness wait defaults to 180 seconds (`NEMOCLAW_SANDBOX_READY_TIMEOUT`), which targets warm-cache, workstation-class onboarding and can be exceeded on:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/deployment/deploy-to-remote-gpu.md` at line 133, Rewrite the passive sentence that references NEMOCLAW_SANDBOX_READY_TIMEOUT into active voice; specifically change "which is sized for warm-cache, workstation-class onboarding" to an active construction like "which we set for warm-cache, workstation-class onboarding" or "which targets warm-cache, workstation-class onboarding" so the line reads e.g. "The post-create readiness wait defaults to 180 seconds (NEMOCLAW_SANDBOX_READY_TIMEOUT), which we set for warm-cache, workstation-class onboarding and can be exceeded on:"; update the sentence containing NEMOCLAW_SANDBOX_READY_TIMEOUT to use one of these active alternatives.
132-132: ⚡ Quick winReplace colon with period and rewrite in active voice.
This sentence uses a colon to connect two clauses rather than introduce a list, and contains passive constructions ("is built," "uploaded"). As per coding guidelines, "Colons should only introduce a list. Flag colons used as general punctuation between clauses" and "Active voice required. Flag passive constructions."
✏️ Suggested revision
-On a remote GPU host, the first `nemoclaw onboard` typically does the slowest work of the lifecycle: the sandbox image is built locally and uploaded into the OpenShell gateway, which can stream hundreds of MiB over the VM's link before the readiness wait even starts. +On a remote GPU host, the first `nemoclaw onboard` typically does the slowest work of the lifecycle. +The sandbox image builds locally and uploads into the OpenShell gateway, streaming hundreds of MiB over the VM's link before the readiness wait even starts.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/deployment/deploy-to-remote-gpu.md` at line 132, Replace the colon with a period and rewrite the sentence in active voice: locate the sentence containing `nemoclaw onboard` and `OpenShell gateway` and change the passive phrases "is built" and "uploaded" to active verbs (e.g., "builds the sandbox image locally and uploads it to the OpenShell gateway"), and split the clauses with a period so it reads as two clear, active sentences describing that `nemoclaw onboard` performs the slowest work by building and uploading the sandbox image, which can stream hundreds of MiB before readiness waits begin.docs/reference/commands.md (2)
1169-1169: ⚡ Quick winSplit sentences and avoid weak intensifier.
This table cell contains two sentences on the same line. Additionally, "very large" is a weak intensifier. As per coding guidelines, "One sentence per line in source (makes diffs readable). Flag paragraphs where multiple sentences appear on the same line."
✏️ Suggested revision
-| `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` | `180` | Wall-clock timeout for the inference-server validation probe during onboard, in seconds. Raise on slow networks or for very large prompts. | +| `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` | `180` | Wall-clock timeout for the inference-server validation probe during onboard, in seconds. Raise on slow networks or for large prompts. |Note: For table cells, consider keeping the description concise to fit the table format, or break the longer explanation into a separate paragraph below the table if detailed guidance is needed.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/reference/commands.md` at line 1169, The table cell for NEMOCLAW_LOCAL_INFERENCE_TIMEOUT currently contains two sentences and uses the weak intensifier "very large"; change it to a single concise sentence and replace "very large" with a more specific term (e.g., "extremely long prompts" or "very long prompts"), or move the additional guidance to a separate sentence below the table; update the cell describing NEMOCLAW_LOCAL_INFERENCE_TIMEOUT to be one sentence only and, if needed, add a short paragraph after the table with the expanded advice.
1170-1170: ⚡ Quick winSplit sentences to follow one-sentence-per-line formatting.
This table cell contains three sentences on the same line. As per coding guidelines, "One sentence per line in source (makes diffs readable). Flag paragraphs where multiple sentences appear on the same line."
✏️ Suggested revision
-| `NEMOCLAW_SANDBOX_READY_TIMEOUT` | `180` | Wall-clock timeout for the post-create readiness wait, in seconds. Raise when the sandbox image build, gateway upload, or in-sandbox boot exceeds the default (typical on 70B+ models, first-time gateway uploads over slow links, or DGX Station / remote-VM first runs). When the deadline expires onboarding deletes the orphaned sandbox and prints the retry hint. | +| `NEMOCLAW_SANDBOX_READY_TIMEOUT` | `180` | Wall-clock timeout for the post-create readiness wait, in seconds. Raise when the sandbox image build, gateway upload, or in-sandbox boot exceeds the default (typical on 70B+ models, first-time gateway uploads over slow links, or DGX Station / remote-VM first runs). When the deadline expires, onboarding deletes the orphaned sandbox and prints the retry hint. |Note: For table cells, consider keeping the description concise to fit the table format, or break the longer explanation into a separate paragraph below the table if multi-sentence guidance is needed.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/reference/commands.md` at line 1170, The table cell for `NEMOCLAW_SANDBOX_READY_TIMEOUT` contains multiple sentences on one line; edit the table cell so each sentence is on its own source line (one-sentence-per-line), e.g., split the current description into separate lines for the short definition, the examples/when to raise, and the note about deletion/retry hint; if the explanatory text is too long for a table cell, move the longer guidance into a separate paragraph below the table and keep the cell concise.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@docs/deployment/deploy-to-remote-gpu.md`:
- Line 133: Rewrite the passive sentence that references
NEMOCLAW_SANDBOX_READY_TIMEOUT into active voice; specifically change "which is
sized for warm-cache, workstation-class onboarding" to an active construction
like "which we set for warm-cache, workstation-class onboarding" or "which
targets warm-cache, workstation-class onboarding" so the line reads e.g. "The
post-create readiness wait defaults to 180 seconds
(NEMOCLAW_SANDBOX_READY_TIMEOUT), which we set for warm-cache, workstation-class
onboarding and can be exceeded on:"; update the sentence containing
NEMOCLAW_SANDBOX_READY_TIMEOUT to use one of these active alternatives.
- Line 132: Replace the colon with a period and rewrite the sentence in active
voice: locate the sentence containing `nemoclaw onboard` and `OpenShell gateway`
and change the passive phrases "is built" and "uploaded" to active verbs (e.g.,
"builds the sandbox image locally and uploads it to the OpenShell gateway"), and
split the clauses with a period so it reads as two clear, active sentences
describing that `nemoclaw onboard` performs the slowest work by building and
uploading the sandbox image, which can stream hundreds of MiB before readiness
waits begin.
In `@docs/reference/commands.md`:
- Line 1169: The table cell for NEMOCLAW_LOCAL_INFERENCE_TIMEOUT currently
contains two sentences and uses the weak intensifier "very large"; change it to
a single concise sentence and replace "very large" with a more specific term
(e.g., "extremely long prompts" or "very long prompts"), or move the additional
guidance to a separate sentence below the table; update the cell describing
NEMOCLAW_LOCAL_INFERENCE_TIMEOUT to be one sentence only and, if needed, add a
short paragraph after the table with the expanded advice.
- Line 1170: The table cell for `NEMOCLAW_SANDBOX_READY_TIMEOUT` contains
multiple sentences on one line; edit the table cell so each sentence is on its
own source line (one-sentence-per-line), e.g., split the current description
into separate lines for the short definition, the examples/when to raise, and
the note about deletion/retry hint; if the explanatory text is too long for a
table cell, move the longer guidance into a separate paragraph below the table
and keep the cell concise.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 4d2a51e4-4e25-4e85-a621-28467d9074d6
📒 Files selected for processing (2)
docs/deployment/deploy-to-remote-gpu.mddocs/reference/commands.md
Summary
NEMOCLAW_SANDBOX_READY_TIMEOUThas been a recognised env var since #2849, but no documentation accompanied it —docs/reference/commands.md,docs/reference/troubleshooting.md, and the inference / deployment guides only mention the companionNEMOCLAW_LOCAL_INFERENCE_TIMEOUT(added in #1620 and documented at that time). Operators hittingSandbox '<name>' was created but did not become ready within 180shave no doc-grep path to the workaround, and the two timeouts are easy to conflate. This closes the documentation gap left by #2849.Originally tried under #3435; closed because that PR mis-framed the docs as resolving #3344 / #3416 (the root cause of both was the GPU policy bug fixed in #3436, not a timeout misconfiguration). The docs themselves still have value as a follow-up to the env-var introductions, so reopening as a new PR with the correct framing.
Related Issue
Changes
docs/reference/commands.md: addNEMOCLAW_SANDBOX_READY_TIMEOUTandNEMOCLAW_LOCAL_INFERENCE_TIMEOUTto the Onboard Timeouts table.docs/reference/troubleshooting.md: new troubleshooting entry "Sandbox onboard times out with 'did not become ready within Ns'" that distinguishes the readiness wait from the inference-probe budget, with a worked example.docs/inference/use-local-inference.md: cross-link the two timeouts from the existingNEMOCLAW_LOCAL_INFERENCE_TIMEOUTsection so readers of either knob land on the other.docs/deployment/deploy-to-remote-gpu.md: new "First-Run Readiness Budget" section calling out DGX Station / cloud-VM / large-quantised-model conditions that exceed the default and showing how to raise it.No code changes — the readiness behaviour is unchanged.
Type of Change
Verification
npx prek run --all-filespassesnpm testpassesmake docsbuilds without warnings (doc changes only)Signed-off-by: Tinson Lai tinsonl@nvidia.com
Summary by CodeRabbit