Skip to content

fix(ci): move sudo npm link into launchable to unblock non-full Brev E2E#2186

Merged
cv merged 1 commit intomainfrom
fix/brev-e2e-npm-link-timeout
Apr 21, 2026
Merged

fix(ci): move sudo npm link into launchable to unblock non-full Brev E2E#2186
cv merged 1 commit intomainfrom
fix/brev-e2e-npm-link-timeout

Conversation

@cjagwani
Copy link
Copy Markdown
Contributor

@cjagwani cjagwani commented Apr 21, 2026

Summary

Every Brev E2E suite other than `full` has been failing on `main` since 2026-04-16 with `spawnSync /bin/sh ETIMEDOUT ... sudo npm link ... SIGTERM`. Root cause is PR #1888, which added a `sudo npm link` step inside the test with a 120s `execSync` cap. On a cold CPU Brev instance, `npm link` on a fresh global prefix consistently takes 2-3 min — past the cap.

Related Issue

Regression from #1888. No dedicated issue filed — surfaced while working on #1924/PR #2183.

Changes

  • `scripts/brev-launchable-ci-cpu.sh` — add `sudo npm link` + `chown` right after the plugin build. The cold `npm link` cost now happens inside the launchable's readiness window (no tight `execSync` timeout) rather than inside a test step. Idempotent, so later test-side `npm link` is a fast no-op against the existing symlink.
  • `test/e2e/brev-e2e.test.ts` — comment-only update explaining that the in-test `sudo npm link` is now idempotent against the launchable's pre-linked symlink. Timeout stays at 120_000 — no need to bump once the cold path is relocated.

Evidence

Type of Change

  • Code change (bug fix)

Verification

  • `npx prek run --all-files` — passes (pre-existing `install-preflight.test.ts:107` flake resolves on retry, unrelated)
  • No secrets, API keys, or credentials committed

Next step

Trigger the Brev E2E with any non-`full` suite (e.g. `test_suite=credential-sanitization`) to confirm the bootstrap completes. That's the one signal we need.

AI Disclosure

  • AI-assisted — tool: Claude Code

Summary by CodeRabbit

  • Chores
    • Enhanced initialization procedures for CPU-based environments with optimized state management setup and comprehensive diagnostic logging improvements throughout the entire startup sequence to improve system observability and reliability.
    • Refined and clarified internal test documentation to better reflect the idempotent nature and execution patterns of environment setup procedures for improved code maintainability.

Every Brev E2E suite other than 'full' has been failing on main since
2026-04-16 (PR #1888) with:

  spawnSync /bin/sh ETIMEDOUT ... sudo npm link ... SIGTERM

Root cause: #1888 added a `sudo npm link` step inside the test's
bootstrapLaunchable() with a 120s execSync cap. On a cold CPU Brev
instance, `npm link` on a fresh global prefix consistently takes 2-3
min while npm resolves and symlinks production deps for the first
time. The `full` suite skips this step (uses install.sh instead), so
that path was fine; every other suite (credential-sanitization,
telegram-injection, messaging-providers, and any newly added suites)
hits the timeout.

Fix: do the cold `sudo npm link` during the launchable's setup phase,
right after the plugin build. The launchable's readiness window has no
tight per-step execSync timeout, so the slow first link happens there.
When the test suite later rsyncs PR branch code over this clone and
re-runs `sudo npm link`, npm sees the existing global symlink and
finishes in seconds — well under the existing 120s cap.

Also updates the code comment in the test so future readers know the
cold-link cost has been relocated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 088b236b-5705-4687-90f7-465709cbe4a9

📥 Commits

Reviewing files that changed from the base of the PR and between 41b405c and c611a27.

📒 Files selected for processing (2)
  • scripts/brev-launchable-ci-cpu.sh
  • test/e2e/brev-e2e.test.ts

📝 Walkthrough

Walkthrough

The CI startup script adds a setup step that creates a global nemoclaw CLI symlink via sudo npm link and restores directory ownership with chown. The E2E test file updates an inline comment to clarify this setup's idempotent behavior on subsequent Brev CPU runs.

Changes

Cohort / File(s) Summary
NemoClaw CLI Symlink Setup
scripts/brev-launchable-ci-cpu.sh, test/e2e/brev-e2e.test.ts
Shell script adds sudo npm link invocation to create global nemoclaw symlink and recursively restores NemoClaw directory ownership via chown after TypeScript plugin build. Test file clarifies the step's idempotent nature on repeat invocations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 Hop, link, symlink—three times divine!
The nemoclaw CLI now shines,
Permissions restored with a chown so swift,
Idempotent setup, the perfect gift! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(ci): move sudo npm link into launchable to unblock non-full Brev E2E' directly and specifically describes the main change: moving the npm link setup step from the test into the launchable startup script to resolve timeout issues.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/brev-e2e-npm-link-timeout

Comment @coderabbitai help to get the list of available commands and usage tips.

@cv cv merged commit ab7f368 into main Apr 21, 2026
18 checks passed
cjagwani added a commit that referenced this pull request Apr 21, 2026
SSH'd into the keep_alive Brev instance from the last failed run and
found the onboard log consisted entirely of:

  Error: Cannot find module '../dist/nemoclaw'
  ...
  code: 'MODULE_NOT_FOUND'
  Node.js v22.22.2

Root cause: the flow runs `npm install --ignore-scripts`, which skips
the `prepare` lifecycle that normally invokes `build:cli`. Before
PR #2186, `sudo npm link` implicitly triggered `prepare` via npm's
lifecycle machinery and built `dist/` as a side effect. The direct
`ln -sf` symlink this PR introduces does not — so `dist/` is never
populated, `bin/nemoclaw.js`'s `require("../dist/nemoclaw")` crashes,
onboard dies instantly, and the test polls a dead process for 20 min.

Changes:
  - scripts/brev-launchable-ci-cpu.sh: run `npm run build:cli`
    explicitly after the `--ignore-scripts` root install.
  - test/e2e/brev-e2e.test.ts: same — after rsync'ing PR branch src
    (which excludes dist/), run `npm run build:cli` before invoking
    the CLI.
  - .github/workflows/e2e-brev.yaml: upload a debug bundle on failure
    (/tmp/nemoclaw-onboard.log, openshell sandbox list, docker ps,
    gateway status). Future failures will leave breadcrumbs without
    needing keep_alive + manual SSH.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
cv pushed a commit that referenced this pull request Apr 21, 2026
…#2196)

## Summary
Reverts #2186 (which made Brev E2E failures 10× slower) and replaces
\`sudo npm link\` with a direct \`sudo ln -sf\` symlink in both the
launchable setup and the in-test bootstrap path. \`npm link\` is
overkill for what we actually need and hangs indefinitely on cold CPU
Brev instances. Direct symlink is O(1) and deterministic.

## Related Issue
Regression from my own #2186. Surfaced while validating #2183 (ollama
E2E).

## Changes
This PR contains two commits:

1. **Revert #2186.** Restores the launchable setup to its pre-\`sudo npm
link\` state. This alone would leave non-full suites broken by the
original pre-existing #1888 issue, but without burning a 20-min Brev
instance per run.
2. **Replace \`sudo npm link\` with direct symlink.**
- \`scripts/brev-launchable-ci-cpu.sh\` — after plugin build, create
\`/usr/local/bin/nemoclaw → \$NEMOCLAW_CLONE_DIR/bin/nemoclaw.js\` via
\`sudo ln -sf\`. Drop \`sudo chown -R\` (only the single symlink is
root-owned now, not node_modules).
- \`test/e2e/brev-e2e.test.ts\` — replace the in-test \`sudo npm link\`
with the same direct-symlink approach. Idempotent re-link so local dev
runs that skip the launchable still work.

## Evidence
- **Before #2186 (2026-04-14):** non-full suites failed at ~2 min with a
clear \`sudo npm link ETIMEDOUT\` (pre-existing #1888 regression, but
cheap to fail).
- **After #2186 (today):** non-full suites hang on \`sudo npm link\`
inside the launchable until the 20-min outer cap trips. 10× longer to
fail, 10× more Brev credit per failed run. Evidence: [run
24737205166](https://github.com/NVIDIA/NemoClaw/actions/runs/24737205166)
— stuck on \`Linking nemoclaw CLI globally...\` from 808s to 1198s with
no progress.
- \`npm link\` does two things: \`/usr/local/lib/node_modules/<name>\`
symlink + \`/usr/local/bin/<bin>\` symlink. We only need the second one.
Replacing with \`sudo ln -sf\` bypasses npm's global-prefix housekeeping
and the \`sudo chown -R node_modules\` traversal that likely drove the
hang.

## Type of Change
- [x] Code change (bug fix)

## Verification
- [x] \`bash -n\` on the launchable script — syntax OK
- [x] \`npm run typecheck:cli\` — passes
- [x] Pre-push hooks — passes (modulo the pre-existing flaky
\`test/install-preflight.test.ts:107\`, resolves on retry, unrelated)
- [ ] **End-to-end Brev E2E run** — to be validated via \`gh workflow
run e2e-brev.yaml --ref fix/brev-cpu-npm-link-hang --field
test_suite=credential-sanitization\` (any non-\`full\` suite proves the
bootstrap now completes). Will attach run URL before merge.

## Retro note
Shipped #2186 without running the Brev suite end-to-end — exact failure
mode I had flagged on #2123 earlier the same day. This PR is validated
the right way before merge.

## AI Disclosure
- [x] AI-assisted — tool: Claude Code

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Made CLI installation deterministic and more reliable with explicit
build and idempotent system-wide linking so the tool is consistently
available and executable.

* **Tests**
* Hardened remote test setup and reduced a CLI-linking timeout for
faster feedback.
* On workflow failures, automatically collect VM diagnostics and upload
them as a debug artifact for easier troubleshooting.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cv cv added the v0.0.22 Release target label Apr 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v0.0.22 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants