test(e2e): classify quick tunnel flakes as external by cv · Pull Request #4154 · NVIDIA/NemoClaw

cv · 2026-05-24T07:40:03Z

Summary

The nightly flake sweep had a tunnel lifecycle failure caused by Cloudflare quick-tunnel registration/parsing instability (failed to unmarshal quick Tunnel). This PR keeps NemoClaw tunnel failures blocking, but classifies known Cloudflare quick-tunnel registration/edge instability as an external skip instead of a NemoClaw test failure.

Changes

Add a shared Cloudflare-transient classifier for quick-tunnel registration, timeout, EOF, 502/503/504, and failed to unmarshal quick Tunnel messages.
Classify nemoclaw tunnel start non-zero exits caused by known quick-tunnel external failures as SKIP, with diagnostics and best-effort cleanup.
Treat healthy-local-dashboard but unreachable quick-tunnel edge after retries as SKIP only for connection failure/000, 502/503/504, or known Cloudflare transient response text.
Preserve failures for NemoClaw-owned cases: missing cloudflared spawn, status URL capture bug, local dashboard regression, unexpected HTTP statuses such as 400/401/403/404, wrong content, and stop/status failures.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
make docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

Release Notes

Tests
- Improved detection and handling of transient Cloudflare tunnel failures to reduce false test failures and enhance error classification accuracy.

copy-pr-bot · 2026-05-24T07:40:07Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-05-24T07:40:09Z

📝 Walkthrough

Walkthrough

This PR improves test resilience by adding Cloudflare transient failure detection to the tunnel lifecycle test. Two new helper functions identify transient Cloudflare errors from log text and HTTP status codes. Error classification is updated to use these helpers, then applied across three failure paths to skip tests on known transient errors instead of failing.

Changes

Cloudflare Transient Error Detection

Layer / File(s)	Summary
Transient detection helpers `test/e2e/test-tunnel-lifecycle.sh`	New `is_cloudflare_transient_text()` and `is_cloudflare_transient_http_code()` functions detect transient Cloudflare quick-tunnel failures from log patterns and status codes (000, 502/503/504).
Error classification integration `test/e2e/test-tunnel-lifecycle.sh`	`classify_cloudflared_log()` now uses the transient-text helper instead of direct grep matching to determine Cloudflare error classification.
Conditional skip/fail logic at failure points `test/e2e/test-tunnel-lifecycle.sh`	Tunnel startup, URL-not-found diagnostics, and reachability retry failures now skip on known transient Cloudflare errors (and print diagnostic logs) instead of failing unconditionally; reachability also distinguishes transient outcomes from unexpected HTTP statuses.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested reviewers

jyaunches

Poem

A rabbit hops through test logs clear, 🐰
Finding transient clouds without fear,
When Cloudflare blinks, the test now skips,
No false alarms from network blips! ✨
Resilience blooms where errors were old,
In this script, a story retold.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'test(e2e): classify quick tunnel flakes as external' accurately and clearly summarizes the main change—classifying Cloudflare quick-tunnel failures as external skip results rather than test failures.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch draft/tunnel-lifecycle-external-skips

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-24T07:40:57Z

E2E Advisor Recommendation

Required E2E: None
Optional E2E: tunnel-lifecycle-e2e

Dispatch hint: tunnel-lifecycle-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

None. Only an existing E2E test script changed; no production runtime or user-flow code was touched, so no merge-blocking E2E is required. The modified tunnel lifecycle job may be run optionally to validate the test harness change.

Optional E2E

tunnel-lifecycle-e2e (medium; 60-minute timeout, installs/onboards a sandbox and uses cloudflared plus NVIDIA_API_KEY): Optional validation of the modified E2E script itself. This job directly runs test/e2e/test-tunnel-lifecycle.sh and will exercise the changed Cloudflare transient classification and tunnel start/probe/stop assertions.

New E2E recommendations

None.

Dispatch hint

Workflow: E2E / Nightly
jobs input: tunnel-lifecycle-e2e

github-actions · 2026-05-24T07:40:57Z

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

None. No scenario workflow, scenario metadata, scenario runtime, or validation-suite files changed.

Optional scenario E2E

None.

Relevant changed files

None.

github-actions · 2026-05-24T07:42:01Z

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Since last review: 1 prior item resolved, 0 still apply, 0 new items found

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

…e-external-skips # Conflicts: # test/e2e/test-hermes-inference-switch.sh # test/e2e/test-openclaw-inference-switch.sh

coderabbitai

🧹 Nitpick comments (1)

test/e2e/test-tunnel-lifecycle.sh (1)

172-174: 💤 Low value

Consider breaking the regex into documented patterns.

The single 300+ character regex is functional but hard to maintain. Consider refactoring to multiple grep calls or a documented pattern list for easier modification and review.

♻️ Example refactor (optional)

 is_cloudflare_transient_text() {
-  grep -qiE 'failed to unmarshal quick Tunnel|quick tunnels? (are )?(temporarily )?disabled|failed to (dial|register)|tunnel server.*error|i/o timeout|EOF.*tunnel|couldn.?t start tunnel|tunnel creation failed|bad gateway|\b50[234]\b' <<<"$1"
+  local patterns=(
+    'failed to unmarshal quick Tunnel'
+    'quick tunnels? (are )?(temporarily )?disabled'
+    'failed to (dial|register)'
+    'tunnel server.*error'
+    'i/o timeout'
+    'EOF.*tunnel'
+    "couldn.?t start tunnel"
+    'tunnel creation failed'
+    'bad gateway'
+    '\b50[234]\b'
+  )
+  local pattern
+  pattern=$(IFS='|'; echo "${patterns[*]}")
+  grep -qiE "$pattern" <<<"$1"
 }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-tunnel-lifecycle.sh` around lines 172 - 174, The
is_cloudflare_transient_text() function currently uses a single long regex which
is hard to maintain; refactor it into a list of named pattern variables or an
array of simpler regexes (e.g., PATTERN_UNMARSHAL, PATTERN_DISABLED,
PATTERN_DIAL_REGISTER, PATTERN_TIMEOUT, PATTERN_EOF, PATTERN_START_FAIL,
PATTERN_HTTP_ERRORS) and then test the input against each pattern with multiple
grep -qiE checks or a loop, preserving the same matching semantics and returning
success if any pattern matches; update the function to include short comments
above each pattern explaining what it detects so future reviewers can easily
modify or extend the transient-error list.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/e2e/test-tunnel-lifecycle.sh`:
- Around line 172-174: The is_cloudflare_transient_text() function currently
uses a single long regex which is hard to maintain; refactor it into a list of
named pattern variables or an array of simpler regexes (e.g., PATTERN_UNMARSHAL,
PATTERN_DISABLED, PATTERN_DIAL_REGISTER, PATTERN_TIMEOUT, PATTERN_EOF,
PATTERN_START_FAIL, PATTERN_HTTP_ERRORS) and then test the input against each
pattern with multiple grep -qiE checks or a loop, preserving the same matching
semantics and returning success if any pattern matches; update the function to
include short comments above each pattern explaining what it detects so future
reviewers can easily modify or extend the transient-error list.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a799de06-5ec3-4f94-85f3-1a5937c91df2

📥 Commits

Reviewing files that changed from the base of the PR and between bbc80df and b87d64e.

📒 Files selected for processing (1)

test/e2e/test-tunnel-lifecycle.sh

github-actions · 2026-05-24T16:22:44Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26366332493
Target ref: b87d64e66a63cb58cd4e297f67b8c4eec4fd99a4
Workflow ref: main
Requested jobs: tunnel-lifecycle-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job	Result
tunnel-lifecycle-e2e	✅ success

cv · 2026-05-24T16:27:01Z

Prepared for review after #4153 merged:

Retargeted PR base to main.
Merged current origin/main; PR diff is now only test/e2e/test-tunnel-lifecycle.sh.
Addressed PR Review Advisor feedback by restricting CloudflareEdge skips to connection failure/000, 502/503/504, or known Cloudflare transient response text. Unexpected statuses such as 400/401/403/404 now remain NemoClaw-owned failures.
Marked the PR ready for review.

Validation:

bash -n test/e2e/test-tunnel-lifecycle.sh passed.
Lightweight helper check confirmed only 000/502/503/504 are classified as transient HTTP statuses.
npx prek run --all-files passed.
npm test passed.
Optional selective E2E tunnel-lifecycle-e2e passed: https://github.com/NVIDIA/NemoClaw/actions/runs/26366332493

Latest gh pr checks is green. GitHub still includes older cancelled check runs in the rollup from retarget/draft churn, but current checks are passing.

## Summary The post-#4154 nightly sweep still shows `hermes-inference-switch-e2e` failing after the route/config/hash checks pass, when live Hermes `inference.local` or API chat requests time out against the upstream model. This PR keeps route/config regressions blocking but classifies explicit post-switch live timeout/5xx probes as external/transient skips. ## Changes - Capture HTTP status for post-switch Hermes `inference.local` and API chat probes. - Track transient state structurally from curl exit 28 or HTTP 502/503/504, instead of matching arbitrary response text. - Convert post-switch `inference.local` or Hermes API transient exhaustion to SKIP after earlier route/config checks have passed. - Preserve FAIL for non-timeout wrong-content responses, unexpected HTTP statuses, and all route/config/hash/registry regressions. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [ ] `npm test` passes - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `make docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) --- Signed-off-by: Carlos Villela <cvillela@nvidia.com>  ## Summary by CodeRabbit * **Tests** * Improved live-inference test reliability with enhanced transient failure detection and retry behavior. * Upgraded HTTP response handling to separately capture status codes and body for clearer diagnostics and enriched failure messages. * Added conditional API authentication in test requests and skip-on-transient-failure behavior to reduce flaky failures.  [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/4158?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

cv added 5 commits May 24, 2026 00:04

fix(docker): retry gosu release download

e450d09

fix(docker): harden gosu curl download

68e5126

test(e2e): retry inference switch verification

26fdb76

test(e2e): rely on Kimi trajectory acceptance

bf26f2f

test(e2e): classify quick tunnel flakes as external

da77f2b

cv self-assigned this May 24, 2026

Merge remote-tracking branch 'origin/main' into draft/tunnel-lifecycl…

db05047

…e-external-skips # Conflicts: # test/e2e/test-hermes-inference-switch.sh # test/e2e/test-openclaw-inference-switch.sh

cv added the v0.0.51 Release target label May 24, 2026

test(e2e): restrict tunnel external skip statuses

b87d64e

cv changed the base branch from draft/kimi-e2e-final-text-accounting to main May 24, 2026 16:15

cv marked this pull request as ready for review May 24, 2026 16:15

coderabbitai Bot reviewed May 24, 2026

View reviewed changes

cv merged commit 69b4823 into main May 24, 2026
39 of 50 checks passed

cv mentioned this pull request May 24, 2026

test(e2e): classify Hermes live switch timeouts #4158

Merged

12 tasks

coderabbitai Bot mentioned this pull request May 25, 2026

test(e2e): classify OpenClaw live switch timeouts #4173

Merged

12 tasks

jyaunches mentioned this pull request May 26, 2026

test(e2e): migrate diagnostics, state, and runtime service coverage #3817

Closed

cv deleted the draft/tunnel-lifecycle-external-skips branch May 27, 2026 21:16

Conversation

cv commented May 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Type of Change

Verification

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot Bot commented May 24, 2026

Uh oh!

coderabbitai Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 24, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

Uh oh!

cv commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cv commented May 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 24, 2026 •

edited

Loading

github-actions Bot commented May 24, 2026 •

edited

Loading

github-actions Bot commented May 24, 2026 •

edited

Loading

github-actions Bot commented May 24, 2026 •

edited

Loading