Skip to content

test(e2e): classify quick tunnel flakes as external#4154

Merged
cv merged 7 commits into
mainfrom
draft/tunnel-lifecycle-external-skips
May 24, 2026
Merged

test(e2e): classify quick tunnel flakes as external#4154
cv merged 7 commits into
mainfrom
draft/tunnel-lifecycle-external-skips

Conversation

@cv
Copy link
Copy Markdown
Collaborator

@cv cv commented May 24, 2026

Summary

The nightly flake sweep had a tunnel lifecycle failure caused by Cloudflare quick-tunnel registration/parsing instability (failed to unmarshal quick Tunnel). This PR keeps NemoClaw tunnel failures blocking, but classifies known Cloudflare quick-tunnel registration/edge instability as an external skip instead of a NemoClaw test failure.

Changes

  • Add a shared Cloudflare-transient classifier for quick-tunnel registration, timeout, EOF, 502/503/504, and failed to unmarshal quick Tunnel messages.
  • Classify nemoclaw tunnel start non-zero exits caused by known quick-tunnel external failures as SKIP, with diagnostics and best-effort cleanup.
  • Treat healthy-local-dashboard but unreachable quick-tunnel edge after retries as SKIP only for connection failure/000, 502/503/504, or known Cloudflare transient response text.
  • Preserve failures for NemoClaw-owned cases: missing cloudflared spawn, status URL capture bug, local dashboard regression, unexpected HTTP statuses such as 400/401/403/404, wrong content, and stop/status failures.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

Release Notes

  • Tests
    • Improved detection and handling of transient Cloudflare tunnel failures to reduce false test failures and enhance error classification accuracy.

Review Change Stack

@cv cv self-assigned this May 24, 2026
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 24, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 24, 2026

📝 Walkthrough

Walkthrough

This PR improves test resilience by adding Cloudflare transient failure detection to the tunnel lifecycle test. Two new helper functions identify transient Cloudflare errors from log text and HTTP status codes. Error classification is updated to use these helpers, then applied across three failure paths to skip tests on known transient errors instead of failing.

Changes

Cloudflare Transient Error Detection

Layer / File(s) Summary
Transient detection helpers
test/e2e/test-tunnel-lifecycle.sh
New is_cloudflare_transient_text() and is_cloudflare_transient_http_code() functions detect transient Cloudflare quick-tunnel failures from log patterns and status codes (000, 502/503/504).
Error classification integration
test/e2e/test-tunnel-lifecycle.sh
classify_cloudflared_log() now uses the transient-text helper instead of direct grep matching to determine Cloudflare error classification.
Conditional skip/fail logic at failure points
test/e2e/test-tunnel-lifecycle.sh
Tunnel startup, URL-not-found diagnostics, and reachability retry failures now skip on known transient Cloudflare errors (and print diagnostic logs) instead of failing unconditionally; reachability also distinguishes transient outcomes from unexpected HTTP statuses.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested reviewers

  • jyaunches

Poem

A rabbit hops through test logs clear, 🐰
Finding transient clouds without fear,
When Cloudflare blinks, the test now skips,
No false alarms from network blips! ✨
Resilience blooms where errors were old,
In this script, a story retold.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'test(e2e): classify quick tunnel flakes as external' accurately and clearly summarizes the main change—classifying Cloudflare quick-tunnel failures as external skip results rather than test failures.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch draft/tunnel-lifecycle-external-skips

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

E2E Advisor Recommendation

Required E2E: None
Optional E2E: tunnel-lifecycle-e2e

Dispatch hint: tunnel-lifecycle-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • None. Only an existing E2E test script changed; no production runtime or user-flow code was touched, so no merge-blocking E2E is required. The modified tunnel lifecycle job may be run optionally to validate the test harness change.

Optional E2E

  • tunnel-lifecycle-e2e (medium; 60-minute timeout, installs/onboards a sandbox and uses cloudflared plus NVIDIA_API_KEY): Optional validation of the modified E2E script itself. This job directly runs test/e2e/test-tunnel-lifecycle.sh and will exercise the changed Cloudflare transient classification and tunnel start/probe/stop assertions.

New E2E recommendations

  • None.

Dispatch hint

  • Workflow: E2E / Nightly
  • jobs input: tunnel-lifecycle-e2e

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • None. No scenario workflow, scenario metadata, scenario runtime, or validation-suite files changed.

Optional scenario E2E

  • None.

Relevant changed files

  • None.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Since last review: 1 prior item resolved, 0 still apply, 0 new items found

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

…e-external-skips

# Conflicts:
#	test/e2e/test-hermes-inference-switch.sh
#	test/e2e/test-openclaw-inference-switch.sh
@cv cv added the v0.0.51 Release target label May 24, 2026
@cv cv changed the base branch from draft/kimi-e2e-final-text-accounting to main May 24, 2026 16:15
@cv cv marked this pull request as ready for review May 24, 2026 16:15
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/e2e/test-tunnel-lifecycle.sh (1)

172-174: 💤 Low value

Consider breaking the regex into documented patterns.

The single 300+ character regex is functional but hard to maintain. Consider refactoring to multiple grep calls or a documented pattern list for easier modification and review.

♻️ Example refactor (optional)
 is_cloudflare_transient_text() {
-  grep -qiE 'failed to unmarshal quick Tunnel|quick tunnels? (are )?(temporarily )?disabled|failed to (dial|register)|tunnel server.*error|i/o timeout|EOF.*tunnel|couldn.?t start tunnel|tunnel creation failed|bad gateway|\b50[234]\b' <<<"$1"
+  local patterns=(
+    'failed to unmarshal quick Tunnel'
+    'quick tunnels? (are )?(temporarily )?disabled'
+    'failed to (dial|register)'
+    'tunnel server.*error'
+    'i/o timeout'
+    'EOF.*tunnel'
+    "couldn.?t start tunnel"
+    'tunnel creation failed'
+    'bad gateway'
+    '\b50[234]\b'
+  )
+  local pattern
+  pattern=$(IFS='|'; echo "${patterns[*]}")
+  grep -qiE "$pattern" <<<"$1"
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/e2e/test-tunnel-lifecycle.sh` around lines 172 - 174, The
is_cloudflare_transient_text() function currently uses a single long regex which
is hard to maintain; refactor it into a list of named pattern variables or an
array of simpler regexes (e.g., PATTERN_UNMARSHAL, PATTERN_DISABLED,
PATTERN_DIAL_REGISTER, PATTERN_TIMEOUT, PATTERN_EOF, PATTERN_START_FAIL,
PATTERN_HTTP_ERRORS) and then test the input against each pattern with multiple
grep -qiE checks or a loop, preserving the same matching semantics and returning
success if any pattern matches; update the function to include short comments
above each pattern explaining what it detects so future reviewers can easily
modify or extend the transient-error list.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/e2e/test-tunnel-lifecycle.sh`:
- Around line 172-174: The is_cloudflare_transient_text() function currently
uses a single long regex which is hard to maintain; refactor it into a list of
named pattern variables or an array of simpler regexes (e.g., PATTERN_UNMARSHAL,
PATTERN_DISABLED, PATTERN_DIAL_REGISTER, PATTERN_TIMEOUT, PATTERN_EOF,
PATTERN_START_FAIL, PATTERN_HTTP_ERRORS) and then test the input against each
pattern with multiple grep -qiE checks or a loop, preserving the same matching
semantics and returning success if any pattern matches; update the function to
include short comments above each pattern explaining what it detects so future
reviewers can easily modify or extend the transient-error list.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a799de06-5ec3-4f94-85f3-1a5937c91df2

📥 Commits

Reviewing files that changed from the base of the PR and between bbc80df and b87d64e.

📒 Files selected for processing (1)
  • test/e2e/test-tunnel-lifecycle.sh

@github-actions
Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26366332493
Target ref: b87d64e66a63cb58cd4e297f67b8c4eec4fd99a4
Workflow ref: main
Requested jobs: tunnel-lifecycle-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
tunnel-lifecycle-e2e ✅ success

@cv cv merged commit 69b4823 into main May 24, 2026
39 of 50 checks passed
@cv
Copy link
Copy Markdown
Collaborator Author

cv commented May 24, 2026

Prepared for review after #4153 merged:

  • Retargeted PR base to main.
  • Merged current origin/main; PR diff is now only test/e2e/test-tunnel-lifecycle.sh.
  • Addressed PR Review Advisor feedback by restricting CloudflareEdge skips to connection failure/000, 502/503/504, or known Cloudflare transient response text. Unexpected statuses such as 400/401/403/404 now remain NemoClaw-owned failures.
  • Marked the PR ready for review.

Validation:

  • bash -n test/e2e/test-tunnel-lifecycle.sh passed.
  • Lightweight helper check confirmed only 000/502/503/504 are classified as transient HTTP statuses.
  • npx prek run --all-files passed.
  • npm test passed.
  • Optional selective E2E tunnel-lifecycle-e2e passed: https://github.com/NVIDIA/NemoClaw/actions/runs/26366332493

Latest gh pr checks is green. GitHub still includes older cancelled check runs in the rollup from retarget/draft churn, but current checks are passing.

cv added a commit that referenced this pull request May 25, 2026
## Summary
The post-#4154 nightly sweep still shows `hermes-inference-switch-e2e`
failing after the route/config/hash checks pass, when live Hermes
`inference.local` or API chat requests time out against the upstream
model. This PR keeps route/config regressions blocking but classifies
explicit post-switch live timeout/5xx probes as external/transient
skips.

## Changes
- Capture HTTP status for post-switch Hermes `inference.local` and API
chat probes.
- Track transient state structurally from curl exit 28 or HTTP
502/503/504, instead of matching arbitrary response text.
- Convert post-switch `inference.local` or Hermes API transient
exhaustion to SKIP after earlier route/config checks have passed.
- Preserve FAIL for non-timeout wrong-content responses, unexpected HTTP
statuses, and all route/config/hash/registry regressions.

## Type of Change
- [x] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification
- [x] `npx prek run --all-files` passes
- [ ] `npm test` passes
- [x] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [ ] Docs updated for user-facing behavior changes
- [ ] `make docs` builds without warnings (doc changes only)
- [ ] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

---
Signed-off-by: Carlos Villela <cvillela@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Tests**
* Improved live-inference test reliability with enhanced transient
failure detection and retry behavior.
* Upgraded HTTP response handling to separately capture status codes and
body for clearer diagnostics and enriched failure messages.
* Added conditional API authentication in test requests and
skip-on-transient-failure behavior to reduce flaky failures.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/4158?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@cv cv deleted the draft/tunnel-lifecycle-external-skips branch May 27, 2026 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v0.0.51 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants