Skip to content

test(e2e): retry empty TUI chat event captures#4255

Merged
cv merged 2 commits into
mainfrom
fix/tui-correlation-empty-event-retry
May 26, 2026
Merged

test(e2e): retry empty TUI chat event captures#4255
cv merged 2 commits into
mainfrom
fix/tui-correlation-empty-event-retry

Conversation

@cv
Copy link
Copy Markdown
Collaborator

@cv cv commented May 26, 2026

Summary

Retries the OpenClaw TUI chat correlation live repro only when an accepted-send attempt captures zero chat events, which matches the dominant harness flake. The strict correlation assertions remain unchanged for empty finals, duplicate turns, missing replies with observed events, and uncorrelated replies.

Changes

  • Add a one-time retry path for zero-event live TUI/webchat correlation captures with a fresh session key.
  • Add per-attempt diagnostics to the failure summary, including sent runs, event counts, and compact history messages.
  • Add a unit regression check that the retry predicate applies only to the zero-chat-events case and does not mask empty final events.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • make docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

  • Tests
    • Improved test infrastructure with per-run unique session identifiers.
    • Added detection for a specific event-capture failure mode and a retry wrapper to rerun repros when that condition occurs.
    • Enhanced failure reporting to include richer diagnostic context and aggregated attempt data.
    • Increased live-test timeout and added a unit test validating the event-capture failure classifier.

Review Change Stack

@cv cv self-assigned this May 26, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 26, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 20a72d7e-3745-41ff-b5b9-0496b79f7a4f

📥 Commits

Reviewing files that changed from the base of the PR and between 76bd808 and 749bc90.

📒 Files selected for processing (1)
  • test/openclaw-tui-chat-correlation.test.ts

📝 Walkthrough

Walkthrough

This test update adds intelligent retry logic for chat event capture failures in Issue 2603 live reproduction. A new session key generation scheme with UUID suffixes ensures per-run uniqueness, detection logic identifies zero-event failure modes, and a retry wrapper conditionally re-executes the test. Failure reporting now captures attempt histories and trace-derived metrics.

Changes

Issue 2603 Event Capture Resilience

Layer / File(s) Summary
Dependencies and data model
test/openclaw-tui-chat-correlation.test.ts
Import randomUUID from node:crypto and define LiveIssue2603Run type to bundle a single repro trace with an ordered list of all attempts.
Failure reporting pipeline
test/openclaw-tui-chat-correlation.test.ts
Add compactHistoryMessages and summarizeAttempt helpers; extend buildFailureSummary to accept optional trace and attempt list so emitted failure JSON includes expectations, trace-derived event counts, and per-attempt summaries.
Event capture retry mechanism
test/openclaw-tui-chat-correlation.test.ts
Update session key generation to include UUID suffix; introduce looksLikeEventCaptureFailure classifier to detect zero-captured-events scenarios; add runLiveIssue2603ReproWithEventCaptureRetry wrapper to conditionally retry on capture failure and return both final repro and all attempts.
Test coverage and integration
test/openclaw-tui-chat-correlation.test.ts
Add unit test validating event-capture failure classifier behavior, and update live sandbox regression test to invoke retry wrapper, pass trace and attempts to failure summary builder, and increase timeout allowance.

Sequence Diagram

sequenceDiagram
  participant LiveTest as Live Sandbox Test
  participant RetryWrapper as runLiveIssue2603ReproWithEventCaptureRetry
  participant ReproExec as Live Repro Execution
  participant Classifier as looksLikeEventCaptureFailure
  participant Reporting as buildFailureSummary

  LiveTest->>RetryWrapper: trigger with UUID session key
  RetryWrapper->>ReproExec: run repro (attempt 1)
  ReproExec-->>RetryWrapper: repro trace
  RetryWrapper->>Classifier: analyze trace for zero captured events
  alt Capture failure detected
    RetryWrapper->>ReproExec: run repro (attempt 2)
    ReproExec-->>RetryWrapper: repro trace
  end
  RetryWrapper-->>LiveTest: return {repro, attempts}
  LiveTest->>Reporting: pass trace + attempts
  Reporting-->>LiveTest: enriched failure JSON
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibble logs and chase each missing trace,
UUID tails keep every run in place,
When captures vanish, I hop back to try,
Two runs recorded, then the summaries fly—
A rabbit's cheers for tests that don't say die.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'test(e2e): retry empty TUI chat event captures' directly and specifically describes the main change: adding retry logic for empty TUI chat event captures in end-to-end tests.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/tui-correlation-empty-event-retry

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

E2E Advisor Recommendation

Required E2E: None
Optional E2E: openclaw-tui-chat-correlation-e2e

Dispatch hint: openclaw-tui-chat-correlation-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • None. No merge-blocking E2E is required because this PR changes only an existing test file/E2E harness and cannot affect NemoClaw runtime behavior or real user flows. The directly affected E2E is recommended as optional to validate the harness change itself.

Optional E2E

  • openclaw-tui-chat-correlation-e2e (high): Optional confidence check for the modified live E2E harness and retry behavior. This is the directly affected existing job and runs test/e2e/test-openclaw-tui-chat-correlation.sh, which invokes test/openclaw-tui-chat-correlation.test.ts with NEMOCLAW_ISSUE_2603_LIVE=1.

New E2E recommendations

  • None.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: openclaw-tui-chat-correlation-e2e

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • None. No scenario workflow, scenario metadata, scenario runtime, or validation-suite files changed.

Optional scenario E2E

  • None.

Relevant changed files

  • None.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 26, 2026

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Since last review: 2 prior items resolved, 0 still apply, 0 new items found

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@cv cv added the v0.0.51 Release target label May 26, 2026
@wscurran wscurran added E2E End-to-end testing — Brev infrastructure, test cases, nightly failures, and coverage gaps fix labels May 26, 2026
@cv
Copy link
Copy Markdown
Collaborator Author

cv commented May 26, 2026

Addressed the review advisor note in 749bc90 by adding source-boundary and removal-criteria context next to the retry helper. The comment now documents that the tolerated zero-event state is a live repro observability race at the pinned OpenClaw gateway/websocket boundary, that this PR keeps the strict #2603/#3145 assertions intact, and when to remove the guard.

Verification re-run:

  • npx vitest run test/openclaw-tui-chat-correlation.test.ts --reporter=verbose
  • npx prek run --all-files

@cv cv requested a review from ericksoa May 26, 2026 18:21
@cv cv added v0.0.52 Release target and removed v0.0.51 Release target labels May 26, 2026
@cv cv merged commit 4bb00d3 into main May 26, 2026
28 checks passed
@cv cv deleted the fix/tui-correlation-empty-event-retry branch May 27, 2026 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

E2E End-to-end testing — Brev infrastructure, test cases, nightly failures, and coverage gaps fix v0.0.52 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants