[None][infra] Fix hang when generating report#14625
Conversation
Signed-off-by: EmmaQiaoCh <qqiao@nvidia.com>
|
/bot run |
📝 WalkthroughWalkthroughThe PR adds a 15-minute timeout wrapper around the rerun report generation step in the LLM test execution pipeline, preventing indefinite execution if the report generation hangs or performs unexpectedly. ChangesRerun Report Generation Timeout
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
jenkins/L0_Test.groovy (1)
3285-3287: ⚖️ Poor tradeoffConsider graceful timeout handling to avoid masking test failures.
The timeout wrapper correctly prevents indefinite hangs during report generation. However, if the timeout fires, it will throw an exception that fails the stage before the
rerunFailedcheck at line 3290, potentially masking actual test failures. Since rerun report generation is supplementary to the core test results, consider wrapping the timeout block in a try-catch to log a warning and continue:♻️ Proposed fix for graceful timeout handling
// Generate comprehensive rerun report if any reruns occurred stage ("Generate Report") { - timeout(time: 15, unit: 'MINUTES'){ - generateRerunReport(stageName, llmSrc) + try { + timeout(time: 15, unit: 'MINUTES'){ + generateRerunReport(stageName, llmSrc) + } + } catch (org.jenkinsci.plugins.workflow.steps.FlowInterruptedException e) { + if (e.causes.any { it instanceof org.jenkinsci.plugins.workflow.steps.TimeoutStepExecution.ExceededTimeout }) { + echo "WARNING: Rerun report generation timed out after 15 minutes. Continuing without rerun report." + } else { + throw e + } } }This ensures that test failures are properly reported even if report generation hangs.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@jenkins/L0_Test.groovy` around lines 3285 - 3287, Wrap the timeout(...) { generateRerunReport(stageName, llmSrc) } call in a try-catch so that if the timeout throws it is caught, a warning is logged, and execution continues to the existing rerunFailed check; specifically, catch the timeout/Exception around the timeout block that invokes generateRerunReport and use the pipeline logger to emit a warning (including the exception message) rather than letting the exception fail the stage.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@jenkins/L0_Test.groovy`:
- Around line 3285-3287: Wrap the timeout(...) { generateRerunReport(stageName,
llmSrc) } call in a try-catch so that if the timeout throws it is caught, a
warning is logged, and execution continues to the existing rerunFailed check;
specifically, catch the timeout/Exception around the timeout block that invokes
generateRerunReport and use the pipeline logger to emit a warning (including the
exception message) rather than letting the exception fail the stage.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 72595a57-6b81-4d8f-9007-da096ebb2caf
📒 Files selected for processing (1)
jenkins/L0_Test.groovy
|
PR_Github #50508 [ run ] triggered by Bot. Commit: |
|
PR_Github #50508 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #50565 [ run ] triggered by Bot. Commit: |
|
PR_Github #50565 [ run ] completed with state
|
|
/bot skip --comment "The timeout will not impact CI flow" |
|
PR_Github #50670 [ skip ] triggered by Bot. Commit: |
|
PR_Github #50670 [ skip ] completed with state |
Summary by CodeRabbit
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either
api-compatibleorapi-breaking. Forapi-breaking, includeBREAKINGin the PR title.Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.