Skip to content

[None][test] Remove RTX-6000 OOM test cases#12800

Merged
ruodil merged 4 commits intoNVIDIA:mainfrom
yufeiwu-nv:fix_RTX
Apr 8, 2026
Merged

[None][test] Remove RTX-6000 OOM test cases#12800
ruodil merged 4 commits intoNVIDIA:mainfrom
yufeiwu-nv:fix_RTX

Conversation

@yufeiwu-nv
Copy link
Copy Markdown
Collaborator

@yufeiwu-nv yufeiwu-nv commented Apr 7, 2026

Summary by CodeRabbit

Release Notes

  • Tests
    • Extended timeout for non-error performance test hangs from 10 to 30 minutes to improve test reliability
    • Rebalanced performance test configurations across different GPU hardware setups to enhance coverage for B200, B300, GB300, and RTX6000 server environments

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
@yufeiwu-nv yufeiwu-nv requested review from a team as code owners April 7, 2026 08:16
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
@yufeiwu-nv
Copy link
Copy Markdown
Collaborator Author

/bot skip --comment "only test list modify"

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 7, 2026

👎 Promotion blocked, new vulnerability found

Vulnerability report

Component Vulnerability Description Severity
encode/uvicorn CVE-2020-7694 This affects all versions of package uvicorn. The request logger provided by the package is vulnerable to ASNI escape sequence injection. Whenever any HTTP request is received, the default behaviour of uvicorn is to log its details to either the console or a log file. When attackers request crafted URLs with percent-encoded escape sequences, the logging component will log the URL after it's been processed with urllib.parse.unquote, therefore converting any percent-encoded characters into their single-character equivalent, which can have special meaning in terminal emulators. By requesting URLs with crafted paths, attackers can: * Pollute uvicorn's access logs, therefore jeopardising the integrity of such files. * Use ANSI sequence codes to attempt to interact with the terminal emulator that's displaying the logs (either in real time or from a file). HIGH
encode/uvicorn CVE-2020-7695 Uvicorn before 0.11.7 is vulnerable to HTTP response splitting. CRLF sequences are not escaped in the value of HTTP headers. Attackers can exploit this to add arbitrary headers to HTTP responses, or even return an arbitrary response body, whenever crafted input is used to construct HTTP headers. MEDIUM

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

The stall timeout for command execution in performance utilities was increased from 600 to 1800 seconds, with documentation comments added. Performance test cases for deepseek_r1_0528_fp4 were reorganized across different GPU system configurations in the test list.

Changes

Cohort / File(s) Summary
Utility Timeout Configuration
tests/integration/defs/perf/utils.py
Increased stall timeout from 600 to 1800 seconds with added comments documenting 30-minute kill threshold for non-error hangs and 3-minute threshold for error states.
Performance Test Case Reorganization
tests/integration/test_lists/qa/llm_perf_core.yml
Removed and repositioned multiple deepseek_r1_0528_fp4 fp4 test cases between GPU configuration sections; moved tests from RTX6000-Server to B200/GB200/B300/GB300 chunked prefill subsection; added new fp4 max throughput test for B200/B300 condition.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is entirely empty, containing only the repository template with placeholders and no actual description of changes, rationale, or test coverage. Provide a concrete description explaining the OOM test case changes, the rationale for moving/removing RTX-6000 tests, and what test coverage validates these changes.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title is partially related to the changeset, referring to RTX-6000 OOM test removal, but the changes also include increased stall timeout and test reorganization across GPU sections.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/defs/perf/utils.py`:
- Around line 135-138: The comments above the constants _STALL_TIMEOUT and
_ERROR_STALL_TIMEOUT violate PEP 8 E265 by missing a space after the '#'
characters; update the two comment lines that precede those constants so each
'#' is followed by a single space (e.g., "# if hang time > 30 mins, it will be
killed") to fix the lint error while leaving the constant names and values
unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f8627007-c275-47aa-a5fc-f02cc3b1a2da

📥 Commits

Reviewing files that changed from the base of the PR and between 4e69c14 and e242fd5.

📒 Files selected for processing (2)
  • tests/integration/defs/perf/utils.py
  • tests/integration/test_lists/qa/llm_perf_core.yml

Comment thread tests/integration/defs/perf/utils.py
@yufeiwu-nv
Copy link
Copy Markdown
Collaborator Author

/bot skip --comment "only test list modify"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42119 [ skip ] triggered by Bot. Commit: 545982e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42119 [ skip ] completed with state SUCCESS. Commit: 545982e
Skipping testing for commit 545982e

Link to invocation

@ruodil ruodil merged commit 6175e2c into NVIDIA:main Apr 8, 2026
5 checks passed
suyoggupta pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request Apr 8, 2026
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants