Skip to content

[TRTLLM-11574][feat] Some updates on Perf Sanity System codes#12430

Merged
chenfeiz0326 merged 5 commits intoNVIDIA:mainfrom
chenfeiz0326:chenfeiz/integrate-pre-merge-dashboard-into-ci-report
Apr 7, 2026
Merged

[TRTLLM-11574][feat] Some updates on Perf Sanity System codes#12430
chenfeiz0326 merged 5 commits intoNVIDIA:mainfrom
chenfeiz0326:chenfeiz/integrate-pre-merge-dashboard-into-ci-report

Conversation

@chenfeiz0326
Copy link
Copy Markdown
Collaborator

@chenfeiz0326 chenfeiz0326 commented Mar 22, 2026

Refactor perf regression system by using a 3-layers architecture.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 22, 2026

📝 Walkthrough

Walkthrough

This pull request removes the pre-merge perf sanity HTML report generation pipeline and replaces it with runtime regression checking. The Jenkins pipeline no longer generates or uploads perf reports, while the pytest execution now passes a stageName environment variable. Perf data records now include stage name and test list fields, and regression analysis is performed at test time rather than as a post-processing step.

Changes

Cohort / File(s) Summary
Jenkins Pipeline Configuration
jenkins/L0_MergeRequest.groovy, jenkins/L0_Test.groovy
Removed the "Collect Perf Sanity Test Result" stage that generated and uploaded HTML reports. Added stageName=${stageName} to environment variables passed to pytest execution.
Performance Report Generation
jenkins/scripts/perf/get_pre_merge_html.py
Removed entire script that loaded perf data YAML, queried OpenSearch history, and generated self-contained HTML reports with inline SVG charts.
Perf Testing Infrastructure
tests/integration/defs/perf/open_search_db_utils.py
Replaced generate_perf_yaml() with regression checking logic; removed SCENARIO_MATCH_FIELDS constant; implemented check_perf_regression() to filter regression entries, write regression_data.yaml optionally, and print regression details.
Perf Sanity Tests
tests/integration/defs/perf/test_perf_sanity.py
Switched postprocessing from YAML generation to regression checking with fail_on_regression=not is_post_merge. Removed scenario-specific match-field logic. Added s_stage_name and s_test_list fields to perf data records.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning The PR description is incomplete, containing only the repository template with empty sections for Description and Test Coverage. Complete the Description section explaining what changes were made and why. Add a Test Coverage section listing relevant tests that validate the refactoring.
Title check ❓ Inconclusive The title uses vague language ('Some updates') that obscures the actual scope of changes, which involve removing perf sanity report generation and shifting to regression checking. Consider using a more descriptive title like 'Replace perf sanity report generation with regression checking' or 'Integrate regression checking into perf validation workflow'.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/defs/perf/open_search_db_utils.py`:
- Around line 649-653: The code currently treats records with the default
b_is_regression=False as "no regression" even when history lookups were skipped;
update prepare_regressive_test_cases() to surface whether a history comparison
actually ran (e.g., set a per-record flag like b_history_checked or return an
overall comparisons_performed boolean) and then change the
regressive_data_list/filtering (the comprehension over new_data_dict.values()
and the logging that prints "No regression data found.") to only consider
records where history was checked (or only log "no regression" when
comparisons_performed is true). Ensure both locations (the regressive_data_list
filter and the similar branch around lines ~700-702) use that new flag/returned
boolean to avoid false-green reports during skipped history lookups.
- Around line 631-637: The loop that prints config entries (using config_keys,
print_func, and skipping s_regression_info) currently emits sensitive env var
fields like s_server_env_var and s_client_env_vars; update the printing logic to
redact any key that contains "env_var" or matches disaggregated variants (e.g.,
keys starting with "s_" and containing "env_var") before calling print_func, or
better yet replace the current full-dump approach with an explicit allowlist of
safe keys to print and skip everything else; ensure you reference and modify the
block that builds config_keys and the loop that iterates over it so keys
matching the env-var pattern are printed as redacted (e.g., "<REDACTED>") or
omitted.

In `@tests/integration/defs/perf/test_perf_sanity.py`:
- Around line 1525-1529: The aggregated-match branch currently always extends
match_keys with server_config.to_match_keys() and client_config.to_match_keys(),
ignoring any ServerConfig.match_mode/client_config.match_mode declarations;
change the logic so after seeding match_keys with ["s_gpu_type","s_runtime"] it
only extends server_config.to_match_keys() if server_config.match_mode !=
"scenario" (and likewise only extend client_config.to_match_keys() if
client_config.match_mode != "scenario"), so configs that declare match_mode:
scenario keep the scenario-only matching behavior; reference
ServerConfig.match_mode, match_keys, server_config.to_match_keys(), and
client_config.to_match_keys() when making the conditional change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fec06597-6aa7-4870-a866-c3f3456d511f

📥 Commits

Reviewing files that changed from the base of the PR and between 6dd98b8 and 14f412b.

📒 Files selected for processing (5)
  • jenkins/L0_MergeRequest.groovy
  • jenkins/L0_Test.groovy
  • jenkins/scripts/perf/get_pre_merge_html.py
  • tests/integration/defs/perf/open_search_db_utils.py
  • tests/integration/defs/perf/test_perf_sanity.py
💤 Files with no reviewable changes (2)
  • jenkins/L0_MergeRequest.groovy
  • jenkins/scripts/perf/get_pre_merge_html.py

Comment thread tests/integration/defs/perf/open_search_db_utils.py Outdated
Comment thread tests/integration/defs/perf/open_search_db_utils.py Outdated
Comment thread tests/integration/defs/perf/test_perf_sanity.py
@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39947 [ run ] triggered by Bot. Commit: cf59812 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39947 [ run ] completed with state SUCCESS. Commit: cf59812
/LLM/main/L0_MergeRequest_PR pipeline #31112 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40025 [ run ] triggered by Bot. Commit: cf59812 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40025 [ run ] completed with state SUCCESS. Commit: cf59812
/LLM/main/L0_MergeRequest_PR pipeline #31182 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40072 [ run ] triggered by Bot. Commit: cf59812 Link to invocation

@chenfeiz0326 chenfeiz0326 enabled auto-merge (squash) March 24, 2026 06:24
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40072 [ run ] completed with state SUCCESS. Commit: cf59812
/LLM/main/L0_MergeRequest_PR pipeline #31224 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40129 [ run ] triggered by Bot. Commit: cf59812 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40129 [ run ] completed with state SUCCESS. Commit: cf59812
/LLM/main/L0_MergeRequest_PR pipeline #31276 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40216 [ run ] triggered by Bot. Commit: cf59812 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40216 [ run ] completed with state SUCCESS. Commit: cf59812
/LLM/main/L0_MergeRequest_PR pipeline #31353 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40342 [ run ] triggered by Bot. Commit: cf59812 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #40342 [ run ] completed with state FAILURE. Commit: cf59812
/LLM/main/L0_MergeRequest_PR pipeline #31447 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41155 [ run ] triggered by Bot. Commit: 5d5147a Link to invocation

@chenfeiz0326 chenfeiz0326 disabled auto-merge April 1, 2026 08:39
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41155 [ run ] completed with state SUCCESS. Commit: 5d5147a
/LLM/main/L0_MergeRequest_PR pipeline #32124 completed with status: 'SUCCESS'

CI Report

Link to invocation

@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --post-merge

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41217 [ run ] triggered by Bot. Commit: 3889600 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41220 [ run ] triggered by Bot. Commit: 3889600 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41220 [ run ] completed with state SUCCESS. Commit: 3889600
/LLM/main/L0_MergeRequest_PR pipeline #32180 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
@chenfeiz0326 chenfeiz0326 force-pushed the chenfeiz/integrate-pre-merge-dashboard-into-ci-report branch from 3889600 to 9dbe2fa Compare April 2, 2026 03:55
Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --post-merge

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41331 [ run ] triggered by Bot. Commit: fc5cd09 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41331 [ run ] completed with state SUCCESS. Commit: fc5cd09
/LLM/main/L0_MergeRequest_PR pipeline #32279 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
@chenfeiz0326 chenfeiz0326 changed the title [TRTLLM-11574][feat] Some updates for integrating Pre-Merge Report into CI Report [TRTLLM-11574][feat] Some updates on Perf Sanity System codes Apr 3, 2026
Chenfei Zhang added 2 commits April 3, 2026 01:43
Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --post-merge

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41688 [ run ] triggered by Bot. Commit: 3f1167e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41688 [ run ] completed with state SUCCESS. Commit: 3f1167e
/LLM/main/L0_MergeRequest_PR pipeline #32591 completed with status: 'ABORTED'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42036 [ run ] triggered by Bot. Commit: 3f1167e Link to invocation

@chenfeiz0326 chenfeiz0326 enabled auto-merge (squash) April 7, 2026 04:02
@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42036 [ run ] completed with state SUCCESS. Commit: 3f1167e
/LLM/main/L0_MergeRequest_PR pipeline #32882 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42121 [ run ] triggered by Bot. Commit: 3f1167e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42121 [ run ] completed with state SUCCESS. Commit: 3f1167e
/LLM/main/L0_MergeRequest_PR pipeline #32958 completed with status: 'SUCCESS'

CI Report

Link to invocation

@chenfeiz0326 chenfeiz0326 merged commit 390c093 into NVIDIA:main Apr 7, 2026
5 checks passed
karen-sy pushed a commit to karen-sy/TensorRT-LLM that referenced this pull request Apr 7, 2026
…#12430)

Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
Co-authored-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
suyoggupta pushed a commit to nv-auto-deploy/TensorRT-LLM that referenced this pull request Apr 8, 2026
…#12430)

Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
Co-authored-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants