[TRTLLM-11574][feat] Some updates on Perf Sanity System codes#12430
Conversation
📝 WalkthroughWalkthroughThis pull request removes the pre-merge perf sanity HTML report generation pipeline and replaces it with runtime regression checking. The Jenkins pipeline no longer generates or uploads perf reports, while the pytest execution now passes a Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ❌ 3❌ Failed checks (2 warnings, 1 inconclusive)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integration/defs/perf/open_search_db_utils.py`:
- Around line 649-653: The code currently treats records with the default
b_is_regression=False as "no regression" even when history lookups were skipped;
update prepare_regressive_test_cases() to surface whether a history comparison
actually ran (e.g., set a per-record flag like b_history_checked or return an
overall comparisons_performed boolean) and then change the
regressive_data_list/filtering (the comprehension over new_data_dict.values()
and the logging that prints "No regression data found.") to only consider
records where history was checked (or only log "no regression" when
comparisons_performed is true). Ensure both locations (the regressive_data_list
filter and the similar branch around lines ~700-702) use that new flag/returned
boolean to avoid false-green reports during skipped history lookups.
- Around line 631-637: The loop that prints config entries (using config_keys,
print_func, and skipping s_regression_info) currently emits sensitive env var
fields like s_server_env_var and s_client_env_vars; update the printing logic to
redact any key that contains "env_var" or matches disaggregated variants (e.g.,
keys starting with "s_" and containing "env_var") before calling print_func, or
better yet replace the current full-dump approach with an explicit allowlist of
safe keys to print and skip everything else; ensure you reference and modify the
block that builds config_keys and the loop that iterates over it so keys
matching the env-var pattern are printed as redacted (e.g., "<REDACTED>") or
omitted.
In `@tests/integration/defs/perf/test_perf_sanity.py`:
- Around line 1525-1529: The aggregated-match branch currently always extends
match_keys with server_config.to_match_keys() and client_config.to_match_keys(),
ignoring any ServerConfig.match_mode/client_config.match_mode declarations;
change the logic so after seeding match_keys with ["s_gpu_type","s_runtime"] it
only extends server_config.to_match_keys() if server_config.match_mode !=
"scenario" (and likewise only extend client_config.to_match_keys() if
client_config.match_mode != "scenario"), so configs that declare match_mode:
scenario keep the scenario-only matching behavior; reference
ServerConfig.match_mode, match_keys, server_config.to_match_keys(), and
client_config.to_match_keys() when making the conditional change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fec06597-6aa7-4870-a866-c3f3456d511f
📒 Files selected for processing (5)
jenkins/L0_MergeRequest.groovyjenkins/L0_Test.groovyjenkins/scripts/perf/get_pre_merge_html.pytests/integration/defs/perf/open_search_db_utils.pytests/integration/defs/perf/test_perf_sanity.py
💤 Files with no reviewable changes (2)
- jenkins/L0_MergeRequest.groovy
- jenkins/scripts/perf/get_pre_merge_html.py
|
/bot run --disable-fail-fast |
|
PR_Github #39947 [ run ] triggered by Bot. Commit: |
|
PR_Github #39947 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #40025 [ run ] triggered by Bot. Commit: |
|
PR_Github #40025 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #40072 [ run ] triggered by Bot. Commit: |
|
PR_Github #40072 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #40129 [ run ] triggered by Bot. Commit: |
|
PR_Github #40129 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #40216 [ run ] triggered by Bot. Commit: |
|
PR_Github #40216 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #40342 [ run ] triggered by Bot. Commit: |
|
PR_Github #40342 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #41155 [ run ] triggered by Bot. Commit: |
|
PR_Github #41155 [ run ] completed with state |
|
/bot run --disable-fail-fast --post-merge |
|
PR_Github #41217 [ run ] triggered by Bot. Commit: |
|
PR_Github #41220 [ run ] triggered by Bot. Commit: |
|
PR_Github #41220 [ run ] completed with state
|
3889600 to
9dbe2fa
Compare
|
/bot run --disable-fail-fast --post-merge |
|
PR_Github #41331 [ run ] triggered by Bot. Commit: |
|
PR_Github #41331 [ run ] completed with state
|
|
/bot run --disable-fail-fast --post-merge |
|
PR_Github #41688 [ run ] triggered by Bot. Commit: |
|
PR_Github #41688 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #42036 [ run ] triggered by Bot. Commit: |
|
/bot run --disable-fail-fast |
|
PR_Github #42036 [ run ] completed with state
|
|
PR_Github #42121 [ run ] triggered by Bot. Commit: |
|
PR_Github #42121 [ run ] completed with state |
…#12430) Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster> Co-authored-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
…#12430) Signed-off-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster> Co-authored-by: Chenfei Zhang <chenfeiz@oci-hsg-cs-001-vscode-01.cm.cluster>
Refactor perf regression system by using a 3-layers architecture.
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.