[TRTLLM-12152][infra] change based testing rules on tests by crazydemo · Pull Request #13899 · NVIDIA/TensorRT-LLM

crazydemo · 2026-05-08T09:45:42Z

Summary by CodeRabbit

New Features
- Added three new change-based test selection rules: test definition matching, test list validation, and out-of-scope detection.
- Introduced dry-run tool to test CBTS behavior across historical commits.
- Enhanced test matching to support both test IDs and file path lookups.
Documentation
- Expanded rule system documentation with detailed scope definitions and fallback behaviors.
- Added comprehensive rule architecture guide with trigger patterns and stage resolution logic.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-05-08T09:53:18Z

📝 Walkthrough

Walkthrough

This PR expands CBTS from a single-rule waive-matching system to a multi-rule architecture. It adds path-based YAML matching in YAMLIndex, introduces three new rules (TestsDefRule for test file changes, TestListRule for test-db YAML additions, OutOfScopeRule for exclusions), refactors WaivesRule to use shared helpers, updates scope-combination logic to merge testsonly-family scopes, and includes a dry-run tool for testing against historical commits.

Changes

CBTS Multi-Rule System

Layer / File(s)	Summary
Architecture & Documentation `jenkins/scripts/cbts/README.md`, `jenkins/scripts/cbts/rules/README.md`	READMEs expanded to document four-rule system, scope combinations, YAML lookup algorithms including path-based matching, trigger-mode filtering, and guidelines for adding rules.
YAMLIndex Path Matching `jenkins/scripts/cbts/blocks.py`	Added git-path-to-YAML-namespace translation (`git_path_to_yaml_key`), path-based block matching (`find_match_for_path`), and support data (`_yaml_first_components`, `_path_lookup_anchor`).
Shared Helper Utilities `jenkins/scripts/cbts/rules/_helpers.py`	New module providing diff parsing (`iter_diff_changes`, `iter_diff_post_line_numbers`), stage grouping (`stages_by_yaml_stem`), and block-filter resolution (`lookup_ids_into_block_filters`, `lookup_paths_into_block_filters`, `resolve_affected_stages`).
Rule Implementations `jenkins/scripts/cbts/rules/out_of_scope_rule.py`, `jenkins/scripts/cbts/rules/test_list_rule.py`, `jenkins/scripts/cbts/rules/tests_def_rule.py`, `jenkins/scripts/cbts/rules/waives_rule.py`	OutOfScopeRule identifies out-of-scope paths; TestListRule narrows stages from test-db YAML additions; TestsDefRule maps test file changes to affected blocks via AST and path matching; WaivesRule refactored to use shared helpers.
Rules Orchestration `jenkins/scripts/cbts/main.py`	RULE_CLASSES extended to register all four rules; `build_rules()` now accepts `repo_root` for TestsDefRule; `_combine_scopes()` merges waiveonly/testdefonly/testlistonly into testsonly; selector respects noop scope.
Dry-Run Testing Tool `jenkins/scripts/cbts/tools/dryrun.py`	New executable script replaying CBTS over historical git commits, generating per-commit output directories and an INDEX.md summary with scope distribution.

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: introducing new change-based testing rules focused on test file modifications to improve CI test selection.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

Tools execution failed with the following error:

Failed to run tools: 14 UNAVAILABLE: read ECONNRESET

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@jenkins/scripts/cbts/rules/test_list_rule.py`:
- Around line 40-45: The _ENTRY_RE currently treats any indented list item as a
test entry; restrict matching so only list items that are children of a tests:
field are captured (or alternatively parse indentation context) by checking for
a preceding "tests:" key and matching indentation level before accepting "-
<pytest_id>" — update _ENTRY_RE and the related logic (also applied in the code
around lines 72-84) to require the same indentation as the tests: block or to
detect the nearest parent key == "tests" before classifying entries, so
wildcard/non-tests lists (e.g. "- \"*h100*\"") are not misclassified as test
IDs.

In `@jenkins/scripts/cbts/rules/tests_def_rule.py`:
- Around line 143-146: The try/except that reads the changed test asset with
(self._repo_root / git_path).read_text(encoding="utf-8") only catches OSError
but not decode failures; update the handler to also catch UnicodeDecodeError (or
UnicodeError) so binary/non-UTF8 fixtures fall back to returning [yaml_path]
instead of crashing, and make the same change for the second occurrence that
mirrors this logic later in the file (the other read_text block around the
second occurrence).
- Around line 167-181: The current per-model refinement is unsafe because
iter_diff_post_line_numbers can misattribute deleted top-level model sections;
fix by detecting deletions in the raw diff and falling back to the full file
when any deletion could affect top-level keys: in the block using diff, before
calling iter_diff_post_line_numbers/_yaml_top_keys_for_lines, check the diff
string for removal lines (e.g. any line starting with '-' in the unified diff
hunks) and if found return [yaml_path]; keep the existing flow (diff ->
iter_diff_post_line_numbers -> changed_models -> model_map -> anchors) only when
no deletion lines are present so changed_models are provable from the post-image
alone.
- Around line 218-247: In apply(), markdown docs under test YAML trees are being
processed into actionable scopes; before calling
yaml_index.git_path_to_yaml_key/_compute_anchors/_combine_scopes, detect
markdown (e.g., git_path.endswith(".md") or basename == "README.md") and treat
them as noop by adding the path to handled and no_match (or out_of_namespace
depending on existing semantics) and continue, so markdown files do not produce
testdefonly results; update the branch that currently handles yaml_path != None
to early-skip .md files before calling _compute_anchors/_combine_scopes.

In `@jenkins/scripts/cbts/tools/dryrun.py`:
- Around line 314-323: The CLI currently allows invalid numeric values for
--window and --limit; update the ap.add_argument calls for "--window" and
"--limit" to validate at parse time by using custom argparse types (or a
validator) that raise argparse.ArgumentTypeError for invalid inputs: implement a
positive_int(value) used for --window that ensures int(value) > 0, and use a
positive_int_or_none(value) (or accept None via default and validate if
provided) for --limit to ensure int(value) > 0 when not None; replace the
current type=int usages in the ap.add_argument calls and/or add explicit
argparse validation so invalid or negative values are rejected during parsing.
- Around line 252-255: The loop over pr_dir.iterdir() uses old.unlink() which
raises on directories and causes crashes when --keep-stale leaves
subdirectories; update the cleanup in dryrun.py to detect whether old is a
file/symlink or a directory (using old.is_dir()/old.is_file()/old.is_symlink())
and call the appropriate remover (unlink for files/symlinks, shutil.rmtree for
directories) or otherwise skip non-removable entries so per-label cleanup in the
for loop does not raise; adjust imports if needed and keep the rest of the logic
(the pr_dir.iterdir() loop and subsequent shared_out assignment) unchanged.
- Around line 100-113: The snapshot helpers (_snapshot_test_db and
_snapshot_groovy) currently call _git with check=False and unconditionally write
.stdout, which can create empty/missing snapshot files; change them to fail fast
by checking the _git result (use check=True or validate result.returncode and
result.stdout) and raise/abort if the git show/ls-tree calls failed or returned
empty content for any expected file; specifically validate the outputs from
_git(repo, "ls-tree", ..., sha, "--", TEST_DB_REL) and each _git(repo, "show",
f"{sha}:{f}") in _snapshot_test_db, and validate the _git(repo, "show",
f"{sha}:{GROOVY_REL}") call in _snapshot_groovy, and only write files when the
calls succeeded and stdout is non-empty, otherwise raise an error so snapshot
creation fails fast.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 622d9a53-626c-4e14-892b-a9e0af1b6311

📥 Commits

Reviewing files that changed from the base of the PR and between 2e4b05c and 2df702c.

📒 Files selected for processing (10)

jenkins/scripts/cbts/README.md
jenkins/scripts/cbts/blocks.py
jenkins/scripts/cbts/main.py
jenkins/scripts/cbts/rules/README.md
jenkins/scripts/cbts/rules/_helpers.py
jenkins/scripts/cbts/rules/out_of_scope_rule.py
jenkins/scripts/cbts/rules/test_list_rule.py
jenkins/scripts/cbts/rules/tests_def_rule.py
jenkins/scripts/cbts/rules/waives_rule.py
jenkins/scripts/cbts/tools/dryrun.py

crazydemo · 2026-05-11T13:26:28Z

/bot run

tensorrt-cicd · 2026-05-11T13:32:12Z

PR_Github #47744 [ run ] triggered by Bot. Commit: b191409 Link to invocation

tensorrt-cicd · 2026-05-11T14:11:10Z

PR_Github #47744 [ run ] completed with state FAILURE. Commit: b191409
/LLM/main/L0_MergeRequest_PR pipeline #37639 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

crazydemo · 2026-05-12T02:21:37Z

/bot run

tensorrt-cicd · 2026-05-12T02:26:49Z

PR_Github #47830 [ run ] triggered by Bot. Commit: 585a6c2 Link to invocation

crazydemo · 2026-05-12T05:06:28Z

/bot kill

tensorrt-cicd · 2026-05-12T05:13:11Z

PR_Github #47882 [ kill ] triggered by Bot. Commit: 585a6c2 Link to invocation

tensorrt-cicd · 2026-05-12T05:13:55Z

PR_Github #47882 [ kill ] completed with state SUCCESS. Commit: 585a6c2
Successfully killed previous jobs for commit 585a6c2

Link to invocation

crazydemo · 2026-05-12T05:41:41Z

/bot run

tensorrt-cicd · 2026-05-12T05:47:18Z

PR_Github #47887 [ run ] triggered by Bot. Commit: b005bf7 Link to invocation

crazydemo · 2026-05-12T07:36:24Z

/bot kill

tensorrt-cicd · 2026-05-12T07:43:57Z

PR_Github #47925 [ kill ] triggered by Bot. Commit: 549afa8 Link to invocation

tensorrt-cicd · 2026-05-12T07:44:43Z

PR_Github #47925 [ kill ] completed with state SUCCESS. Commit: 549afa8
Successfully killed previous jobs for commit 549afa8

Link to invocation

crazydemo · 2026-05-12T07:51:16Z

/bot run

Top-level: replace stale "v0 scope" section with current 4-rule map, scope catalog (waiveonly / testdefonly / testlistonly / testsonly / noop / null), and trigger-mode filter section. Update INPUT_JSON and decision JSON examples to reflect post_merge field, sanity/ perfsanity flags, and noop scope. rules/README: full per-rule writeup (WaivesRule, TestsDefRule, TestListRule, OutOfScopeRule) covering each rule's diff handling, lookup, and outcome matrix. Add helpers table for _helpers.py. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

Markdown files under tests/ have no test-execution impact; claim them as scope=noop so PRs that only touch docs (or mix docs with a code change) don't fall back over the .md files. Adds OUT_OF_SCOPE_TESTS_SUFFIXES alongside the existing OUT_OF_SCOPE_PREFIXES; both feed _is_out_of_scope. Updates rule docs in the top-level and rules/ READMEs. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

Generalize TestsDefRule to handle any path under tests/ in YAML namespace — not just .py files. find_match_for_path walks up enclosing directories to the narrowest YAML-covered ancestor, so: - accuracy/references/*.yaml → anchor accuracy/ - disaggregated/test_configs/*.yaml → anchor disaggregated/ - unittest/api_stability/references/*.yaml → anchor unittest/api_stability/ - conftest.py / __init__.py / helpers (existing) For test_*.py files, file-level anchor is preserved (no walk-up) so new test files in already-covered dirs still get the noop semantic when no YAML entry covers them. Out-of-namespace paths (top-level integration conftest, dirs no YAML references like tests/integration/test_input_files/) still fall back — those could have wide implicit blast radius. Smoke-tested across accuracy refs, disagg configs, api_stability refs, input fixtures, and a mixed py+yaml PR. Re-ran 60-PR dry-run; same distribution (none of the historical PRs touched these data files in the sampled window). Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

…hors Each top-level YAML key in `accuracy/references/<dataset>.yaml` is a HF model name. AST-scan `accuracy/test_*.py` for class-level `MODEL_NAME = "<hf>"` literals to build {model_name → [Class anchor]} (cached). A diff line under `meta-llama/Llama-3.1-8B-Instruct:` narrows to `accuracy/test_*.py::TestLlama3_1_8BInstruct` only. Smoke: single-model diff on mmlu.yaml → 21 blocks/45 stages instead of the dir walk-up's 48 blocks/90 stages (>50% reduction). Models in the YAML with no matching test class (aliases, new entries without test code yet) fall back to the dir-level anchor — same as before. No diff / unreadable file → same fallback. Map covers 110 distinct MODEL_NAMEs across 9 accuracy/test_*.py files. For mmlu.yaml's 65 model keys, 59 (90%) match a test class. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

…ts / microbenchmarks Add three subtrees no L0 pipeline (pre-merge or post-merge) consumes: - tests/integration/test_lists/dev/ — dev artifacts, no L0 consumer - tests/integration/defs/.test_durations — pytest-split timing cache - tests/microbenchmarks/ — benchmarking scripts only Verified consumption: dev/ has no Jenkins or test-db reference; .test_durations is a pytest cache file; microbenchmarks/ has zero references in jenkins/ and only a single internal cross-link from allreduce_perf/ — no L0 stage runs it. 500-commit dry-run effective narrow rate: 97.6% (161/165), up from 95.8% (158/165). Three previously-fallback PRs (12978, 12887, 12886) now correctly resolve to noop. tests/scripts/perf-sanity/aggregated/*.yaml is intentionally left as fallback: those configs are loaded by perf/test_perf_sanity.py at runtime in both pre-merge and post-merge PerfSanity stages, so noop would be wrong. Refining them to per-config narrow is a future-work item. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

A standalone `test_*.py` whose path is out-of-namespace (`git_path_to_yaml_key` returns None) is one no L0 YAML references — i.e. no L0 stage runs it. Pytest doesn't auto-import test files into other tests, so editing such a file can't change what L0 selects. Claim it as noop instead of falling back. The existing fallback path is preserved for non-test out-of-namespace files (conftest, __init__, helpers, data fixtures) — those can still impact selection via implicit pytest discovery / sys.path imports. 500-commit dry-run effective narrow rate: 98.2% (162/165), up from 97.6%. PR-12625 (which deleted top-level tests/integration/defs/ test_mlpf_results.py and test_sanity.py — neither in any YAML) now narrows correctly via the 20 in-namespace test files it also touched. Reword the partial-narrow miss-note to "N path(s) not in any L0 YAML" since `no_match` now covers both in-namespace files with no covering entry and standalone out-of-namespace test files. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

…ical commits Promotes the one-shot /tmp/cbts_dryrun.py into a permanent debug CLI under jenkins/scripts/cbts/tools/. Snapshots test-db YAMLs and L0_Test.groovy at each commit's own SHA so results reflect the CBTS decision at the time the PR landed. Supports tests-only / all filters, --limit, --post-merge, and explicit --sha replay; writes per-PR summary.txt + narrowed cbts_test_db YAMLs + a top-level INDEX.md. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

…ates OutOfScopeRule claims tests/**/*.md (and dev/qa lists, .test_durations, microbenchmarks) as noop, but TestsDefRule was also processing those paths. For files that mapped into a YAML namespace, TestsDefRule emitted a testdefonly narrow that _combine_scopes preferred over noop, silently defeating the documented out-of-scope contract. Make is_out_of_scope public and filter it out of TestsDefRule's candidates so the noop claim is no longer overridden by a same-file narrow contribution. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

The prior entry parser matched any indented `- ...` line, so YAML lists under `wildcards:`, `terms:`, or `condition:` (e.g. `- "*newgpu*"` under `wildcards.gpu:`) were classified as added test ids. A purely structural stage-selection change could thus produce scope="noop" instead of falling back. Replace the single regex with a diff walker that maintains a per-hunk key stack and only treats a list item as a tests-list entry when its enclosing key is `tests:`. Items under any other key — and items whose enclosing key is not visible in the hunk — are reported as structural, forcing fallback. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

… dryrun `--window 0` made `git log -0` silently return no commits, and negative `--limit` values sliced the SHA list from the tail (`shas[:-N]`) instead of capping it, producing surprising selections rather than an error. Replace `type=int` with a `_positive_int` argparse type so any value < 1 (or non-integer) is rejected with a clear ArgumentTypeError before the run starts. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

The per-label cleanup loop called `old.unlink()` on every entry except summary.txt, which raises `IsADirectoryError` if a subdirectory exists in pr_dir. The default path wipes pr_dir upstream via `shutil.rmtree`, so this only surfaced under `--keep-stale` when a stale subdirectory remained from a prior layout — replay would then crash for that label. Dispatch on `is_dir()` so directories go through `shutil.rmtree` and plain files still use `unlink`. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

The snapshot helpers ran git with check=False and unconditionally wrote stdout to disk, so a missing path / bogus SHA produced empty test-db or L0_Test.groovy files. CBTS would then replay against those empty inputs and emit a misleading scope (typically scope="noop" because no entry resolves) instead of surfacing the replay as a failure. Make `git ls-tree` and `git show` use check=True so subprocess errors surface as CalledProcessError, and additionally raise RuntimeError when ls-tree returns no .yml files or the groovy snapshot is empty. The main per-SHA exception handler catches both, logs the SHA, and continues. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

iter_diff_post_line_numbers anchors `-` lines to the next surviving post-image line, so deleting or moving a whole top-level model section in accuracy/references/*.yaml attributes those lines to the *following* model. The accuracy-references refinement would then narrow to the wrong test class. Refinement is only sound when every changed line has a post-image position whose top-level YAML key can be read directly. Short-circuit to `[yaml_path]` (dir walk-up to `accuracy/`) when the diff contains any deletion; pure-addition diffs still refine to per-class anchors as before. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

…ssets Path.read_text(encoding="utf-8") raises UnicodeDecodeError on binary fixtures or non-UTF8 files under tests/. UnicodeDecodeError subclasses ValueError, not OSError, so the existing `except OSError` did not catch it and a fixture-only PR could crash CBTS selection. Extend the except tuples in _compute_anchors, _compute_accuracy_reference_anchors, and _accuracy_model_to_classes to also catch UnicodeDecodeError, taking the documented file-level fallback (or skipping the source file in the cached scanner) instead of propagating. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

The previous TestListRule walked a per-hunk key_stack to decide whether a `+/-` list item was under `tests:`. When the hunk did not include the `tests:` parent in its 3-line context (common for large test-db files where additions live deep under condition: blocks), the rule fell back, dropping many real tests-entry edits to None and forcing the whole PR to fallback. Drive classification from the post-PR YAML structure instead. For each touched test-db file, walk up from every changed line in the post-PR file content; if any ancestor key (or the line itself) is in {condition, wildcards, terms, ranges}, the edit changed stage- selection semantics and the rule returns scope=None. Otherwise every `+/-` list item is by construction inside a `tests:` block and is treated as an add/remove. A cheap pre-check on the raw diff catches whole-block deletions whose anchors fall past EOF in the post-PR file. Pass `repo_root` through to TestListRule so it can read post-PR files, mirroring TestsDefRule's existing setup. Verified pr-12554 recovers from None to testsonly (7 tests: entries previously misclassified as structural), while wildcards/terms edits still correctly fall back. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

`_snapshot_test_db` + `_snapshot_groovy` only carried `test-db/*.yml` and `L0_Test.groovy` to the replay; everything else CBTS reads — post- PR test source files (TestsDefRule's AST scope mapping, accuracy refs walk-up), accuracy/test_*.py for MODEL_NAME → class mapping, and now the post-PR test-db YAML in TestListRule's ancestor-key walk — was served from the live `--repo-root`. As upstream/main advanced after each replayed SHA landed, line numbers in `iter_diff_post_line_numbers` were resolved against drifted file content, producing both spurious narrowing (anchors finding the wrong YAML coverage) and spurious fallbacks (e.g. a deleted tests-list line landing on a stage-select key in the current file). Replace the file-by-file snapshot with a one-shot `git worktree add --detach <tmp>/wt <SHA>` per replay and route `--repo-root` to the worktree. The whole post-PR tree is then consistent and matches what real CI sees when the PR is at HEAD. Copy the filtered test-db YAMLs out of `<wt>/cbts_test_db/` before the `finally` removes the worktree. Add `git worktree prune` at startup to clean any metadata stranded by interrupted prior runs. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

…hboring blocks `iter_diff_post_line_numbers` anchors every `-` line at the next surviving post-image line. When the last entry of a `tests:` list is deleted, that anchor lands on the start of the next `- condition:` block — ancestor walk-up then sees a stage-select key as the line's neighborhood and forces fallback even though the actual edit was just a tests-entry removal under tests:. Split the `-` line handling out of the post-PR ancestor walk: - `_added_line_numbers(diff)` collects post-image positions for `+` lines only (and consumes context lines to keep the cursor correct). Only those positions are walked against the post-PR file. - `_diff_has_suspicious_minus(diff)` body-shape-checks every `-` line. Comment / blank lines and clean tests-entry list items pass; key edits, sub-key removals (`stage: pre_merge`), or non-tests-id list values (`- "*newgpu*"`) force fallback. `_diff_edits_stage_select_key` still catches direct STAGE_SELECT_KEY edits as before. Verified on pr-12789 (deletes 3 tests-entry lines right before a `- condition:` boundary): old behavior fell back to `None`; new behavior correctly emits `testsonly` and narrows to the affected stages. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

`tests/integration/defs/agg_unit_mem_df.csv` is a runtime tuning table consumed by `test_unittests.py:80-117`: it maps (gpu, unittest_case) to the pytest-xdist `parallel_factor` (clamped to <=8 workers) and nothing else. It does not select which tests run, change test bodies, or affect test results — only worker concurrency. Wrong values surface as visible runtime failures (OOM if too high, slower CI if too low). This is the same category as `.test_durations` (pytest-split timing cache), which is already in OUT_OF_SCOPE_PREFIXES. Add the CSV path so edits to it are claimed as noop instead of leaving the file unhandled and forcing a baseline fallback for the whole PR. Verified on pr-13756 (split _torch/speculative into hw-agnostic subdir + updated the CSV): replay flips from `None` to `testsonly`, with the narrow driven by the actual test/test-db edits. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

…t-db entries Bundles the rule-side change with concrete new accuracy and perf- sanity test methods + their test-db registrations so a single PR exercises every CBTS rule (incl. perfsanity_required signal) against a real CI run. Revert all five files together after `/bot run` confirms the narrowing pipeline. - jenkins/scripts/cbts/rules/out_of_scope_rule.py: claim `jenkins/scripts/cbts/` itself as out-of-scope. CBTS runs at PR- decision time; edits to its own source can't affect any test stage in the same PR. - tests/integration/defs/accuracy/test_llm_api_pytorch.py: new method `TestLlama3_1_8B.test_cbts_validation` — lightweight assertion on `self.MODEL_NAME`, no LLM load. - tests/integration/test_lists/test-db/l0_b200.yml: register the new accuracy test under the existing B200 pre_merge pytorch block. - tests/integration/defs/perf/test_perf_sanity.py: new top-level function `test_cbts_validation()` — `assert True`, no fixtures, no real perf measurement. - tests/integration/test_lists/test-db/l0_dgx_h200_perf_sanity.yml: register the new perf test under the existing DGX-H200 post_merge pytorch perf-sanity block. Expected CBTS decision for this PR: - jenkins/scripts/cbts/** → outofscope, noop - test_llm_api_pytorch.py edit → testdef narrow - l0_b200.yml edit → testlist narrow - test_perf_sanity.py edit → testdef narrow - l0_dgx_h200_perf_sanity.yml edit → testlist narrow, sets perfsanity_required=True - combined scope → testsonly, narrow to DGX_B200-PyTorch-* and DGX_H200-8_GPUs-PyTorch- PerfSanity-Post-Merge-* stages Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

Blank or comment-only `+/-` lines carry no test-selection signal — they can't introduce imports, decorators, test methods, or YAML entries. But CBTS's existing diff walkers treat every `+/-` line the same, so a trailing `+` blank line in an otherwise method-local diff drops `_map_lines_to_pytest_scopes` to module-level (AST has no scope for it), forces a file-level anchor fallback, and pulls in every sibling class/test in the file — turning a 3-stage narrow into a 30-stage one. Add `strip_noop_diff_lines(diff)` to `rules/_helpers.py` and apply it in `main.py::_load_pr_inputs` so every rule transparently sees a diff where: - `+` blank / comment-only lines are downgraded to context. The post-image cursor in `iter_diff_post_line_numbers` still advances, so all subsequent line numbers remain correct. - `-` blank / comment-only lines are dropped. `-` lines never advance the post-image cursor, so removal is safe. Hunk header counts (`-A,B +C,D`) are intentionally left stale — CBTS walkers don't read B or D, only the `+C` start line. Verified via dryrun on a real PR (b005bf7): testdef narrow goes from 56 blocks / 65 stages → 2 blocks / 4 stages; `affected_stages` collapses from 30 to 3 (DGX_B200-PyTorch-{1,2,3}), matching the actual intent of the diff. testlist / waives are unaffected since they already filtered comment bodies internally. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

After /bot run validated the end-to-end CBTS narrowing pipeline (incl. `perfsanity_required` signal triggering perf-sanity stages and the new `strip_noop_diff_lines` helper tightening narrow precision), drop the four synthetic test entries that were only added to drive the validation: - tests/integration/defs/accuracy/test_llm_api_pytorch.py: remove `TestLlama3_1_8B.test_cbts_validation` - tests/integration/test_lists/test-db/l0_b200.yml: unregister the entry - tests/integration/defs/perf/test_perf_sanity.py: remove top-level `test_cbts_validation` - tests/integration/test_lists/test-db/l0_dgx_h200_perf_sanity.yml: unregister the entry The OOS rule extension that added `jenkins/scripts/cbts/` to `OUT_OF_SCOPE_PREFIXES` is kept as a real long-term improvement — CBTS's own source can't affect any test stage in the same PR, so treating it as out-of-scope is correct and matches the existing `tests/integration/test_lists/dev/` precedent. Partial revert of b005bf7. Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>

crazydemo · 2026-05-12T13:25:50Z

/bot run

tensorrt-cicd · 2026-05-12T13:31:23Z

PR_Github #47970 [ run ] triggered by Bot. Commit: c5c348c Link to invocation

tensorrt-cicd · 2026-05-12T14:09:40Z

PR_Github #47970 [ run ] completed with state SUCCESS. Commit: c5c348c
/LLM/main/L0_MergeRequest_PR pipeline #37810 completed with status: 'SUCCESS'

CI Report

Link to invocation

QiJune

LGTM

crazydemo requested review from a team as code owners May 8, 2026 09:45

crazydemo requested review from dpitman-nvda and niukuo May 8, 2026 09:45

github-actions Bot assigned crazydemo May 8, 2026

crazydemo requested a review from QiJune May 8, 2026 09:45

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

QiJune reviewed May 11, 2026

View reviewed changes

Comment thread jenkins/scripts/cbts/rules/tests_def_rule.py

QiJune reviewed May 11, 2026

View reviewed changes

Comment thread jenkins/scripts/cbts/rules/out_of_scope_rule.py

QiJune reviewed May 11, 2026

View reviewed changes

Comment thread jenkins/scripts/cbts/rules/README.md

crazydemo requested a review from a team as a code owner May 11, 2026 13:22

crazydemo force-pushed the cbts-v1 branch from b191409 to 585a6c2 Compare May 12, 2026 02:19

crazydemo force-pushed the cbts-v1 branch 2 times, most recently from 36f8b86 to b005bf7 Compare May 12, 2026 05:38

crazydemo requested a review from a team as a code owner May 12, 2026 05:38

crazydemo added 21 commits May 12, 2026 21:24

crazydemo force-pushed the cbts-v1 branch from 46ae3c4 to c5c348c Compare May 12, 2026 13:25

QiJune approved these changes May 13, 2026

View reviewed changes

crazydemo enabled auto-merge (squash) May 13, 2026 01:47

EmmaQiaoCh approved these changes May 15, 2026

View reviewed changes

crazydemo merged commit d989ff5 into NVIDIA:main May 15, 2026
7 checks passed

coderabbitai Bot mentioned this pull request May 18, 2026

[TRTLLM-12719][cbts] Add core code related rule #14266

Merged

1 task

Conversation

crazydemo commented May 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai Bot commented May 8, 2026

Walkthrough

Changes

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

crazydemo commented May 11, 2026

Uh oh!

tensorrt-cicd commented May 11, 2026

Uh oh!

tensorrt-cicd commented May 11, 2026

Uh oh!

crazydemo commented May 12, 2026

Uh oh!

tensorrt-cicd commented May 12, 2026

Uh oh!

crazydemo commented May 12, 2026

Uh oh!

tensorrt-cicd commented May 12, 2026

Uh oh!

tensorrt-cicd commented May 12, 2026

Uh oh!

crazydemo commented May 12, 2026

Uh oh!

tensorrt-cicd commented May 12, 2026

Uh oh!

crazydemo commented May 12, 2026

Uh oh!

tensorrt-cicd commented May 12, 2026

Uh oh!

tensorrt-cicd commented May 12, 2026

Uh oh!

crazydemo commented May 12, 2026

Uh oh!

crazydemo commented May 12, 2026

Uh oh!

tensorrt-cicd commented May 12, 2026

Uh oh!

tensorrt-cicd commented May 12, 2026

Uh oh!

QiJune left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

crazydemo commented May 8, 2026 •

edited by coderabbitai Bot

Loading