Skip to content

[TRTLLM-12152][infra] change based testing rules on tests#13899

Merged
crazydemo merged 28 commits into
NVIDIA:mainfrom
crazydemo:cbts-v1
May 15, 2026
Merged

[TRTLLM-12152][infra] change based testing rules on tests#13899
crazydemo merged 28 commits into
NVIDIA:mainfrom
crazydemo:cbts-v1

Conversation

@crazydemo
Copy link
Copy Markdown
Collaborator

@crazydemo crazydemo commented May 8, 2026

Summary by CodeRabbit

  • New Features

    • Added three new change-based test selection rules: test definition matching, test list validation, and out-of-scope detection.
    • Introduced dry-run tool to test CBTS behavior across historical commits.
    • Enhanced test matching to support both test IDs and file path lookups.
  • Documentation

    • Expanded rule system documentation with detailed scope definitions and fallback behaviors.
    • Added comprehensive rule architecture guide with trigger patterns and stage resolution logic.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@crazydemo crazydemo requested review from a team as code owners May 8, 2026 09:45
@crazydemo crazydemo requested review from dpitman-nvda and niukuo May 8, 2026 09:45
@crazydemo crazydemo requested a review from QiJune May 8, 2026 09:45
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 8, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR expands CBTS from a single-rule waive-matching system to a multi-rule architecture. It adds path-based YAML matching in YAMLIndex, introduces three new rules (TestsDefRule for test file changes, TestListRule for test-db YAML additions, OutOfScopeRule for exclusions), refactors WaivesRule to use shared helpers, updates scope-combination logic to merge testsonly-family scopes, and includes a dry-run tool for testing against historical commits.

Changes

CBTS Multi-Rule System

Layer / File(s) Summary
Architecture & Documentation
jenkins/scripts/cbts/README.md, jenkins/scripts/cbts/rules/README.md
READMEs expanded to document four-rule system, scope combinations, YAML lookup algorithms including path-based matching, trigger-mode filtering, and guidelines for adding rules.
YAMLIndex Path Matching
jenkins/scripts/cbts/blocks.py
Added git-path-to-YAML-namespace translation (git_path_to_yaml_key), path-based block matching (find_match_for_path), and support data (_yaml_first_components, _path_lookup_anchor).
Shared Helper Utilities
jenkins/scripts/cbts/rules/_helpers.py
New module providing diff parsing (iter_diff_changes, iter_diff_post_line_numbers), stage grouping (stages_by_yaml_stem), and block-filter resolution (lookup_ids_into_block_filters, lookup_paths_into_block_filters, resolve_affected_stages).
Rule Implementations
jenkins/scripts/cbts/rules/out_of_scope_rule.py, jenkins/scripts/cbts/rules/test_list_rule.py, jenkins/scripts/cbts/rules/tests_def_rule.py, jenkins/scripts/cbts/rules/waives_rule.py
OutOfScopeRule identifies out-of-scope paths; TestListRule narrows stages from test-db YAML additions; TestsDefRule maps test file changes to affected blocks via AST and path matching; WaivesRule refactored to use shared helpers.
Rules Orchestration
jenkins/scripts/cbts/main.py
RULE_CLASSES extended to register all four rules; build_rules() now accepts repo_root for TestsDefRule; _combine_scopes() merges waiveonly/testdefonly/testlistonly into testsonly; selector respects noop scope.
Dry-Run Testing Tool
jenkins/scripts/cbts/tools/dryrun.py
New executable script replaying CBTS over historical git commits, generating per-commit output directories and an INDEX.md summary with scope distribution.

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: introducing new change-based testing rules focused on test file modifications to improve CI test selection.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

Tools execution failed with the following error:

Failed to run tools: 14 UNAVAILABLE: read ECONNRESET

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@jenkins/scripts/cbts/rules/test_list_rule.py`:
- Around line 40-45: The _ENTRY_RE currently treats any indented list item as a
test entry; restrict matching so only list items that are children of a tests:
field are captured (or alternatively parse indentation context) by checking for
a preceding "tests:" key and matching indentation level before accepting "-
<pytest_id>" — update _ENTRY_RE and the related logic (also applied in the code
around lines 72-84) to require the same indentation as the tests: block or to
detect the nearest parent key == "tests" before classifying entries, so
wildcard/non-tests lists (e.g. "- \"*h100*\"") are not misclassified as test
IDs.

In `@jenkins/scripts/cbts/rules/tests_def_rule.py`:
- Around line 143-146: The try/except that reads the changed test asset with
(self._repo_root / git_path).read_text(encoding="utf-8") only catches OSError
but not decode failures; update the handler to also catch UnicodeDecodeError (or
UnicodeError) so binary/non-UTF8 fixtures fall back to returning [yaml_path]
instead of crashing, and make the same change for the second occurrence that
mirrors this logic later in the file (the other read_text block around the
second occurrence).
- Around line 167-181: The current per-model refinement is unsafe because
iter_diff_post_line_numbers can misattribute deleted top-level model sections;
fix by detecting deletions in the raw diff and falling back to the full file
when any deletion could affect top-level keys: in the block using diff, before
calling iter_diff_post_line_numbers/_yaml_top_keys_for_lines, check the diff
string for removal lines (e.g. any line starting with '-' in the unified diff
hunks) and if found return [yaml_path]; keep the existing flow (diff ->
iter_diff_post_line_numbers -> changed_models -> model_map -> anchors) only when
no deletion lines are present so changed_models are provable from the post-image
alone.
- Around line 218-247: In apply(), markdown docs under test YAML trees are being
processed into actionable scopes; before calling
yaml_index.git_path_to_yaml_key/_compute_anchors/_combine_scopes, detect
markdown (e.g., git_path.endswith(".md") or basename == "README.md") and treat
them as noop by adding the path to handled and no_match (or out_of_namespace
depending on existing semantics) and continue, so markdown files do not produce
testdefonly results; update the branch that currently handles yaml_path != None
to early-skip .md files before calling _compute_anchors/_combine_scopes.

In `@jenkins/scripts/cbts/tools/dryrun.py`:
- Around line 314-323: The CLI currently allows invalid numeric values for
--window and --limit; update the ap.add_argument calls for "--window" and
"--limit" to validate at parse time by using custom argparse types (or a
validator) that raise argparse.ArgumentTypeError for invalid inputs: implement a
positive_int(value) used for --window that ensures int(value) > 0, and use a
positive_int_or_none(value) (or accept None via default and validate if
provided) for --limit to ensure int(value) > 0 when not None; replace the
current type=int usages in the ap.add_argument calls and/or add explicit
argparse validation so invalid or negative values are rejected during parsing.
- Around line 252-255: The loop over pr_dir.iterdir() uses old.unlink() which
raises on directories and causes crashes when --keep-stale leaves
subdirectories; update the cleanup in dryrun.py to detect whether old is a
file/symlink or a directory (using old.is_dir()/old.is_file()/old.is_symlink())
and call the appropriate remover (unlink for files/symlinks, shutil.rmtree for
directories) or otherwise skip non-removable entries so per-label cleanup in the
for loop does not raise; adjust imports if needed and keep the rest of the logic
(the pr_dir.iterdir() loop and subsequent shared_out assignment) unchanged.
- Around line 100-113: The snapshot helpers (_snapshot_test_db and
_snapshot_groovy) currently call _git with check=False and unconditionally write
.stdout, which can create empty/missing snapshot files; change them to fail fast
by checking the _git result (use check=True or validate result.returncode and
result.stdout) and raise/abort if the git show/ls-tree calls failed or returned
empty content for any expected file; specifically validate the outputs from
_git(repo, "ls-tree", ..., sha, "--", TEST_DB_REL) and each _git(repo, "show",
f"{sha}:{f}") in _snapshot_test_db, and validate the _git(repo, "show",
f"{sha}:{GROOVY_REL}") call in _snapshot_groovy, and only write files when the
calls succeeded and stdout is non-empty, otherwise raise an error so snapshot
creation fails fast.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 622d9a53-626c-4e14-892b-a9e0af1b6311

📥 Commits

Reviewing files that changed from the base of the PR and between 2e4b05c and 2df702c.

📒 Files selected for processing (10)
  • jenkins/scripts/cbts/README.md
  • jenkins/scripts/cbts/blocks.py
  • jenkins/scripts/cbts/main.py
  • jenkins/scripts/cbts/rules/README.md
  • jenkins/scripts/cbts/rules/_helpers.py
  • jenkins/scripts/cbts/rules/out_of_scope_rule.py
  • jenkins/scripts/cbts/rules/test_list_rule.py
  • jenkins/scripts/cbts/rules/tests_def_rule.py
  • jenkins/scripts/cbts/rules/waives_rule.py
  • jenkins/scripts/cbts/tools/dryrun.py

Comment thread jenkins/scripts/cbts/rules/test_list_rule.py Outdated
Comment thread jenkins/scripts/cbts/rules/tests_def_rule.py
Comment thread jenkins/scripts/cbts/rules/tests_def_rule.py
Comment thread jenkins/scripts/cbts/rules/tests_def_rule.py
Comment thread jenkins/scripts/cbts/tools/dryrun.py Outdated
Comment thread jenkins/scripts/cbts/tools/dryrun.py Outdated
Comment thread jenkins/scripts/cbts/tools/dryrun.py Outdated
Comment thread jenkins/scripts/cbts/rules/tests_def_rule.py
Comment thread jenkins/scripts/cbts/rules/out_of_scope_rule.py
Comment thread jenkins/scripts/cbts/rules/README.md
@crazydemo crazydemo requested a review from a team as a code owner May 11, 2026 13:22
@crazydemo
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47744 [ run ] triggered by Bot. Commit: b191409 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47744 [ run ] completed with state FAILURE. Commit: b191409
/LLM/main/L0_MergeRequest_PR pipeline #37639 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

@crazydemo
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47830 [ run ] triggered by Bot. Commit: 585a6c2 Link to invocation

@crazydemo
Copy link
Copy Markdown
Collaborator Author

/bot kill

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47882 [ kill ] triggered by Bot. Commit: 585a6c2 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47882 [ kill ] completed with state SUCCESS. Commit: 585a6c2
Successfully killed previous jobs for commit 585a6c2

Link to invocation

@crazydemo crazydemo force-pushed the cbts-v1 branch 2 times, most recently from 36f8b86 to b005bf7 Compare May 12, 2026 05:38
@crazydemo crazydemo requested a review from a team as a code owner May 12, 2026 05:38
@crazydemo
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47887 [ run ] triggered by Bot. Commit: b005bf7 Link to invocation

@crazydemo
Copy link
Copy Markdown
Collaborator Author

/bot kill

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47925 [ kill ] triggered by Bot. Commit: 549afa8 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47925 [ kill ] completed with state SUCCESS. Commit: 549afa8
Successfully killed previous jobs for commit 549afa8

Link to invocation

@crazydemo
Copy link
Copy Markdown
Collaborator Author

/bot run

crazydemo added 21 commits May 12, 2026 21:24
Top-level: replace stale "v0 scope" section with current 4-rule map,
scope catalog (waiveonly / testdefonly / testlistonly / testsonly /
noop / null), and trigger-mode filter section. Update INPUT_JSON
and decision JSON examples to reflect post_merge field, sanity/
perfsanity flags, and noop scope.

rules/README: full per-rule writeup (WaivesRule, TestsDefRule,
TestListRule, OutOfScopeRule) covering each rule's diff handling,
lookup, and outcome matrix. Add helpers table for _helpers.py.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Markdown files under tests/ have no test-execution impact; claim them as
scope=noop so PRs that only touch docs (or mix docs with a code change)
don't fall back over the .md files.

Adds OUT_OF_SCOPE_TESTS_SUFFIXES alongside the existing
OUT_OF_SCOPE_PREFIXES; both feed _is_out_of_scope. Updates rule docs in
the top-level and rules/ READMEs.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Generalize TestsDefRule to handle any path under tests/ in YAML
namespace — not just .py files. find_match_for_path walks up enclosing
directories to the narrowest YAML-covered ancestor, so:

- accuracy/references/*.yaml          → anchor accuracy/
- disaggregated/test_configs/*.yaml   → anchor disaggregated/
- unittest/api_stability/references/*.yaml → anchor unittest/api_stability/
- conftest.py / __init__.py / helpers (existing)

For test_*.py files, file-level anchor is preserved (no walk-up) so
new test files in already-covered dirs still get the noop semantic
when no YAML entry covers them.

Out-of-namespace paths (top-level integration conftest, dirs no YAML
references like tests/integration/test_input_files/) still fall back —
those could have wide implicit blast radius.

Smoke-tested across accuracy refs, disagg configs, api_stability refs,
input fixtures, and a mixed py+yaml PR. Re-ran 60-PR dry-run; same
distribution (none of the historical PRs touched these data files in
the sampled window).

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
…hors

Each top-level YAML key in `accuracy/references/<dataset>.yaml` is a
HF model name. AST-scan `accuracy/test_*.py` for class-level
`MODEL_NAME = "<hf>"` literals to build {model_name → [Class anchor]}
(cached). A diff line under `meta-llama/Llama-3.1-8B-Instruct:` narrows
to `accuracy/test_*.py::TestLlama3_1_8BInstruct` only.

Smoke: single-model diff on mmlu.yaml → 21 blocks/45 stages instead of
the dir walk-up's 48 blocks/90 stages (>50% reduction).

Models in the YAML with no matching test class (aliases, new entries
without test code yet) fall back to the dir-level anchor — same as
before. No diff / unreadable file → same fallback.

Map covers 110 distinct MODEL_NAMEs across 9 accuracy/test_*.py files.
For mmlu.yaml's 65 model keys, 59 (90%) match a test class.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
…ts / microbenchmarks

Add three subtrees no L0 pipeline (pre-merge or post-merge) consumes:

- tests/integration/test_lists/dev/      — dev artifacts, no L0 consumer
- tests/integration/defs/.test_durations — pytest-split timing cache
- tests/microbenchmarks/                  — benchmarking scripts only

Verified consumption: dev/ has no Jenkins or test-db reference;
.test_durations is a pytest cache file; microbenchmarks/ has zero
references in jenkins/ and only a single internal cross-link from
allreduce_perf/ — no L0 stage runs it.

500-commit dry-run effective narrow rate: 97.6% (161/165), up from 95.8%
(158/165). Three previously-fallback PRs (12978, 12887, 12886) now
correctly resolve to noop.

tests/scripts/perf-sanity/aggregated/*.yaml is intentionally left as
fallback: those configs are loaded by perf/test_perf_sanity.py at
runtime in both pre-merge and post-merge PerfSanity stages, so noop
would be wrong. Refining them to per-config narrow is a future-work
item.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
A standalone `test_*.py` whose path is out-of-namespace
(`git_path_to_yaml_key` returns None) is one no L0 YAML references —
i.e. no L0 stage runs it. Pytest doesn't auto-import test files into
other tests, so editing such a file can't change what L0 selects.
Claim it as noop instead of falling back.

The existing fallback path is preserved for non-test out-of-namespace
files (conftest, __init__, helpers, data fixtures) — those can still
impact selection via implicit pytest discovery / sys.path imports.

500-commit dry-run effective narrow rate: 98.2% (162/165), up from
97.6%. PR-12625 (which deleted top-level tests/integration/defs/
test_mlpf_results.py and test_sanity.py — neither in any YAML) now
narrows correctly via the 20 in-namespace test files it also touched.

Reword the partial-narrow miss-note to "N path(s) not in any L0 YAML"
since `no_match` now covers both in-namespace files with no covering
entry and standalone out-of-namespace test files.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
…ical commits

Promotes the one-shot /tmp/cbts_dryrun.py into a permanent debug CLI
under jenkins/scripts/cbts/tools/. Snapshots test-db YAMLs and
L0_Test.groovy at each commit's own SHA so results reflect the CBTS
decision at the time the PR landed. Supports tests-only / all
filters, --limit, --post-merge, and explicit --sha replay; writes
per-PR summary.txt + narrowed cbts_test_db YAMLs + a top-level
INDEX.md.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
…ates

OutOfScopeRule claims tests/**/*.md (and dev/qa lists, .test_durations,
microbenchmarks) as noop, but TestsDefRule was also processing those
paths. For files that mapped into a YAML namespace, TestsDefRule emitted
a testdefonly narrow that _combine_scopes preferred over noop, silently
defeating the documented out-of-scope contract.

Make is_out_of_scope public and filter it out of TestsDefRule's
candidates so the noop claim is no longer overridden by a same-file
narrow contribution.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
The prior entry parser matched any indented `- ...` line, so YAML lists
under `wildcards:`, `terms:`, or `condition:` (e.g. `- "*newgpu*"`
under `wildcards.gpu:`) were classified as added test ids. A purely
structural stage-selection change could thus produce scope="noop"
instead of falling back.

Replace the single regex with a diff walker that maintains a per-hunk
key stack and only treats a list item as a tests-list entry when its
enclosing key is `tests:`. Items under any other key — and items whose
enclosing key is not visible in the hunk — are reported as structural,
forcing fallback.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
… dryrun

`--window 0` made `git log -0` silently return no commits, and negative
`--limit` values sliced the SHA list from the tail (`shas[:-N]`) instead
of capping it, producing surprising selections rather than an error.

Replace `type=int` with a `_positive_int` argparse type so any value
< 1 (or non-integer) is rejected with a clear ArgumentTypeError before
the run starts.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
The per-label cleanup loop called `old.unlink()` on every entry except
summary.txt, which raises `IsADirectoryError` if a subdirectory exists
in pr_dir. The default path wipes pr_dir upstream via `shutil.rmtree`,
so this only surfaced under `--keep-stale` when a stale subdirectory
remained from a prior layout — replay would then crash for that label.

Dispatch on `is_dir()` so directories go through `shutil.rmtree` and
plain files still use `unlink`.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
The snapshot helpers ran git with check=False and unconditionally wrote
stdout to disk, so a missing path / bogus SHA produced empty test-db
or L0_Test.groovy files. CBTS would then replay against those empty
inputs and emit a misleading scope (typically scope="noop" because no
entry resolves) instead of surfacing the replay as a failure.

Make `git ls-tree` and `git show` use check=True so subprocess errors
surface as CalledProcessError, and additionally raise RuntimeError when
ls-tree returns no .yml files or the groovy snapshot is empty. The main
per-SHA exception handler catches both, logs the SHA, and continues.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
iter_diff_post_line_numbers anchors `-` lines to the next surviving
post-image line, so deleting or moving a whole top-level model section
in accuracy/references/*.yaml attributes those lines to the *following*
model. The accuracy-references refinement would then narrow to the
wrong test class.

Refinement is only sound when every changed line has a post-image
position whose top-level YAML key can be read directly. Short-circuit
to `[yaml_path]` (dir walk-up to `accuracy/`) when the diff contains
any deletion; pure-addition diffs still refine to per-class anchors as
before.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
…ssets

Path.read_text(encoding="utf-8") raises UnicodeDecodeError on binary
fixtures or non-UTF8 files under tests/. UnicodeDecodeError subclasses
ValueError, not OSError, so the existing `except OSError` did not catch
it and a fixture-only PR could crash CBTS selection.

Extend the except tuples in _compute_anchors,
_compute_accuracy_reference_anchors, and _accuracy_model_to_classes to
also catch UnicodeDecodeError, taking the documented file-level
fallback (or skipping the source file in the cached scanner) instead
of propagating.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
The previous TestListRule walked a per-hunk key_stack to decide whether
a `+/-` list item was under `tests:`. When the hunk did not include the
`tests:` parent in its 3-line context (common for large test-db files
where additions live deep under condition: blocks), the rule fell
back, dropping many real tests-entry edits to None and forcing the
whole PR to fallback.

Drive classification from the post-PR YAML structure instead. For each
touched test-db file, walk up from every changed line in the post-PR
file content; if any ancestor key (or the line itself) is in
{condition, wildcards, terms, ranges}, the edit changed stage-
selection semantics and the rule returns scope=None. Otherwise every
`+/-` list item is by construction inside a `tests:` block and is
treated as an add/remove. A cheap pre-check on the raw diff catches
whole-block deletions whose anchors fall past EOF in the post-PR file.

Pass `repo_root` through to TestListRule so it can read post-PR files,
mirroring TestsDefRule's existing setup. Verified pr-12554 recovers
from None to testsonly (7 tests: entries previously misclassified as
structural), while wildcards/terms edits still correctly fall back.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
`_snapshot_test_db` + `_snapshot_groovy` only carried `test-db/*.yml`
and `L0_Test.groovy` to the replay; everything else CBTS reads — post-
PR test source files (TestsDefRule's AST scope mapping, accuracy refs
walk-up), accuracy/test_*.py for MODEL_NAME → class mapping, and now
the post-PR test-db YAML in TestListRule's ancestor-key walk — was
served from the live `--repo-root`. As upstream/main advanced after
each replayed SHA landed, line numbers in `iter_diff_post_line_numbers`
were resolved against drifted file content, producing both spurious
narrowing (anchors finding the wrong YAML coverage) and spurious
fallbacks (e.g. a deleted tests-list line landing on a stage-select
key in the current file).

Replace the file-by-file snapshot with a one-shot `git worktree add
--detach <tmp>/wt <SHA>` per replay and route `--repo-root` to the
worktree. The whole post-PR tree is then consistent and matches what
real CI sees when the PR is at HEAD. Copy the filtered test-db YAMLs
out of `<wt>/cbts_test_db/` before the `finally` removes the worktree.
Add `git worktree prune` at startup to clean any metadata stranded by
interrupted prior runs.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
…hboring blocks

`iter_diff_post_line_numbers` anchors every `-` line at the next
surviving post-image line. When the last entry of a `tests:` list is
deleted, that anchor lands on the start of the next `- condition:`
block — ancestor walk-up then sees a stage-select key as the line's
neighborhood and forces fallback even though the actual edit was just
a tests-entry removal under tests:.

Split the `-` line handling out of the post-PR ancestor walk:

- `_added_line_numbers(diff)` collects post-image positions for `+`
  lines only (and consumes context lines to keep the cursor correct).
  Only those positions are walked against the post-PR file.
- `_diff_has_suspicious_minus(diff)` body-shape-checks every `-` line.
  Comment / blank lines and clean tests-entry list items pass; key
  edits, sub-key removals (`stage: pre_merge`), or non-tests-id list
  values (`- "*newgpu*"`) force fallback. `_diff_edits_stage_select_key`
  still catches direct STAGE_SELECT_KEY edits as before.

Verified on pr-12789 (deletes 3 tests-entry lines right before a
`- condition:` boundary): old behavior fell back to `None`; new
behavior correctly emits `testsonly` and narrows to the affected
stages.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
`tests/integration/defs/agg_unit_mem_df.csv` is a runtime tuning table
consumed by `test_unittests.py:80-117`: it maps (gpu, unittest_case) to
the pytest-xdist `parallel_factor` (clamped to <=8 workers) and nothing
else. It does not select which tests run, change test bodies, or affect
test results — only worker concurrency. Wrong values surface as visible
runtime failures (OOM if too high, slower CI if too low).

This is the same category as `.test_durations` (pytest-split timing
cache), which is already in OUT_OF_SCOPE_PREFIXES. Add the CSV path so
edits to it are claimed as noop instead of leaving the file unhandled
and forcing a baseline fallback for the whole PR.

Verified on pr-13756 (split _torch/speculative into hw-agnostic subdir
+ updated the CSV): replay flips from `None` to `testsonly`, with the
narrow driven by the actual test/test-db edits.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
…t-db entries

Bundles the rule-side change with concrete new accuracy and perf-
sanity test methods + their test-db registrations so a single PR
exercises every CBTS rule (incl. perfsanity_required signal) against
a real CI run. Revert all five files together after `/bot run`
confirms the narrowing pipeline.

- jenkins/scripts/cbts/rules/out_of_scope_rule.py: claim
  `jenkins/scripts/cbts/` itself as out-of-scope. CBTS runs at PR-
  decision time; edits to its own source can't affect any test stage
  in the same PR.

- tests/integration/defs/accuracy/test_llm_api_pytorch.py: new method
  `TestLlama3_1_8B.test_cbts_validation` — lightweight assertion on
  `self.MODEL_NAME`, no LLM load.

- tests/integration/test_lists/test-db/l0_b200.yml: register the new
  accuracy test under the existing B200 pre_merge pytorch block.

- tests/integration/defs/perf/test_perf_sanity.py: new top-level
  function `test_cbts_validation()` — `assert True`, no fixtures, no
  real perf measurement.

- tests/integration/test_lists/test-db/l0_dgx_h200_perf_sanity.yml:
  register the new perf test under the existing DGX-H200 post_merge
  pytorch perf-sanity block.

Expected CBTS decision for this PR:
  - jenkins/scripts/cbts/**                  → outofscope, noop
  - test_llm_api_pytorch.py edit             → testdef narrow
  - l0_b200.yml edit                         → testlist narrow
  - test_perf_sanity.py edit                 → testdef narrow
  - l0_dgx_h200_perf_sanity.yml edit         → testlist narrow,
                                               sets perfsanity_required=True
  - combined scope                           → testsonly, narrow to
                                               DGX_B200-PyTorch-* and
                                               DGX_H200-8_GPUs-PyTorch-
                                               PerfSanity-Post-Merge-* stages

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Blank or comment-only `+/-` lines carry no test-selection signal —
they can't introduce imports, decorators, test methods, or YAML
entries. But CBTS's existing diff walkers treat every `+/-` line
the same, so a trailing `+` blank line in an otherwise method-local
diff drops `_map_lines_to_pytest_scopes` to module-level (AST has no
scope for it), forces a file-level anchor fallback, and pulls in
every sibling class/test in the file — turning a 3-stage narrow into
a 30-stage one.

Add `strip_noop_diff_lines(diff)` to `rules/_helpers.py` and apply it
in `main.py::_load_pr_inputs` so every rule transparently sees a diff
where:

  - `+` blank / comment-only lines are downgraded to context. The
    post-image cursor in `iter_diff_post_line_numbers` still advances,
    so all subsequent line numbers remain correct.
  - `-` blank / comment-only lines are dropped. `-` lines never
    advance the post-image cursor, so removal is safe.

Hunk header counts (`-A,B +C,D`) are intentionally left stale —
CBTS walkers don't read B or D, only the `+C` start line.

Verified via dryrun on a real PR (b005bf7): testdef narrow goes
from 56 blocks / 65 stages → 2 blocks / 4 stages; `affected_stages`
collapses from 30 to 3 (DGX_B200-PyTorch-{1,2,3}), matching the
actual intent of the diff. testlist / waives are unaffected since
they already filtered comment bodies internally.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
After /bot run validated the end-to-end CBTS narrowing pipeline (incl.
`perfsanity_required` signal triggering perf-sanity stages and the
new `strip_noop_diff_lines` helper tightening narrow precision), drop
the four synthetic test entries that were only added to drive the
validation:

- tests/integration/defs/accuracy/test_llm_api_pytorch.py:
  remove `TestLlama3_1_8B.test_cbts_validation`
- tests/integration/test_lists/test-db/l0_b200.yml:
  unregister the entry
- tests/integration/defs/perf/test_perf_sanity.py:
  remove top-level `test_cbts_validation`
- tests/integration/test_lists/test-db/l0_dgx_h200_perf_sanity.yml:
  unregister the entry

The OOS rule extension that added `jenkins/scripts/cbts/` to
`OUT_OF_SCOPE_PREFIXES` is kept as a real long-term improvement —
CBTS's own source can't affect any test stage in the same PR, so
treating it as out-of-scope is correct and matches the existing
`tests/integration/test_lists/dev/` precedent.

Partial revert of b005bf7.

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
@crazydemo
Copy link
Copy Markdown
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47970 [ run ] triggered by Bot. Commit: c5c348c Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #47970 [ run ] completed with state SUCCESS. Commit: c5c348c
/LLM/main/L0_MergeRequest_PR pipeline #37810 completed with status: 'SUCCESS'

CI Report

Link to invocation

Copy link
Copy Markdown
Collaborator

@QiJune QiJune left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@crazydemo crazydemo enabled auto-merge (squash) May 13, 2026 01:47
@crazydemo crazydemo merged commit d989ff5 into NVIDIA:main May 15, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants