[None][feat] Add Auto-Deploy dashboard failures analysis skill by tcherckez-nvidia · Pull Request #12033 · NVIDIA/TensorRT-LLM

tcherckez-nvidia · 2026-03-09T12:01:49Z

Summary by CodeRabbit

Documentation
- Added new skill specification documentation for pipeline analysis and failure handling workflows.

Note: This release includes internal documentation updates with no direct user-facing changes.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>

coderabbitai · 2026-03-09T12:05:29Z

📝 Walkthrough

Walkthrough

A new skill specification is introduced that defines a workflow for analyzing AutoDeploy pipeline failures, categorizing them into root-cause buckets, and creating targeted PRs or issues per bucket. The specification includes multi-phase control flow (0-4), pipeline resolution rules, bucket validation procedures, and multiple output format options.

Changes

Cohort / File(s)	Summary
New Skill Specification `.claude/skills/ad-pipeline-failure-pr/SKILL.md`	Introduces a comprehensive skill specification defining a 4-phase workflow to analyze AutoDeploy pipeline failures (resolve scope, gather evidence, validate buckets, create fixes), with pipeline resolution rules, bucket validation criteria, PR/issue guardrails, and output formatting in chat, Markdown, and CSV formats.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	PR description contains only the template with all sections blank; Title, Description, and Test Coverage sections are completely unfilled.	Fill in a proper PR title matching template format (e.g., [TRTLLM-####][feat] Add...), complete the Description and Test Coverage sections, and explain what the skill does and how it is tested.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title '[None][feat] Add Auto-Deploy dashboard failures analysis skill' clearly and concisely summarizes the main change: introducing a new feature for analyzing Auto-Deploy dashboard failures.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/ad-pipeline-failure-pr/SKILL.md:
- Line 8: The document uses the misspelled environment variable GITLAN_TOKEN
(e.g., in the SKILL description text and any references) which breaks GitLab
auth; update every occurrence of GITLAN_TOKEN to the correct GITLAB_TOKEN so all
mentions (including the initial auth requirement and any later examples or
instructions) consistently reference GITLAB_TOKEN.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ead35b5c-83a8-41ca-9271-9113c5e5c687

📥 Commits

Reviewing files that changed from the base of the PR and between d704b5e and 6acdb62.

📒 Files selected for processing (1)

.claude/skills/ad-pipeline-failure-pr/SKILL.md

.claude/skills/ad-pipeline-failure-pr/SKILL.md

Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>

lucaslie · 2026-03-09T16:59:36Z

/bot skip --comment "doc changes only"

tensorrt-cicd · 2026-03-09T17:08:01Z

PR_Github #38300 [ skip ] triggered by Bot. Commit: b04b43d Link to invocation

tensorrt-cicd · 2026-03-09T17:58:18Z

PR_Github #38300 [ skip ] completed with state SUCCESS. Commit: b04b43d
Skipping testing for commit b04b43d

Link to invocation

[None][feat] Add Auto-Deploy dashboard failures analysis skill

6acdb62

Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>

github-actions bot assigned tcherckez-nvidia Mar 9, 2026

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

.claude/skills/ad-pipeline-failure-pr/SKILL.md Outdated Show resolved Hide resolved

Fix typo

b04b43d

Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>

lucaslie approved these changes Mar 9, 2026

View reviewed changes

lucaslie enabled auto-merge (squash) March 9, 2026 16:59

lucaslie merged commit 7747f25 into NVIDIA:main Mar 9, 2026
5 checks passed

tcherckez-nvidia deleted the dev-dashboard-failures-skill branch March 17, 2026 07:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][feat] Add Auto-Deploy dashboard failures analysis skill#12033

[None][feat] Add Auto-Deploy dashboard failures analysis skill#12033
lucaslie merged 2 commits intoNVIDIA:mainfrom
tcherckez-nvidia:dev-dashboard-failures-skill

tcherckez-nvidia commented Mar 9, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 9, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

lucaslie commented Mar 9, 2026

Uh oh!

tensorrt-cicd commented Mar 9, 2026

Uh oh!

tensorrt-cicd commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tcherckez-nvidia commented Mar 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lucaslie commented Mar 9, 2026

Uh oh!

tensorrt-cicd commented Mar 9, 2026

Uh oh!

tensorrt-cicd commented Mar 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tcherckez-nvidia commented Mar 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 9, 2026 •

edited

Loading