[#12593][feat] AutoDeploy: onboard DeepSeek-R1 by galagam · Pull Request #12601 · NVIDIA/TensorRT-LLM

galagam · 2026-03-30T22:08:44Z

Summary by CodeRabbit

New Features
- Added DeepSeek-R1 model configuration with optimized compile settings and transformation passes.
- Introduced model registry CSV generation for streamlined deployment.
- Enabled registry-based deployment option via --use-registry flag.
Bug Fixes
- Fixed RMS normalization tensor contiguity handling.
- Improved Float8 tensor indexing compatibility and weight dequantization.
- Corrected scale-row grouping logic for distributed weight sharding.
Tests
- Added accuracy benchmarks for DeepSeek-R1 (MMLU, GSM8K).
- Added unit test coverage for weight sharding edge cases.

Description

Registers DeepSeek-R1 in the AutoDeploy model registry and fixes all blocking issues needed to run it end-to-end with torch-cudagraph backend.
Added accuracy test for development (not added to test lists yet).
Output is no longer gibberish, MMLU and GSM8K are both ~75 - but we need to check if higher accuracy can be recovered (TRT-LLM report higher accuracy for the NVFP4 checkpoint)
Fixes:

Fixed MLA to correctly dequantize FP8 quantized weights (instead of casting to BF16)
Fix finegrained FP8 weights binning when scale_dim doesn't evenly divide world_size in
Add contiguity check before RMS norm; non-contiguous inputs (e.g. split_with_sizes views) cause stride mismatch (IMA)

Test Coverage

tests/integration/defs/accuracy/test_llm_api_autodeploy.py::TestModelRegistryAccuracy::test_autodeploy_from_registry[deepseek-ai_DeepSeek-R1-True]

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-03-31T16:00:12Z

📝 Walkthrough

Walkthrough

This change introduces registry-based configuration management for DeepSeek-R1 model deployment, including updated runtime parameters, dequantization hooks for FP8 weights, refined sharding logic for non-divisible world sizes, and corresponding accuracy reference entries and tests.

Changes

Cohort / File(s)	Summary
DeepSeek-R1 Model Registry `examples/auto_deploy/build_and_run_ad.py`, `examples/auto_deploy/model_registry/configs/deepseek-r1.yaml`, `examples/auto_deploy/model_registry/models.yaml`	Added registry path constants and `get_registry_yaml_extra()` function to load model-specific YAML configs. Introduced `_inject_registry_yaml_extra()` to parse `--use-registry` and `--model` CLI flags, resolve YAML extra file paths, and inject resolved paths into `sys.argv` before config parsing. Updated `models.yaml` entry for `deepseek-ai/DeepSeek-R1` to reference single consolidated `deepseek-r1.yaml` config instead of multiple composable YAML files. New config file defines backend mode (`torch-cudagraph`), distributed sizing (`world_size: 8`), and optimization passes including SWiGLU/NVFP4 fusion, MoE/MLA attention streaming, and cached MLA insertion.
Model Registry Tooling `examples/auto_deploy/model_registry/generate_csv.py`	New script reads `models.yaml` and generates `models.csv` with columns for Hugging Face model IDs, URLs, and build commands. Includes `build_command()` function to construct `python examples/auto_deploy/build_and_run_ad.py --model {name} --use-registry` invocations.
MLA RoPE Dequantization Hooks `tensorrt_llm/_torch/auto_deploy/models/custom/mla_rope_utils.py`, `tensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek.py`	Replaced advanced indexing with `torch.index_select` for Float8 tensor permutations (RoPE columns and PE entries). Added `_kv_b_proj_dequant_load_hook` to detect and dequantize FP8 `kv_b_proj.weight` tensors to BF16 using block-wise scale factors, with fallback to raw cast if scale missing. Integrated hook into `DeepSeekV3ForCausalLM.__init__` via state dict pre-hook registered before weight loading.
Tensor Operations `tensorrt_llm/_torch/auto_deploy/custom_ops/normalization/triton_rms_norm.py`	Added contiguity check in `rms_norm` to ensure input tensor matches output contiguity, replacing non-contiguous inputs with `.contiguous()` copy before kernel launch to prevent stride mismatch issues.
Distributed Sharding Logic `tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py`	Refined `FineGrainedFP8WeightShardingInfo._split_scale` scale-row grouping for cases where `scale_dim < world_size` and dimensions do not divide evenly. Changed from integer-rank formula to proportional binning (`group = min((rank * scale_dim) // world_size, scale_dim - 1)`). Added validation to raise `ValueError` when `scale_dim <= 0`.
Accuracy References & Tests `tests/integration/defs/accuracy/references/gsm8k.yaml`, `tests/integration/defs/accuracy/references/mmlu.yaml`, `tests/integration/defs/accuracy/test_llm_api_autodeploy.py`	Added baseline accuracy entries for `deepseek-ai/DeepSeek-R1` (73.730 for GSM8K, 76.34 for MMLU without quantization). Added corresponding `pytest.param` to `TestModelRegistryAccuracy.MODEL_REGISTRY_ACCURACY_PARAMS` with task list `[MMLU, GSM8K]`, skip markers for pre-Blackwell devices, and minimum 120GB device memory.
Sharding Unit Tests `tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py`	Added `test_finegrained_fp8_split_scale_non_divisible_world_size` to validate scale-row grouping with `scale_dim < world_size`. Test iterates over ranks 0–7 with non-evenly-divisible dimensions and asserts returned shard matches expected single-row slice per rank.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title clearly identifies the main change: registering DeepSeek-R1 in the AutoDeploy model registry with appropriate feature flag.
Description check	✅ Passed	Description explains the purpose, lists specific fixes, and identifies test coverage, though it doesn't fully utilize the template structure with clear sectioning.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py (1)
1-1: ⚠️ Potential issue | 🟠 Major

Add the required NVIDIA copyright header.

This modified Python test file is missing the required NVIDIA OSS copyright header at the top.

As per coding guidelines: “Add NVIDIA copyright header to ALL new files; update year on modified files.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py`
at line 1, Add the required NVIDIA OSS copyright header to the top of the test
file
(tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py)
by inserting the standard NVIDIA header block as the first lines of the file; if
this is a modified file rather than new, update the copyright year range in that
header accordingly so it reflects the current year.
tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py (1)
1-1: ⚠️ Potential issue | 🟠 Major

Add the required NVIDIA copyright header.

This modified Python file is missing the required NVIDIA OSS copyright header at the top.

As per coding guidelines: “All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the year of its latest meaningful modification.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py` at line 1,
This file is missing the required NVIDIA OSS copyright header: add the standard
NVIDIA copyright header as the very first lines of
tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py (before the module
docstring """Transformations to support graph sharding."""), include the correct
year of latest meaningful modification and the canonical NVIDIA wording used
across the repo, and ensure the header remains a top-of-file comment block so
tools and license scanners recognize it.
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek.py (1)
1-1: ⚠️ Potential issue | 🟡 Minor

Update the copyright year on this modified file.

This file changed in this PR but still carries a 2025 header. Please bump it to 2026.
Possible fix
-# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.
As per coding guidelines, "Add NVIDIA copyright header to ALL new files; update year on modified files."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek.py` at line
1, Update the copyright header at the top of
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek.py by changing
the year from 2025 to 2026 so the file header reflects the modification year;
locate the file's top-of-file copyright comment and replace "2025" with "2026".

🧹 Nitpick comments (5)

tensorrt_llm/_torch/auto_deploy/custom_ops/normalization/triton_rms_norm.py (1)

54-59: Good fix for the stride mismatch bug.

The contiguity guard correctly prevents out-of-bounds writes when hidden_states is a non-contiguous view. The comment clearly explains the root cause.

Consider applying the same fix to triton_fused_add_rms_norm_quant_fp8.py. Both rms_norm_quant_fp8() (lines 75–77) and fused_add_rms_norm_quant_fp8() (lines 181–200) compute strides from reshaped tensors without ensuring contiguity first, exhibiting the same vulnerability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/auto_deploy/custom_ops/normalization/triton_rms_norm.py`
around lines 54 - 59, Add the same contiguity guard used in triton_rms_norm.py
to triton_fused_add_rms_norm_quant_fp8.py: inside rms_norm_quant_fp8() (around
lines where hidden_states is reshaped and strides are computed) and inside
fused_add_rms_norm_quant_fp8() (before computing strides from the reshaped input
tensors), check if the relevant input tensors (e.g., hidden_states or the
addend) are contiguous and call .contiguous() to replace them if not; this
prevents input_stride != output_stride mismatches that lead to out-of-bounds
writes in the Triton kernel.

tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py (1)

606-616: Harden _split_scale against invalid world_size/rank inputs.

Line [615] can raise ZeroDivisionError for world_size <= 0, and out-of-range ranks can silently map to the wrong shard group. Add explicit argument checks up front.

Proposed patch

     scale_dim = scale.shape[dim]
     if scale_dim <= 0:
         raise ValueError(f"Invalid scale dimension ({scale_dim}) for dim={dim}.")
+    if world_size <= 0:
+        raise ValueError(f"world_size must be > 0, got {world_size}.")
+    if rank < 0 or rank >= world_size:
+        raise ValueError(f"rank must be in [0, {world_size}), got {rank}.")
     if scale_dim >= world_size:
         return torch.tensor_split(scale, world_size, dim=dim)[rank]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py` around lines
606 - 616, The _split_scale function lacks validation for world_size and rank
which can cause ZeroDivisionError or incorrect grouping; add upfront argument
checks: validate that world_size is an integer > 0 and raise ValueError if not,
and validate that rank is an integer in [0, world_size-1] and raise IndexError
or ValueError if out of range; keep the existing logic (use torch.tensor_split
and proportional binning) after these checks so torch.tensor_split(scale,
world_size, dim=dim)[rank] and the group computation cannot divide by zero or
map an invalid rank to the wrong group.

tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py (1)

106-117: Nice regression test; consider covering the new invalid-dimension guard too.

This test covers the non-divisible mapping path well. Add a companion case for empty scale rows to assert the ValueError path introduced in _split_scale.

Suggested additional test

+def test_finegrained_fp8_split_scale_invalid_scale_dim():
+    scale = torch.empty(0, 4, dtype=torch.float32)
+    with pytest.raises(ValueError, match="Invalid scale dimension"):
+        FineGrainedFP8WeightShardingInfo._split_scale(
+            scale=scale, dim=0, rank=0, world_size=8
+        )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py`
around lines 106 - 117, Add a companion unit test that asserts
FineGrainedFP8WeightShardingInfo._split_scale raises a ValueError when the
computed row slice would be empty: construct a small scale tensor with fewer
rows than the mapping requires (e.g., scale with shape (N, ...) and a larger
world_size so some ranks map to group >= N), call _split_scale for a rank that
maps outside the available rows, and use pytest.raises(ValueError) to verify the
guard path.

examples/auto_deploy/model_registry/generate_csv.py (2)

9-9: Unused constant REPO_ROOT.

REPO_ROOT is defined but never used in the script. Consider removing it to avoid confusion.

-REPO_ROOT = Path(__file__).resolve().parents[3]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/auto_deploy/model_registry/generate_csv.py` at line 9, The constant
REPO_ROOT defined as Path(__file__).resolve().parents[3] in generate_csv.py is
unused; remove the REPO_ROOT declaration (and its import usage if only used for
that) to avoid dead code and confusion—search for the symbol REPO_ROOT in this
file and delete the assignment and any now-unneeded imports (e.g., Path) or
replace with a local path usage if it was intended to be used.

21-32: Consider adding basic error handling for robustness.

The script doesn't handle cases where models.yaml is missing or malformed. While this is a development utility, adding minimal error handling would improve usability.

🛡️ Optional: Add error handling

 def main():
+    if not MODELS_YAML.exists():
+        print(f"Error: {MODELS_YAML} not found")
+        return
+
     with open(MODELS_YAML) as f:
         data = yaml.safe_load(f)

+    if not data or "models" not in data:
+        print("Error: Invalid models.yaml format")
+        return
+
     rows = []
     for entry in data.get("models", []):
-        name = entry["name"]
+        name = entry.get("name")
+        if not name:
+            continue
         hf_link = f"{HF_BASE_URL}/{name}"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/auto_deploy/model_registry/generate_csv.py` around lines 21 - 32,
Wrap file reading and YAML parsing in try/except to handle missing or malformed
MODELS_YAML and validate the structure before iterating: in main() catch
FileNotFoundError when opening MODELS_YAML and yaml.YAMLError from
yaml.safe_load, log/print a clear error and exit/return with non‑zero status;
also validate that data.get("models") is a list before looping and handle
missing keys (entry["name"]) by skipping or reporting the malformed entry; keep
references to MODELS_YAML, yaml.safe_load, build_command, HF_BASE_URL and rows
so the fixes are made in the same scope.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/auto_deploy/model_registry/generate_csv.py`:
- Around line 1-6: Add the required NVIDIA copyright header at the top of the
generate_csv.py module above the module docstring; include the current year (or
year of latest meaningful modification) and follow the project's standard NVIDIA
copyright header template so the file-level docstring and imports remain
unchanged.

In `@tensorrt_llm/_torch/auto_deploy/models/custom/mla_rope_utils.py`:
- Around line 107-115: The code currently silently casts FP8 kv_b_proj weights
to bfloat16 when the corresponding scale_key is missing (see state_dict,
scale_key, w_key, kv_b_proj, layer_idx, and the w.to(torch.bfloat16) branch);
change this to fail fast by raising a clear exception instead of continuing:
replace the silent fallback (and possibly the ad_logger.warning_once) with
raising a RuntimeError (or ValueError) that includes the missing scale_key,
w_key, and layer_idx so loading stops immediately and the checkpoint-format
problem is surfaced.
- Around line 119-130: The code currently expands FP8 scales assuming scale has
shape [N/block_n, K/block_k], but triton_fp8_quantize_1x128 emits scales with
shape [ceil(K/128), N] (K-chunk major), so the axes are swapped; fix by treating
scale as (num_k_chunks, N), transpose it to (N, num_k_chunks), then
repeat_interleave along dim=1 by the FP8 chunk size (128) to expand K, and
finally slice to [:N, :K]; update the expansion using the existing symbols
scale, w (N, K), and state_dict[w_key] so dequantization uses the correct
per-128-K-chunk scales (assume block_k = 128 as in triton_fp8_quantize_1x128).

In `@tests/integration/defs/accuracy/test_llm_api_autodeploy.py`:
- Around line 954-964: The _get_registry_yaml_extra() helper currently only
parses world_size from filenames; modify it to first open each YAML in the
provided yaml_extra list, safe-load the YAML content (e.g., yaml.safe_load) and,
if a top-level integer key "world_size" exists, use that value as world_size,
then only if not found fall back to the existing regex filename extraction for
patterns like world_size_N.yaml to preserve backward compatibility; ensure you
reference and update the logic inside the _get_registry_yaml_extra function so
tests like deepseek-ai/DeepSeek-R1 (yaml_extra: ['deepseek-r1.yaml'] with
world_size: 8) return world_size=8.

---

Outside diff comments:
In `@tensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek.py`:
- Line 1: Update the copyright header at the top of
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek.py by changing
the year from 2025 to 2026 so the file header reflects the modification year;
locate the file's top-of-file copyright comment and replace "2025" with "2026".

In `@tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py`:
- Line 1: This file is missing the required NVIDIA OSS copyright header: add the
standard NVIDIA copyright header as the very first lines of
tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py (before the module
docstring """Transformations to support graph sharding."""), include the correct
year of latest meaningful modification and the canonical NVIDIA wording used
across the repo, and ensure the header remains a top-of-file comment block so
tools and license scanners recognize it.

In
`@tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py`:
- Line 1: Add the required NVIDIA OSS copyright header to the top of the test
file
(tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py)
by inserting the standard NVIDIA header block as the first lines of the file; if
this is a modified file rather than new, update the copyright year range in that
header accordingly so it reflects the current year.

---

Nitpick comments:
In `@examples/auto_deploy/model_registry/generate_csv.py`:
- Line 9: The constant REPO_ROOT defined as Path(__file__).resolve().parents[3]
in generate_csv.py is unused; remove the REPO_ROOT declaration (and its import
usage if only used for that) to avoid dead code and confusion—search for the
symbol REPO_ROOT in this file and delete the assignment and any now-unneeded
imports (e.g., Path) or replace with a local path usage if it was intended to be
used.
- Around line 21-32: Wrap file reading and YAML parsing in try/except to handle
missing or malformed MODELS_YAML and validate the structure before iterating: in
main() catch FileNotFoundError when opening MODELS_YAML and yaml.YAMLError from
yaml.safe_load, log/print a clear error and exit/return with non‑zero status;
also validate that data.get("models") is a list before looping and handle
missing keys (entry["name"]) by skipping or reporting the malformed entry; keep
references to MODELS_YAML, yaml.safe_load, build_command, HF_BASE_URL and rows
so the fixes are made in the same scope.

In `@tensorrt_llm/_torch/auto_deploy/custom_ops/normalization/triton_rms_norm.py`:
- Around line 54-59: Add the same contiguity guard used in triton_rms_norm.py to
triton_fused_add_rms_norm_quant_fp8.py: inside rms_norm_quant_fp8() (around
lines where hidden_states is reshaped and strides are computed) and inside
fused_add_rms_norm_quant_fp8() (before computing strides from the reshaped input
tensors), check if the relevant input tensors (e.g., hidden_states or the
addend) are contiguous and call .contiguous() to replace them if not; this
prevents input_stride != output_stride mismatches that lead to out-of-bounds
writes in the Triton kernel.

In `@tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py`:
- Around line 606-616: The _split_scale function lacks validation for world_size
and rank which can cause ZeroDivisionError or incorrect grouping; add upfront
argument checks: validate that world_size is an integer > 0 and raise ValueError
if not, and validate that rank is an integer in [0, world_size-1] and raise
IndexError or ValueError if out of range; keep the existing logic (use
torch.tensor_split and proportional binning) after these checks so
torch.tensor_split(scale, world_size, dim=dim)[rank] and the group computation
cannot divide by zero or map an invalid rank to the wrong group.

In
`@tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py`:
- Around line 106-117: Add a companion unit test that asserts
FineGrainedFP8WeightShardingInfo._split_scale raises a ValueError when the
computed row slice would be empty: construct a small scale tensor with fewer
rows than the mapping requires (e.g., scale with shape (N, ...) and a larger
world_size so some ranks map to group >= N), call _split_scale for a rank that
maps outside the available rows, and use pytest.raises(ValueError) to verify the
guard path.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fa984ac1-742f-4595-9d94-8584a5065b8d

📥 Commits

Reviewing files that changed from the base of the PR and between b3304b8 and 4b25712.

📒 Files selected for processing (12)

examples/auto_deploy/build_and_run_ad.py
examples/auto_deploy/model_registry/configs/deepseek-r1.yaml
examples/auto_deploy/model_registry/generate_csv.py
examples/auto_deploy/model_registry/models.yaml
tensorrt_llm/_torch/auto_deploy/custom_ops/normalization/triton_rms_norm.py
tensorrt_llm/_torch/auto_deploy/models/custom/mla_rope_utils.py
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek.py
tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py
tests/integration/defs/accuracy/references/gsm8k.yaml
tests/integration/defs/accuracy/references/mmlu.yaml
tests/integration/defs/accuracy/test_llm_api_autodeploy.py
tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py

taylor-yb-lee

LGTM, except question about the accuracy value commits

galagam · 2026-04-02T19:08:41Z

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-Post-Merge-1"

tensorrt-cicd · 2026-04-02T19:15:01Z

PR_Github #41490 [ run ] triggered by Bot. Commit: 2744cf2 Link to invocation

tensorrt-cicd · 2026-04-02T19:59:23Z

PR_Github #41490 [ run ] completed with state SUCCESS. Commit: 2744cf2
/LLM/main/L0_MergeRequest_PR pipeline #32410 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

galagam · 2026-04-02T21:05:48Z

/bot run

tensorrt-cicd · 2026-04-02T21:12:09Z

PR_Github #41505 [ run ] triggered by Bot. Commit: 2744cf2 Link to invocation

tensorrt-cicd · 2026-04-02T22:34:47Z

PR_Github #41505 [ run ] completed with state SUCCESS. Commit: 2744cf2
/LLM/main/L0_MergeRequest_PR pipeline #32423 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

galagam · 2026-04-03T09:21:55Z

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-Post-Merge-1"

tensorrt-cicd · 2026-04-03T09:27:25Z

PR_Github #41642 [ run ] triggered by Bot. Commit: cef786a Link to invocation

tensorrt-cicd · 2026-04-03T11:57:29Z

PR_Github #41642 [ run ] completed with state SUCCESS. Commit: cef786a
/LLM/main/L0_MergeRequest_PR pipeline #32549 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

galagam · 2026-04-03T12:15:32Z

/bot run

tensorrt-cicd · 2026-04-03T12:21:17Z

PR_Github #41661 [ run ] triggered by Bot. Commit: cef786a Link to invocation

tensorrt-cicd · 2026-04-03T13:34:04Z

PR_Github #41661 [ run ] completed with state SUCCESS. Commit: cef786a
/LLM/main/L0_MergeRequest_PR pipeline #32566 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

galagam · 2026-04-03T14:40:31Z

/bot run

tensorrt-cicd · 2026-04-07T07:35:52Z

PR_Github #42093 [ run ] triggered by Bot. Commit: b102c51 Link to invocation

tensorrt-cicd · 2026-04-07T08:20:15Z

PR_Github #42093 [ run ] completed with state FAILURE. Commit: b102c51
/LLM/main/L0_MergeRequest_PR pipeline #32932 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

galagam · 2026-04-07T10:25:34Z

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-Post-Merge-1,DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1"

tensorrt-cicd · 2026-04-07T10:31:14Z

PR_Github #42135 [ run ] triggered by Bot. Commit: 69dd2b6 Link to invocation

tensorrt-cicd · 2026-04-07T12:53:19Z

PR_Github #42135 [ run ] completed with state SUCCESS. Commit: 69dd2b6
/LLM/main/L0_MergeRequest_PR pipeline #32970 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

galagam · 2026-04-07T13:44:35Z

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-Post-Merge-1,DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1" --reuse-test --disbale-fail-fast

tensorrt-cicd · 2026-04-07T13:50:12Z

PR_Github #42146 Bot args parsing error: usage: /bot [-h]
{run,kill,skip,submit,reviewers,reuse-pipeline,reuse-review} ...
/bot: error: unrecognized arguments: --disbale-fail-fast

Link to invocation

galagam · 2026-04-07T14:26:42Z

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-Post-Merge-1,DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1" --reuse-test --disable-fail-fast

tensorrt-cicd · 2026-04-07T14:32:56Z

PR_Github #42149 [ run ] triggered by Bot. Commit: 69dd2b6 Link to invocation

tensorrt-cicd · 2026-04-07T16:30:20Z

PR_Github #42149 [ run ] completed with state SUCCESS. Commit: 69dd2b6
/LLM/main/L0_MergeRequest_PR pipeline #32982 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

galagam · 2026-04-07T19:03:12Z

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-Post-Merge-1,DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1" --reuse-test --disable-fail-fast

tensorrt-cicd · 2026-04-07T19:10:09Z

PR_Github #42178 [ run ] triggered by Bot. Commit: 69dd2b6 Link to invocation

tensorrt-cicd · 2026-04-07T22:04:28Z

PR_Github #42178 [ run ] completed with state SUCCESS. Commit: 69dd2b6
/LLM/main/L0_MergeRequest_PR pipeline #33005 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

galagam · 2026-04-07T23:57:02Z

/bot run

tensorrt-cicd · 2026-04-08T00:02:45Z

PR_Github #42202 [ run ] triggered by Bot. Commit: 69dd2b6 Link to invocation

tensorrt-cicd · 2026-04-08T04:04:56Z

PR_Github #42202 [ run ] completed with state SUCCESS. Commit: 69dd2b6
/LLM/main/L0_MergeRequest_PR pipeline #33021 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

galagam · 2026-04-08T05:39:06Z

/bot run

tensorrt-cicd · 2026-04-08T05:45:45Z

PR_Github #42283 [ run ] triggered by Bot. Commit: 69dd2b6 Link to invocation

tensorrt-cicd · 2026-04-08T08:54:54Z

PR_Github #42283 [ run ] completed with state FAILURE. Commit: 69dd2b6
/LLM/main/L0_MergeRequest_PR pipeline #33080 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

galagam · 2026-04-08T15:55:37Z

/bot run

tensorrt-cicd · 2026-04-08T16:02:36Z

PR_Github #42361 [ run ] triggered by Bot. Commit: 69dd2b6 Link to invocation

tensorrt-cicd · 2026-04-09T06:06:10Z

PR_Github #42361 [ run ] completed with state SUCCESS. Commit: 69dd2b6
/LLM/main/L0_MergeRequest_PR pipeline #33146 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

galagam · 2026-04-09T06:09:12Z

/bot run --reuse-test

tensorrt-cicd · 2026-04-09T06:17:04Z

PR_Github #42483 [ run ] triggered by Bot. Commit: 69dd2b6 Link to invocation

tensorrt-cicd · 2026-04-09T10:58:24Z

PR_Github #42483 [ run ] completed with state SUCCESS. Commit: 69dd2b6
/LLM/main/L0_MergeRequest_PR pipeline #33233 completed with status: 'SUCCESS'

CI Report

Link to invocation

github-actions bot assigned galagam Mar 30, 2026

galagam mentioned this pull request Mar 30, 2026

[AutoDeploy]: DeepSeek R1 onboarding + initial perf analysis #12593

Open

1 task

galagam force-pushed the gagam/deepseek-r1-onboard branch from 129f9de to 4b25712 Compare March 31, 2026 15:45

galagam marked this pull request as ready for review March 31, 2026 15:49

galagam requested review from a team as code owners March 31, 2026 15:49

galagam requested review from MrGeva, suyoggupta, taylor-yb-lee and tcherckez-nvidia March 31, 2026 15:49

coderabbitai bot reviewed Mar 31, 2026

View reviewed changes

galagam force-pushed the gagam/deepseek-r1-onboard branch from 11cf343 to b55bfb1 Compare March 31, 2026 17:40

xinhe-nv approved these changes Apr 1, 2026

View reviewed changes

taylor-yb-lee reviewed Apr 1, 2026

View reviewed changes

Comment thread tests/integration/defs/accuracy/references/mmlu.yaml Outdated

taylor-yb-lee approved these changes Apr 1, 2026

View reviewed changes

galagam force-pushed the gagam/deepseek-r1-onboard branch from 2744cf2 to cef786a Compare April 3, 2026 09:21

galagam added 2 commits April 7, 2026 03:05

minor cleanup

ab4d50b

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

fix test list bug - wrong insertion caused perf test condition change

69dd2b6

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>

greg-kwasniewski1 approved these changes Apr 8, 2026

View reviewed changes

ZhanruiSunCh approved these changes Apr 9, 2026

View reviewed changes

galagam merged commit 28b6afb into NVIDIA:main Apr 9, 2026
5 checks passed

Conversation

galagam commented Mar 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai bot commented Mar 31, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

taylor-yb-lee left a comment

Choose a reason for hiding this comment

Uh oh!

galagam commented Apr 2, 2026

Uh oh!

tensorrt-cicd commented Apr 2, 2026

Uh oh!

tensorrt-cicd commented Apr 2, 2026

Uh oh!

galagam commented Apr 2, 2026

Uh oh!

tensorrt-cicd commented Apr 2, 2026

Uh oh!

tensorrt-cicd commented Apr 2, 2026

Uh oh!

galagam commented Apr 3, 2026

Uh oh!

tensorrt-cicd commented Apr 3, 2026

Uh oh!

tensorrt-cicd commented Apr 3, 2026

Uh oh!

galagam commented Apr 3, 2026

Uh oh!

tensorrt-cicd commented Apr 3, 2026

Uh oh!

tensorrt-cicd commented Apr 3, 2026

Uh oh!

galagam commented Apr 3, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

galagam commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

galagam commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

galagam commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

galagam commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

tensorrt-cicd commented Apr 7, 2026

Uh oh!

galagam commented Apr 7, 2026

Uh oh!

galagam commented Mar 30, 2026 •

edited by coderabbitai bot

Loading