Skip to content

[#12593][feat] AutoDeploy: onboard DeepSeek-R1#12601

Merged
galagam merged 19 commits intoNVIDIA:mainfrom
nv-auto-deploy:gagam/deepseek-r1-onboard
Apr 9, 2026
Merged

[#12593][feat] AutoDeploy: onboard DeepSeek-R1#12601
galagam merged 19 commits intoNVIDIA:mainfrom
nv-auto-deploy:gagam/deepseek-r1-onboard

Conversation

@galagam
Copy link
Copy Markdown
Collaborator

@galagam galagam commented Mar 30, 2026

Summary by CodeRabbit

  • New Features

    • Added DeepSeek-R1 model configuration with optimized compile settings and transformation passes.
    • Introduced model registry CSV generation for streamlined deployment.
    • Enabled registry-based deployment option via --use-registry flag.
  • Bug Fixes

    • Fixed RMS normalization tensor contiguity handling.
    • Improved Float8 tensor indexing compatibility and weight dequantization.
    • Corrected scale-row grouping logic for distributed weight sharding.
  • Tests

    • Added accuracy benchmarks for DeepSeek-R1 (MMLU, GSM8K).
    • Added unit test coverage for weight sharding edge cases.

Description

Registers DeepSeek-R1 in the AutoDeploy model registry and fixes all blocking issues needed to run it end-to-end with torch-cudagraph backend.
Added accuracy test for development (not added to test lists yet).
Output is no longer gibberish, MMLU and GSM8K are both ~75 - but we need to check if higher accuracy can be recovered (TRT-LLM report higher accuracy for the NVFP4 checkpoint)
Fixes:

  • Fixed MLA to correctly dequantize FP8 quantized weights (instead of casting to BF16)
  • Fix finegrained FP8 weights binning when scale_dim doesn't evenly divide world_size in
  • Add contiguity check before RMS norm; non-contiguous inputs (e.g. split_with_sizes views) cause stride mismatch (IMA)

Test Coverage

tests/integration/defs/accuracy/test_llm_api_autodeploy.py::TestModelRegistryAccuracy::test_autodeploy_from_registry[deepseek-ai_DeepSeek-R1-True]

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@galagam galagam force-pushed the gagam/deepseek-r1-onboard branch from 129f9de to 4b25712 Compare March 31, 2026 15:45
@galagam galagam marked this pull request as ready for review March 31, 2026 15:49
@galagam galagam requested review from a team as code owners March 31, 2026 15:49
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 31, 2026

📝 Walkthrough

Walkthrough

This change introduces registry-based configuration management for DeepSeek-R1 model deployment, including updated runtime parameters, dequantization hooks for FP8 weights, refined sharding logic for non-divisible world sizes, and corresponding accuracy reference entries and tests.

Changes

Cohort / File(s) Summary
DeepSeek-R1 Model Registry
examples/auto_deploy/build_and_run_ad.py, examples/auto_deploy/model_registry/configs/deepseek-r1.yaml, examples/auto_deploy/model_registry/models.yaml
Added registry path constants and get_registry_yaml_extra() function to load model-specific YAML configs. Introduced _inject_registry_yaml_extra() to parse --use-registry and --model CLI flags, resolve YAML extra file paths, and inject resolved paths into sys.argv before config parsing. Updated models.yaml entry for deepseek-ai/DeepSeek-R1 to reference single consolidated deepseek-r1.yaml config instead of multiple composable YAML files. New config file defines backend mode (torch-cudagraph), distributed sizing (world_size: 8), and optimization passes including SWiGLU/NVFP4 fusion, MoE/MLA attention streaming, and cached MLA insertion.
Model Registry Tooling
examples/auto_deploy/model_registry/generate_csv.py
New script reads models.yaml and generates models.csv with columns for Hugging Face model IDs, URLs, and build commands. Includes build_command() function to construct python examples/auto_deploy/build_and_run_ad.py --model {name} --use-registry invocations.
MLA RoPE Dequantization Hooks
tensorrt_llm/_torch/auto_deploy/models/custom/mla_rope_utils.py, tensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek.py
Replaced advanced indexing with torch.index_select for Float8 tensor permutations (RoPE columns and PE entries). Added _kv_b_proj_dequant_load_hook to detect and dequantize FP8 kv_b_proj.weight tensors to BF16 using block-wise scale factors, with fallback to raw cast if scale missing. Integrated hook into DeepSeekV3ForCausalLM.__init__ via state dict pre-hook registered before weight loading.
Tensor Operations
tensorrt_llm/_torch/auto_deploy/custom_ops/normalization/triton_rms_norm.py
Added contiguity check in rms_norm to ensure input tensor matches output contiguity, replacing non-contiguous inputs with .contiguous() copy before kernel launch to prevent stride mismatch issues.
Distributed Sharding Logic
tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py
Refined FineGrainedFP8WeightShardingInfo._split_scale scale-row grouping for cases where scale_dim < world_size and dimensions do not divide evenly. Changed from integer-rank formula to proportional binning (group = min((rank * scale_dim) // world_size, scale_dim - 1)). Added validation to raise ValueError when scale_dim <= 0.
Accuracy References & Tests
tests/integration/defs/accuracy/references/gsm8k.yaml, tests/integration/defs/accuracy/references/mmlu.yaml, tests/integration/defs/accuracy/test_llm_api_autodeploy.py
Added baseline accuracy entries for deepseek-ai/DeepSeek-R1 (73.730 for GSM8K, 76.34 for MMLU without quantization). Added corresponding pytest.param to TestModelRegistryAccuracy.MODEL_REGISTRY_ACCURACY_PARAMS with task list [MMLU, GSM8K], skip markers for pre-Blackwell devices, and minimum 120GB device memory.
Sharding Unit Tests
tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py
Added test_finegrained_fp8_split_scale_non_divisible_world_size to validate scale-row grouping with scale_dim < world_size. Test iterates over ranks 0–7 with non-evenly-divisible dimensions and asserts returned shard matches expected single-row slice per rank.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 44.44% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly identifies the main change: registering DeepSeek-R1 in the AutoDeploy model registry with appropriate feature flag.
Description check ✅ Passed Description explains the purpose, lists specific fixes, and identifies test coverage, though it doesn't fully utilize the template structure with clear sectioning.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add the required NVIDIA copyright header.

This modified Python test file is missing the required NVIDIA OSS copyright header at the top.

As per coding guidelines: “Add NVIDIA copyright header to ALL new files; update year on modified files.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py`
at line 1, Add the required NVIDIA OSS copyright header to the top of the test
file
(tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py)
by inserting the standard NVIDIA header block as the first lines of the file; if
this is a modified file rather than new, update the copyright year range in that
header accordingly so it reflects the current year.
tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Add the required NVIDIA copyright header.

This modified Python file is missing the required NVIDIA OSS copyright header at the top.

As per coding guidelines: “All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the year of its latest meaningful modification.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py` at line 1,
This file is missing the required NVIDIA OSS copyright header: add the standard
NVIDIA copyright header as the very first lines of
tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py (before the module
docstring """Transformations to support graph sharding."""), include the correct
year of latest meaningful modification and the canonical NVIDIA wording used
across the repo, and ensure the header remains a top-of-file comment block so
tools and license scanners recognize it.
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek.py (1)

1-1: ⚠️ Potential issue | 🟡 Minor

Update the copyright year on this modified file.

This file changed in this PR but still carries a 2025 header. Please bump it to 2026.

Possible fix
-# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.
+# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved.

As per coding guidelines, "Add NVIDIA copyright header to ALL new files; update year on modified files."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek.py` at line
1, Update the copyright header at the top of
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek.py by changing
the year from 2025 to 2026 so the file header reflects the modification year;
locate the file's top-of-file copyright comment and replace "2025" with "2026".
🧹 Nitpick comments (5)
tensorrt_llm/_torch/auto_deploy/custom_ops/normalization/triton_rms_norm.py (1)

54-59: Good fix for the stride mismatch bug.

The contiguity guard correctly prevents out-of-bounds writes when hidden_states is a non-contiguous view. The comment clearly explains the root cause.

Consider applying the same fix to triton_fused_add_rms_norm_quant_fp8.py. Both rms_norm_quant_fp8() (lines 75–77) and fused_add_rms_norm_quant_fp8() (lines 181–200) compute strides from reshaped tensors without ensuring contiguity first, exhibiting the same vulnerability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/auto_deploy/custom_ops/normalization/triton_rms_norm.py`
around lines 54 - 59, Add the same contiguity guard used in triton_rms_norm.py
to triton_fused_add_rms_norm_quant_fp8.py: inside rms_norm_quant_fp8() (around
lines where hidden_states is reshaped and strides are computed) and inside
fused_add_rms_norm_quant_fp8() (before computing strides from the reshaped input
tensors), check if the relevant input tensors (e.g., hidden_states or the
addend) are contiguous and call .contiguous() to replace them if not; this
prevents input_stride != output_stride mismatches that lead to out-of-bounds
writes in the Triton kernel.
tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py (1)

606-616: Harden _split_scale against invalid world_size/rank inputs.

Line [615] can raise ZeroDivisionError for world_size <= 0, and out-of-range ranks can silently map to the wrong shard group. Add explicit argument checks up front.

Proposed patch
     scale_dim = scale.shape[dim]
     if scale_dim <= 0:
         raise ValueError(f"Invalid scale dimension ({scale_dim}) for dim={dim}.")
+    if world_size <= 0:
+        raise ValueError(f"world_size must be > 0, got {world_size}.")
+    if rank < 0 or rank >= world_size:
+        raise ValueError(f"rank must be in [0, {world_size}), got {rank}.")
     if scale_dim >= world_size:
         return torch.tensor_split(scale, world_size, dim=dim)[rank]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py` around lines
606 - 616, The _split_scale function lacks validation for world_size and rank
which can cause ZeroDivisionError or incorrect grouping; add upfront argument
checks: validate that world_size is an integer > 0 and raise ValueError if not,
and validate that rank is an integer in [0, world_size-1] and raise IndexError
or ValueError if out of range; keep the existing logic (use torch.tensor_split
and proportional binning) after these checks so torch.tensor_split(scale,
world_size, dim=dim)[rank] and the group computation cannot divide by zero or
map an invalid rank to the wrong group.
tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py (1)

106-117: Nice regression test; consider covering the new invalid-dimension guard too.

This test covers the non-divisible mapping path well. Add a companion case for empty scale rows to assert the ValueError path introduced in _split_scale.

Suggested additional test
+def test_finegrained_fp8_split_scale_invalid_scale_dim():
+    scale = torch.empty(0, 4, dtype=torch.float32)
+    with pytest.raises(ValueError, match="Invalid scale dimension"):
+        FineGrainedFP8WeightShardingInfo._split_scale(
+            scale=scale, dim=0, rank=0, world_size=8
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py`
around lines 106 - 117, Add a companion unit test that asserts
FineGrainedFP8WeightShardingInfo._split_scale raises a ValueError when the
computed row slice would be empty: construct a small scale tensor with fewer
rows than the mapping requires (e.g., scale with shape (N, ...) and a larger
world_size so some ranks map to group >= N), call _split_scale for a rank that
maps outside the available rows, and use pytest.raises(ValueError) to verify the
guard path.
examples/auto_deploy/model_registry/generate_csv.py (2)

9-9: Unused constant REPO_ROOT.

REPO_ROOT is defined but never used in the script. Consider removing it to avoid confusion.

-REPO_ROOT = Path(__file__).resolve().parents[3]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/auto_deploy/model_registry/generate_csv.py` at line 9, The constant
REPO_ROOT defined as Path(__file__).resolve().parents[3] in generate_csv.py is
unused; remove the REPO_ROOT declaration (and its import usage if only used for
that) to avoid dead code and confusion—search for the symbol REPO_ROOT in this
file and delete the assignment and any now-unneeded imports (e.g., Path) or
replace with a local path usage if it was intended to be used.

21-32: Consider adding basic error handling for robustness.

The script doesn't handle cases where models.yaml is missing or malformed. While this is a development utility, adding minimal error handling would improve usability.

🛡️ Optional: Add error handling
 def main():
+    if not MODELS_YAML.exists():
+        print(f"Error: {MODELS_YAML} not found")
+        return
+
     with open(MODELS_YAML) as f:
         data = yaml.safe_load(f)

+    if not data or "models" not in data:
+        print("Error: Invalid models.yaml format")
+        return
+
     rows = []
     for entry in data.get("models", []):
-        name = entry["name"]
+        name = entry.get("name")
+        if not name:
+            continue
         hf_link = f"{HF_BASE_URL}/{name}"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/auto_deploy/model_registry/generate_csv.py` around lines 21 - 32,
Wrap file reading and YAML parsing in try/except to handle missing or malformed
MODELS_YAML and validate the structure before iterating: in main() catch
FileNotFoundError when opening MODELS_YAML and yaml.YAMLError from
yaml.safe_load, log/print a clear error and exit/return with non‑zero status;
also validate that data.get("models") is a list before looping and handle
missing keys (entry["name"]) by skipping or reporting the malformed entry; keep
references to MODELS_YAML, yaml.safe_load, build_command, HF_BASE_URL and rows
so the fixes are made in the same scope.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/auto_deploy/model_registry/generate_csv.py`:
- Around line 1-6: Add the required NVIDIA copyright header at the top of the
generate_csv.py module above the module docstring; include the current year (or
year of latest meaningful modification) and follow the project's standard NVIDIA
copyright header template so the file-level docstring and imports remain
unchanged.

In `@tensorrt_llm/_torch/auto_deploy/models/custom/mla_rope_utils.py`:
- Around line 107-115: The code currently silently casts FP8 kv_b_proj weights
to bfloat16 when the corresponding scale_key is missing (see state_dict,
scale_key, w_key, kv_b_proj, layer_idx, and the w.to(torch.bfloat16) branch);
change this to fail fast by raising a clear exception instead of continuing:
replace the silent fallback (and possibly the ad_logger.warning_once) with
raising a RuntimeError (or ValueError) that includes the missing scale_key,
w_key, and layer_idx so loading stops immediately and the checkpoint-format
problem is surfaced.
- Around line 119-130: The code currently expands FP8 scales assuming scale has
shape [N/block_n, K/block_k], but triton_fp8_quantize_1x128 emits scales with
shape [ceil(K/128), N] (K-chunk major), so the axes are swapped; fix by treating
scale as (num_k_chunks, N), transpose it to (N, num_k_chunks), then
repeat_interleave along dim=1 by the FP8 chunk size (128) to expand K, and
finally slice to [:N, :K]; update the expansion using the existing symbols
scale, w (N, K), and state_dict[w_key] so dequantization uses the correct
per-128-K-chunk scales (assume block_k = 128 as in triton_fp8_quantize_1x128).

In `@tests/integration/defs/accuracy/test_llm_api_autodeploy.py`:
- Around line 954-964: The _get_registry_yaml_extra() helper currently only
parses world_size from filenames; modify it to first open each YAML in the
provided yaml_extra list, safe-load the YAML content (e.g., yaml.safe_load) and,
if a top-level integer key "world_size" exists, use that value as world_size,
then only if not found fall back to the existing regex filename extraction for
patterns like world_size_N.yaml to preserve backward compatibility; ensure you
reference and update the logic inside the _get_registry_yaml_extra function so
tests like deepseek-ai/DeepSeek-R1 (yaml_extra: ['deepseek-r1.yaml'] with
world_size: 8) return world_size=8.

---

Outside diff comments:
In `@tensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek.py`:
- Line 1: Update the copyright header at the top of
tensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek.py by changing
the year from 2025 to 2026 so the file header reflects the modification year;
locate the file's top-of-file copyright comment and replace "2025" with "2026".

In `@tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py`:
- Line 1: This file is missing the required NVIDIA OSS copyright header: add the
standard NVIDIA copyright header as the very first lines of
tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py (before the module
docstring """Transformations to support graph sharding."""), include the correct
year of latest meaningful modification and the canonical NVIDIA wording used
across the repo, and ensure the header remains a top-of-file comment block so
tools and license scanners recognize it.

In
`@tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py`:
- Line 1: Add the required NVIDIA OSS copyright header to the top of the test
file
(tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py)
by inserting the standard NVIDIA header block as the first lines of the file; if
this is a modified file rather than new, update the copyright year range in that
header accordingly so it reflects the current year.

---

Nitpick comments:
In `@examples/auto_deploy/model_registry/generate_csv.py`:
- Line 9: The constant REPO_ROOT defined as Path(__file__).resolve().parents[3]
in generate_csv.py is unused; remove the REPO_ROOT declaration (and its import
usage if only used for that) to avoid dead code and confusion—search for the
symbol REPO_ROOT in this file and delete the assignment and any now-unneeded
imports (e.g., Path) or replace with a local path usage if it was intended to be
used.
- Around line 21-32: Wrap file reading and YAML parsing in try/except to handle
missing or malformed MODELS_YAML and validate the structure before iterating: in
main() catch FileNotFoundError when opening MODELS_YAML and yaml.YAMLError from
yaml.safe_load, log/print a clear error and exit/return with non‑zero status;
also validate that data.get("models") is a list before looping and handle
missing keys (entry["name"]) by skipping or reporting the malformed entry; keep
references to MODELS_YAML, yaml.safe_load, build_command, HF_BASE_URL and rows
so the fixes are made in the same scope.

In `@tensorrt_llm/_torch/auto_deploy/custom_ops/normalization/triton_rms_norm.py`:
- Around line 54-59: Add the same contiguity guard used in triton_rms_norm.py to
triton_fused_add_rms_norm_quant_fp8.py: inside rms_norm_quant_fp8() (around
lines where hidden_states is reshaped and strides are computed) and inside
fused_add_rms_norm_quant_fp8() (before computing strides from the reshaped input
tensors), check if the relevant input tensors (e.g., hidden_states or the
addend) are contiguous and call .contiguous() to replace them if not; this
prevents input_stride != output_stride mismatches that lead to out-of-bounds
writes in the Triton kernel.

In `@tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py`:
- Around line 606-616: The _split_scale function lacks validation for world_size
and rank which can cause ZeroDivisionError or incorrect grouping; add upfront
argument checks: validate that world_size is an integer > 0 and raise ValueError
if not, and validate that rank is an integer in [0, world_size-1] and raise
IndexError or ValueError if out of range; keep the existing logic (use
torch.tensor_split and proportional binning) after these checks so
torch.tensor_split(scale, world_size, dim=dim)[rank] and the group computation
cannot divide by zero or map an invalid rank to the wrong group.

In
`@tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py`:
- Around line 106-117: Add a companion unit test that asserts
FineGrainedFP8WeightShardingInfo._split_scale raises a ValueError when the
computed row slice would be empty: construct a small scale tensor with fewer
rows than the mapping requires (e.g., scale with shape (N, ...) and a larger
world_size so some ranks map to group >= N), call _split_scale for a rank that
maps outside the available rows, and use pytest.raises(ValueError) to verify the
guard path.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fa984ac1-742f-4595-9d94-8584a5065b8d

📥 Commits

Reviewing files that changed from the base of the PR and between b3304b8 and 4b25712.

📒 Files selected for processing (12)
  • examples/auto_deploy/build_and_run_ad.py
  • examples/auto_deploy/model_registry/configs/deepseek-r1.yaml
  • examples/auto_deploy/model_registry/generate_csv.py
  • examples/auto_deploy/model_registry/models.yaml
  • tensorrt_llm/_torch/auto_deploy/custom_ops/normalization/triton_rms_norm.py
  • tensorrt_llm/_torch/auto_deploy/models/custom/mla_rope_utils.py
  • tensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek.py
  • tensorrt_llm/_torch/auto_deploy/transform/library/sharding.py
  • tests/integration/defs/accuracy/references/gsm8k.yaml
  • tests/integration/defs/accuracy/references/mmlu.yaml
  • tests/integration/defs/accuracy/test_llm_api_autodeploy.py
  • tests/unittest/auto_deploy/multigpu/transformations/library/test_tp_sharding.py

Comment thread examples/auto_deploy/model_registry/generate_csv.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/models/custom/mla_rope_utils.py Outdated
Comment thread tensorrt_llm/_torch/auto_deploy/models/custom/mla_rope_utils.py Outdated
Comment thread tests/integration/defs/accuracy/test_llm_api_autodeploy.py
@galagam galagam force-pushed the gagam/deepseek-r1-onboard branch from 11cf343 to b55bfb1 Compare March 31, 2026 17:40
Comment thread tests/integration/defs/accuracy/references/mmlu.yaml Outdated
Copy link
Copy Markdown
Collaborator

@taylor-yb-lee taylor-yb-lee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except question about the accuracy value commits

@galagam
Copy link
Copy Markdown
Collaborator Author

galagam commented Apr 2, 2026

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-Post-Merge-1"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41490 [ run ] triggered by Bot. Commit: 2744cf2 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41490 [ run ] completed with state SUCCESS. Commit: 2744cf2
/LLM/main/L0_MergeRequest_PR pipeline #32410 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

@galagam
Copy link
Copy Markdown
Collaborator Author

galagam commented Apr 2, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41505 [ run ] triggered by Bot. Commit: 2744cf2 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41505 [ run ] completed with state SUCCESS. Commit: 2744cf2
/LLM/main/L0_MergeRequest_PR pipeline #32423 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@galagam galagam force-pushed the gagam/deepseek-r1-onboard branch from 2744cf2 to cef786a Compare April 3, 2026 09:21
@galagam
Copy link
Copy Markdown
Collaborator Author

galagam commented Apr 3, 2026

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-Post-Merge-1"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41642 [ run ] triggered by Bot. Commit: cef786a Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41642 [ run ] completed with state SUCCESS. Commit: cef786a
/LLM/main/L0_MergeRequest_PR pipeline #32549 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

@galagam
Copy link
Copy Markdown
Collaborator Author

galagam commented Apr 3, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41661 [ run ] triggered by Bot. Commit: cef786a Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41661 [ run ] completed with state SUCCESS. Commit: cef786a
/LLM/main/L0_MergeRequest_PR pipeline #32566 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@galagam
Copy link
Copy Markdown
Collaborator Author

galagam commented Apr 3, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42093 [ run ] triggered by Bot. Commit: b102c51 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42093 [ run ] completed with state FAILURE. Commit: b102c51
/LLM/main/L0_MergeRequest_PR pipeline #32932 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

galagam added 2 commits April 7, 2026 03:05
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
@galagam
Copy link
Copy Markdown
Collaborator Author

galagam commented Apr 7, 2026

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-Post-Merge-1,DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42135 [ run ] triggered by Bot. Commit: 69dd2b6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42135 [ run ] completed with state SUCCESS. Commit: 69dd2b6
/LLM/main/L0_MergeRequest_PR pipeline #32970 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@galagam
Copy link
Copy Markdown
Collaborator Author

galagam commented Apr 7, 2026

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-Post-Merge-1,DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1" --reuse-test --disbale-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42146 Bot args parsing error: usage: /bot [-h]
{run,kill,skip,submit,reviewers,reuse-pipeline,reuse-review} ...
/bot: error: unrecognized arguments: --disbale-fail-fast

Link to invocation

@galagam
Copy link
Copy Markdown
Collaborator Author

galagam commented Apr 7, 2026

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-Post-Merge-1,DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1" --reuse-test --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42149 [ run ] triggered by Bot. Commit: 69dd2b6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42149 [ run ] completed with state SUCCESS. Commit: 69dd2b6
/LLM/main/L0_MergeRequest_PR pipeline #32982 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@galagam
Copy link
Copy Markdown
Collaborator Author

galagam commented Apr 7, 2026

/bot run --stage-list "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-Post-Merge-1,DGX_B200-8_GPUs-AutoDeploy-Post-Merge-1" --reuse-test --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42178 [ run ] triggered by Bot. Commit: 69dd2b6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42178 [ run ] completed with state SUCCESS. Commit: 69dd2b6
/LLM/main/L0_MergeRequest_PR pipeline #33005 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

@galagam
Copy link
Copy Markdown
Collaborator Author

galagam commented Apr 7, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42202 [ run ] triggered by Bot. Commit: 69dd2b6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42202 [ run ] completed with state SUCCESS. Commit: 69dd2b6
/LLM/main/L0_MergeRequest_PR pipeline #33021 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@galagam
Copy link
Copy Markdown
Collaborator Author

galagam commented Apr 8, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42283 [ run ] triggered by Bot. Commit: 69dd2b6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42283 [ run ] completed with state FAILURE. Commit: 69dd2b6
/LLM/main/L0_MergeRequest_PR pipeline #33080 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@galagam
Copy link
Copy Markdown
Collaborator Author

galagam commented Apr 8, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42361 [ run ] triggered by Bot. Commit: 69dd2b6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42361 [ run ] completed with state SUCCESS. Commit: 69dd2b6
/LLM/main/L0_MergeRequest_PR pipeline #33146 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@galagam
Copy link
Copy Markdown
Collaborator Author

galagam commented Apr 9, 2026

/bot run --reuse-test

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42483 [ run ] triggered by Bot. Commit: 69dd2b6 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #42483 [ run ] completed with state SUCCESS. Commit: 69dd2b6
/LLM/main/L0_MergeRequest_PR pipeline #33233 completed with status: 'SUCCESS'

CI Report

Link to invocation

@galagam galagam merged commit 28b6afb into NVIDIA:main Apr 9, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gagam-slack-watch Label for getting slack notifications via GitHub Slack app

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants