Skip to content

[#9306][cleanup] Remove some fields with redefined defaults#11671

Merged
2ez4bz merged 4 commits intoNVIDIA:mainfrom
2ez4bz:dev-ad-llm-args
Apr 6, 2026
Merged

[#9306][cleanup] Remove some fields with redefined defaults#11671
2ez4bz merged 4 commits intoNVIDIA:mainfrom
2ez4bz:dev-ad-llm-args

Conversation

@2ez4bz
Copy link
Copy Markdown
Collaborator

@2ez4bz 2ez4bz commented Feb 24, 2026

Summary by CodeRabbit

Release Notes

  • Breaking Changes

    • Removed model_kwargs, sampler_type, max_beam_width, and attn_backend configuration fields from LlmArgs. Update your deployment configurations to remove references to these deprecated fields.
  • Improvements

    • Streamlined sampler type selection with automatic TorchSampler resolution when auto-mode is specified, plus improved error reporting.

Description

  • Why?

We would like to be able to use a TorchLlmArgs config in AutoDeploy's own version with minimal changes.

  • What?

This commit removes the redefinition of:

  • model_kwargs: existing usages guarded against None the same way as an empty dict.
  • max_beam_width: instead adds a validator for it.
  • att_backend: although the default between the base class ("TRTLLM") and autodeploy ("flashinfer") differ, the update_transforms_with_shortcuts validator in practice reads the default from default.yaml, which is "flashinfer".
  • sampler: the executor code already supported both. We just tweak it so that the "auto" value corresponds to the now removed default.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@2ez4bz 2ez4bz requested a review from a team as a code owner February 24, 2026 07:18
@2ez4bz 2ez4bz requested a review from MrGeva February 24, 2026 07:18
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 24, 2026

📝 Walkthrough

Walkthrough

The pull request removes deprecated configuration fields from the LlmArgs public API (model_kwargs, sampler_type, max_beam_width, attn_backend) and adds a validation hook to enforce beam-search constraints. It also updates the sampler selection logic in ad_executor.py to handle auto-resolution of sampler_type to TorchSampler.

Changes

Cohort / File(s) Summary
Configuration Cleanup
tensorrt_llm/_torch/auto_deploy/llm_args.py
Removed four public fields (model_kwargs, sampler_type, max_beam_width, attn_backend), added validator method ensure_no_beam_search to enforce max_beam_width constraint, consolidated imports, and added inline comments about configuration improvements.
Sampler Selection Logic
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py
Introduced local sampler_type variable with auto-resolution logic that defaults to TorchSampler when auto is requested, updating control flow and error reporting to use normalized sampler type.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: removing field redefinitions with default values from the LlmArgs configuration.
Description check ✅ Passed The description provides clear motivation (why), concrete details (what fields are removed), and reasoning for each removal. However, the 'Test Coverage' section is left blank, and specific test cases are not identified.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py (1)

1-1: ⚠️ Potential issue | 🟠 Major

Update the copyright header year to 2026.
This file was modified in 2026 but still carries a 2025 header.

🗓️ Suggested change
-# Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

As per coding guidelines: “Add NVIDIA copyright header to ALL new files; update year on modified files.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py` at line 1, Update the
copyright header year in the file
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py from 2025 to 2026 by editing
the top-of-file copyright comment (the existing "# Copyright (c) 2025, NVIDIA
CORPORATION & AFFILIATES. All rights reserved." line) so it reads 2026.
tensorrt_llm/_torch/auto_deploy/llm_args.py (2)

1-3: ⚠️ Potential issue | 🟠 Major

Add the NVIDIA Apache 2.0 header (2026).
This file is missing the required NVIDIA copyright/license header for modified source files.

🧾 Suggested header
+ # Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ # Licensed under the Apache License, Version 2.0 (the "License");
+ # you may not use this file except in compliance with the License.
+ # You may obtain a copy of the License at
+ #     http://www.apache.org/licenses/LICENSE-2.0
+ # Unless required by applicable law or agreed to in writing, software
+ # distributed under the License is distributed on an "AS IS" BASIS,
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ # See the License for the specific language governing permissions and
+ # limitations under the License.
+
 from importlib.resources import files

As per coding guidelines: “All TensorRT-LLM source files should contain an NVIDIA copyright header with the year of the latest meaningful modification. The header should be an Apache 2.0 license block as specified.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/auto_deploy/llm_args.py` around lines 1 - 3, Add the
NVIDIA Apache 2.0 copyright/license header (with year 2026) to the top of the
module tensorrt_llm._torch.auto_deploy.llm_args (i.e., before the existing
imports like from importlib.resources import files and from pathlib import
Path); insert the full Apache 2.0 header block used across the codebase
including the NVIDIA copyright line and license text so the file conforms to the
project coding guidelines.

11-143: ⚠️ Potential issue | 🟡 Minor

Use a module-level import to preserve namespace.
The current import violates coding guidelines which require importing the module rather than individual classes.

🔧 Suggested import/style fix
-from ...llmapi.llm_args import BuildConfig, EagleDecodingConfig, TorchLlmArgs, _ParallelConfig
+from ...llmapi import llm_args

Then update all usages in the file:

  • class LlmArgs(DynamicYamlMixInForSettings, llm_args.TorchLlmArgs, BaseSettings):
  • build_config: Optional[llm_args.BuildConfig] = Field(
  • isinstance(self.speculative_config, llm_args.EagleDecodingConfig)
  • self._parallel_config = llm_args._ParallelConfig(

Per coding guidelines: "When importing in Python, always maintain the namespace. Import the module, not individual classes or functions."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/_torch/auto_deploy/llm_args.py` around lines 11 - 143, The file
imports specific names from ...llmapi.llm_args which breaks the namespace
guideline; replace that import with a module-level import (e.g. import
...llmapi.llm_args as llm_args) and update all references: change base class
TorchLlmArgs to llm_args.TorchLlmArgs in the LlmArgs declaration, change
type/field annotations like BuildConfig and EagleDecodingConfig to
llm_args.BuildConfig and llm_args.EagleDecodingConfig (e.g. build_config and
speculative_config checks), and change the _ParallelConfig usage to
llm_args._ParallelConfig when assigning self._parallel_config; ensure all
occurrences (including isinstance checks and type annotations) use the
llm_args.<Name> namespace.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/_torch/auto_deploy/llm_args.py`:
- Line 160: Remove the inline class-body comment about reusing LoadFormat.DUMMY
in the public config class and instead document it in the class or field
docstring for the public API; update the docstring of the config class (or the
specific field docstring for load_format in llm_args.py) to mention that
LoadFormat.DUMMY can be reused and any rationale, and delete the commented line
(referencing LoadFormat.DUMMY and the load_format field) from the class body so
no user-facing comments remain inline.
- Around line 88-93: Replace the custom validator ensure_no_beam_search for the
max_beam_width field with a Pydantic Field constraint: remove the
`@field_validator` method ensure_no_beam_search and instead redefine the
max_beam_width field in the model (or its child class) using Field(le=1) so the
numeric upper-bound is enforced declaratively; ensure you stop using the
default_factory pattern for this field and place the Field(...) definition
directly on max_beam_width in the class where the field is declared.

---

Outside diff comments:
In `@tensorrt_llm/_torch/auto_deploy/llm_args.py`:
- Around line 1-3: Add the NVIDIA Apache 2.0 copyright/license header (with year
2026) to the top of the module tensorrt_llm._torch.auto_deploy.llm_args (i.e.,
before the existing imports like from importlib.resources import files and from
pathlib import Path); insert the full Apache 2.0 header block used across the
codebase including the NVIDIA copyright line and license text so the file
conforms to the project coding guidelines.
- Around line 11-143: The file imports specific names from ...llmapi.llm_args
which breaks the namespace guideline; replace that import with a module-level
import (e.g. import ...llmapi.llm_args as llm_args) and update all references:
change base class TorchLlmArgs to llm_args.TorchLlmArgs in the LlmArgs
declaration, change type/field annotations like BuildConfig and
EagleDecodingConfig to llm_args.BuildConfig and llm_args.EagleDecodingConfig
(e.g. build_config and speculative_config checks), and change the
_ParallelConfig usage to llm_args._ParallelConfig when assigning
self._parallel_config; ensure all occurrences (including isinstance checks and
type annotations) use the llm_args.<Name> namespace.

In `@tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py`:
- Line 1: Update the copyright header year in the file
tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py from 2025 to 2026 by editing
the top-of-file copyright comment (the existing "# Copyright (c) 2025, NVIDIA
CORPORATION & AFFILIATES. All rights reserved." line) so it reads 2026.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5953b9 and 1c68a84.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/auto_deploy/llm_args.py
  • tensorrt_llm/_torch/auto_deploy/shim/ad_executor.py

@2ez4bz 2ez4bz changed the title [#9306][cleanup] Remove fields with redefined defaults [#9306][cleanup] Remove some fields with redefined defaults Feb 24, 2026
@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Feb 24, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #36683 [ run ] triggered by Bot. Commit: 1c68a84 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #36683 [ run ] completed with state SUCCESS. Commit: 1c68a84
/LLM/main/L0_MergeRequest_PR pipeline #28403 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Feb 25, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #36762 [ run ] triggered by Bot. Commit: 1c68a84 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #36762 [ run ] completed with state SUCCESS. Commit: 1c68a84
/LLM/main/L0_MergeRequest_PR pipeline #28471 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@2ez4bz 2ez4bz requested review from a team as code owners March 14, 2026 06:10
@2ez4bz 2ez4bz requested review from yiqingy0 and zeroepoch March 14, 2026 06:10
@2ez4bz 2ez4bz force-pushed the dev-ad-llm-args branch 3 times, most recently from 67fc2b7 to fa2e3ab Compare March 17, 2026 04:35
@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Mar 17, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39180 [ run ] triggered by Bot. Commit: fa2e3ab Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39180 [ run ] completed with state FAILURE. Commit: fa2e3ab
/LLM/main/L0_MergeRequest_PR pipeline #30435 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Mar 17, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39288 [ run ] triggered by Bot. Commit: 4dfeeb2 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #39288 [ run ] completed with state SUCCESS. Commit: 4dfeeb2
/LLM/main/L0_MergeRequest_PR pipeline #30536 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 3, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41608 [ run ] triggered by Bot. Commit: e45c88f Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41608 [ run ] completed with state SUCCESS. Commit: e45c88f
/LLM/main/L0_MergeRequest_PR pipeline #32517 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 3, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41715 [ run ] triggered by Bot. Commit: e45c88f Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41715 [ run ] completed with state SUCCESS. Commit: e45c88f
/LLM/main/L0_MergeRequest_PR pipeline #32617 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
@2ez4bz 2ez4bz requested a review from a team as a code owner April 3, 2026 20:01
@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 3, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41734 [ run ] triggered by Bot. Commit: a7b37a2 Link to invocation

Copy link
Copy Markdown
Collaborator

@yuanjingx87 yuanjingx87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a7b37a2 looks good to me, approved

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41734 [ run ] completed with state SUCCESS. Commit: a7b37a2
/LLM/main/L0_MergeRequest_PR pipeline #32635 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 3, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41761 [ run ] triggered by Bot. Commit: a7b37a2 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41761 [ run ] completed with state SUCCESS. Commit: a7b37a2
/LLM/main/L0_MergeRequest_PR pipeline #32659 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 4, 2026

/bot run

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41798 [ run ] triggered by Bot. Commit: a7b37a2 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41798 [ run ] completed with state SUCCESS. Commit: a7b37a2
/LLM/main/L0_MergeRequest_PR pipeline #32692 completed with status: 'SUCCESS'

CI Report

Link to invocation

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 6, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41899 [ run ] triggered by Bot. Commit: 572c71d Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41899 [ run ] completed with state SUCCESS. Commit: 572c71d
/LLM/main/L0_MergeRequest_PR pipeline #32759 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@2ez4bz
Copy link
Copy Markdown
Collaborator Author

2ez4bz commented Apr 6, 2026

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41945 [ run ] triggered by Bot. Commit: 572c71d Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #41945 [ run ] completed with state SUCCESS. Commit: 572c71d
/LLM/main/L0_MergeRequest_PR pipeline #32801 completed with status: 'SUCCESS'

CI Report

Link to invocation

@2ez4bz 2ez4bz merged commit 2b80f8d into NVIDIA:main Apr 6, 2026
5 checks passed
xinhe-nv pushed a commit to xinhe-nv/TensorRT-LLM that referenced this pull request Apr 7, 2026
…IDIA#11671)

* Why?

We would like to be able to use a TorchLlmArgs config in
AutoDeploy's own version with minimal changes.

* What?

This commit removes the redefinition of:
- `model_kwargs`: existing usages guarded against `None` the same way
  as an empty dict.
- `max_batch_size: most unit tests set it explicitly; a few configs were
  updated to have the old default.
- `max_beam_width`: instead adds a validator for it.
- `att_backend`: although the default between the base class ("TRTLLM")
  and autodeploy ("flashinfer") differ, the
  `update_transforms_with_shortcuts` validator in practice reads the
  default from `default.yaml`, which is "flashinfer".
- `sampler`: the executor code already supported both. We just tweak it
  so that the "auto" value corresponds to the now removed default.

It also removes the `cuda_graph_batch_sizes` in favor of
`cuda_graph_config.batch_sizes`, with necessary adjustments to unit
tests and existing configs.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
yufeiwu-nv pushed a commit to yufeiwu-nv/TensorRT-LLM that referenced this pull request Apr 7, 2026
…IDIA#11671)

* Why?

We would like to be able to use a TorchLlmArgs config in
AutoDeploy's own version with minimal changes.

* What?

This commit removes the redefinition of:
- `model_kwargs`: existing usages guarded against `None` the same way
  as an empty dict.
- `max_batch_size: most unit tests set it explicitly; a few configs were
  updated to have the old default.
- `max_beam_width`: instead adds a validator for it.
- `att_backend`: although the default between the base class ("TRTLLM")
  and autodeploy ("flashinfer") differ, the
  `update_transforms_with_shortcuts` validator in practice reads the
  default from `default.yaml`, which is "flashinfer".
- `sampler`: the executor code already supported both. We just tweak it
  so that the "auto" value corresponds to the now removed default.

It also removes the `cuda_graph_batch_sizes` in favor of
`cuda_graph_config.batch_sizes`, with necessary adjustments to unit
tests and existing configs.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
karen-sy pushed a commit to karen-sy/TensorRT-LLM that referenced this pull request Apr 7, 2026
…IDIA#11671)

* Why?

We would like to be able to use a TorchLlmArgs config in
AutoDeploy's own version with minimal changes.

* What?

This commit removes the redefinition of:
- `model_kwargs`: existing usages guarded against `None` the same way
  as an empty dict.
- `max_batch_size: most unit tests set it explicitly; a few configs were
  updated to have the old default.
- `max_beam_width`: instead adds a validator for it.
- `att_backend`: although the default between the base class ("TRTLLM")
  and autodeploy ("flashinfer") differ, the
  `update_transforms_with_shortcuts` validator in practice reads the
  default from `default.yaml`, which is "flashinfer".
- `sampler`: the executor code already supported both. We just tweak it
  so that the "auto" value corresponds to the now removed default.

It also removes the `cuda_graph_batch_sizes` in favor of
`cuda_graph_config.batch_sizes`, with necessary adjustments to unit
tests and existing configs.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants