[None][feat] llmc: standalone package improvements and enforce import discipline#13466
[None][feat] llmc: standalone package improvements and enforce import discipline#13466lucaslie merged 7 commits intoNVIDIA:mainfrom
Conversation
📝 WalkthroughWalkthroughRestructures the AutoDeploy source to enforce import discipline within the auto_deploy package while preparing a standalone Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
scripts/check_auto_deploy_imports.py (1)
25-25: Replace legacy typing generics with built-in types (list/tuple)Per the coding guidelines, use built-in generics instead of
typing.Listandtyping.Tuple. The file currently imports and uses these legacy types at multiple locations; switch to the modern syntax.♻️ Suggested fix
-from typing import List, Tuple +from typing import Iterable -def _file_package_parts(path: pathlib.Path) -> List[str]: +def _file_package_parts(path: pathlib.Path) -> list[str]: -def _check_file(path: pathlib.Path) -> List[Tuple[int, str]]: +def _check_file(path: pathlib.Path) -> list[tuple[int, str]]: - violations: List[Tuple[int, str]] = [] + violations: list[tuple[int, str]] = [] -def main(argv: List[str]) -> int: +def main(argv: list[str]) -> int: - failures: List[Tuple[pathlib.Path, int, str]] = [] + failures: list[tuple[pathlib.Path, int, str]] = []Also applies to: 32–33, 39, 50, 104, 111
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/check_auto_deploy_imports.py` at line 25, Replace the legacy typing generics import and annotations: remove "from typing import List, Tuple" and update all type annotations that use List[...] and Tuple[...] to use built-in generics list[...] and tuple[...]; update function signatures and variable annotations (wherever List/ Tuple are referenced, e.g., the imports line and any functions or variables around the previous uses on lines noted) and ensure no other typing-only imports are required; delete the now-unused List/Tuple import.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/auto_deploy/README.md`:
- Line 5: Remove the stale standalone import snippet in
examples/auto_deploy/README.md (the example that imports torch_export_to_gm,
TransformRegistry, and ModelFactoryRegistry around lines ~133-140); replace that
block with a short pointer to ./llmc/README.md (or simply delete the outdated
code example) so the in-tree README no longer contradicts the redirect on line 5
and standalone usage is documented only in llmc/README.md.
In `@scripts/check_auto_deploy_imports.py`:
- Around line 44-47: The current code silently swallows a SyntaxError from
ast.parse and returns an empty list; change the except SyntaxError block to
capture the exception (except SyntaxError as e) and treat it as a check failure
by returning a non-empty violation (e.g., a message or violation object that
includes str(path) and the exception text) or by re-raising a clear exception so
the hook fails; update the handler around ast.parse(source, filename=str(path))
to include the error details in the returned result (or raised error) so
malformed files do not pass the check.
In `@tensorrt_llm/_torch/auto_deploy/llm.py`:
- Around line 6-10: This file is missing the required NVIDIA copyright/license
header; add the standard NVIDIA copyright header (with the year of latest
meaningful modification) as the very first lines of
tensorrt_llm._torch.auto_deploy.llm before any imports, preserving the existing
imports (e.g., CompletionOutput, DefaultInputProcessor, _TorchLLM,
TokenizerBase/TransformersTokenizer/tokenizer_factory, SamplingParams) and file
contents unchanged otherwise.
In `@tensorrt_llm/_torch/auto_deploy/shim/interface.py`:
- Around line 13-15: This file (module
tensorrt_llm._torch.auto_deploy.shim.interface) is missing the required NVIDIA
copyright/license header at the top; add the standard NVIDIA header block to the
very top of the file and ensure the copyright year is updated for a modified
file, leaving the existing imports (MambaHybridCacheManager, KVCacheManager,
torch_dtype_to_binding) intact so signatures like MambaHybridCacheManager and
KVCacheManager remain unchanged.
---
Nitpick comments:
In `@scripts/check_auto_deploy_imports.py`:
- Line 25: Replace the legacy typing generics import and annotations: remove
"from typing import List, Tuple" and update all type annotations that use
List[...] and Tuple[...] to use built-in generics list[...] and tuple[...];
update function signatures and variable annotations (wherever List/ Tuple are
referenced, e.g., the imports line and any functions or variables around the
previous uses on lines noted) and ensure no other typing-only imports are
required; delete the now-unused List/Tuple import.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c42cc167-48df-4aef-b6a6-1eb9d2f6dba2
📒 Files selected for processing (25)
.pre-commit-config.yamlexamples/auto_deploy/README.mdexamples/auto_deploy/llmc/CONTRIBUTING.mdexamples/auto_deploy/llmc/README.mdexamples/auto_deploy/llmc/create_standalone_package.pyscripts/check_auto_deploy_imports.pytensorrt_llm/_torch/auto_deploy/custom_ops/attention/flashinfer_attention.pytensorrt_llm/_torch/auto_deploy/custom_ops/normalization/flashinfer_fused_add_rms_norm.pytensorrt_llm/_torch/auto_deploy/custom_ops/normalization/rms_norm.pytensorrt_llm/_torch/auto_deploy/llm.pytensorrt_llm/_torch/auto_deploy/llm_args.pytensorrt_llm/_torch/auto_deploy/models/custom/modeling_deepseek_ir.pytensorrt_llm/_torch/auto_deploy/models/custom/modeling_eagle.pytensorrt_llm/_torch/auto_deploy/models/custom/modeling_llama3_ir.pytensorrt_llm/_torch/auto_deploy/models/custom/modeling_minimax_m2.pytensorrt_llm/_torch/auto_deploy/models/custom/modeling_nemotron_h_ir.pytensorrt_llm/_torch/auto_deploy/models/custom/modeling_qwen3_5_moe_ir.pytensorrt_llm/_torch/auto_deploy/models/custom/modeling_qwen3_ir.pytensorrt_llm/_torch/auto_deploy/models/eagle.pytensorrt_llm/_torch/auto_deploy/shim/ad_executor.pytensorrt_llm/_torch/auto_deploy/shim/demollm.pytensorrt_llm/_torch/auto_deploy/shim/interface.pytensorrt_llm/_torch/auto_deploy/transform/library/moe_routing.pytensorrt_llm/_torch/auto_deploy/utils/quantization_utils.pytests/unittest/auto_deploy/standalone/test_standalone_package.py
|
/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_B200-4_GPUs-AutoDeploy-1" |
|
PR_Github #45518 [ run ] triggered by Bot. Commit: |
|
/bot help |
GitHub Bot Help
Provide a user friendly way for developers to interact with a Jenkins server. Run See details below for each supported subcommand. Details
Launch build/test pipelines. All previously running jobs will be killed.
kill
Kill all running builds associated with pull request. skip
Skip testing for latest commit on pull request. reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break. |
|
/bot kill |
|
PR_Github #45522 [ kill ] triggered by Bot. Commit: |
|
PR_Github #45522 [ kill ] completed with state |
|
/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_B200-4_GPUs-AutoDeploy-1" |
|
PR_Github #45527 [ run ] triggered by Bot. Commit: |
|
PR_Github #45527 [ run ] completed with state
|
|
PR_Github #46064 [ run ] completed with state
|
f51b105 to
edae8ab
Compare
|
/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_B200-4_GPUs-AutoDeploy-1" |
|
PR_Github #46210 [ run ] triggered by Bot. Commit: |
|
PR_Github #46210 [ run ] completed with state
|
|
/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_B200-4_GPUs-AutoDeploy-1" --disable-fail-fast |
|
PR_Github #46242 [ run ] triggered by Bot. Commit: |
|
PR_Github #46242 [ run ] completed with state |
…ipline
Renames the standalone AutoDeploy distribution to `llmc` (PyPI:
`nvidia-llmc`), splits the example READMEs/CONTRIBUTING for in-tree
vs standalone usage, and adds a pre-commit hook that enforces import
discipline inside `tensorrt_llm/_torch/auto_deploy/` so the source
tree can be copied verbatim into the standalone repo.
Highlights:
- New pre-commit hook `auto-deploy-import-discipline` (AST-based) plus
`scripts/check_auto_deploy_imports.py`. Rules: in-package imports
must be relative; relative imports must not escape the package
(use absolute `tensorrt_llm.X` for that).
- Source fixes to make the hook pass: 7 absolute self-imports
converted to relative (modeling_*_ir.py, moe_routing.py); 9
escaping relative imports flipped to absolute `from tensorrt_llm.X`
(llm.py, llm_args.py, shim/{ad_executor,demollm,interface}.py,
models/eagle.py, models/custom/modeling_eagle.py, several custom
ops, utils/quantization_utils.py). `llm_args.py` now resolves the
config dir via `files(_ad_config_pkg)` instead of a hardcoded
string so it works in both `tensorrt_llm._torch.auto_deploy` and
`llmc` flavors.
- `create_standalone_package.py` moved to
`examples/auto_deploy/llmc/`. Output package is now `llmc/` with
distribution name `nvidia-llmc`. Source-side import rewriting is
dropped (the lint hook guarantees no rewriting is needed); test
files keep their absolute imports and are still rewritten on copy.
- `examples/auto_deploy/README.md` reverted to TRT-LLM-only and
links to `llmc/`. New `examples/auto_deploy/llmc/README.md` adds
install instructions and a comprehensive ModelFactory +
InferenceOptimizer + CachedSequenceInterface example. New
`examples/auto_deploy/llmc/CONTRIBUTING.md` documents that the
standalone repo is read-only and PRs must land on TensorRT-LLM.
- Standalone test suite (`tests/unittest/auto_deploy/standalone/`)
updated for the new package name; full suite passes locally
(12/12, ~4m30s, including the nested run of the standalone
package's own unit tests).
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
- README/CONTRIBUTING: use long names ("LLM Compiler" / "llm-compiler")
in titles; drop PyPI/wheels references; simplify install to clone +
`uv pip install -e ".[dev]"` plus a `pip install git+…` one-liner;
drop the internal "Regenerating the standalone repo" section.
- examples/auto_deploy/README.md: remove a stale standalone code
snippet that still referenced `auto_deploy.X` imports — the in-tree
README is now TRT-LLM-only and points to llmc/README.md.
- scripts/check_auto_deploy_imports.py: modernize typing
(`List`/`Tuple` → `list`/`tuple`); treat `SyntaxError` from
ast.parse as a violation instead of silently passing.
- tensorrt_llm/_torch/auto_deploy/{llm,shim/interface}.py: add the
missing NVIDIA copyright/license header.
Standalone test suite re-verified locally (12/12 pass, ~4m35s).
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Plug in the real standalone-repo URL where the previous round had placeholders, per review feedback: - examples/auto_deploy/README.md: top-level redirect now points to github.com/NVIDIA/llm-compiler instead of the local llmc/README.md. - examples/auto_deploy/llmc/README.md: drop the "we don't publish wheels yet" wording; install instructions now show both the https and ssh variants of `pip install git+…` and `git clone`. Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Make TRT-LLM the source of truth for OSS compliance metadata in the standalone llmc package. The standalone-package generator now copies the following from this repo on every regen: - CODE_OF_CONDUCT.md, SECURITY.md, ATTRIBUTIONS-Python.md (from repo root) - .editorconfig (from repo root) - .github/ tree (issue/PR templates) sourced from examples/auto_deploy/llmc/.github_for_llmc/ — stored under that non-".github" name so it does not interfere with TRT-LLM's own .github/ The issue template config disables blank issues and redirects bug reports, feature requests, discussions, and security reports to NVIDIA/TensorRT-LLM (or PSIRT) since the standalone repo is regenerated and read-only. The PR template likewise points contributors back to NVIDIA/TensorRT-LLM. All copied paths are added to _MANAGED_PATHS so the regen stays idempotent. Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Test files that needed KvCacheConfig or ActivationType were importing them from their canonical TRT-LLM paths and the standalone packaging script translated those imports to llmc._compat on copy. Source the symbols from tensorrt_llm._torch.auto_deploy._compat directly so the generic tensorrt_llm._torch.auto_deploy -> llmc rewrite handles them, and drop the now-dead special cases from create_standalone_package.py. Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Reset SPDX years on shim/interface.py and llm.py from 2022-2026 to 2025-2026 to reflect the actual content authoring date per reviewer feedback. Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
edae8ab to
8fcc5b9
Compare
|
/bot run --stage-list "A10-Build_Docs, A10-PackageSanityCheck-PY310-UB2204, A100X-PackageSanityCheck-PY312-UB2404, A30-AutoDeploy-1, H100_PCIe-AutoDeploy-1, DGX_B200-AutoDeploy-1, A100X-PyTorch-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_B200-4_GPUs-AutoDeploy-1" --disable-fail-fast |
|
PR_Github #46434 [ run ] triggered by Bot. Commit: |
|
/bot skip --comment "AD tests are passing in CI and locally |
|
PR_Github #46464 Bot args parsing error: Traceback (most recent call last): During handling of the above exception, another exception occurred: Traceback (most recent call last): |
|
/bot skip --comment "AD tests are passing in CI and locally" |
|
PR_Github #46476 [ skip ] triggered by Bot. Commit: |
|
PR_Github #46476 [ skip ] completed with state |
Summary
llmc(PyPI:nvidia-llmc); the AutoDeploy source tree itself stays put. The renaming happens at standalone-package generation time only.auto-deploy-import-discipline) that enforces relative-only imports insidetensorrt_llm/_torch/auto_deploy/, so the source tree can be copied verbatim into the standalone repo without rewriting in-package imports.llmc) usage; relocatescreate_standalone_package.pytoexamples/auto_deploy/llmc/.CODE_OF_CONDUCT.md,SECURITY.md,ATTRIBUTIONS-Python.md,.editorconfig, and a.github/tree (issue/PR templates) are now copied into the standalone package on every regen.Notable changes
scripts/check_auto_deploy_imports.py(AST-based). Two rules: (A) imports resolving insidetensorrt_llm._torch.auto_deploymust be relative; (B) relative imports must not escape the package — use absolutetensorrt_llm.Xfor that.models/custom/modeling_*_ir.pyandtransform/library/moe_routing.pyconverted to relative; 9 escaping relative imports inllm.py,llm_args.py,shim/{ad_executor,demollm,interface}.py,models/eagle.py,models/custom/modeling_eagle.py, several custom-ops files, andutils/quantization_utils.pyflipped to absolutefrom tensorrt_llm.X.llm_args.pyresolves the bundled config dir viafiles(_ad_config_pkg)instead of a hardcoded"tensorrt_llm._torch.auto_deploy.config"string so it works under both flavors.examples/auto_deploy/llmc/create_standalone_package.py. Output package is nowllmc/with distribution namenvidia-llmc(Python import:import llmc). Source-side import rewriting is dropped (no longer needed); test files keep absolute imports and are still rewritten on copy.CODE_OF_CONDUCT.md,SECURITY.md,ATTRIBUTIONS-Python.md, and.editorconfigfrom the TRT-LLM repo root, plus a.github/tree (issue/PR templates) fromexamples/auto_deploy/llmc/.github_for_llmc/(stored under that non-.githubname to avoid colliding with TRT-LLM's own.github/). The issue-templateconfig.ymldisables blank issues and redirects bug reports / feature requests / discussions / security reports back to NVIDIA/TensorRT-LLM (or PSIRT). The PR template likewise points contributors atNVIDIA/TensorRT-LLM. All copied paths are added to_MANAGED_PATHSso the regen stays idempotent.examples/auto_deploy/README.mdreverted to TRT-LLM-only and links tollmc/. Newexamples/auto_deploy/llmc/README.mdadds install instructions and a comprehensiveModelFactory+InferenceOptimizer+CachedSequenceInterfaceexample showing how to build a custom inference pipeline.examples/auto_deploy/llmc/CONTRIBUTING.mddocuments that the standalone repo is read-only and PRs must land on TensorRT-LLM (you can fork llmc to experiment, but the upstream source of truth is here).The torch op namespace stays as
torch.ops.auto_deploy(no rename). No public-API changes.Test plan
scripts/check_auto_deploy_imports.pypasses on the entiretensorrt_llm/_torch/auto_deploy/tree.pre-commit run --all-filespasses for the newauto-deploy-import-disciplinehook.pre-commit runon every changed file passes (ruff, ruff-format, mdformat, codespell, the new hook, etc.).python -c "import tensorrt_llm._torch.auto_deploy; from tensorrt_llm._torch.auto_deploy.llm_args import LlmArgs; from tensorrt_llm._torch.auto_deploy.shim.demollm import DemoEngine; from tensorrt_llm._torch.auto_deploy.shim.ad_executor import ADEngine".pytest tests/unittest/auto_deploy/standalone/— 12/12 passed (~4m30s, re-run after OSS compliance changes). This includes the nested run of the standalone package's own unit-test suite (test_run_unit_testsinstallsnvidia-llmcinto an isolated venv and runs the copied tests).python examples/auto_deploy/llmc/create_standalone_package.py --output-dir /tmp/<...>writes the new files (CODE_OF_CONDUCT.md,SECURITY.md,ATTRIBUTIONS-Python.md,.editorconfig,.github/ISSUE_TEMPLATE/config.yml,.github/PULL_REQUEST_TEMPLATE.md) byte-identical to their TRT-LLM sources./bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1".Summary by CodeRabbit
Documentation
llmcpackage, including installation, usage examples, and contributing guidelines.Developer Tools
Updates
llmcwith updated generation and packaging configuration.