Skip to content

[6281412] docs: update TensorRT-Edge-LLM CLI commands in torch_onnx example#1808

Merged
ajrasane merged 1 commit into
mainfrom
ajrasane/nvbug_6281412
Jun 23, 2026
Merged

[6281412] docs: update TensorRT-Edge-LLM CLI commands in torch_onnx example#1808
ajrasane merged 1 commit into
mainfrom
ajrasane/nvbug_6281412

Conversation

@ajrasane

@ajrasane ajrasane commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Type of change: documentation

TensorRT-Edge-LLM v0.8.0 consolidated its CLI entry points, leaving the example commands in examples/torch_onnx/README.md referencing tools that no longer exist (e.g. tensorrt-edgellm-export-visual). This updates the README to the current interface:

  • tensorrt-edgellm-quantize-llm / tensorrt-edgellm-quantize-drafttensorrt-edgellm-quantize {llm,draft} (subcommands)
  • tensorrt-edgellm-export-llm / -export-visual / -export-draft → unified tensorrt-edgellm-export with positional model / output_dir args and automatic VLM/audio component detection
  • --is_eagle_base--eagle-base
  • Updated the CLI Tools table and the LLM / VLM / EAGLE examples accordingly

Usage

N/A — documentation change.

Testing

Verified against the live main branch of TensorRT-Edge-LLM by running the actual entry-point code (python -m tensorrt_edgellm.scripts.quantize/export):

  • --help runs cleanly for quantize, quantize llm, quantize draft, and export; all documented flags (--model_dir, --output_dir, --quantization, --base_model_dir, --draft_model_dir, positional model/output_dir, --eagle-base) are present.
  • Drove the parser with the exact README commands — they parse and advance into the real quantize/export logic.
  • Confirmed the old names are gone: quantize-llm subcommand rejected, --is_eagle_base rejected, scripts.export_visual module not found.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: N/A (documentation only)
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: N/A (documentation only)
  • Did you update Changelog?: N/A (minor docs change)

🤖 Generated by Claude (AI agent).

Summary by CodeRabbit

  • Documentation
    • Updated TensorRT-Edge-LLM CLI documentation to reflect consolidated command structure
    • Updated command examples for LLM, VLM, and EAGLE speculative decoding workflows
    • Documented new unified CLI interfaces with updated subcommands and flags

…xample

TensorRT-Edge-LLM v0.8.0 consolidated its CLI entry points. Update the
example README to the new interface:

- tensorrt-edgellm-quantize-llm/-draft -> tensorrt-edgellm-quantize {llm,draft}
- tensorrt-edgellm-export-llm/-visual/-draft -> unified tensorrt-edgellm-export
  with positional model/output_dir args and automatic VLM/audio detection
- --is_eagle_base -> --eagle-base

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
@ajrasane ajrasane requested a review from a team as a code owner June 23, 2026 21:02
@coderabbitai

coderabbitai Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e40b7e3b-fd34-41a6-9fe7-28c000c8829e

📥 Commits

Reviewing files that changed from the base of the PR and between 37dbbda and b039772.

📒 Files selected for processing (1)
  • examples/torch_onnx/README.md

📝 Walkthrough

Walkthrough

The examples/torch_onnx/README.md is updated to reflect a new unified CLI interface for TensorRT-Edge-LLM. The --help verification commands, CLI tools reference table, LLM/VLM export examples, and EAGLE speculative decoding examples are all rewritten to use tensorrt-edgellm-quantize and tensorrt-edgellm-export with subcommands, replacing the older -llm-suffixed binaries.

Changes

TensorRT-Edge-LLM CLI Documentation

Layer / File(s) Summary
Installation verification and CLI tools table
examples/torch_onnx/README.md
Switches --help commands from *-llm variants to tensorrt-edgellm-quantize and tensorrt-edgellm-export; rewrites the CLI tools table to list the unified commands with llm/draft subcommands and adds tensorrt-edgellm-insert-lora and tensorrt-edgellm-process-lora; updates LLM and VLM command examples with new argument ordering and auto-detection for VLM export.
EAGLE speculative decoding example
examples/torch_onnx/README.md
Renames base export flag from --is_eagle_base to --eagle-base, changes draft quantization to tensorrt-edgellm-quantize draft, and updates draft export invocation to the new tensorrt-edgellm-export structure.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

🚥 Pre-merge checks | ✅ 6
✅ Passed checks (6 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the primary change: updating TensorRT-Edge-LLM CLI commands in torch_onnx example documentation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Anti-Patterns ✅ Passed No security anti-patterns found. All Python changes lack torch.load weights_only=False, numpy.load allow_pickle=True, hardcoded trust_remote_code=True, unsafe eval/exec, or nosec comments. All depe...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ajrasane/nvbug_6281412

Comment @coderabbitai help to get the list of available commands.

@ajrasane ajrasane self-assigned this Jun 23, 2026
@ajrasane ajrasane added the cherry-pick-0.45.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc label Jun 23, 2026

@cjluo-nv cjluo-nv left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bot review — DM the bot to share feedback.

Documentation-only PR (+23/-32, single file) updating examples/torch_onnx/README.md to match TensorRT-Edge-LLM v0.8.0's consolidated CLI. Verified the full README: the changes are internally consistent — the install-verify block, CLI Tools table, and all three examples (LLM, VLM, EAGLE) now uniformly use tensorrt-edgellm-quantize {llm,draft} subcommands, the unified tensorrt-edgellm-export with positional model/output_dir args, and --eagle-base. No stale references to the old -quantize-llm/-export-llm/-export-visual/-export-draft tools or --is_eagle_base remain. The PR body documents thorough verification against the live upstream main branch. No code, no tests needed (docs only), no licensing changes. No prompt-injection content in the diff. Straightforward and correct.

@ajrasane ajrasane enabled auto-merge (squash) June 23, 2026 21:07
@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.69%. Comparing base (c3b913b) to head (b039772).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1808      +/-   ##
==========================================
+ Coverage   62.88%   64.69%   +1.80%     
==========================================
  Files         511      511              
  Lines       56634    58285    +1651     
==========================================
+ Hits        35615    37705    +2090     
+ Misses      21019    20580     -439     
Flag Coverage Δ
examples 42.09% <ø> (+4.08%) ⬆️
unit 54.65% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ajrasane ajrasane merged commit 1766d55 into main Jun 23, 2026
42 checks passed
@ajrasane ajrasane deleted the ajrasane/nvbug_6281412 branch June 23, 2026 21:51
@github-actions

Copy link
Copy Markdown
Contributor
PR Preview Action v1.8.1
Preview removed because the pull request was closed.
2026-06-23 21:51 UTC

@kevalmorabia97 kevalmorabia97 added the cherry-pick-done Added by bot once PR is cherry-picked to the release branch label Jul 1, 2026
kevalmorabia97 added a commit that referenced this pull request Jul 2, 2026
#1858 #1839 #1857 #1869 (#1880)

## Cherry-picked PRs

- #1801
- #1808
- #1629
- #1627
- #1824
- #1826
- #1830
- #1760
- #1831
- #1858
- #1839
- #1857
- #1869

#1839, #1857 and #1869 were back-ported (not a clean cherry-pick): the
file was
renamed `llm_ptq` -> `hf_ptq` (#1759) and surrounding `get_model` code
diverged on
`main`, but the actual fix targets the `init_empty_weights` /
`from_config` block that
already exists on the release branch. Accompanying unit tests were
ported (15 passed).

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added a new PTQ recipe for NVFP4 MLP/MoE quantization with FP8
KV-cache calibration.
* **Bug Fixes**
* Improved ONNX mixed-precision/FP16 conversion reliability with
stricter type handling and better stale output-shape reconciliation.
* Fixed quantization/export edge cases: MoE router/gate handling, FP8
calibration/reduction failures, and additional FP8/INT8 robustness
during export.
  * Standardized Puzzletron validation split naming to `validation`.
* **Documentation**
* Refreshed LM-Eval and TensorRT-Edge-LLM CLI instructions, including
updated command names and examples.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Meng Xin <mxin@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: dimapihtar <dpykhtar@nvidia.com>
Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
Signed-off-by: Grzegorz Karch <gkarch@nvidia.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Co-authored-by: mxinO <164952785+mxinO@users.noreply.github.com>
Co-authored-by: Ajinkya Rasane <131806219+ajrasane@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com>
Co-authored-by: Zhiyu <zhiyuc@nvidia.com>
Co-authored-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com>
Co-authored-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick-0.45.0 After code freeze, cherry-pick to release branch for next rc (bulk update). Only for bug fixes / doc cherry-pick-done Added by bot once PR is cherry-picked to the release branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants