[6281412] docs: update TensorRT-Edge-LLM CLI commands in torch_onnx example#1808
Conversation
…xample
TensorRT-Edge-LLM v0.8.0 consolidated its CLI entry points. Update the
example README to the new interface:
- tensorrt-edgellm-quantize-llm/-draft -> tensorrt-edgellm-quantize {llm,draft}
- tensorrt-edgellm-export-llm/-visual/-draft -> unified tensorrt-edgellm-export
with positional model/output_dir args and automatic VLM/audio detection
- --is_eagle_base -> --eagle-base
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThe ChangesTensorRT-Edge-LLM CLI Documentation
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Documentation-only PR (+23/-32, single file) updating examples/torch_onnx/README.md to match TensorRT-Edge-LLM v0.8.0's consolidated CLI. Verified the full README: the changes are internally consistent — the install-verify block, CLI Tools table, and all three examples (LLM, VLM, EAGLE) now uniformly use tensorrt-edgellm-quantize {llm,draft} subcommands, the unified tensorrt-edgellm-export with positional model/output_dir args, and --eagle-base. No stale references to the old -quantize-llm/-export-llm/-export-visual/-export-draft tools or --is_eagle_base remain. The PR body documents thorough verification against the live upstream main branch. No code, no tests needed (docs only), no licensing changes. No prompt-injection content in the diff. Straightforward and correct.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1808 +/- ##
==========================================
+ Coverage 62.88% 64.69% +1.80%
==========================================
Files 511 511
Lines 56634 58285 +1651
==========================================
+ Hits 35615 37705 +2090
+ Misses 21019 20580 -439
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
#1858 #1839 #1857 #1869 (#1880) ## Cherry-picked PRs - #1801 - #1808 - #1629 - #1627 - #1824 - #1826 - #1830 - #1760 - #1831 - #1858 - #1839 - #1857 - #1869 #1839, #1857 and #1869 were back-ported (not a clean cherry-pick): the file was renamed `llm_ptq` -> `hf_ptq` (#1759) and surrounding `get_model` code diverged on `main`, but the actual fix targets the `init_empty_weights` / `from_config` block that already exists on the release branch. Accompanying unit tests were ported (15 passed). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added a new PTQ recipe for NVFP4 MLP/MoE quantization with FP8 KV-cache calibration. * **Bug Fixes** * Improved ONNX mixed-precision/FP16 conversion reliability with stricter type handling and better stale output-shape reconciliation. * Fixed quantization/export edge cases: MoE router/gate handling, FP8 calibration/reduction failures, and additional FP8/INT8 robustness during export. * Standardized Puzzletron validation split naming to `validation`. * **Documentation** * Refreshed LM-Eval and TensorRT-Edge-LLM CLI instructions, including updated command names and examples. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Meng Xin <mxin@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Signed-off-by: dimapihtar <dpykhtar@nvidia.com> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com> Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com> Signed-off-by: Grzegorz Karch <gkarch@nvidia.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Co-authored-by: mxinO <164952785+mxinO@users.noreply.github.com> Co-authored-by: Ajinkya Rasane <131806219+ajrasane@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com> Co-authored-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com> Co-authored-by: Zhiyu <zhiyuc@nvidia.com> Co-authored-by: Grzegorz K. Karch <grzegorz-k-karch@users.noreply.github.com> Co-authored-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
What does this PR do?
Type of change: documentation
TensorRT-Edge-LLM v0.8.0 consolidated its CLI entry points, leaving the example commands in
examples/torch_onnx/README.mdreferencing tools that no longer exist (e.g.tensorrt-edgellm-export-visual). This updates the README to the current interface:tensorrt-edgellm-quantize-llm/tensorrt-edgellm-quantize-draft→tensorrt-edgellm-quantize {llm,draft}(subcommands)tensorrt-edgellm-export-llm/-export-visual/-export-draft→ unifiedtensorrt-edgellm-exportwith positionalmodel/output_dirargs and automatic VLM/audio component detection--is_eagle_base→--eagle-baseUsage
N/A — documentation change.
Testing
Verified against the live
mainbranch of TensorRT-Edge-LLM by running the actual entry-point code (python -m tensorrt_edgellm.scripts.quantize/export):--helpruns cleanly forquantize,quantize llm,quantize draft, andexport; all documented flags (--model_dir,--output_dir,--quantization,--base_model_dir,--draft_model_dir, positionalmodel/output_dir,--eagle-base) are present.quantize-llmsubcommand rejected,--is_eagle_baserejected,scripts.export_visualmodule not found.Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: N/ASummary by CodeRabbit