[Cherry-pick] PRs #1393 #1389 #1268 #1397 #1402 #1411 #1410 #1419 #1408 #1416#1426
[Cherry-pick] PRs #1393 #1389 #1268 #1397 #1402 #1411 #1410 #1419 #1408 #1416#1426kevalmorabia97 merged 11 commits intorelease/0.44.0from
Conversation
…compat (#1393) Fix for NVBug 6120631 to fix ``` finetune.py: error: argument --warmup_steps/--warmup-steps: invalid int value: '0.0' ``` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Corrected parameter format in finetuning example script for consistency. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
When the dataset has a `category` column with >1 distinct categories and ``--num_requests N`` is below the dataset size, take ceil(N / num_categories) rows from each category and round-robin interleave them so any prefix is balanced. Falls back to the existing ``range(N)`` slice when category metadata is absent or there's only one category. Fixes a sampling bug where SPEED-Bench parquet files are sorted by category, so e.g. ``--num_requests 64`` on throughput_8k pulls 64 high_entropy prompts and zero from low_entropy / mixed. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * SPEEDBench dataset truncation now performs deterministic, balanced stratified sampling across categories (round-robin) when multiple categories exist and a smaller sample size is requested. * Falls back to simple prefix selection when no category or only one category is present, preserving prior behavior in that case. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…1268) ### What does this PR do? Type of change: code improvement Subprocess is needed to run `trtexec` commands but those require `nosec` approval. This PR centralizes it into a single function for the ONNX workflow to avoid future approval triggers. The torch workflow has that centralized in `run_command` (renamed to `_run_trtexec_streamed`). Original request: #1259 (comment) ### Usage ```python results = _run_trtexec(cmd) ``` ### Testing N/A ### Before your PR is "*Ready for review*" - Is this change backward compatible?: ✅ - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: N/A <!--- Mandatory --> - Did you write any new necessary tests?: N/A <!--- Mandatory for new features or examples. --> - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: N/A <!--- Only for new features, API changes, critical bug fixes or backward incompatible changes. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactor** * Standardized how the app invokes and logs the external conversion/benchmarking tool across build, profile, and benchmark flows for consistent execution and clearer logs. * **Behavior Change** * The tool is now invoked by name from PATH (configurable path option removed); execution and logging were centralized. * **Bug Fixes** * Improved missing-tool error messaging: failures now instruct to ensure the external tool is available in PATH. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: gcunhase <4861122+gcunhase@users.noreply.github.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## Summary - Saturates `per_block_scale * 448 / per_block_scale_max` to ≤ 448 before the `to(torch.float8_e4m3fn)` cast in `NVFP4QTensor.get_weights_scaling_factor_from_quantizer`. - Adds a regression test that reproduces the NaN byte without the clamp. ## Why When `_amax` contains a zero entry (e.g. an all-zero weight block left untouched by max calibration), the existing `per_block_scale[per_block_scale == 0] = 1.0` safety net drives the pre-cast value to `1.0 * 448 / (global_amax / 6)`. `fp8_e4m3fn` has no Inf — anything `≥ 480` rounds to NaN — so a 0x7F byte slips into the exported `weight_scale`. This was observed in a saved Kimi-K2.6-NVFP4-MSE checkpoint at `language_model.model.layers.1.mlp.experts.21.down_proj.weight_scale[4001, 18]`. The MSE FP8 sweep itself never produces zero per-block amax (it always emits at least `c[0] * global_amax`), but any export path where `_amax` ends up zero — including pure max calibration — hits the bug. With the clamp the byte saturates to `0x7E` (= 448, fp8 max finite) and dequantization is unaffected: the FP4 nibbles for an all-zero block are all 0, so `0 × 448 × weight_scale_2 = 0` regardless of the stored fp8 scale. For non-degenerate blocks the clamp is a no-op since `per_block_amax ≤ global_amax` already bounds the pre-cast value at 448. ## Test plan - [x] New regression test `test_export_fp8_scale_no_nan_for_zero_amax_block` fails on `main`'s export code (reproduces the 0x7F NaN byte) and passes with the clamp. - [x] Existing tests in `tests/gpu/torch/quantization/test_nvfp4_static_quantizer_cuda.py` still pass (10/10). 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved numerical stability in FP8 quantization scaling by preventing overflow and NaN conditions * Enhanced handling of edge cases in quantization processing for zero-weight blocks <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…tom configs (#1402) The export step's collect_shared_input_modules() runs a dummy forward pass to trace shared input modules for resmoothing/requantization. This forward call hits the same AttributeError as PR #1324 on configs that don't assign use_cache (e.g., stepfun-ai/Step-3.5-Flash's Step3p5Config): AttributeError: 'Step3p5Config' object has no attribute 'use_cache' Wrap the dummy forward with the _disable_use_cache context manager so it sets/restores config.use_cache around the call, mirroring the calibration fix from #1324. Covers all callsites since both unified_export_hf and plugins/vllm_fakequant_hf funnel through collect_shared_input_modules. ### What does this PR do? Type of change: ? <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. --> <!-- Details about the change. --> ### Usage ```python # Add a code snippet demonstrating how to use this ``` ### Testing <!-- Mention how have you tested your change if applicable. --> ### Before your PR is "*Ready for review*" Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed (`git commit -s -S`). Make sure you read and follow the [Security Best Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors) (e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(..., weights_only=False)`, `pickle`, etc.). - Is this change backward compatible?: ✅ / ❌ / N/A <!--- If ❌, explain why. --> - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ / ❌ / N/A <!--- Mandatory --> - Did you write any new necessary tests?: ✅ / ❌ / N/A <!--- Mandatory for new features or examples. --> - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: ✅ / ❌ / N/A <!--- Only for new features, API changes, critical bug fixes or backward incompatible changes. --> ### Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved model export process by optimizing cache handling during profiling operations. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…le_2 consistent (#1411) ## Summary - Pre-fill the source `gate_up_proj_weight_quantizers[idx]._amax` from the **fused** `gate_up[idx]` tensor when it is uncalibrated, **before** the per-projection deepcopy in `_export_fused_experts`. Both gate's clone and up's clone then inherit the same scalar amax, so `weight_scale_2 = amax / (6 · 448)` matches across the W1/W3 fusion that vLLM expects at load time. - Add a unit regression test that builds a fused-experts module with intentionally mismatched gate vs. up weight magnitudes, leaves every expert uncalibrated, and asserts the gate and up wrappers carry the same amax into the FP4 quantization step. Fails on `main`, passes with this fix. ## Why NVbug 6142360: Qwen3.5-MoE / Qwen3-Next NVFP4 checkpoints produced by ModelOpt yielded garbled output under vLLM (`1\n1\n1\n…` style degenerate token loops). The vLLM log showed `w1_weight_scale_2 must match w3_weight_scale_2. Accuracy may be affected.`. Root cause: in `_export_fused_experts`, when an expert receives no calibration tokens (common for low-frequency experts even with `--calib_size 512` on 128-expert MoEs), the per-projection fallback computed `amax` independently from each split slice: ```python w_quantizer.amax = weight_slice.abs().amax().to(torch.float32) ``` `weight_slice` is the gate-only or up-only 2-D half of the fused `gate_up_proj`. Since gate and up have different magnitudes, gate and up end up with different scalar amax values — and therefore different `weight_scale_2`. vLLM fuses W1 and W3 into a single weight at load time and asserts a single shared scale; the half whose scale was discarded is now off by the gate/up magnitude ratio (~10× was typical), which catastrophically corrupts the MoE output. The fix derives the fallback amax once from the whole `gate_up[idx]` tensor before the deepcopies, so gate and up share the same amax — exactly what calibration would have produced if any token had hit the expert. Calibrated experts are unaffected (the new code path is gated on `_amax` being missing or zero). `down_proj` keeps its existing per-projection fallback because it has its own quantizer with no fusion partner. ## End-to-end vLLM verification Used `vllm/vllm-openai:nightly` Docker on RTX 6000 Ada, Marlin NVFP4 backend, with a real Qwen3-30B-A3B-Instruct NVFP4 checkpoint. The bug requires the export-time scale skew, so I A/B-tested by mutating `gate_proj.weight_scale_2` directly: | Variant | gate vs up `weight_scale_2` | Output for "Write an article about AI." | |---|---|---| | Baseline (original checkpoint) | matched (calibrated) | coherent: "The Rise of Artificial Intelligence: Transforming the World One Algorithm at a Time…" | | Bug-simulated (gate = up / 10 for all 6143 expert pairs) | mismatched ~10× | `__':\n__':\n__':\n…` (degenerate loop, same shape as user's `1\n1\n1\n…`) | | Fix-simulated (gate restored to up) | matched | coherent: same article as baseline | The 10× skew matches what the bug actually emits for uncalibrated experts (e.g. `gate=2.77e-5, up=2.30e-4` from a synthetic repro), and reproduces the same garbled-loop pattern the user reported. ## Test plan - [x] New unit test `tests/unit/torch/quantization/plugins/test_fused_experts.py::TestExportFusedExperts::test_uncalibrated_expert_gate_up_share_amax` passes. - [x] Full `tests/unit/torch/quantization/plugins/test_fused_experts.py`: 28/28 pass. - [x] Broader `tests/unit/torch/export/` + `tests/unit/torch/quantization/`: 609 passed, 9 skipped. - [x] vLLM end-to-end A/B (table above): bug reproduces with mismatched scales, fix produces coherent output identical to baseline. ### Before your PR is "*Ready for review*" Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed (`git commit -s -S`). Make sure you read and follow the [Security Best Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors) (e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(..., weights_only=False)`, `pickle`, etc.). - Is this change backward compatible?: ✅ / ❌ / N/A <!--- If ❌, explain why. --> - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ / ❌ / N/A <!--- Mandatory --> - Did you write any new necessary tests?: ✅ / ❌ / N/A <!--- Mandatory for new features or examples. --> - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: ✅ / ❌ / N/A <!--- Only for new features, API changes, critical bug fixes or backward incompatible changes. --> - Did you get Claude approval on this PR?: ✅ / ❌ / N/A <!--- Run `/claude review`. NVIDIA org members can self-trigger for complex changes; orthogonal to CodeRabbit. --> ### Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Enhanced handling of uncalibrated mixture-of-experts weight quantizers during export with improved consistency checks and informative warnings. * **Tests** * Added regression test for uncalibrated expert quantization behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: weimingc <weimingc@nvidia.com> Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…zer disabled (#1410) ## Summary `awq_lite.setup()` disables `module.input_quantizer` at the start of search. The calibrated branch re-enables it inside `postprocess()`, but the uncalibrated branch (no cache-pass tokens, e.g. an MoE expert that never gets routed) never did. Worse, for experts that had cache hits but missed the search pass, the per-channel `_amax` left over from `max_calibrate` during cache mode tripped `preprocess_linear_fusion`'s `numel == 1` assertion and prevented `_export_quantized_weight` from emitting a per-tensor `input_scale`. Result: per-expert `input_scale` was missing in the exported HF checkpoint, and TRT-LLM `CutlassFusedMoE` crashed on load with `KeyError: '<idx>.w1.input_scale'` for any expert that did not see enough calibration tokens (e.g. Qwen3-30B-A3B + `nvfp4_awq` from the bug report). ## Fix Mirror the calibrated `postprocess()` path in `modelopt/torch/quantization/model_calib.py`: collapse any per-channel `_amax` to scalar (axis=None) and re-enable the `input_quantizer`. ## Test plan - [x] Added regression test `test_awq_lite_uncalibrated_linear_keeps_input_quantizer_enabled` using `NVFP4_AWQ_LITE_CFG` with a two-branch model where only one branch is exercised; verifies the uncalibrated linear's `input_quantizer` remains enabled after `mtq.quantize`. - [x] End-to-end pipeline test on a tiny synthetic Qwen3-MoE (8 experts, top-1 routing) confirms all 48 expert `input_scale` keys are present in the exported state_dict (vs. multiple missing pre-fix). - [x] Manual repro of the bug command (Qwen3-30B-A3B, `--qformat nvfp4_awq`) confirmed the missing `input_scale` keys before the fix. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Fixed AWQ-Lite quantization for uncalibrated modules so export/preprocessing invariants are preserved even when calibration/parameter updates are skipped. * **Tests** * Added regression tests: one verifies uncalibrated linear modules keep their input quantizer enabled after quantization; another verifies weight-only AWQ-Lite cases keep the input quantizer disabled as expected. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Chenjie Luo <chenjiel@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
### What does this PR do? Type of change: ? <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. --> <!-- Details about the change. --> ### Usage ```python # Add a code snippet demonstrating how to use this ``` ### Testing <!-- Mention how have you tested your change if applicable. --> ### Before your PR is "*Ready for review*" Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed (`git commit -s -S`). Make sure you read and follow the [Security Best Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors) (e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(..., weights_only=False)`, `pickle`, etc.). - Is this change backward compatible?: ✅ / ❌ / N/A <!--- If ❌, explain why. --> - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ / ❌ / N/A <!--- Mandatory --> - Did you write any new necessary tests?: ✅ / ❌ / N/A <!--- Mandatory for new features or examples. --> - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: ✅ / ❌ / N/A <!--- Only for new features, API changes, critical bug fixes or backward incompatible changes. --> - Did you get Claude approval on this PR?: ✅ / ❌ / N/A <!--- Run `/claude review`. NVIDIA org members can self-trigger for complex changes; orthogonal to CodeRabbit. --> ### Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Updated compatibility handling to support Transformers 5.0 while maintaining backward compatibility with earlier versions. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Suguna Velury <178320438+sugunav14@users.noreply.github.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do? **Type of change:** Bug fix (backward-compat restoration), follow-up to #1356. #1356 restored `modelopt.onnx.llm_export_utils` as a deprecation shim so that TensorRT-Edge-LLM 0.6.1 could resume importing `fold_fp8_qdq_to_dq`. QA reports that the next line of `tensorrt_edgellm/onnx_export/onnx_utils.py` still aborts at module load: ```python from modelopt.onnx.llm_export_utils.surgeon_utils import fold_fp8_qdq_to_dq # OK from modelopt.onnx.quantization.qdq_utils import (fp4qdq_to_2dq, quantize_weights_to_int4, quantize_weights_to_mxfp8) # ImportError: cannot import name 'fp4qdq_to_2dq' from 'modelopt.onnx.quantization.qdq_utils' ``` Those three symbols were removed when the LLM ONNX export pipeline was refactored into the staged `modelopt.onnx.export` exporters (`INT4QuantExporter`, `NVFP4QuantExporter`, `MXFP8QuantExporter`). The new exporters still implement the same conversions, but edgellm 0.6.1 imports the old top-level functions unconditionally, so every `tensorrt-edgellm-*` CLI fails immediately — same 5/5 (100%) failure rate as before, just one line further in. ## What this PR does - Restores `fp4qdq_to_2dq`, `quantize_weights_to_int4`, and `quantize_weights_to_mxfp8` on `modelopt.onnx.quantization.qdq_utils` as deprecation shims that: - emit a `DeprecationWarning` pointing readers at the new `modelopt.onnx.export` exporters and at TensorRT-Edge-LLM as the long-term migration target, - reuse the existing in-tree helpers (`_cast_fp8` here, `_cast_fp4` and `_replace_fp4qdq_with_2dq` from `modelopt.onnx.export.nvfp4_exporter` via a lazy import to avoid a circular import — `nvfp4_exporter` already imports `onnx_dtype_map` from this module), - reuse `pack_weights_to_int4`, `get_amax`, `compute_e8m0`, `get_weights_scaling_factor[_2]`, `quantize`, `get_attribute`, `has_attribute`, and `read_f16_tensor_as_fp32` from their canonical locations rather than duplicating them. - Adds the matching imports at the top of `qdq_utils.py`. No new files, no public API removals, no behavior changes for any existing caller. I also audited every other `modelopt` import edgellm 0.6.1 makes (`modelopt.torch.quantization`, `modelopt.torch.opt`, `modelopt.torch.export.quant_utils`, `modelopt.torch.opt.plugins.huggingface`, `modelopt.onnx.quantization.gs_patching`, `modelopt.torch.quantization.nn.{QuantLinear,TensorQuantizer}`, `modelopt.torch.quantization.utils.{is_quantized,is_quantized_linear}`) and confirmed each still resolves on current `main`; only the three `qdq_utils` symbols above were missing. ## Testing - All four edgellm-side import lines from the failing traceback now succeed against this branch (verified by importing them from the v0.6.1 source tree). - `tests/unit/onnx/quantization/test_qdq_utils.py` + `tests/unit/onnx/test_onnx_utils.py`: 36 passed in 5s. - Each restored shim emits `DeprecationWarning` when called and runs to completion on an empty `ModelProto` smoke input. - `pre-commit run --files modelopt/onnx/quantization/qdq_utils.py`: ruff, ruff-format, mypy, bandit, license header all pass. ## Notes for reviewers - The shim functions are intentionally near-verbatim copies of the pre-removal implementations rather than thin adapters over the new exporters: the new exporters split work across `compute_scales` / `compress_weights` / `post_process` stages, while edgellm calls the legacy single-shot entry points and depends on side effects (e.g. casting bias to fp16 inside `quantize_weights_to_int4`) that the staged exporters do not all reproduce in one call. Wrapping them up was higher-risk than restoring the legacy paths verbatim with a deprecation warning. - Removal target: same as the existing `llm_export_utils` shim — drop together once edgellm releases a version that targets `modelopt.onnx.export` directly. ## Before your PR is "*Ready for review*" - [x] Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed. - [x] Is this change backward compatible?: **Yes** — purely additive; no existing API changes. - [x] Did you write any new necessary tests?: relied on existing qdq_utils unit tests + a manual repro of the failing edgellm import path. - [x] Did you add or update any necessary documentation?: deprecation message in each shim points to the migration target. - [x] Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: not applicable for a compat shim. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Added deprecated legacy quantization conversion utilities to preserve compatibility; invoking them now emits deprecation warnings. * **Documentation** * Updated helper docstrings to note compatibility/shim usage. * **Tests** * Added end-to-end smoke tests verifying the deprecated shims emit warnings and produce expected transformed/quantized models. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughThis PR consolidates TensorRT subprocess execution into shared helpers, adds legacy ONNX quantization shims for backwards compatibility, fixes FP8 scale overflow handling, improves quantizer state consistency for fused experts, extends model export compatibility for newer dependency versions, and updates evaluation examples to integrate with lm-eval 0.4.10+. ChangesTensorRT Subprocess Execution Refactoring
Legacy ONNX Quantization Deprecation Shims
Quantizer State and FP8 Scale Handling
Model Export and Dependency Compatibility
LLM Evaluation Framework Integration
Example Scripts and Dataset Utilities
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Suggested reviewers
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
…1416) ### What does this PR do? Type of change: Bug fix Related: NVbug 6153721. To make `lm_eval_hf.py` and the `vllm_causallms` adapter compatible with vLLM 0.20+, **lm_eval must be upgraded to 0.4.11**. `examples/llm_eval/lm_eval_hf.py` failed to import on lm-eval >= 0.4.10: ``` ImportError: cannot import name 'parse_eval_args' from 'lm_eval.__main__' ``` lm-eval 0.4.10 replaced `lm_eval.__main__.{setup_parser, parse_eval_args, cli_evaluate}` with a `HarnessCLI`-based interface in `lm_eval._cli`. This PR drops the legacy code path and drives `HarnessCLI` directly: - Attach the ModelOpt arguments (`--quant_cfg`, `--auto_quantize_*`, `--calib_*`, `--compress`, `--sparse_cfg`) to the new `run` subparser. - After parsing, move those keys out of the argparse namespace and into `args.model_args` (now a dict, courtesy of `MergeDictAction`), so `EvaluatorConfig.from_cli` doesn't reject them as unknown kwargs and so they reach our `HFLM.create_from_arg_obj` override. - Hard-require lm-eval >= 0.4.10 via `packaging.version.Version` and bump `examples/llm_eval/requirements.txt` and `examples/puzzletron/requirements.txt` accordingly. ### Usage ```bash # Same CLI as before — HarnessCLI auto-inserts the `run` subcommand for legacy-style invocations. python examples/llm_eval/lm_eval_hf.py \ --model hf \ --model_args pretrained=<HF model> \ --tasks hellaswag \ --quant_cfg FP8_DEFAULT_CFG \ --batch_size 4 ``` ### Testing Added `tests/examples/llm_eval/test_llm_eval.py::test_lm_eval_hf` — an end-to-end test that builds a tiny qwen3 (no GPU required) and runs `lm_eval_hf.py` against MMLU with `--limit 0.1`. Verifies both the new HarnessCLI integration and the `HFLM.create_from_arg_obj` override actually execute. Existing FP8 PTQ test (`test_qwen3_eval_fp8`) was retargeted to the same tiny qwen3 helper so it no longer pulls down the 1.1B TinyLlama checkpoint. ### Before your PR is "*Ready for review*" Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md) and your commits are signed (`git commit -s -S`). Make sure you read and follow the [Security Best Practices](https://github.com/NVIDIA/Model-Optimizer/blob/main/SECURITY.md#security-coding-practices-for-contributors) (e.g. avoiding hardcoded `trust_remote_code=True`, `torch.load(..., weights_only=False)`, `pickle`, etc.). - Is this change backward compatible?: ❌ — drops support for lm-eval < 0.4.10. Requirements pins are bumped in the same PR; existing CLI invocations continue to work because HarnessCLI auto-inserts the `run` subcommand for legacy-style argv. - If you copied code from any other sources or added a new PIP dependency, did you follow guidance in `CONTRIBUTING.md`: N/A - Did you write any new necessary tests?: ✅ - Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?: N/A - Did you get Claude approval on this PR?: ❌ ### Additional Information NVbug 6153721 — vLLM 0.20+ compatibility requires lm_eval 0.4.11. This PR pins to >= 0.4.10 to fix the import error; bumping the floor to 0.4.11 (or letting users opt in via vLLM extras) unblocks vLLM 0.20+ usage. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Require lm_eval >= 0.4.10 for evaluation examples and remove an lm-eval pin from a puzzletron example. * Improve LM-eval CLI integration so model-related CLI options (including trust_remote_code) are recognized and forwarded correctly. * **Tests** * Added an end-to-end smoke test for the LM evaluation example. * Updated evaluation tests to use the new tiny model setup for more robust validation. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
|
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tests/unit/torch/deploy/_runtime/tensorrt/test_engine_builder.py (1)
67-67:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMatch mocked subprocess output type to runtime contract (
bytes).Line 67 returns
"success"asstr, but_run_trtexec_with_loggingreturnsbytes. Keeping the mock as bytes avoids type-skew in tests.Proposed fix
- mock_run.return_value = (0, "success") + mock_run.return_value = (0, b"success")(Then update string assertions in this test file to
b"success".)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/torch/deploy/_runtime/tensorrt/test_engine_builder.py` at line 67, The test mocks subprocess output as a str but the runtime function _run_trtexec_with_logging returns bytes; change mock_run.return_value in test_engine_builder.py from (0, "success") to (0, b"success") and update any assertions comparing the captured output (e.g., equality checks against "success") to use b"success" so types match the runtime contract; look for uses of mock_run, _run_trtexec_with_logging-related assertions, and string comparisons in that test file and replace them with the bytes literal.examples/llm_eval/lm_eval_hf.py (1)
193-217:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDon't overwrite user
--model_argswith parser defaults.Lines 248-259 unconditionally copy every ModelOpt parser field into
args.model_args, but several of those fields have defaults on Lines 193, 206, 217, plus the implicitFalsefrom Line 231. That means a call like--model_args calib_size=1024,compress=Truegets silently rewritten back to512/Falseunless the same values are also passed as top-level flags.
create_from_arg_obj()already supplies these defaults, so the safer pattern is to leave parser defaults unset and only inject keys the user explicitly provided.Proposed fix
parser.add_argument( - "--calib_size", type=int, help="Calibration size for quantization", default=512 + "--calib_size", type=int, help="Calibration size for quantization" ) @@ parser.add_argument( "--auto_quantize_method", type=str, - default="gradient", choices=["gradient", "kl_div"], help=( "Method for auto_quantize sensitivity analysis. 'gradient' uses gradient-based method " @@ parser.add_argument( "--auto_quantize_score_size", type=int, - default=128, help=( "Number of samples to use for auto_quantize scoring. Most of auto_quantize time is spent on " "sensitivity score estimation, so reducing this speeds it up while only minimally affecting " @@ parser.add_argument( "--compress", action="store_true", + default=None, help="Compress the model after quantization", ) @@ for key in _MODELOPT_ARG_KEYS: if hasattr(args, key): - model_args[key] = getattr(args, key) + value = getattr(args, key) + if value is not None: + model_args[key] = value delattr(args, key)Also applies to: 229-232, 241-259
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/llm_eval/lm_eval_hf.py` around lines 193 - 217, The code is overwriting user-provided args.model_args with parser default values for ModelOpt fields (e.g., calib_size, auto_quantize_bits, auto_quantize_method, auto_quantize_score_size and the implicit booleans); instead, only merge keys that the user explicitly passed at the command line into args.model_args so create_from_arg_obj() can supply defaults. Modify the merge logic that currently copies every ModelOpt parser field into args.model_args to: detect which flags were explicitly set (via argparse's actions or by comparing against parser.get_default for each option) and only inject those keys into args.model_args (leave absent fields untouched so create_from_arg_obj() provides defaults). Ensure this change applies to the same merging blocks that handle calib_size/auto_quantize_* and any other ModelOpt fields referenced earlier.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/specdec_bench/specdec_bench/datasets/speed.py`:
- Around line 745-773: Add an early guard in _stratified_select to handle
non-positive n: if n <= 0 return dataset.select(range(n)) before checking
"category" presence or building cat_to_rows, so the interleaving code never runs
for n<=0 and the function returns an empty selection deterministically.
In `@modelopt/onnx/quantization/ort_utils.py`:
- Line 66: Remove the inline Bandit suppression by deleting the trailing "#
nosec B603" on the subprocess.run call so the security scan can validate it;
ensure the call remains using the argv list variable `cmd` (as already used) and
keeps the existing parameters (capture_output=True, text=True, timeout=timeout)
or replace with an explicit safer pattern if desired (e.g., subprocess.run(cmd,
capture_output=True, text=True, timeout=timeout, check=True))—locate the return
statement that invokes subprocess.run(cmd, ...) in ort_utils.py and remove only
the "# nosec B603" comment.
In `@modelopt/onnx/quantization/qdq_utils.py`:
- Around line 1264-1272: The current filter builds weight_dq_nodes by selecting
all DequantizeLinear nodes, causing index errors when activation DQ nodes (whose
inputs aren't in initializer_map) are processed; update the selection logic to
only include DequantizeLinear nodes whose input[0] is present in initializer_map
(i.e., backed by an initializer) before iterating. Locate the list comprehension
that defines weight_dq_nodes and change it to check initializer_map membership
of node.input[0] (or validate via get_tensor_producer_nodes/initializer_map) so
only weight-backed DequantizeLinear nodes are processed.
- Around line 1348-1359: The branch that replaces "Constant_output_0" with
"scale" and unconditionally appends a new initializer can create duplicate
initializer names; update the logic in qdq_utils.py around
scale_name/initializer_map/tensor_producer_map so that before calling
graph.initializer.append(...) you check whether an initializer with the target
name already exists (e.g., consult initializer_map or graph.initializer names)
and if it does, reuse that existing initializer and do not append a new one;
alternatively generate a unique name when needed and set node.input[1] to that
unique or existing initializer name to avoid duplicate initializers.
In `@modelopt/torch/_deploy/_runtime/tensorrt/engine_builder.py`:
- Around line 64-70: Remove the `# nosec` suppressions around the subprocess
invocation in engine_builder.py and harden the call instead: keep using
subprocess.run with a list (no shell=True), remove the comments,
validate/sanitize the `args` list before building `cmd = ["trtexec", *args]`
(reject or escape unsafe characters and ensure each entry is a string), verify
the executable exists (e.g., use shutil.which("trtexec") and raise a clear
error), and wrap subprocess.run in try/except to log and rethrow the caught
exception from the subprocess.run invocation so no security bypass comment is
needed; update references to subprocess.run, cmd, and args accordingly.
---
Outside diff comments:
In `@examples/llm_eval/lm_eval_hf.py`:
- Around line 193-217: The code is overwriting user-provided args.model_args
with parser default values for ModelOpt fields (e.g., calib_size,
auto_quantize_bits, auto_quantize_method, auto_quantize_score_size and the
implicit booleans); instead, only merge keys that the user explicitly passed at
the command line into args.model_args so create_from_arg_obj() can supply
defaults. Modify the merge logic that currently copies every ModelOpt parser
field into args.model_args to: detect which flags were explicitly set (via
argparse's actions or by comparing against parser.get_default for each option)
and only inject those keys into args.model_args (leave absent fields untouched
so create_from_arg_obj() provides defaults). Ensure this change applies to the
same merging blocks that handle calib_size/auto_quantize_* and any other
ModelOpt fields referenced earlier.
In `@tests/unit/torch/deploy/_runtime/tensorrt/test_engine_builder.py`:
- Line 67: The test mocks subprocess output as a str but the runtime function
_run_trtexec_with_logging returns bytes; change mock_run.return_value in
test_engine_builder.py from (0, "success") to (0, b"success") and update any
assertions comparing the captured output (e.g., equality checks against
"success") to use b"success" so types match the runtime contract; look for uses
of mock_run, _run_trtexec_with_logging-related assertions, and string
comparisons in that test file and replace them with the bytes literal.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: caa499f6-ea17-46a7-a8dd-11708cb15dd5
📒 Files selected for processing (22)
examples/llm_eval/lm_eval_hf.pyexamples/llm_eval/requirements.txtexamples/llm_sparsity/weight_sparsity/launch_finetune.shexamples/puzzletron/requirements.txtexamples/specdec_bench/specdec_bench/datasets/speed.pymodelopt/onnx/export/nvfp4_exporter.pymodelopt/onnx/quantization/autotune/benchmark.pymodelopt/onnx/quantization/ort_utils.pymodelopt/onnx/quantization/qdq_utils.pymodelopt/torch/_deploy/_runtime/tensorrt/constants.pymodelopt/torch/_deploy/_runtime/tensorrt/engine_builder.pymodelopt/torch/export/moe_utils.pymodelopt/torch/export/unified_export_hf.pymodelopt/torch/opt/plugins/transformers.pymodelopt/torch/quantization/model_calib.pymodelopt/torch/quantization/qtensor/nvfp4_tensor.pytests/examples/llm_eval/test_llm_eval.pytests/gpu/torch/quantization/test_nvfp4_static_quantizer_cuda.pytests/unit/onnx/quantization/test_qdq_utils.pytests/unit/torch/deploy/_runtime/tensorrt/test_engine_builder.pytests/unit/torch/quantization/plugins/test_fused_experts.pytests/unit/torch/quantization/test_calib.py
💤 Files with no reviewable changes (2)
- examples/puzzletron/requirements.txt
- modelopt/torch/_deploy/_runtime/tensorrt/constants.py
| def _stratified_select(dataset: "Dataset", n: int) -> "Dataset": | ||
| """Select ``n`` samples uniformly across the ``category`` column. | ||
|
|
||
| Round-robin across categories until ``n`` rows are collected. The | ||
| resulting prefix is balanced; once a smaller category is exhausted | ||
| the remaining categories continue contributing, so exactly ``n`` | ||
| rows are returned whenever ``n`` does not exceed the dataset size. | ||
| Falls back to ``range(n)`` when ``category`` is absent or there is | ||
| only one category. Indices come from ``range(category_size)`` (not | ||
| random) so behavior is deterministic. | ||
| """ | ||
| if "category" not in dataset.column_names: | ||
| return dataset.select(range(n)) | ||
| cat_to_rows: dict[str, list[int]] = {} | ||
| for i, c in enumerate(dataset["category"]): | ||
| cat_to_rows.setdefault(c, []).append(i) | ||
| if len(cat_to_rows) <= 1: | ||
| return dataset.select(range(n)) | ||
| cat_lists = list(cat_to_rows.values()) | ||
| interleaved: list[int] = [] | ||
| max_len = max(len(c) for c in cat_lists) | ||
| for i in range(max_len): | ||
| for c in cat_lists: | ||
| if i < len(c): | ||
| interleaved.append(c[i]) | ||
| if len(interleaved) == n: | ||
| return dataset.select(interleaved) | ||
| return dataset.select(interleaved) | ||
|
|
There was a problem hiding this comment.
Guard non-positive sample counts before interleaving.
When n <= 0, this method can still return non-empty results for multi-category datasets (because the round-robin loop appends before any len(interleaved) == n check can succeed). That makes num_samples=0 behave incorrectly.
Suggested fix
`@staticmethod`
def _stratified_select(dataset: "Dataset", n: int) -> "Dataset":
+ if n <= 0:
+ return dataset.select([])
+
if "category" not in dataset.column_names:
return dataset.select(range(n))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/specdec_bench/specdec_bench/datasets/speed.py` around lines 745 -
773, Add an early guard in _stratified_select to handle non-positive n: if n <=
0 return dataset.select(range(n)) before checking "category" presence or
building cat_to_rows, so the interleaving code never runs for n<=0 and the
function returns an empty selection deterministically.
| """ | ||
| cmd = ["trtexec", *(args or [])] | ||
| try: | ||
| return subprocess.run(cmd, capture_output=True, text=True, timeout=timeout) # nosec B603 |
There was a problem hiding this comment.
Remove the inline Bandit suppression.
This subprocess call is already using an argv list, so the new # nosec B603 is just a policy violation on a security-sensitive path and should not be merged as-is.
Suggested fix
- return subprocess.run(cmd, capture_output=True, text=True, timeout=timeout) # nosec B603
+ return subprocess.run(cmd, capture_output=True, text=True, timeout=timeout)As per coding guidelines, "Bandit security checks must pass without exceptions. # nosec comments are not allowed as a bypass for security checks."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| return subprocess.run(cmd, capture_output=True, text=True, timeout=timeout) # nosec B603 | |
| return subprocess.run(cmd, capture_output=True, text=True, timeout=timeout) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modelopt/onnx/quantization/ort_utils.py` at line 66, Remove the inline Bandit
suppression by deleting the trailing "# nosec B603" on the subprocess.run call
so the security scan can validate it; ensure the call remains using the argv
list variable `cmd` (as already used) and keeps the existing parameters
(capture_output=True, text=True, timeout=timeout) or replace with an explicit
safer pattern if desired (e.g., subprocess.run(cmd, capture_output=True,
text=True, timeout=timeout, check=True))—locate the return statement that
invokes subprocess.run(cmd, ...) in ort_utils.py and remove only the "# nosec
B603" comment.
| weight_dq_nodes = [node for node in graph.node if node.op_type == "DequantizeLinear"] | ||
| tensor_producer_map = get_tensor_producer_nodes(graph) | ||
|
|
||
| nodes_to_remove = [] | ||
| for node in weight_dq_nodes: | ||
| weight_name = node.input[0] | ||
| scale_name = node.input[1] | ||
| logger.debug(f"Processing INT4 conversion for weight {weight_name}") | ||
| weight = numpy_helper.to_array(initializer_map[weight_name]) |
There was a problem hiding this comment.
Filter this shim to weight-backed DequantizeLinear nodes only.
Right now weight_dq_nodes includes every DequantizeLinear in the graph. The first activation DQ will fail at initializer_map[weight_name] because its input is not an initializer. That makes the compatibility shim unusable on mixed QDQ graphs even though only weight DQs should be rewritten here.
Suggested fix
- weight_dq_nodes = [node for node in graph.node if node.op_type == "DequantizeLinear"]
+ weight_dq_nodes = [
+ node
+ for node in graph.node
+ if node.op_type == "DequantizeLinear"
+ and node.input
+ and node.input[0] in initializer_map
+ ]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modelopt/onnx/quantization/qdq_utils.py` around lines 1264 - 1272, The
current filter builds weight_dq_nodes by selecting all DequantizeLinear nodes,
causing index errors when activation DQ nodes (whose inputs aren't in
initializer_map) are processed; update the selection logic to only include
DequantizeLinear nodes whose input[0] is present in initializer_map (i.e.,
backed by an initializer) before iterating. Locate the list comprehension that
defines weight_dq_nodes and change it to check initializer_map membership of
node.input[0] (or validate via get_tensor_producer_nodes/initializer_map) so
only weight-backed DequantizeLinear nodes are processed.
| if scale_name not in initializer_map: | ||
| # Remove scale producer if it's a Constant node | ||
| scale_name = node.input[1] | ||
| scale_producer = tensor_producer_map[scale_name] | ||
| if scale_producer.op_type == "Constant": | ||
| graph.node.remove(scale_producer) | ||
|
|
||
| # Create a new scale tensor | ||
| scale_name = scale_name.replace("Constant_output_0", "scale") | ||
| scale_tensor = onnx.numpy_helper.from_array(scale, scale_name) | ||
| graph.initializer.append(scale_tensor) | ||
| node.input[1] = scale_name |
There was a problem hiding this comment.
Don't append a second initializer with the same replacement name.
In the constant-scale path, Constant_output_0 is renamed to scale and then always appended as a new initializer. The fixture added in this PR already carries an unused scale initializer, so this branch produces duplicate initializer names, which makes the ONNX graph ambiguous/invalid.
Suggested fix
- scale_tensor = onnx.numpy_helper.from_array(scale, scale_name)
- graph.initializer.append(scale_tensor)
- node.input[1] = scale_name
+ scale_tensor = onnx.numpy_helper.from_array(scale, scale_name)
+ if scale_name in initializer_map:
+ initializer_map[scale_name].CopyFrom(scale_tensor)
+ else:
+ graph.initializer.append(scale_tensor)
+ initializer_map[scale_name] = graph.initializer[-1]
+ node.input[1] = scale_name🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modelopt/onnx/quantization/qdq_utils.py` around lines 1348 - 1359, The branch
that replaces "Constant_output_0" with "scale" and unconditionally appends a new
initializer can create duplicate initializer names; update the logic in
qdq_utils.py around scale_name/initializer_map/tensor_producer_map so that
before calling graph.initializer.append(...) you check whether an initializer
with the target name already exists (e.g., consult initializer_map or
graph.initializer names) and if it does, reuse that existing initializer and do
not append a new one; alternatively generate a unique name when needed and set
node.input[1] to that unique or existing initializer name to avoid duplicate
initializers.
| import subprocess # nosec | ||
|
|
||
| cmd = ["trtexec", *args] | ||
| logging.info(" ".join(cmd)) | ||
| with NamedTemporaryFile("w+b") as log: | ||
| p = subprocess.Popen(cmd, stdout=log, stderr=log, cwd=str(cwd) if cwd else None) # nosec | ||
| p.wait() | ||
| log.seek(0) | ||
| output = log.read() | ||
| if p.returncode != 0: | ||
| logging.error(output.decode(errors="ignore")) | ||
| return p.returncode, output | ||
| try: | ||
| result = subprocess.run( # nosec B603 - cmd[0] is hardcoded "trtexec" | ||
| cmd, |
There was a problem hiding this comment.
Remove # nosec suppressions in subprocess invocation.
Line 64 and Line 69 use # nosec bypasses. This violates repo security policy and must be removed (or escalated with explicit approved exception flow in PR metadata).
Proposed fix
- import subprocess # nosec
+ import subprocess
@@
- result = subprocess.run( # nosec B603 - cmd[0] is hardcoded "trtexec"
+ result = subprocess.run(
cmd,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
cwd=str(cwd) if cwd else None,
timeout=3600,
)As per coding guidelines: "Any use of '# nosec' comments to bypass Bandit security checks is not allowed."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modelopt/torch/_deploy/_runtime/tensorrt/engine_builder.py` around lines 64 -
70, Remove the `# nosec` suppressions around the subprocess invocation in
engine_builder.py and harden the call instead: keep using subprocess.run with a
list (no shell=True), remove the comments, validate/sanitize the `args` list
before building `cmd = ["trtexec", *args]` (reject or escape unsafe characters
and ensure each entry is a string), verify the executable exists (e.g., use
shutil.which("trtexec") and raise a clear error), and wrap subprocess.run in
try/except to log and rethrow the caught exception from the subprocess.run
invocation so no security bypass comment is needed; update references to
subprocess.run, cmd, and args accordingly.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## release/0.44.0 #1426 +/- ##
==================================================
- Coverage 77.19% 76.69% -0.51%
==================================================
Files 467 466 -1
Lines 50581 50778 +197
==================================================
- Hits 39048 38944 -104
- Misses 11533 11834 +301
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Cherry-picked PRs
Summary by CodeRabbit
New Features
Bug Fixes
Improvements
Tests